Week 5 – Spatial Analysis

How does digital mapping relate and compare to other digital humanities issues and methods? This lab will build on what you’ve already learned about your college collections, humanities data cleaning, and text analysis by asking you to extract geographic data from a college newspaper and collectively contribute to a map of your findings using open source tools. 

Goals

  • Learn how to extract geographic data from plain text
  • Practice cleaning data to meet the needs of spatial analysis
  • Develop practical knowledge of when and how to use data distribution and story maps
  • Contribute to a small scale demonstration collaborative mapping project
  • Reflect on the pros and cons of mapping for digital humanities scholarship

Specs

  • Report of approximately 500 words
  • Report addresses the questions in the “Reflection” section below 
  • Report applies ideas from this week’s readings to the questions
  • Author makes their points in clear and concise ways
  • Includes links to or screenshots of contributions to group maps
  • Upload PDF to Canvas

Lab Instructions

There are a number of open source and proprietary mapping tools out there and which you use will depend on your goals, your access, your financial resources, and a number of other factors. The aim of this lab is to give you a sense of the steps involved in any digital mapping process by using a suite of easily accessible web based tools. Follow the steps below as best you can, but focus on the big picture. You will be graded on your reflections on the issues raised by this methodology rather than your technical expertise with any single step below.

Overview

  1. Obtain a digitized copy of a recent issue of your college newspaper
  2. Extract a geographic dataset of places mentioned in the newspaper issue using https://recogito.pelagios.org 
  3. Clean and organize your data to optimize for mapping
  4. Add your data to a collective point map of places mentioned by each college newspaper using the Leaflet Maps with Google Sheets template
  5. Begin to explore the meaning of your data by adding a chapter about one place mentioned in your issue to a story map using the Leaflet Storymaps with Google Sheets template

Output Maps

These two maps will update with our collective information as you contribute your data to them.


Sources

  • Get the text of a recent college newspaper as your source
  • Go to your college’s newspaper collection and choose one of the most recent student newspaper issues you can find [because there are several students from the same colleges, there may be overlap, which is fine]
  • Download the OCRd text, if available, or download the PDF. 
  • Extract the text from the PDF. The easiest way is to Select All in the PDF and paste into a new document in a text editor (if you don’t have one, I recommend the free Atom editor).
  • Save as a plain text file and make sure to set the encoding to UTF-8 in the save dialogue or Recogito won’t accept it.

Process

Named Entity Recognition and Geocoding

Our goal is to identify all the places mentioned in your issue (or at least on the first page, if the issue is very long) and add geographic coordinates to them (geocode them). You could manually read for these, add them to a spreadsheet and manually find latitude and longitude coordinates for each, but we are going to speed the process by having an algorithm find named entities and compare them to place names dictionaries (called gazetteers) using a tool called Recogito.

  • Sign up for a free account at recogito.pleagios.org.
  • Read through the Quick 10 Minute tutorial paying particular attention to Step 3: Identify and map places
  • Upload the .txt version of your college newspaper you saved above
  • Click the document once, then choose Named Entity Recognition from the Options drop-down menu in the top right corner (Video instructions at 1:55)
  • In the NER dialogue, choose Stanford CoreNLP en to use the English language recognition engine
  • Uncheck all available Authority Files, and then check only the GeoNames gazetteer to search for modern place names in English
  • Click Start NER
  • Once it finishes, double click the document name to open the annotation viewer
  • You should see identified Persons in blue and Places in green (if you don’t, make sure to choose COLOUR: By Entity Type in the top options bar)
  • Click on each green highlighted word to confirm if the locations are correct and change them if not
  • On hitting OK, if you are prompted to Re-Apply, choose Yes & merge existing annotations to update all references to that place
  • You only need 5-10 confirmed locations for this demo, so feel free to stop if you get many more
  • Once you have corrected the locations, switch to the Map View in the top toolbar to view and verify the results in the exploratory map visualization
    • Click around the Map View to see what this data viz offers: annotations in context, links to text, color and filter options, etc.
    • How could this help you ask or answer new questions of your text?
Data Cleaning

If you were just doing a project for yourself, the recogito map might be all the exploratory data analysis you need, but we are going to try to combine your individual data into a collaborative project. For that, we need to export our data, and recogito offers many export formats. Our goal is a clean list of each unique place mentioned with its lat/long coordinates.

  • Choose Download Options from the top menu
  • Note that you could get GeoJSON or KML files if you are using an advanced GIS software (see optional bonus exercise below), but we are going to download annotations as CSV to get a list of all persons and places identified
  • Practice your OpenRefine skills (or just use Excel) to make the following changes
  • Save your CSV

Presentation

Point map: compare data trends

I have set up a template map to compare the places mentioned in our newspapers in order to perform some data distribution analysis. You add data to this map by contributing to the google sheet below and adding additional attribute values so your points are colored the same as your college and contain an informational pop up.

DATA ENTRY GOOGLE SPREADSHEET

LIVE INTERACTIVE MAP (click to open full screen)

Instructions: Tutorial with much more information 

  • Import data into the map by going to the POINTS tab in the DATA ENTRY sheet linked above
  • Copy the list of Places from your cleaned Recogito export into the Names column
  • Copy the corresponding Lat and Long columns into their equivalents
  • Click on the LIVE INTERACTIVE MAP link above to refresh and see your data points!
  • Now you’ll add additional meaning through symbology to make patterns more easily recognizable.
  • Set the following values on the first record, and copy/paste them to fill down all your rows
    • Group = COLLEGE NAME (so we can filter between schools to compare geographic spread)
    • Marker Color = [the value for the college whose newspaper you used] (so you can see at a glance which points are from which college)
    • Description = Your newspaper title and date of issue (so point clicks will show the source of the information)
  • Play around with the filters and compare the data.
    • What patterns do you see?
    • Any unexpected clusters or outliers?
Story map: make arguments and make meaning

I have also set up a template story map to dig into some of the meaning behind these places’ inclusion in the campus newspapers. As above, you add data to this map by contributing to the different google sheet below. 

DATA ENTRY GOOGLE SPREADSHEET FOR STORYMAP

LIVE INTERACTIVE STORYMAP 

Instructions: Tutorial with much more information 

  • Choose one location mentioned in your issue that is most interesting to you
  • Import data into Story Map Google Sheets Leaflet map by adding a new row with the following information
    • Required
      • Chapter: The title of your story
      • Description: A brief (<500 characters) summary of the reason the place was mentioned in the newspaper
      • Location: Human readable place name
      • Lat/Long: Computer mappable coordinates
    • Optional
      • Media/Media Credit/Media Credit Link: Visual content to enhance your chapter
      • Play around with zoom levels and other marker options to consider what makes most sense for your story
  • What types of project would this style of map lend itself to?

CONGRATULATIONS!

Reflections

  • Describe your experience working with extracting and visualizing geographic data. How does it build on, contrast with or compare to working with other types of data you’ve explored so far?
  • Describe your experiences working with the specific tools used here. Were you able to complete all the exercises? What problems did you encounter?
  • What benefits can you see of using mapping for exploratory data analysis (as in the Recogito map) and/or explanatory data analysis as in the Google sheets template maps?
  • What kinds of research questions or projects could you imagine using these tools for, especially regarding social justice collections? Give at least one example of a project idea.

Submission Details

  • Submit the lab report as a PDF to Canvas by the end of the day (local time) on Saturday, July 16
  • You can write the report in Google Docs, Word, Pages, or another application. Just be sure to save as a PDF

OPTIONAL Bonus Exercise (Intermediate)

The instructions above largely used web tools and spreadsheets to create maps for you, which hid both the geometric data structures and coding. This optional advanced exercise opens the door to some more complexity for those who want to dig into the code a bit deeper. I’ll show you how to take your recogito output as a GeoJSON file (an open standard web data format) and use a FOSS tool called geojson.io to explore the data and/or customize a map you can host in github gists (if you create a free account) and share on the web.

  • Go back to your document in Recogito, click the Download icon and download your Place data as GeoJSON
  • Launch geojson.io and drag the file you just downloaded (e.g. “gt54k6w8jwccav.json”) onto the map window
    • You should get a similar visualization to Recogito’s Map View on the left, but this time with an editable code viewer of the GeoJSON source in the right pane. This is a great way to interrogate the structure of a json file and figure out how the JavaScript mapping libraries parse the stored data.
  • Let’s compare a mapped point with its GeoJSON representation
    • Click “Table” at the top of the right pane to see a list of your titles and number of annotations (the “properties” object values)
    • Click on the first city or point location and the map should center it.
    • Zoom in until you can see it and make sure it is a point.
    • Toggle back to the JSON pane and compare the map popup to the code.
      • Can you tell what does what?
    • Change the color of the marker to something else from the color picker in the popup and Save
      • What changes in the “properties” object?
    • Repeat the process in reverse by changing the “properties” “marker-color” key to a new web color value, e.g. “#000000” or “black”
      • What changes in the map view?
    • Continue to explore and/or add and delete new fields to points or features entirely until you have a map you like. You can even use the drawing tools on the right of the map to create new geometry from scratch.
  • Finally, you can save the map if you log into (or create) a github account and then embed it anywhere on the web
    • Click “Login” in the top right corner
    • Sign in to github (if you have an account) or create a new free one. (If you think you might do a lot of web development/coding, you can also check out the Student Developer Pack which is a great free value while you are in higher ed.)
    • Save your map to a Gist by choosing Save > Gist or just Cmd/Ctrl+S.
    • You should now see a github icon and gist URI in the menu bar
      • You could share the URL to this page and reload the map as you see it
      • You can also click the github icon to go the gist on Github, where you can share an uneditable clean map directly as a link, a repository, or an embed, as below