Jesse Koblin – Week 7 Project Update

LACOL Digital Humanities 7/27/23

For my work this week analyzing and transcribing texts from Vassar Digital Library’s Jasper Parrish Papers, I used a combination of Microsoft Vision Studio for data preparation and Voyant Tools for exploratory data visualization. Vision Studio is a fantastic tool for text transcription, using optical character recognition (OCR) AI Azure to convert English script manuscripts into a digital format. Vision Studio could be more precise, as it necessitated re-transcriptions on my end to confirm the accuracy of the produced text. Nonetheless, Vision Studio saves time and physical effort by automating the initial typing process, alleviating the tedium of transcribing digital documents. After assembling and fine-tuning the finished transcripts, I used Voyant Tools to visualize the produced data. Voyant’s primary utility is its wide range of possible visualizations and streamlined UI, even having a feature to compare across distinct texts. Although Voyant has some limitations, such as a lack of customizability for produced visualizations, it was the right tool for exploring the Jasper Parrish corpus. 

I used Voyant’s topic modeling tools to identify correlations, collocations, trends, and thematic overlaps across the Jasper Parrish Papers. Topic modeling is a practical approach to data and text analysis because it breaks down writing into its scientific and mathematical structures, highlighting data-based discoveries hidden within language’s semantic and formal rules. Topic modeling is beneficial for elucidating textual themes obscured within the writing, ensuring a comprehensive understanding of the object. This approach helped me recognize a trend of familial metaphors across the texts, with the U.S. President consistently called “Father” and the corresponding U.S. and Native representative “Brothers.” Meeks and Weingart’s article “The Digital Humanities Contribution to Topic Modeling” from Week 4 was an indispensable resource for understanding this approach and led me to discover MALLET, a topic modeling tool I have been experimenting with in addition to Voyant. 

My data cleaning first involved transcribing the texts to the greatest degree of accuracy. The texts in the Jasper Parrish Papers span the 18th and 19th centuries and bear archaic spelling and grammar conventions. Therefore, I translated these elements into modern English to create a consistent dataset. I also ordered the texts chronologically so that Voyant visualizations could show the progression and trends of specific terms over time. Finally, I separated the text objects and created an Excel spreadsheet to assign metadata to each object, including the creation date, creator, accession date, place of creation, materials, genre, culture, and format. I opted for separation rather than consolidation because I want to be able to analyze these objects independently and assign them independent properties using metadata, allowing for easier data manipulation and visualizations. 

Finally, I corresponded with the registrar of Vassar College’s Frances Lehman Loeb Art Gallery (my work study location during the school year!) and acquired a dataset containing 238 objects of Native American origin. The dataset features extensive metadata on the object accession date, medium, culture, creator, creation date, and a description box providing archival contextualization. I have been manipulating this dataset in OpenRefine and faceting it with different variables. With some cleaning and consolidation, it could form a critical substantiating part of our thesis for the final project and pair well with museum metadata from Amherst and Williams College.

jkoblin@vassar.edu

Leave a Reply

Your email address will not be published. Required fields are marked *