The Schoenberg Database of Manuscripts consists of huge amount of data regarding the locations, dates, auction and sales information, titles, and authors, types and physical details of medieval manuscripts that were produced before 1600. It has more than 250,000 records that were drawn from over 12,000 sales, auction, and institution catalogues. Its 35 data elements can be identified as either transaction or manuscript details. This allows us to examine and research the movements of the manuscripts across time, the sellers and buyers involved, the types of manuscripts, language and structure of the manuscripts, the provenance, and specific authors and titles.

Through our project, we hope to illuminate new insights on these areas of history in the form of an interactive digital humanities project. In The Value of Narrativity in the Representation of Reality, Hayden White acknowledges,

“Historiography is an especially good ground on which to consider the nature of narration and narrativity because it is here that our desire for the imaginary, the possible, must contest with the imperatives of the real, the actual. If we view narration and narrativity as the instruments by which the conflicting claims of the imaginary and the real are mediated, arbitrated, or resolved in a discourse, we begin to comprehend both the appeal of narrative and the grounds for refusing it” (8-9).

Hayden’s acknowledgement of the importance of transforming historiography into a narrative is resembled in the values with which we created our project. We analyzed the data we were given about these manuscripts and we used it to synthesize it and transform records and numbers into a historical narrative that would tell a story different from what a textbook can tell.

Why Language?

Our project analyses a very important aspect of everyday life, language. We took language as the lens to study the history of German literature and the key events that it reflected. By utilizing such lens, we were able to construct a historical narrative that allows there to be an “imaginary” and a “reality,” as White mentioned. Our humanities research question asks, "How did key historical events in Medieval Germany influence the use of vernacular language in manuscripts?" More specifically, we focused on six key events–the Rise and Fall of the Carolingian Age, the Ottonian Kingdom, the Hohenstaufen Era, the Great Famine, the Black Death, and the Invention of the Gutenberg Printing Press.

Our overarching challenge with this project was the sheer number of records in our database. With over 250,000 rows, we needed to cut down. We initially considered narrowing down on specific authors or time periods, but realized that the former was too specific and the latter too broad to find trends and insights. Therefore, we decided to compare the use of the German and Latin languages. It limited our data to a manageable 7000 records, and we now had more specific geographical boundaries for secondary research.


The main source for answering our humanities research question was the University of Pennsylvania's Schoenberg Manuscript Database. This comprehensive collection has over 250,000 records of catalogued medieval manuscripts, taking note of sales, trades, and observations. Although the organization was difficult to decipher, the database undoubtedly contains a wealth of valuable information.

Most valuable to our research question were the place, language, and manuscript date. To complement our main question with data visualization, we also used "titles," which categorized the different types of manuscripts, such as "Book of Hours," "Psalters," "Bible," and "Medical."

Secondary research was incredibly important in understanding the complex historical material. We focused on six key historical periods or events and their relationship to language: The Rise and Fall of the Carolingian Age, the Ottonian Kingdom, the Hohenstaufen Era, the Great Famine, the Black Death, and the Invention of the Gutenberg Printing Press. Our research was predominantly conducted online, using academic journal databases such as Jstor and the UCLA EBSCOhost Academic Search Complete. Our sources range from cultural overviews to economic history to linguistic analysis, providing key context to help analyze our data trends.

Finally, we also sourced images to add visuals to our website. In order to do this we used royalty-free stock images from Wikimedia Commons and Schoenberg’s Penn in Hand.


Organization was the most complicated problem we ran into during the course of our project. With 250,000 records, thousands of duplicates, and over 20 different columns in our original data set, we needed to systematically and meticulously clean the data. Using Google’s OpenRefine, we reduced the dataset into 7000 or so records, and the process is explained below.

First, we filtered the data by place using the keyword “Germany”. We included ambiguous data points like "Germany|Flanders."

Second, we filtered the data by language, using "German" and "Latin" as keywords. We included ambiguous data points like "German|Latin" as their own category. After cleaning up the language variable, we ran it again in R, a statistical analysis software, to do some basic analysis, such as frequency tables. We realized that the German dialects can be lumped together as simply “German”.

Third, we removed data points with ambiguous place names such as “Germany|France” that had languages other than “German”. We also created another subset that was filtered first by language using “German” as the keyword to find places that were not Germany that had German manuscripts. Using Excel, we combined this new subset data with the first subset data, while making sure to avoid duplicate rows.

Fourth, we cleaned up the “place” variable using OpenRefine’s clustering option. Additionally, we aggregated ambiguous data points such as “Germany|France” to simply “Germany”. We got rid of data rows that have pipes or blanks for "Manuscript Date" to reduce the ambiguity and uncertainty. Also, because the dates were in years, we converted the dates into a format that was readable by both CartoDB and Tableau: 01/01/YYYY.

Fifth, we geocoded the place names using Excel, OpenRefine, and Geocode by Awesome Table (an add-on for Google Sheets). To do this, we filtered our data set for unique place names and created a new data set. Then, using the Geocode add-on, we retrieved geocoordinates for each place name. For those place names that Geocode wasn’t able to identify, we had to manually edit them. After correcting for spelling mistakes and changing ambiguous place names accordingly, such as “Germany, southern” to “Germany”, we were then able to get the complete geocodes for the dataset. Using OpenRefine, we were able to add the latitudes and longitudes as separate columns into our cleaned-up data based on the matching place names.

Sixth, we combed through the "Title" option, lumping together the different types of books into 50 or so general categories, like "Psalter," "Prayerbook," "Book of Hours," and "Calligraphy." There were liberties taken to lump the data. Examples include using "Bible" for titles like "New Testament," "Old Testament," and "Genesis," or "Vitae Sanctorum" for books about Peter and Paul's lives. Spelling and language were also accounted for these broad categories. However, due to the extensive list of titles in Latin and time constraints, many of the titles were left as it is. We will need a Latin specialist and additional time to further clean them up.


"The greatest value of a picture is when it forces us to notice what we never expected to see."
-John W. Tukey, Exploratory Data Analysis (1977)

For our project, we wanted the big “picture” to be an interactive website that would allow the viewer to navigate through a series of historical background, images, maps, timeline, and other data visualizations. In Exploring Data Visually, Yau states that “visualization as an analysis tool enables you to explore data and find stories that you might not find with formal statistical tests (136).” Through our presentation layers we wanted to create the same experience for our viewers so that they have a visual image of the historical event as it links to the endeavour of answering our humanities question.

For mapping, we used mainly CartoDB. Our main map is the torque-category style, showing the frequency and locations of the language of the manuscripts over time. It is a very visual, intuitive, and colorful way to see the increasing number of German manuscripts. The map of the spread of the Black Death in Europe was created using custom, hand-constructed polygons in CartoDB. Each time period has its own color to easily visualize the dissemination of the plague. The main map was also created using a hand-constructed polygon to present a “medieval Germany”. There was no official country called Germany in the medieval times due to changing political situations, therefore, it was critical for us to indicate which parts of Europe was considered “Germany” using a map layout of the Holy Roman Empire from the 9th century as a guideline.

For data visualization, we used Tableau. By using Tableau, we were able to make the manuscript data records come alive. For example, Leticia created a visualization that fuses a map, a bar graph, and a line chart to demonstrate the different locations in which the manuscripts can be found at a specific time and in a specific language. This visualization allows the viewer to understand the answer to our question in one visual experience and also experience for himself or herself using the Tableau’s interactive layout.

To create our main timeline, we used Timeline JS. In reference to timelines, Yau states that the goal is to visualize “what has passed, what is different, and what is the same, and by how much (154). In our project the timeline is relevant to Yau’s goals as we show the historical events that have passed through time and the story they tell through this specific time. We also show how the language shift and become different going from Latin to Vernacular. By connecting our timeline to our narrative we also come to show how the culture and values of the German people have also changed through the historical events. Through the timeline and the maps we also come to show for how much time language stays the same in Medieval Germany.

Web Tools Overview

This site was built using the Twitter Boostrap framework, and based off of Start Bootstrap's Business Casual theme.

The maps were constructed on CartoDB.

The visualizations were made using Tableau.

The timeline feature was built on Northwestern University Knight Lab's Timeline.Js framework.


We would like to thank Professor Miriam Posner, our wonderful mentor who introduced us to the fascinating world of digital humanities. Without her breadth and depth of knowledge of topics from data analysis to mapping and everything in between, we undoubtedly would not have been able to create a comprehensive, dynamic project! We appreciate her patience with our group's obstacles, and her constant willingness to lend a hand. Thank you Professor Posner, we had a great time in DH101!

We would also like to thank our incredible T.A. Francesca Albrezzi who was there for our group every step of the way. Her empathetic understanding of our challenges and frustrations encouraged us to work hard and learn well. Her expertise in the various software programs we used from OpenRefine to CartoDB, was an incredible help in troubleshooting technical troubles. Her wise questions always pushed us towards the right direction, especially when we felt lost in our database of 250,000 records!

Finally, Dr. Mitch Fraas from the University of Pennsylvania, our expert contact for the Schoenberg Manuscript Database, provided valuable insights to help us better understand our complex task. His knowledge about the history, context, and cataloguing of the medieval manuscripts provided key context in shaping our research questions, priorities, and organization. We're excited to see how the database evolves from here!

About Page Sources

White, Hayden. “The Value of Narrativity in the Representation of Reality.” Critical Inquiry 7, no. 1 (October 1, 1980): 5–27.

Yau, Nathan. Data Points: Visualization That Means Something. Indianapolis, IN: John Wiley & Sons, Inc., 2013, chapters three and four (99-200).