The Schoenberg Database of Manuscripts consists of huge amount of data regarding the locations, dates, auction and sales information, titles, and authors, types and physical details of medieval manuscripts that were produced before 1600. It has more than 250,000 records that were drawn from over 12,000 sales, auction, and institution catalogues. Its 35 data elements can be identified as either transaction or manuscript details. This allows us to examine and research the movements of the manuscripts across time, the sellers and buyers involved, the types of manuscripts, language and structure of the manuscripts, the provenance, and specific authors and titles. We hope to illuminate new insights on these areas, which could contribute the most to our digital humanities project.
Larry Schoenberg began a database of manuscripts in 1997, and since then other scholars have contributed to the metadata repository. According to the Schoenberg online database, Larry began with a Microsoft Excel file that was eventually converted in 1999 to a Microsoft Access database. As the database grew, there was a growing need to make it more easily accessible to a wider audience due to its increasing popularity among scholars and researchers of manuscript studies. Therefore, in 2005, the Schoenberg Center for Electronic Text and Image (SCETI) began hosting the database, making it freely available on the web. In 2007, the SDBM joined forced with the Jordanus Database of Scientific Manuscripts at the University of Munich, making the number of records exceed 125,000. By June of 2007, the Schoenbergs and Penn Libraries began a partnership that enabled SCETI to gather a group of expertise that have the skills to contribute new records into the database.
Although our dataset is immensely large, there is still a lot of important information missing. We are left not knowing what kind of manuscripts we are working with. Also, a large part of this data is supposed to tell us the selling/buying history of the manuscripts, however, many of the data types are missing the prices of the manuscripts or whether or not they were sold or not. Because we can’t actually see most excerpts from the manuscripts, we are abstained from internal features such as side notes, pictures, and the writing style. This information could allow researchers to make more connections and arguments. Furthermore, because the data descriptions come from secondary sources, there is an uncertainty regarding the accuracy of the details of the manuscripts. On top of the missing information already stated, we are also missing a lot of the explanation of the variables. It is problematic that these pieces of information can be interpreted in different ways. Because we are using the information from this dataset to make specific conclusions, it is important that there is a clear understanding of the meanings of the data types.
The dataset’s ontology revolves around the concept of documenting sales and auction catalogues of medieval manuscripts. Its categories are more apparent and understandable to scholars and aficionados of manuscript studies, and not so much to the rest of the public. Judging by the terms used in the dataset, the categorization of the data was clearly created with a specific audience in mind. For example, provenance, catalogue I.D., and circa have raised up many questions in our group. Without looking to other sources, we have been and still are confused about their definitions, how they are used, and why they are important to the dataset. It is hard to grasp what the data elements are referring to without prior knowledge of manuscript studies and the process of cataloguing medieval manuscripts. This, then, limits the accessibility of the database for the general public, and it also limits the scope of data collection regarding manuscripts and auction catalogues. If our dataset was our only source, we would be missing background information, context, definitions of the terms used for categorization, information on the importance of the manuscripts, the historical significance, the meaning of the data types, its ontology regarding terms of auctioneering, and visual representation of the manuscripts.