This is the second in a series of guest posts on the subject "Medieval Studies in the Age of Big Data," described in an earlier post here (the first contribution, by Martin Foys, can be found here). Timothy Stinson teaches at North Carolina State University and is the co-founder of the Medieval Electronic Scholarly Alliance (MESA)

Has medieval studies entered the age of big data? Yes and no, but certainly less so than many would have you believe. Such a claim may seem surprising given that medievalists were among the earliest adopters of computing technologies for humanistic inquiry and that many of the most prominent and innovative digital humanists working today are medievalists. All of that is true, but the fact is that almost all of us lack the raw materials to do “big data” in our fields of specialization. No one knows for certain how many medieval manuscript books have survived to our time; conservative estimates place the count at around 600,000, but others argue that the number is more likely as high as one million. And of course we need to add to this the very large (but unknown) number of loose leaves from codices as well as documents that often weren’t bound into codex form, such as leases, charters, and papal bulls. But our big data is still on the shelf: of these surviving medieval codices, fewer than one percent have been digitized, and of that group, most exist in digital form as images unaccompanied by textual transcriptions that can be mined or machine processed.

In his guest post for In the Middle, Martin Foys recalls a question that Dan O’Donnell posed at a Kalamazoo session devoted to digital scholarship some years ago, a question that he says has haunted him since:

Well, okay, is all this just improving what we already do, or is it actually changing what we do?” Sitting there, I was appalled — not at the question, but at the answer I myself had — which was “no, not really.”

A couple of years ago, I had the good fortune of attending a workshop entitled “Digital Manuscript Uses and Interoperation” that was sponsored by Stanford University Libraries & Academic Information Resources (SULAIR). Both Martin and Dan, whom I count among my friends and among those doing some of the most important work in our field, were present, and I witnessed something of a reprise of this conversation. Martin argued (as he does in his post) that too many digital humanists are in thrall to the idea of doing things “better, faster, stronger (a.k.a. the Bionic Man hermeneutic),” i.e., that they try to use computing technologies to amplify what they do rather than to reimagine it.

For the most part, I am in agreement with Martin and Dan; the questions we think to pose and the professional tools we have devised to study medieval books — close reading, monographs, journal articles — are shaped by and indeed usually themselves take the form of the codex. And much early digital work has been and remains bookish, sometimes in ways that are counterproductive. But I’d like to offer a caveat here, or perhaps an addendum to the conversation — and one that I offered during the SULAIR meeting. There are times when a sufficient change in scale or speed — better, faster, stronger — is itself revolutionary. And I propose here that we face such a moment in medieval studies. Yes, we can and should ask new questions and invent new forms for disseminating our research. And we have made great strides in building new tools for analyzing manuscripts, with exciting projects such as T-PEN, Shared Canvas, the DM Project, and Virtual Vellum underway. But if we really want to do big data, and if we really want to see the full potential of these tools, we need the other 99% of manuscripts digitized.

Together, what we’ve done with the (not yet) one percent of digitized medieval books as a community offers cause both for celebration and for caution. On the one hand, some of the resources created thus far are already mind-blowing to those of used to doing things the old way (i.e., working with manuscripts only in person or on microfilm and encountering most medieval texts in modern printed books). Imagine what a medievalist of a few decades ago would have thought about the possibility of viewing nearly all of the medieval manuscripts in Switzerland for free, any time of day, on e-codices or of having the entirety of the Parker Library to browse at will. And the scale of what we’re able to do has been greatly amplified already, especially for particular texts or libraries. For example, I recently wrote an article on Faus Semblant, an allegorical figure from the 13th-century Old French poem Roman de la Rose. Thanks to the Roman de la Rose Digital Library, I was able to view 2,925 images in 126 illuminated manuscript copies of the poem, finding along the way 170 that depicted Faus Semblant. I studied manuscripts from the Bibliothèque nationale in Paris, from municipal libraries all over France, from Oxford, Geneva, Baltimore, Philadelphia, New York, Chicago, and Los Angeles. I was able to return to these books again and again and compare any of them side by side, and getting through a far larger number of books much faster changed the research questions that I was able to pose and answer. These projects, like the tools mentioned above, are great success stories.

But in other ways, our approach thus far to digitizing medieval books — and how usage of the data is restricted after the fact — has set us on a path that frustrates the potential for “actually changing what we do” (as Dan put it) on a large scale. In the current landscape, digitization of medieval manuscripts is expensive and proceeds at a fairly slow pace. It is also rather ad hoc. There’s not much sex appeal in digitization in and of itself, or much funding for the same, and thus budgets for digitization are often attached to innovative editorial or archival projects. Digitization often slices and dices, taking out the portions of a codex needed for a given project. On top of this, images are usually accompanied by restrictions on use, are sometimes behind paywalls, and often are attached to a software infrastructure that frustrates repurposing of data (e.g., you might be able to view an image in a proprietary image viewer, but cannot download a high-resolution file for use with other tools). And there are no community-wide solutions that streamline the process of acquiring and working with these images: each scholar who wants to include digitized manuscripts in a project has to enter a lengthy period of fundraising and negotiating rights to the images, and frequently needs to build new tools for accessing the manuscripts online.

I don’t mean to be critical here, or at least not unfairly so. I myself have worked on a number of projects that are subject to precisely the problems that I articulate above. I have digitally sliced and diced. I’ve loaded images into viewing software that restricts who can download and repurpose them and posted warnings that images can’t be reused without permission. In the current climate, we all have to do such things. There are no bad guys here — scholars, librarians, and funding agencies are by and large making honest efforts to do their best with limited resources, and they all have very valid concerns and needs that drive their decisions. There are a number of good reasons why things are as they are: medieval books are fragile and thus we can’t simply digitize them en masse using the technologies that Google Books has utilized in digitizing print books; libraries face shrinking budgets and real concerns about misuse of images of their books online; and even if we could digitize all of our books, what medievalists could do with a million books is much different than what others could do with a million books, as we have no OCR technology that is capable of deciphering medieval hands.

To return to Martin’s idea of the Bionic Man: I’m not challenging his idea so much as borrowing it and tweaking it to make a point. I’d like to suggest that the single greatest need we face as a community is to develop a better, faster, stronger approach to digitizing, and that digitizing much more much faster will actually change what we do in a fundamental way. The most revolutionary thing we could do at this moment looks almost mundane: we need to digitize lots of books. And we need to do so in a way that takes into account everyone’s needs — librarians, scholars, funders, the public — while offering the fewest restrictions on the uses of the resulting data. If you want to revolutionize medieval studies, digitize the British Library. Digitize the Bibliothèque nationale. Digitize the Bodleian Library, the Vatican Library, the Bayerische Staatsbibliothek, and the Staatsbibliothek zu Berlin. Big data is out there for medievalists, but it is still bound in books awaiting digitization.

Timothy Stinson is assistant professor of English at North Carolina State University. He has published articles on the Alliterative Revival, printing history, codicology, manuscript illumination, and the application of genetic analysis to the study of medieval parchment. He is editor of the Siege of Jerusalem Electronic Archive, is co-founder and co-director of the Medieval Electronic Scholarship Alliance (MESA), and co-director of the Piers Plowman Electronic Archive. His research has received funding from the National Endowment for the Humanities, the Andrew W. Mellon Foundation, the Bibliographical Society of America, and the Council on Library and Information Resources. Dr. Stinson’s research on the genetic analysis of medieval parchment has been featured by numerous magazines, news organizations, and radio news programs, including The Chronicle of Higher EducationNational GeographicScientific American, and the BBC’s The World Today.

