How I Learned To Stop Worrying and Love Big Data (Medieval Studies in the Age of Big Data I)

***

Editorial Note

I’m delighted to publish the first in a series of guest posts on the subject “Medieval Studies in the Age of Big Data,” described in an earlier post here. The first contributor, Martin Foys, is a leading figure in the field of digital humanities as well as an award-winning scholar of Anglo-Saxon textual cultures. His full bio follows the post; he can be found on Twitter @martinfoys. Thanks to Martin for kicking off this forum so provocatively!

***

The title of this post derives from Dr. Strangelove, in which the bomb that “Big Data” here replaces ends up destroying the world as we know it, and not in a good way, due in part to the insanity of the titular mad scientist. For current purposes, Dr. Strangelove is aptly a man-machine hybrid – using a wheelchair for mobility, possessing a mechanical hand that often disobeys his own will, and promoting a computer-driven program of eugenics to selectively breed a superior human. Though one can sense a faint whiff of doomsday in some descriptions of large-scale, digitized information and the ways it is used, concerns about big data generally are not Cold War fears of a nuclear holocaust, and human-machine ecologies are not simpatico with mad scientists plagued by rebellious prosthetic arms — this is not what I am after. Rather the opposite: technologies of information and the ecological dynamic we have with it are not alien, but organic, and derived from our own informational needs. Historically, they are of our own making, and continue to be so. In medieval studies and elsewhere, big data will be as good or as bad as we allow.

An anxious sentiment colors many of the discussions on emergent work of scholars in the age of so-called Big Data. There are other sentiments as well–excited, liberating, or revolutionary are all there for the easy finding. But it’s the anxious strain I find myself thinking about more, perhaps because it seems so pervasive. “Even at a smart Economist conference, mention of big data brings the adjective Orwellian,” media critic Jeff Jarvis recently tweeted. Sometimes this anxiety manifests outrageously, as we see with Steven Marche, the Digital Humanities’ favorite straw man:

BIG DATA IS COMING for your books. It’s already come for everything else. All human endeavor has by now generated its own monadic mass of data, and through these vast accumulations of ciphers the robots now endlessly scour for significance much the way cockroaches scour for nutrition in the enormous bat dung piles hiding in Bornean caves.

But that’s an easy one, as Holger Syme deftly demonstrates. Marche’s reaction is one of a rhetorical extremist at the far end of the pendulum swing, even if it does tug at the taproot of what makes many vaguely uncomfortable when they hear the term. More revealing, I think, is the tone of Bethany Nowviskie’s outstanding MLA talk on the new nature of materialism in Digital Humanities. Describing the first of three factors which have set the current form of Digital Humanities, Nowviskie notes,

The first . . . starts with the massive, rapid, and inexorable conversion of our material cultural inheritance to digital forms. Hand-crafted, boutique digitization by humanities scholars and archivists (in the intrepid, research-oriented, hypothesis-testing mode of the ‘90s) was jarred and overwhelmed by the mid-2000s advent of mass digitization, in the form of Google Books. Least-common-denominator commercial digitization has had grave implications not only for our ability to insert humanities voices and perspectives in the process, but also for our collective capacity and will to think clearly about, to steward, and to engage with physical archives in its wake.

The concerns raised throughout Nowviskie’s thoughtful, inspiring and beautiful meditation on words, objects, digitality and resistance to the material are not to be laid aside anytime soon – especially her final assessment of the casual treatment of digital humanities labor in the academic sector (and, I would add, elsewhere as well).

Still, it is worth pausing to look at the lexicographic lines drawn in the passage above. You have a juggernaut of large-scale and automated techne – massive, rapid and inexorable, which jars and overwhelms the intrepid and boutique hand-crafted scholarly work which comes before it. The situation is grave, for those who wish to insert the voices and perspectives of humanity.

But wait. Here is where I think the opportunity for response really begins. We’ve been having this kind of conversation about modern technology for quite a while. Heidegger worried that modern machines so fundamentally re-ordered the world that they also compromised humanity’s “authentic sense of being.” To go back further (but closer in terms of homology), in her recent Divine Art, Infernal Machine, Elizabeth Eisenstein unpacks the history of reception of the printing press, noting that negative and positive attitudes existed from the beginning — though Eisenstein, in a move similar to Nowviskie’s, is at pains to distinguish the objections to the capitalistic commodification of mass printing by “iron presses” from the generally favorable view of earlier, smaller-scale wooden handpresses.

There are all kind of directions to go from here, but the one I’ll take is this: concerns about big data are nothing new — we’ve had them for a long time, because we’ve had big data for a long time. Writing produces bigger data by virtue of the fact that it physically takes up space, and, through persistent substrates, time as well. Humanists have always swum against an incessant and inexorably accumulative informational tide, trying to come up with intellectual and material methods for mining and managing the data. In the second century, Origen of Alexandria produced the Hexapla, to codify variants of Old Testament Scripture in six languages; Jerome pored through the Hexapla as well as other Hebrew, Greek and Latin versions of Scriptures to produce the Vulgate; more than a thousand years later various printed Polyglot Bibles reverse-engineered this work, presenting Latin, Greek, and Hebrew versions in three columns. Aquinas aspired to big data; his attempt to parse the totality of existence resulted in the not tiny Summa Theologica. In one of the events signaling the birth of medieval studies, Matthew Parker faced the problem that his big data was shrinking and did everything he could to stop it. With print, the need to aggregate and tame unruly data only intensified as more and more data were more easily and quickly produced. In the field of medieval studies we see the eighteenth-century explosion of concordances, indices and thesauri to provide guides to surviving texts, artifacts and language, then the proliferation of nineteenth-century editions, and then the profusion of twentieth-century critical interpretation. The twentieth-first century digital repositories and algorithmic resources to explore them are simply the next station stop. We cannot cease producing data. We made the big data to begin with.

Of course it can be (and has already been, and may well hereafter be again) argued that this selective and convenient genealogy is too simple. Any number of arguments can be posed: big data today is different because it operates on a scale several orders of magnitude greater than these precedents, it fragments and disembodies prior cultural forms, it leads to shallow(er) knowledge, its functions and methods are fundamentally alien to past forms of human expression, and, in a related mode, it challenges the presence of the human within the work humanists do.  These apprehensions are all worth a chary stance as we inevitably stride further into the age of (more) big data. But I don’t know that digital big data is really so different in ways that ultimately matter in terms of human presence. When instead of the mouth, the body used a hand that needed the pen to form the word, did we have something less “human” than before? When one person could use machinery to mass-produce millions of words in the space of a week, and then in the space of an hour, could we say the same? Was it in 1716, when Leibnitz died, as he was one of the many apocryphal claimants to the last individual to truly know it all?

Certainly we can chart an increasingly cybernetic presence of the machine in the work we humans do – and that certainly mounts a challenge to past formulations of the individual artist or intellectual. But I guess my perplexity has to do most with why some are now tempted to draw the line at digital big data as the place to get anxious about informational media. To me, the difference today is not so much one of ontology but of scale and mechanical intercession, but that’s been going on since communication began (hint: pp. 7-8). Yet the human isn’t slowly vanishing from the equation. Chad Wellmon, who takes a welcome historical long-view in his response to another popular jeremiad (on whether Google is making us stupid), puts it better than I can:

Google’s search algorithms do not operate in absolute mechanical purity, free of outside interference. Only if we understand the Web and our search and filter technologies as elements in a digital ecology can we make sense of the emergent properties of the complex interactions of humans and technology . . . The Web is not just a technology but an ecology of human-technology interaction. It is a dynamic culture with its own norms and practices.

We have to be careful about falling into the easy trap of technological determinism: the default view that Google Books and its ilk came along and now everything is different. It’s not so simple, of course. New technological modes of information processing arise as a response to informational needs already present, and usually redlining. As with all the other information “revolutions,” digital big data came about as a method of managing the big data we already had, which had in many ways reached its limit. As with the printing press, new modes of managing older big data led to new modes of producing even more big data — with all the attendant consequences, positive and negative.

Critics are quick to point out, rightly, the drawbacks of the methods for working with the digital big data we already have — e.g. much of it is “poorly labeled and promiscuously copied,” as Siva Vaidyanathan notes in the book referenced by Bruce in the introduction to this forum. Just so. But this is changing. We are still in the age of the digital incunabulum–the time when the new media form has not yet figured out how best to realize its own inherent logic. In linguistic terms, big data has yet to come anywhere near “communicational competence.” This is going to take a while, and it’s an uncomfortably liminal existence in the meantime. But what is clear is how quickly possible routes of the new logic are emerging. Consider the Visible Archive under development by the National Archives of Australia, a resource working to radically change the ways one might explore more than 65,000 series of documents (seriously – take five minutes to take a look at this screencast demo, and then imagine it as the engine for, oh, scribal hands across manuscript collections). The visual taxonomies under development are a giant step away from poor labeling, and hint at how refined our big data promise to become.

The Visible Archive stands as an instant classic example of what in Digital Humanities Burdick, Drucker, et. al term enhanced critical curation:  the need for digital resources to match the humanist ideal that

Informed critical judgments regarding the relationship between originals and copies, the greater or lesser authority of a given object or set of objects, and the work’s meaning all become far more significant than the mere fact of accumulation (32).

It is true that accumulation is almost all that has happened so far, and clearly that is not enough. Will Straw long ago said as much in “Embedded Memories,” an essay I wish more people would read. And while digital accumulation (and, of course, fair intellectual access to such accumulation) must continue, the critical next step is exactly that: the critical way humanists curate digital materials, and generations of people study and augment them. This is the charge set before us as medievalists: all medievalists, as, to varying degrees, we all are now digital medievalists.

Certainly this charge is not an exclusive one. Witness the charges of new materialism, object oriented ontologies and speculative realisms, to name just a few of the exciting critical modes recharging our field.  And witness, again, Nowviskie’s profound recognition of the re-convergence of the virtual and the material within Digital Humanities.  And certainly there is no less space for the work we formerly have done, and should continue to do.

I was lucky enough last year to hear Michael Witmore’s keynote address at a conference sponsored by NYU’s Medieval and Renaissance Center on the subject “What is Access?” Witmore opened his talk with a shot of the stacks a the Folger Library, which on some levels run for an entire city block. He then walked us through a dizzying and at times opaque array of three-dimensional data modeling of Early Modern drama that was probably intended to confirm the best and worst that big data had to offer. One of the conclusions Witmore offered was a second look at that city block of books again, with the proviso that secondary criticism, “books about books,” represents the only way to make sense of all that big textual data, and to ignore that vast trove of already-metadata was sheer folly.

Precisely. Almost. Only with the understanding that those stacks and stacks of books were, and are, also big data, even as they remain the way to focus big data into smaller data that mean something. Like digital databases, those books still stand as both gateway and hurdle to our desire to talk about the medieval past.  We need to keep reading them, to do what we have always done, but we also need better digital tools to explore them, and we need other, radically new digital methods and resources. As Nowviskie (one of a vitally growing chorus) frames it, “we’ve come to a moment of unprecedented potential for the material, embodied, and experiential digital humanities.” What we need, as Nowviskie contends (as her slide above suggests), is better control. Control, I would like to suggest, that needs to be designed as handcrafted approaches to operating massive massive machines.

Our job today as medievalists is certainly not to fecklessly embrace big data as the most important thing out there. But our job equally is not to bemoan, malign or resist big data. Big data’s growing presence is impossible to ignore. It’s not going anywhere. Our job is to improve big data in all of its many forms, to challenge the big data with better ways into and out of it, and to use, not abuse, big data for what we want to achieve in our own work with the medieval past–whatever that work may be.

If you liked this post you can Like this blog on Facebook!

***

Martin K. Foys is Associate Professor of English at Drew University, in Madison, NJ, and the current Executive Director of the International Society of Anglo-Saxonists (ISAS). Major publications include the Bayeux Tapestry Digital Edition (2003),  Virtually Anglo-Saxon: Old Media, New Media, and Early Medieval Studies in the Late Age of Print (2007), and Bayeux Tapestry: New Interpretations (2009). Martin also co-directs the DM project (http://ada.drew.edu/dmproject/), a digital resource for the open annotation of medieval images and texts that is funded with multi-year grants from the National Endowment for the Humanities and the Andrew W. Mellon Foundation. Recent work includes an essay on “Media” for the Wiley-Blackwell Handbook of Anglo-Saxon Studies (2012), and (as co-editor) a volume of articles on “Becoming Media” for postmedieval, for which submissions were also vetted through an experimental on-line crowd review (2012).

6 comments on “How I Learned To Stop Worrying and Love Big Data (Medieval Studies in the Age of Big Data I)

  1. EileEileen A. Joy on said:

    This is a marvelous post, to be expected from Martin, but still: really great. Nothing too coherent to say, except, I agree with much of what Martin says here, although there ARE a lot of people within DH that are worrying a little bit about the “security” risks with big data, which is why I’m glad Martin concluded with that.

  2. Martin Foys on said:

    As a coda to all this, I just saw this (just) published essay in Wired, which comes to some similar conclusions about the human presence within big data, but from a much different angle (best bits at the end!): http://www.wired.co.uk/news/archive/2013-01/25/big-data-end-of-theory

    ~ M

    • Bruce Holsinger on said:

      Thanks Martin. I’m intrigued by the mention of Gelila Tilahun’s group at Toronto:

      http://www.technologyreview.com/view/509876/the-algorithms-that-automatically-date-medieval-manuscripts/

      It could be that we’ll need a broader definition of ‘theory’ as quantitative analysis comes to subsume more and more of what we do…

  3. Theresa Earenfight on said:

    In an essay in the Jan 14 New Yorker by John McPhee (“Structure”), he related a conversation in the 1980′s with a colleague in computer science who was helping McPhee with a text editing program. Before he even thought about designing a program, he asked McPhee to tell him how he works. What a cogent question! Foy’s post makes it clear that as we move, quickly, into wrangling Big Data, we have to remain the drivers of the system. The leap from roll to codex was a profound cognitive shift, as was that from memory to printed record, and only we ourselves can calm our anxieties over who controls data, big or small.

  4. Jenna Mead on said:

    Many thanks for this series and its lucid opening: this post does a great job of framing some central questions by recalling historical and contextual moments. More than just identifying moments, this post describes a series of trajectories that enable us to relax — we’re already under way– and settle into the big questions we already asking and thinking about. Love the format, Prof. Foys, and the coda: these are good moves in digital incunabulism as your post helps us ‘figure out how best to realize its own inherent logic.’ One other of the (many) important threads here is your recoding of the word, ontology and discourse around ‘humanist.’ Another process that’s already underway and will have useful resonances in all sorts of places and contexts.

  5. マークジェイコブス 値段 on said:

    コキュ バッグ マークジェイコブス 値段 http://chuanqishr.cartiershueiowsdgeventyjp.org/

Leave a Reply

Your email address will not be published.

HTML tags are not allowed.