I came across a thoughtful article the other day in the Los Angels Review of Books called Literature is not Data: Against the Digital Humanities by Stephen Marche. Unlike those who are trying to program computers to write books, the digital humanities is something else entirely. It is a new and evolving field that is a sort of catch-all for a bunch of different humanities subjects that have gone digital.
Wikipedia has a pretty good digital humanities entry with lots of links to explore for those who are interested. But briefly, digital humanities includes things like digital libraries and archives such as the awesome Walt Whitman Archive and the Perseus Digital Library. It also includes amazing multi-media projects like The Valley of the Shadow that closely examines two communities during the American Civil War. In addition, some digital humanities researchers use computational methods to analyze large data sets (aka: texts that have been digitized). It is this latter approach that Marche is most concerned about in his article.
Marche pretty much blames Google for making the digital humanities possible. It all began, he said, in 2002 with Google figuring out the fastest, most efficient way to scan print books. It isn’t all Google’s fault though, Marche blames literary institutions too for being so deeply conservative that they allowed Google to take control. He accuses, “For at least 50 years, humanities departments have been in the business of creating problems rather than solving them.” Ouch.
Marche does acknowledge that Google was not the first to start digitizing texts. Early English Books Online, has been available for a decade. Far from being a good thing though, Marche sees it as a decline:
That wonderful database in its own way demonstrates how digitization leads to the decline of the sacred. Before EEBO arrived, every English scholar of the Renaissance had to spend time at the Bodleian library in Oxford; that’s where one found one’s material. But actually finding the material was only a part of the process of attending the Bodleian, where connections were made at the mother university in the land of the mother tongue. Professors were relics; they had snuffboxes and passed them to the right after dinner, because port is passed left. EEBO ended all that, because the merely practical reason for attending the Bodleian was no longer justifiable when the texts were all available online.
Calling it the “decline of the sacred” seems hyperbole to me. What exactly was sacred? The books or passing the port and the snuffbox? I am sure the Bodleian still has plenty of visiting scholars. What their texts being available online means is that those who could not previously afford to visit in person can now examine texts online. Depending on your purpose, online viewing might be perfectly sufficient.
What Marche is really upset about though is the data mining aspect of some digital humanities research:
Data mining is potentially transformative, more for its shift in attitude than for any actual insight it has generated. Some of its lexigraphical generalizations have been remarkably astute as philology, establishing scalable n-grams of word sequences over time. The problem comes when these generalizations are applied to literary questions proper.
But really, applying generalizations to literary questions is not done just by digital humanities researchers. Over generalizing is plain bad scholarship no matter what methodology is used.
Still, that’s not the heart of the problem. Marche asserts,
But there is a deeper problem with the digital humanities in general, a fundamental assumption that runs through all aspects of the methodology and which has not been adequately assessed in its nascent theory. Literature cannot meaningfully be treated as data. The problem is essential rather than superficial: literature is not data. Literature is the opposite of data.
This is true, literature is not data. But, some aspects of literature can be treated as data like creating those n-grams he mentions in an earlier quote.
Now, I am not a digital humanities scholar and I just flirt with the field around the edges, so I am no expert. I do know, however, that the field is not just about literature. It is called digital humanities which includes history and art and music and dance and drama among other things. Literature is just a small part of the field. And when I peruse digital humanities sites and journals that focus on literature, I have yet to see very many attempts at using data to interpret a text as a person writing criticism or theory might. Because that is what Marche is most worried about, computers theorizing about meaning.
His argument that computers can never do this is based on the literary record being incomplete and messy. This fact doesn’t seem to be a problem for scholars so I am not sure how his saying that there are nine different versions of Shakespeare’s Richard III goes against using computers. If anything, I would think that this is a perfect example of how computers might help scholars study all nine versions. Computers are much better at comparing and contrasting changes in a text across a set of texts, better at tracking word and phrase usage. Computers can do this much faster and with fewer errors than people can. But the computer doesn’t decide what the results mean, people decide what the resulting data reveals.
Yes, one of the dangers of data mining (for lack of a better expression), is, as Marche worries, a loss of context. I am sure as the field expands there will be problems with context and a host of other things that haven’t even arisen yet. But that doesn’t mean that what is being done is completely useless. It only means that scholars must be careful and watch for errors creeping into their research. This, to me, seems like something all good researchers are concerned with and is not exclusive to the field of digital humanities.
And now after spending all this time going through the article, I have no idea what the author was trying to do other than worry about what the digital humanities may do to literature in a worst case scenario. The worst case scenario — computers doing literary analysis — is not likely to happen. This is not to say that someone won’t try it, but it seems rather like the computer writing poetry from my last post. It might have something going for it but it will never have the understanding and nuance of human intelligence.
As for being able to access books, especially old books and manuscripts, online at anytime from anywhere, bring it on! I am not a scholar but maybe I love Walt Whitman so much that I want to spend some time comparing his written manuscript of a poem with all of its other iterations. Thanks to the Walt Whitman Archive I can do that without having to travel all over the country to different libraries and probably not being allowed to see some of the stuff anyway since I am not affiliated with a university. I don’t need snuffboxes and port and I bet in this time of tight budgets a good many academic researchers don’t either.
In case you haven’t figured it out, I find the digital humanities a fascinating field and I look forward to seeing how it develops. I expect, like any other field or method of research, there will be both good and bad things about it. Yes, we should talk about the bad things so we are aware of potential pitfalls. But focusing only on the possible negatives and blowing them up into big monsters does nobody any good. Mistakes will be made but so will discoveries.
nicely done – thank you!
N Filbert, thanks!
Eep. I just can’t get over “digitization leads to the decline of the sacred”. While I’m sure visiting the Bodleian is a wonderful experience, like you so well pointed out the model Marche proposes limits research to the privileged few with the means to do so. The cynical in me wonders if this is part of the point. Thank you for writing such a measured and eloquent post.
Ana, I know, that left me just scratching my head. If you read the article he even goes on to compare the creation of the codex to Christianity and the digitization of books, the “unbinding” of them to a return to paganism! I had to read that paragraph a couple times because I was so astonished by it. My mind raced down the same cynical path as yours did. And thank you for your kind words!
I agree! And data-mining has been possible as long as indices have existed. Good scholarship is still good scholarship.
This sort of lament seems to be an emotional necessity for many of us but I agree that the democratic possibility of the access of ordinary readers to resources such as the Whitman Archive is just brilliant! Surely sites like that are just what makes being online such a great deal.
Ian, interesting comment about this kind of lament being an emotional necessity. It’s kind of like mourning loudly in public over the loss of the things the author values most. It ends up saying more about the author than it does about what is being mourned, doesn’t it? I hadn’t thought of it quite like that before. Scholars benefit from digitization of texts, but as you say, the democratic possibility for the ordinary reader is the best thing. We suddenly get access to manuscripts and texts we could never have seen before.
Resources such as these electronic ones are surely an extension of things like interlibrary loan which have always been essential to students.
Ophelia, oh good point! Indices and concordances are definitely data mining and those have existed for hundreds of years.
My library has EEBO, which I think is really pretty cool. I’m not a scholar so maybe I can’t see some of the cons–not looking at it from that angle, but for people who would never have access to these sorts of materials I can only see it as a good thing.
Danielle, my library has access to EEBO too and even though we are a law library it gets used. People are interested in old books for more than just their literary merit and having them available online is incredibly useful.
Our students would be lost without EEBO. Going off to the Bodleian is all well and good but it costs money and in case Marche hasn’t realised that is something your average student is fairly short on. Digital resources have made all sorts of texts accessible that students couldn’t otherwise hope to have access to. It also means that rare books that couldn’t be handled by many people can be made available to all. I’m afraid I lose patience with this sort of argument.
Alex, good point about rare books handling. Having them available online allows more people to see them and probably cuts down on how many people need to actually handle the book itself. A win all around as far as I am concerned.
I studied an interdisciplinary MA in medieval and early modern literature, history, art etc. Without EEBO it would have been practically impossible – it’s a marvellous resource. Even though in world terms we were practically down the road from Oxford (in Kent), there’s just not enough time and money to keep going up there, and we had lots of texts to read. Furthermore, fragile texts are protected from too much handling. The Bodleian itself will not crumble as scholars will still take the chance to study there if they can, and communities can be built in other ways.
Ooh I see Alex has just written most of that!
Anyway, I thoroughly agreed with your excellent points. It’s not the digitisation, it’s what we do with it!
Helen, with such a fascinating area of study you chose I am sure the EEBO helped make it all possible. And you are right, if scholars have the chance to actually go to the Bodleian I am sure they will, I know I would! And as you say, it is not the digitization, it is what we do with it that matters most.
I’m equally in favour of rare texts being widely available at last through the magic of the internet. It takes time and effort and money to travel to where they are, all of which are in short supply in a student’s life. I think the poor old humanities have been under such cultural attack of late, that this must look to some like a really worrying nail that might eventually close the coffin. Science and scientific methodologies are the ones in favour when it comes to grant allocation. But I really like your approach of pointing out how broad and diverse the field of digital humanities is, and how little it actually has to do with textual interpretation. It’s even more time and effort lost for literary folk to be fighting the wrong batlle!
Litlove, the internet and digitized books are little miracles really. I look back to when I was in college and think how much more I would have had access too, how much more I could have learned if I had access to the things that are available now. And thanks, the field of digital humanities is really large and still evolving and best practices and methodologies have not been settled on. It’s kind of exciting really.
Not to mention the world it opens up for those who are blind or sight impaired!
Ophelia, oh yes! You are ever so right about that!
I’m with you Stefanie … I love what digitisation and digital techniques have brought to our world. I love the fact that a whole range of classic novels, poetry, etc are now available. I love things like The Wasteland app and how in one place I could read the poem, hear the poem spoken by several readers, hear some commentaries, see the manuscript page by page, have access to the annotations which are linked further to explanations, and so on. So, exciting. Like all tools there are potentials for misuse but that’s been the same since the world began (to use a cliche). It’s good to be critical (in the analytical sense) but overall naysaying makes little sense to me.
whisperinggums, yes, yes to everything you said. I can’t think of anything else to add to it!