Serving two masters is a tricky business, and this paper attempts to do just that. It is a companion to the Web site Open Source Shakespeare (www.opensourceshakespeare.org), my M.A. thesis project, but this paper is not exclusively intended for scholars. Two groups of people might benefit from this discussion: 1) literary scholars who have an interest in electronic texts, and who seek a general understanding of how developers build tools to serve those texts; and 2) online software developers searching for ideas about how to build tools that serve literary scholars.
Since the literati would be bored by a highly technical discussion of coding techniques, and the technorati would roll their collective eyes at arcane discussions of early seventeenth-century printing techniques, I have omitted anything that smacks of jargon. More than that, I hope that some casual readers might want to know how you take a 400-year-old collection of texts and put them into a medium that did not exist before 1990.
Before getting to the meat of the paper, I would like to explain the site’s name. “Open source” has two meanings: in the intelligence community, it means information that is published by normal distribution methods — say, a newspaper written in Urdu, or a television broadcast in Malaysia. In the computing world, it means a product whose source code is released freely, so other programmers can take portions of it for themselves, or else revise and extend the original product. (Most software packages are distributed as “binaries,” which are machine-readable distillations of the original program’s source code. For all intents and purposes, binaries cannot be modified in any significant way, nor read by humans.) Prominent examples of open source software include the Linux operating system, the Firefox browser, and the Apache Web server, which runs about two-thirds of all public Web sites.
Open Source Shakespeare is open in both senses. The general public can use the site without paying money, or even registering for the site at all. Further, anyone is free to download and use any part of Open Source Shakespeare. The sole restriction is that it cannot be used in a commercial site. But as long as you are not selling anything made from it, you are welcome to help yourself to any or all of OSS, including any portion of this paper.
Like many offspring, Open Source Shakespeare is the fruit of love and boredom. For a couple of years, I reviewed plays for The Washington Times and saw many of Washington’s first-rate productions, including those of the Folger Theatre and the Shakespeare Theatre. Though it was not my full-time job, it was an interesting diversion from my normal duties in managing the paper’s Web operations.
Because I wanted to be a conscientious reviewer, I read the play before seeing it, even if I had read it before. Being an Internet-enabled kind of guy, I favored using electronic texts to look up passages for the reviews, though I preferred extended reading from a copy of G.B. Harrison’s Shakespeare: The Complete Works.
In 2001, I began to build a Shakespeare repository site, just for fun. I created a rudimentary parser that fed “As You Like It” into a database. However, the responsibilities of my day job precluded turning the idea into a full-fledged Web site. Also, my wife and children deserved more attention than an interesting computer project, so the “Shakespeare database project,” as I called it, lay fallow.
In the summer of 2003, I found myself in Kuwait, with not a lot to do. During the invasion of Iraq, I had been attached to an infantry battalion with a team of fellow Marine reservists, clearing civilians away from battle areas so they would not get hurt or killed. After the country’s regime fell, we helped get an Iraqi province’s infrastructure up and running. Then we were redeployed back to Kuwait, awaiting “contingencies.” What are “contingencies”? No one ever figured that out. Mainly, my comrades and I sat in a desert camp, wondering when we would be sent home. After a few weeks of sitting around watching DVDs, playing video games, and looking at my watch, I decided to do something productive. The “Shakespeare database project” was reborn.
The first question I asked was, “Has anyone else done this before?” After looking on the Web, I concluded that, surprisingly, there were very few comprehensive Shakespeare Web sites out there. The ones that were comprehensive were not free, and the free ones were not comprehensive. The only one that was both free and comprehensive was “The Works of the Bard” (TWOTB), a venerable site with an arcane yet powerful search mechanism. I did find a German site coincidently called the “Shakespeare database project,” which was incredibly ambitious but looked abandoned, as it had not been updated in several years, and as of this writing has been dormant for a half-decade (Neuhaus).
TWOTB excludes stage directions and character descriptions from its searches, which is a small but significant omission. Its search mechanism can use word proximity and Boolean logical operators (AND, OR, NOT), and the queries can be limited to single plays, characters, acts, or scenes. Search terms can be nested and grouped, allowing for a practically infinite number of ways to search. The downside is that users have to learn the esoteric format, and they have to write out the query as a stream of text, e.g. +spot or (silver and 2+gold). This seemed like too much to ask of a casual user (Farrow),
I determined that my site had to be at least as powerful as TWOTB, but with a friendlier interface. Patrick Finn describes the ideal approach to Shakespeare editions as hospitality: “A hospitable edition is one that creates a space where a number of readers can come and feel welcome” (Finn). To accomplish that, I wanted to make it useful to four groups of people:
With the help of a very slow Internet connection — one that made a dial-up connection look speedy — I downloaded Shakespeare’s plays and the necessary software. With these things installed on my personal laptop, which I had painstakingly protected from the relentless sand and grit, I started the first version of Open Source Shakespeare.
Sitting at one of the tables in the middle of the long tent, I was frequently interrupted by curious Marines. As the Marine Corps is a haven for eccentrics, they did not think it odd to see someone creating a literary Web site in a desolate camp in one of the most God-forsaken places on Earth. The site progressed to the point where it had all the essentials: the parser read the texts into the database, which was used by the Web site to display the texts, search for keywords, and display all of a character’s lines. Open Source Shakespeare’s foundation had been laid.
The rest of the development history was far more prosaic. I returned home in July 2003, and worked on OSS in bursts, as my time allowed. For stretches of two or three weeks, I worked on the site for a few hours almost every night, and then I would leave it alone for a while. I did most of the donkey work as I rode the subway back and forth to work. Marking up the texts in the right format, and developing the program that processed them, was interesting for a while but then became borderline tedious. The development of the display pages for each literary form (play, sonnet, poem) had to be done at home, so once the texts were finished, I stopped bringing my laptop on the train, which my seatmates probably appreciated.
During the last half of 2004, I worked to flesh out the site so I could fulfill all of the objectives described in the abstract. I had been releasing small, incremental changes, but this time I opted for one big release at the end of the year, thinking that when I was done, I could release the new version and announce it to the world. From a developmental standpoint, this was an acceptable strategy, but the drawback was that several text errors reported by OSS users were left uncorrected during that time. My inner editor recoiled against this, but I needed to make changes all at once because they involved structural changes to the database. Performing those kinds of changes to an existing site is like working on a home’s foundation: you do not do it lightly, and you must work carefully lest you cause more problems than you solve. If the name of one field name of one database table is changed, it could cause a dozen pages to fail ignominiously.
At this writing, I do not know of any errors in the code. If this were a commercial product, the development manager would have at least one staff member designated as the official tester. Large software companies employ fully-staffed test labs that do nothing other than try every function and attempt to generate errors. (That is why many programmers hate the test lab guys.)
Needless to say, Open Source Shakespeare lacks a test lab, as the budget — $110 a year for Web hosting — does not allow it. When there are coding errors in the live site, typically users will identify the problems via e-mail, if I do not see them first. Even more helpfully, they almost always verify that the problems are fixed once I have implemented the changes. Here is an example of a message reported by a user, whose name is removed because he was sending private correspondence:
I LOVE LOVE LOVE your absolutely AMAZING site. I recommend it to all my students and everyone I see.
In working with it this morning, preparing something for a class, I noticed what might be an error.
In the text of 3 Henry VI, Act 1, Scene 4, Richard is called “Duke of
Gloucester” throughout. But this character is not Richard Duke of Gloucester — it’s his father, Richard Duke of York. Gloucester lives on to the next play to become Richard III. The first stage direction says, “Enter York” (Anonymous).
Open Source Shakespeare uses the “Moby Shakespeare” collection as its source text. An Internet search reveals thousands of references to Moby. The collection is an electronic reproduction of another set of texts which the Electronic Text Center at the University of Virginia identifies the source as the Globe Shakespeare, a mid-nineteenth-century popular edition of the Cambridge Shakespeare:
Note: We have been unable to verify conclusively the exact source of this electronic text, but we believe it to be “The Globe Edition” of the Works of William Shakespeare edited by William George Clark and William Aldis Wright. Error checking was done against the 1866 edition noted in the “Source Description” field. These texts are public domain. (Electronic)
I performed a side-by-side comparison of four different plays’ opening scenes (“King Lear,” “Macbeth,” “Romeo and Juliet,” and “Taming of the Shrew.”) There were no substantial differences between the Electronic Text Center’s text and Moby Shakespeare.
Also, I compared the 1887 edition of the Globe Shakespeare, which has this note on the frontispiece: “Text of the [Old] Cambridge Shakespeare slightly modified, without the notes and critical apparatus, with a glossary by J.M. Jephson.” I selected scenes at random, and compared this edition with Moby Shakespeare. The Globe uses italics, and the plaintext Moby cannot, but that and all other noticeable differences were slight. Even the placement of brackets within the stage directions were identical. In sum, I had no serious reason to doubt that Moby Shakespeare is the Globe Shakespeare.