Reflection: SCETI and the Labor of Digitization

08/26/17 on Blog

While working for END this summer my conception of digital texts has been continuously redefined through readings, attending lectures, and developing my personal project. Prior to END I viewed digital editions of texts with a certain amount of contempt and skepticism, as “less real” versions of works useful for parsing with the “find” function on browsers and little else. Although I still can’t imagine sitting down to read a novel on Project Gutenberg, my experiences at END have taught me to view the digital text as something just as valuable and just as rooted in the material as the human text.

One of the most significant shifts in how I view digital media and documents resulted from taking a tour of the University of Pennsylvania’s Schoenberg Center for Electronic Text and Image (SCETI) with my fellow END project members. At SCETI, workers scan and edit images of physical texts to eventually compile and upload as searchable online documents for the Van Pelt Library. What struck me about SCETI was the scale of the operation—each of the dozen or so workers occupied a separate station where they either photographed documents, manipulated images in Photoshop, or performed Optical Character Recognition and text cleanup on completed scans of text. On the tour we learned that it sometimes takes months for a SCETI worker to properly scan documents if they are lengthy, or are so fragile they require manipulation with special machinery to avoid destruction in the scanning process. Sometimes, the loss of a document is unavoidable; for example, the head of SCETI mentioned that they were gearing up to scan a set of newspapers from the 19th century that would likely disintegrate after being handled.

Visiting SCETI showed me that behind every non-digital text that is scanned and uploaded are a host of people and hours of labor. It is easy to think of scanned texts as transient or trifling because they have no presence beyond the screen of an electronic device, but the labor required to bring these texts to computers intimately ties the physical with the digital. Ironically, the production of the weightless, ethereal digital scan is comparable to the amount of work required to bring a book into production on a printing press.

Unfortunately, the labor required to digitize texts often goes unrecognized and may even be deliberately obscured. For instance, when Google launched its ambitious Google Books platform in 2004 to “democratize knowledge,” it needed a legion of data entry workers to actually scan and upload millions of books. According to artist Andrew Norman Wilson who was working on the Google campus at the time, most of these “ScanOps” employees are people of color who are sequestered away in a separate building at Google. While Google fired Wilson after he started documenting these divisions, evidence of their labor still haunts the pages of Google Books in the form of scanning errors. When browsing through Google Books readers may eventually stumble upon ripped pages, distortions, or most strikingly, the hands of a ScanOps employee obscuring part of a page. Errors like these serve as a reminder that behind every database of textual material that was not originally created in a digital format are people who work tirelessly and anonymously to bring those materials to the web.

Working on my personal project of remediating an 18th century novel into digital editions has moved issues of the labor of digitization from something I think about abstractly to something that I actually experience. Performing Optical Character Recognition on my chosen text and then editing the text file by hand was a tedious and lengthy process—and I was lucky because I didn’t have to create my own scans of the text! If and when my digital editions are completed, I will be in a quandary about how to provide credit for the scans I used provided by the Van Pelt Library. Which worker at SCETI scanned and uploaded my text? How long did it take them to complete the scan? None of this information is online, so at least one person who provided me with a crucial piece of my project will go uncredited. Part of this dilemma is simply the nature of library and digitization work, which is often anonymous. Yet after learning about the labor behind digitization and conducting part of it myself I am saddened that the people behind the resources I have used and will use during my time as a student will be largely unknown to me. When I catalogue for END I record the names of publishers, booksellers, and even individual printers in the title pages of books created hundreds of years ago. The teams who bring these texts to life online today, however, are an enigma.

Reflection: SCETI and the Labor of Digitization

About the Author

Maya Deutsch

Student Researcher

The Early Novels Database