Hello! I’m Ashley Blewer, and I’ve recently joined the ACRL TechConnect blogging team. For my first post, I wanted to interview Lorena Ramírez-López. Lorena is working (among other places) at the D.C. Public Library on their Memory Lab initiative, which we will discuss below. Although this upcoming project targets public libraries, Lorena has a history of dedication to providing open technical workflows and documentation to support any library’s mission to set up similar “digitization stations.”
Hi Lorena! Can you please introduce yourself?
Hi! I’m Lorena Ramírez-López. I am a born and raised New Yorker from Queens. I went to New York University for Cinema Studies and Spanish where I did an honors thesis on Paraguayan cinema in regards to sound theory. I continued my education at NYU and graduated from the Moving Image Archiving and Preservation program where I concentrated on video and digital preservation. I was one of the National Digital Stewardship Residents for the American Archive of Public Broadcasting. I did my residency at Howard University television station (WHUT) in Washington D.C from 2016 until this June 2017. Along with being the project manager for the Memory Lab Network, I do contracting work for the National Portrait Gallery on their time based media artworks, joined the Women who Code community, and teach Spanish at Fluent City!
Tell us a little bit about DCPL’s Memory Lab and your role in it.
The DC Public Library’s Memory Lab was a National Digital Stewardship Project back in 2014 through 2015. This was the baby of DCPL’s National Digital Stewardship Resident, Jaime Mears, back in the day. A lot of my knowledge on how it started comes from reading the original project proposal, which you can find that on the Library of Congress’s website as well as Jaime Mear’s final report on the Memory Lab is found on the DC Library website. But to summarize its origin story, the Memory Lab was created as a local response to the fact that communities are generating a lot of digital content while still keeping many of their physical materials like VHS, miniDVs, and photos but might not necessarily have the equipment or knowledge to preserve their content. It has been widely accepted in the archival and preservation fields that we have an approximate 15- to 20-year window of opportunity to digitally preserve legacy audio and video recordings on magnetic tape because of the rate of degradation and the obsolescence of playback equipment. The term “video at risk” might ring a bell to some people. There’s also photographs and film, particularly color slides and negatives and moving image film formats, that will also fade and degrade over time. People want to save their memories as well as share them on a digital platform.
There are well-established best practices for digital preservation in archival practice, but these guidelines and documentation are generally written for a professional audience. And while there are a various personal digital archiving resources for a public audience, they aren’t really easy to find on the web and a lot of these resources aren’t updated to reflect the changes in our technology, software, and habits.
That being the case, our communities risk massive loss of history and culture! And to quote Gabriela Redwine’s Digital Preservation Coalition report, “personal digital archives are important not just because of their potential value to future scholars, but because they are important to the people who created them.”
So the Memory Lab was the library’s local response in the Washington D.C. area to bridge this gap of digital archiving knowledge and provide the tools and resources for library patrons to personally digitize their own personal content.
My role is maintaining this memory lab (digitization rack). When hardware gets worn down or breaks, I fix it. When software for our computers upgrade to newer systems, I update our workflows.
I am currently re-doing the website to reflect the new wiring I did and updating the instructions with more explanations and images. You can expect gifs!
You recently received funding from IMLS to create a Memory Lab Network. Can you tell us more about that?
Yes! The DC Public Library in partnership with the Public Library Association received a national leadership grant to expand the memory lab model.
During this project, the Memory Lab Network will partner with seven public libraries across the United States. Our partners will receive training, mentoring, and financial support to develop their own memory lab as well as programs for their library patrons and community to digitize and preserve their personal and family collections. A lot of focus is put on the digitization rack, mostly because it’s cool, but the memory lab model is not just creating a digitization rack. It’s also developing classes and online resources for the community to understand that digital preservation doesn’t just end with digitizing analog formats.
By creating these memory labs, these libraries will help bridge the digital preservation divide between the professional archival community and the public community. But first we have to train and help the libraries set up the memory lab, which is why we are providing travel grants to Washington, D.C. for an in-depth digital preservation bootcamp and training for these seven partners.
If anyone wants to read the proposal, the Institute of Museum and Library Sciences has it here.
What are the goals of the Memory Lab Network, and how do you see this making an impact on the overall library field (outside of just the selected libraries)?
One of the main goals is to see how well the memory lab model holds up. The memory lab was a local response to a need but it was meant to be replicated. This funding is our chance to see how we can adapt and improve the memory lab model for other public libraries and not just our own urban library in Washington D.C.
There are actually many institutions and organizations that have digitization stations and/or the knowledge and resources, but we just don’t realize who they are. Sometimes it feels like we keep reinventing the wheel with digital preservation. There are plenty of websites that had contemporary information on digital preservation and links to articles and other explanations at one time. Then those websites weren’t sustained and remained stagnant while housing a series of broken links and lost PDFs. We could (and should) be better of not just creating new resources, but updating the ones we have.
The reasons why some organization aren’t transparent or updating the information, or why we aren’t searching in certain areas, varies, but we should be better at documenting and sharing our information to our archival and public communities. This is why the other goal is to create a network to better communicate and share.
What advice do you have for librarians thinking of setting up their own digitization stations? How can someone learn more about aspects of audiovisual preservation on the job?
If you are thinking of setting up your own digitization station, announce that not only to your local community but also the larger archival community. Tell us about this amazing adventure you’re about to tackle. Let us know if you need help! Circulate and cite that article you thought was super helpful. Try to communicate not only your successes but also your problems and failures.
We need to be better at documenting and sharing what we’re doing, especially when dealing with how to handle and repair playback decks for magnetic media. Besides the fact that the companies just stopped supporting this equipment, a lot of this information on how to support and repair equipment could have been shared or passed down by really knowledge experts, but it wasn’t. Now we’re all holding our breath and pulling our hair out because this one dude who repairs U-matic tapes is thinking about retiring. This lack of information and communication shouldn’t be the case in our environment when we can email and call.
We tend to freak out about audiovisual preservation because we see how other professional institutions set up their workflows and the amount of equipment they have. The great advantage libraries have is that not only can they act practically with their resources but also they have the best type of feedback to learn from: library patrons. We’re creating these memory lab models for the general public so getting practical experience, feedback, and concerns are great ways to learn more on what aspects of audiovisual preservation really need to be fleshed out and researched.
And for fun, try creating and archiving your own audiovisual media! You technically already do with taking photos and videos on your phone. Getting to know your equipment and where your media goes is very helpful.
Thanks very much, Lorena!
For more information on how to set up a “digitization station” at your library, I recommend Dorothea Salo’s robust website detailing how to build an “audio/video or digital data rescue kit”, available here.
Last month we got the long-awaited ruling in favor of Google in the Authors Guild vs. Google Books case, which by now has been analyzed extensively. Ultimately the judge in the case decided that Google’s digitization was transformative and thus constituted fair use. See InfoDocket for detailed coverage of the decision.
The Google Books project was part of the Google mission to index all the information available, and as such could never have taken place without libraries, which hold all those books. While most, if not all, the librarians I know use Google Books in their work, there has always been a sense that the project should not have been started by a commercial enterprise using the intellectual resources of libraries, but should have been started by libraries themselves working together. Yet libraries are often forced to be more conservative about digitization than we might otherwise be due to rules designed to protect the college or university from litigation. This ruling has made it seem as though we could afford to be less cautious. As Eric Hellman points out, the decision seems to imply that with copyright the ends are the important part, not the means. “In Judge Chin’s analysis, copyright is concerned only with the ends, not the means. Copyright seems not to be concerned with what happens inside the black box.” 1 As long as the end use of the books was fair, which was deemed to be the case, the initial digitization was not a problem.
Looking at this from the perspective of repository manager, I want to address a few of the theoretical and logistical issues behind such a conclusion for libraries.
What does this mean for digitization at libraries?
At the beginning of 2013 I took over an ongoing digitization project, and as a first-time manager of a large-scale long-term project, I learned a lot about the processes involved in such a project. The project I work with is extremely small-scale compared with many such projects, but even at this scale the project is expensive and time-consuming. What makes it worth it is that long-buried works of scholarship are finally being used and read, sometimes for reasons we do not quite understand. That gets at the heart of the Google Books decision—digitizing books in library stacks and making them more widely available does contribute to education and useful arts.
There are many issues that we need to address, however. Some of the most important ones are what access can and should be provided to what works, and making mass digitization more available to smaller and international cultural heritage institutions. Google Books could succeed because it had the financial and computing resources of Google matched with the cultural resources of the participating research libraries. This problem is international in scope. I encourage you to read this essay by Amelia Sanz, in which she argues that digitization efforts so far have been inherently unequal and a reflection of colonialism. 2 But is there a practical way of approaching this desire to make books available to a wider audience?
There are several separate issues in providing access. Books that are in the public domain are unquestionably fine to digitize, though differences in international copyright law make it difficult to determine what can be provided to whom. As Amelia Sanz points out, Google can only digitize Spanish works prior to 1870 in Spain, but may digitize the complete work in the United States. The complete work is not available to Spanish researchers, but it is available in full to US researchers.
That aside, there are several reasons why it is useful to digitize works still unquestionably under copyright. One of the major reasons is textual corpus analysis–you need to have every word of many texts available to draw conclusions about use of words and phrases in those texts. Google Books ngram viewer is one such tool that comes out of mass digitization. Searching for phrases in Google and finding that phrase as a snippet in a book is an important way to find information in books that might otherwise be ignored in favor of online sources. Some argue that this means that those books will not be purchased when they might have otherwise been, but it is equally possible that this leads to greater discovery and more purchases, which research into music piracy suggests may be the case.
Another reason to digitize works still under copyright is to highlight the work of marginalized communities, though in that case it is imperative to work with those communities to ensure that the digitization is not exploitative. Many orphan works, for whom a rights-holder cannot be located, fall under this, and I know from some volunteer work that I have done that small cultural heritage institutions are eager to digitize material that represents the cultural and intellectual output of their communities.
In all the above cases, it is crucial to put into place mechanisms for ensuring that works under copyright are not abused. Google Books uses an algorithm that makes it impossible to read an entire book, which is probably beyond the abilities of most institutions. (If anyone has an idea for how to do this, I would love to hear it.) Simpler and more practical solutions to limiting access are to only make a chapter or sample of a book available for public use, which many publishers already allow. For instance, Oxford University Press allows up to 10% of a work (within certain limits) on personal websites or institutional repositories. (That is, of course, assuming you can get permission from the author). Many institutions maintain “dark archives“, which are digitized and (usually) indexed archives of material inaccessible to the public, whether institutional or research information. For instance, the US Department of Energy Office of Scientific and Technical Information maintains a dark archive index of technical reports comprising the equivalent of 6 million pages, which makes it possible to quickly find relevant information.
In any case where an institution makes the decision to digitize and make available the full text of in-copyright materials for reasons they determine are valid, there are a few additional steps that institutions should take. Institutions should research rights-holders or at least make it widely known to potential rights-holders that a project is taking place. The Orphan Works project at the University of Michigan is an example of such a project, though it has been fraught with controversy. Another important step is to have a very good policy for taking down material when a rights-holder asks–it should be clear to the rights-holder whether any copies of the work will be maintained and for what purposes (for instance archival or textual analysis purposes).
Digitizing, Curating, Storing, Oh My!
The above considerations are only useful when it is even possible for institutions without the resources of Google to start a digitization program. There are many examples of DIY digitization by individuals, for instance see Public Collectors, which is a listing of collections held by individuals open for public access–much of it digitized by passionate individuals. Marc Fischer, the curator of Public Collectors, also digitizes important and obscure works and posts them on his site, which he funds himself. Realistically, the entire internet contains examples of digitization of various kinds and various legal statuses. Most of this takes place on cheap and widely available equipment such as flatbed scanners. But it is possible to build an overhead book scanner for large-scale digitization with individual parts and at a reasonable cost. For instance, the DIY Book Scanning project provides instructions and free software for creating a book scanner. As they say on the site, all the process involves is to “[p]oint a camera at a book and take pictures of each page. You might build a special rig to do it. Process those pictures with our free programs. Enjoy reading on the device of your choice.”
“Processing the pictures” is a key problem to solve. Turning images into PDF documents is one thing, but providing high quality optical character recognition is extremely challenging. Free tools such as FreeOCR make it possible to do OCR from image or PDF files, but this takes processing power and results vary widely, particularly if the scan quality is lower. Even expensive tools like Adobe Acrobat or ABBYY FineReader have the same problems. Karen Coyle points out that uncorrected OCR text may be sufficient for searching and corpus analysis, but does not provide a faithful reproduction of the text and thus, for instance, provide access to visually impaired persons 3 This is a problem well known in the digital humanities world, and one solved by projects such as Project Gutenberg with the help of dedicated volunteer distributed proofreaders. Additionally, a great deal of material clearly in the public domain is in manuscript form or has text that modern OCR cannot recognize. In that case, crowdsourcing transcriptions is the only financially viable way for institutions to make text of the material available. 4 Examples of successful projects using volunteer transcriptors or proofreaders include Ancient Lives to transcribe ancient papyrus, What’s on the Menu at the New York Public Library, and DIYHistory at the University of Iowa libraries. (The latter has provided step by step instructions for building your own version using open source tools).
So now you’ve built your low-cost DIY book scanner, and put together a suite of open source tools to help you process your collections for free. Now what? The whole landscape of storing and preserving digital files is far beyond the scope of this post, but the cost of accomplishing this is probably the highest of anything other than staffing a digitization project, and it is here where Google clearly has the advantage. The Internet Archive is a potential solution to storing public domain texts (though they are not immune to disaster), but if you are making in-copyright works available in any capacity you will most likely have to take the risk on your own servers. I am not a lawyer, but I have never rented server space that would allow copyrighted materials to be posted.
Conclusion: Is it Worth It?
Obviously from this post I am in favor of taking on digitization projects of both public domain and copyrighted materials when the motivations are good and the policies are well thought out. From this perspective, I think the Google Books decision was a good thing for libraries and for providing greater access to library collections. Libraries should be smart about what types of materials to digitize, but there are more possibilities for large-scale digitization, and by providing more access, the research community can determine what is useful to them.
If you have managed a DIY book scanning project, please let me know in the comments, and I can add links to your project.
- Hellman, Eric. “Google Books and Black-Box Copyright Jurisprudence.” Go To Hellman, November 18, 2013. http://go-to-hellman.blogspot.com/2013/11/google-books-and-black-box-copyright.html. ↩
- Sanz, Amelia. “Digital Humanities or Hypercolonial Studies?” Responsible Innovation in ICT (June 26, 2013). http://responsible-innovation.org.uk/torrii/resource-detail/1249#_ftnref13. ↩
- Coyle, Karen. “It’s FAIR!” Coyle’s InFormation, November 14, 2013. http://kcoyle.blogspot.com/2013/11/its-fair.html. ↩
- For more on this, see Ben Brumfield’s work on crowdsourced transcription, for example Brumfield, Ben W. “Collaborative Manuscript Transcription: ‘The Landscape of Crowdsourcing and Transcription’ at Duke University.” Collaborative Manuscript Transcription, November 23, 2013. http://manuscripttranscription.blogspot.com/2013/11/the-landscape-of-crowdsourcing-and.html. ↩