Add one part stress test of the Digital Public Library of America’s API, one part conceptual exploration of how DPLA might work, and one part interdisciplinary collaboration contact high, and what do you get? The DPLA Appfest on November 8 in Chattanooga, TN. This day-and-a-half event brought together developers, designers, and librarians from across the US and Canada to build apps on the DPLA platform. (For more on DPLA vision and scope, see previous TechConnect coverage.)
Our venue was the 4th floor of the Chattanooga Public Library. This giant, bare-floored, nearly empty space was once an oubliette for discarded things — for thirty years, a storage space. Now, mostly cleaned out, it’s a blank canvas. New Assistant Director Nate Hill has been filling it with collaboration and invention: a startup pitch day, meeting space for the local Linux users group, and Appfest.
The first night of the event was reserved for getting-to-know-you events and laying the groundwork for the next day’s hacking. With several dozen attendees from a variety of backgrounds, some of us were already good friends, some familiar with one another’s work online, and some total strangers; the relaxed, informal session built rapport for the upcoming teamwork. Tables, sofas, snacks, and beer encouraged mingling.
We also started the intellectual juices flowing with project pitches, an overview of the DPLA API, and an intro to GitHub. Participants had been encouraged in advance to brainstorm project topics and post them on the wiki. People pitched their projects more formally in person, outlining both the general idea and the skills that would be helpful, so we could start envisioning where we’d fit in. Jeffrey Licht from the DPLA Technical Development Team introduced us all to the DPLA API, to give us a sense of the metadata queries we’d be able to base projects on. At Nate Hill’s request, I also did a brief intro to the concepts behind GitHub, one of the key tools we’d be using to work together (seen earlier on TechConnect).
Friday morning, we got to the library early, took advantage of its impressive breakfast spread and lots of coffee, and plunged immediately into hackery. One huge piece of butcher paper on the wall was our sign-up sheet as whiteboards — one for team members to sign up for projects, and the rest quickly covered with wireframes, database models, and scribbled questions on use cases. The mood in the room was energetic, intellectually engaged, and intense.
The energy extended outside of the room, too. An IRC backchannel (#dpla-api on Freenode), shared Google docs, a sandbox server that John Blyberg set up (thanks!) with an impressive range of language and tool support, the #dpla Twitter hashtag, and GitHub collaboration allowed for virtual participation.
The intense, head-down hackery was briefly interrupted by barbecue, cupcakes, and beer (a keg at the library, by the way? Genius). Truly, though, it’s all a blur until 4:30, when the teams demonstrated their apps.
The Apps We Built
There were eleven products at day’s end:
- a workflow for integrating DPLA content with existing Drupal modules;
- a new Drupal module;
- Follow that Cab!, an IFTTT-inspired system allowing users to design a query and be automatically notified of new results (design, code)
- A Ruby library and command line tool for interfacing with DPLA
- Catalog the Whole Earth, a site for letting users submit pictures and metadata of anything;
- a design for Cuttr, a crowdsourced question-and-answer site where users can get their questions answered using DPLA materials, rate answers, and earn karma;
- A PHP class and web interface for DPLA integration;
- A site using DPLA geographical metadata and browser geolocation to show you DPLA collection items sourced near you;
- A browsable map for exploring DPLA items near you (beta, code);
- BiblioGrapher, a tetris-like visualization of DPLA metadata
- DPLA Plus, an engine using probabilistic record linkages to deduplicate and recommend content (documentation, code).
While the DPLA Plus proposer referred to his algorithm-heavy idea during pitches as “super-unsexy”, the judges clearly disagreed, as this impressive bit of engineering took home the prize for best app.
The Culture We Built
Appfest was a very different vibe from DPLA’s hackathon in Cambridge last April (which I also attended). It featured a dramatically larger space, a longer and more structured schedule, more participants, and a much wider range of skills. While Cambridge had almost exclusively software developers, Appfest drew developers, designers, UX experts, metadata wonks, and librarians. This meant it took longer for teams to coalesce, and we needed to be much more intentional about the process. Instead of April’s half-hour for project pitches, we spread that process over the weeks leading up to Appfest (on the wiki), Thursday’s pitch session, and Friday morning’s signups.
With such different skills, we also were familiar with different tools and used different vocabulary; we needed to work hard at making sure we could all communicate and everyone had a useful role in the projects. Again, the longer timeframe helped; an informal dinner at the library Thursday evening and assorted bar trips Thursday night gave us all a chance to get comfortable with one another socially. (So, admittedly, did the keg.) A mutual commitment to inclusiveness helped us all remember to communicate, to break down projects into steps that gave everyone something to do, and to appreciate one another’s contributions. Finally, organizers circulated around the room and kept an eye on people’s mood, intervening when we needed help finding a team that needed our skills, or just a pep talk.
And with all that work put in to building culture? The results were amazing. The April results were generally developer-oriented; as you can see from the list above, the Appfest products ranged from back-end tools for developers to participatory, end-user-oriented web sites. They were also, in most cases, functional or nearly so, and often gorgeous.
#DPLA Appfest is honestly the most productive hackathon I’ve ever been to. More groups shipping near complete things than I’ve ever seen
— Dave Riordan (@riordan) November 9, 2012
There are some implications here for both DPLA and libraries in general. The range of apps, inspired in part by earlier work on DPLA use cases, helped to illustrate the potential impact that DPLA could have. They also illustrated both the potential and the limitations of DPLA metadata. The metadata is ingested from a variety of content hubs and overlaid with a common DPLA schema. This allows for straightforward queries against the DPLA API — you can always count on those common schema elements.
However, as anyone who’s ever met a crosswalk in a library school metadata class knows, automated ingestion doesn’t always work. The underlying metadata schemata don’t always map perfectly to DPLA’s, and therefore the contents of the fields aren’t wholly standardized and don’t always provide what developers might be looking for (e.g. thumbnail images to illustrate query results). This, of course, is why DPLA has these hackathons — both to illustrate potential and to stress-test its back end, find out what works and what doesn’t and why. Want to help? Go for it. There’s a host of ways to get involved at the DPLA web site.
And for other libraries? I keep coming back to two things: Dave Riordan’s tweet, above, and the digital humanities community’s One Week | One Tool project. This was, essentially, a week-long 2010 summer camp in which a small, diverse team of digital humanists built an electronic publishing plugin for WordPress, from scratch, after a public discussion of their community’s needs. In other words: we can do this thing. Working, useful tools can be built in a shockingly short time, if we have open conversations about what would be useful and assemble skilled, diverse teams.
Let’s do more of that.
Robert Darnton asked in the New York Review of Books blog nearly two years ago: “Can we create a National Digital Library?” 1 Anyone who recalls reference homework exercises checking bibliographic information for United States imprints versus British or French will certainly remember the United States does not have a national library in the sense of a library that collects all the works of that country and creates a national bibliography 2 Certain libraries, such as the Library of Congress, have certain prerogatives for collection and dissemination of standards 3, but there is no one library that creates a national bibliography. Such it was for print, and so it remains even more so for digital. So when Darnton asks that–as he goes on to illuminate further in his article–he is asking a much larger question about libraries in the United States. European and Asian countries have created national digital libraries as part of or in addition to their national print libraries. The question is: if others can do it, why can’t we? Furthermore, why can’t we join those libraries with our national digital library? The DPLA has announced collaboration with Europeana, which has already had notable successes with digitizing content and making it and its metadata freely available. This indicates that we could potentially create a useful worldwide digital library, or at least a North American/European one.The dream of Paul Otlet’s universal bibliography seems once again to be just out of reach.
In this post, I want to examine what the Digital Public Library of America claims to do, and what approaches it is taking. It is still new enough and there are still enough unanswered questions to give any sort of final answer to whether this will actually be the national digital library. Nonetheless, there seems to be enough traction and, perhaps more importantly, funding that we should pay close attention to what is delivered in April 2013.
Can we reach a common vision about the nature of the DPLA?
The planning for the DPLA started in the fall of 2010 when Harvard’s Berkman Center received a grant from the Sloan Foundation to begin planning the project in earnest. The initial idea was to digitize all the materials which it was legal to digitize, and create a platform that would be accessible to all people in the US (or nationally). Google had already proved that it was possible, so it seemed that with many libraries working together it would be concievable to repeat their sucesses, but with solely non-commerical motives 4.
The initials stages of planning brought out many different ideas and perspectives about the philosophical and practical components of the DPLA, many of which are still unanswered. The theme of debate that has emerged are whether the DPLA would be a true “public” library, and what in fact ought to be in such a library. David Rothman argues that the DPLA as described by Darnton would be a wonderful tool for making humanities research easy and viable for more people, but would not solve the problems of making popular e-books accessible through libraries or getting students up-to-date textbooks. The latter two aims are much more challenging than getting access to public domain or academic materials because a lot more money is at stake 5.
One of the projects for the Audience and Content workstream is to figure out how average Americans might actually use a digital public library of America. One of the potential use cases is a student who can just use DPLA to write a whole paper on the Iriquois Nations. Teachers and librarians posted some questions about this in the comments, including questioning whether it is appropriate to tell students to use one portal for all research. We generally counsel students to check multiple sources–and getting students used to searching one place that happens to be appropriate for searching one topic may not work if the DPLA has nothing available on say, the latest computer technology.
Digital content and the DPLA
What content the DPLA will provide will surely become more clear over the following months. They have appointed Emily Gore as Director of Content, and continue to hold further working groups on content and audience. The DPLA website promises a remarkable vision for content:
The DPLA will incorporate all media types and formats including the written record—books, pamphlets, periodicals, manuscripts, and digital texts—and expanding into visual and audiovisual materials in concert with existing repositories. In order to lay a solid foundation for its collections, the DPLA will begin with works in the public domain that have already been digitized and are accessible through other initiatives. Further material will be added incrementally to this basic foundation, starting with orphan works and materials that are in copyright but out-of-print. The DPLA will also explore models for digital lending of in-copyright materials. The content that is contributed to or funded by the DPLA will be made available, including through bulk download, with no new restrictions, via a service available to libraries, museums, and archives in the United States, with use and reuse governed only by public law. 6
All of these models exist in one way or another already, however, so how is this something new?
The major purveyors of out of copyright digital book content are Google Books and HathiTrust. The potential problems with Google Books are obvious just in the name–Google is a publicly traded company with aspirations to be the hub of all world information. Privacy and availability, not to mention legality, are a few of the concerns. HathiTrust is a collective of research universities digitizing collections, many in concert with Google Books, but the full text of these books in a convenient format is generally only available to members of HathiTrust. HathiTrust faced a lawsuit from the Authors Guild about its digitization of orphan works, which is an issue the DPLA is also planning to address.
Other projects exist trying to make currently in copyright digital books more accessible, of which Unglue.it is probably best known. This requires a critical mass of people to actively work to pay to release a book into the public domain, and so may not serve the scholar with a unique research project. Some future plans for the DPLA include to obtain funds to pay authors for use–but this may or may not include releasing books into the public domain.
DPLA is not meant to include books alone. Planning so far suggests that books make a logical jumping off point. The “Concept Note” points out that “if it takes the sky as its limit, it will never get off the ground.” Despite this caution, ideally it would eventually be a portal to all types of materials already made available by cultural institutions, including datasets and government information.
Do we need another platform?
The first element of the DPLA is code–it will use open source technologies in developing a platform, and will release all code (and the tools and services this code builds) as open source software. The so-called “Beta Sprint” that took place last year invited people to “grapple, technically and creatively, with what has already been accomplished and what still need to be developed…” 7. The winning “betas” deal largely with issues of interoperability and linked data. Certainly if a platform could be developed that solved these problems, this would be a huge boon to the library world.
Getting involved withe DPLA and looking to the future
While the governance structure is becoming more formal, there are plenty of opportunities to become involved with the DPLA. Six working groups (called workstreams) were formed to discuss content, audience, legal issues, business models, governance, and technical issues. Becoming involved with the DPLA is as easy as signing up for an account on the wiki and noting your name and comments on the working group page in which are interested. You can also sign up mailing lists to stay involved in the project. Like many such projects, the work is done by the people who show up and speak up. If you read this and have an opinion on the direction the DPLA should take, it is not difficult to make sure your opinion gets heard by the right people.
Like all writing about the DPLA since the planning began, turning to a thought experiment seems the next logical rhetorical step. Let’s say that the DPLA succeeds to the point where all public domain books in the United States are digitized and available in multiple formats to any person in the country, and a significant number of in copyright works are also available. What does this mean for libraries as a whole? Does it make public libraries research libraries? How does it change the nature of research libraries? And lastly, will all this information create a new desire for knowledge among the American people?
- Darnton, Robert. “A Library Without Walls.” NYRblog, October 4, 2010. http://www.nybooks.com/blogs/nyrblog/2010/oct/04/library-without-walls/. ↩
- McGowan, Ian. “National Libraries.” In Encyclopedia of Library and Information Sciences, Third Edition, 3850–3863. ↩
- “Frequently Asked Questions – About the Library (Library of Congress).” Text, n.d. http://www.loc.gov/about/faqs.html#every_book ↩
- Dillon, Cy. “Planning the Digital Public Library of America.” College & Undergraduate Libraries 19, no. 1 (March 2012): 101–107. ↩
- Rothman, David H. “It’s Time for a National Digital-Library System.” The Chronicle of Higher Education, February 24, 2011, sec. The Chronicle Review. http://chronicle.com/article/Its-Time-for-a-National/126489/. ↩
- “Elements of the DPLA.” Digital Public Library of America, n.d. http://dp.la/about/elements-of-the-dpla/. ↩
- “Digital Public Library of America Steering Committee Announces ‘Beta Sprint’ ”, May 20, 2011. http://cyber.law.harvard.edu/newsroom/Digital_Public_Library_America_Beta_Sprint. ↩