Creating a Flipbook Reader for the Web

At my library, we have a wonderful collection of artist’s books. These titles are not mere text, or even concrete poetry shaping letters across a page, but works of art that use the book as a form in the same way a sculptor might choose clay or marble. They’re all highly inventive and typically employ an interplay between language, symbol, and image that challenges our understanding of what a print volume is. As an art library, collecting these unique and inspiring works seems natural. But how can we share our artists’ books with the world? For a while, our work study students have been scanning sets of pages from the books using a typical flatbed scanner but we weren’t sure how to present these images. Below, I’ll detail how we came to fork the Internet Archive’s Bookreader to publish our images to the web.

Choosing a Book Display

Our Digital Scholarship Librarian, Lisa Conrad, led the artists’ book project. She investigated a number of potential options for creating interactive “flipbooks” out of our series of images. We had a few requirements we were looking for:

  • mobile friendly — ideally, our books would be able to be read on any device, not just desktops with Adobe Flash
  • easy to use — our work study staff shouldn’t need sophisticated skills, like editing code or markup, to publish a work
  • visually appealing — obviously we’re dealing with works of art, if our presentation is hideous or obscures the elegance of the original then it’s doing more harm than good
  • works with our repository — ultimately, the books would “live” in our institutional repository, so we needed software that would emit something like static HTML or documents that could be easily retained in it

These are not especially stringent requirements, in my mind. Still, we struggled to find decent options. The mobile limitation restricted things quite a bit; a surprising number of apps used flash or published in desktop-first formats. We felt our options to customize and simplify the user interface of other apps was too small. Even if an app spat out a nice bundle of HTML and assets that we could upload, the HTML might call out to sketchy external services or present a number of options we didn’t want or need. While I was at first hesitant to implement an open source piece of software that I knew would require much of my time, ultimately the Internet Archive’s Bookreader became the frontrunner. We felt more comfortable using an established, free piece of software that doesn’t require any server-side component (it’s all JavaScript!). My primary concern was that the last commit to the bookreader code base was over a half year ago.

Implementing the Internet Archive Bookreader

The Bookreader proved startlingly easy to implement. We simply copied the code from an example given by the Internet Archive, edited several lines of JavaScript, and were ready to go. Everything else was refinement. Still, we made a few nice decisions in working on the Bookreader that I’d like to share. If you’re not a coder, feel free to skip the rest of this section as it may be unnecessarily detailed.

First off, our code is all up on GitHub. But it won’t be useful to anyone else! It contains specific logic based on how our IR serves up links to the bookreader app itself. But one of the smarter moves, if I may indulge in some self-congratulation, was using submodules in git. If you know git then you know it’s easily one of the best ways to version code and manage projects. But what if you app is just an implementation of an existing code base? There’s a huge portion of code you’re borrowing from somewhere else, and you want to be able to segment that off and incorporate changes to it separately.

As I noted before, the bookreader isn’t exactly actively developed. But it still benefits us to separate our local modifications from the stable, external code. A submodule lets us incorporate another repository into our own. It’s the clean way of copying someone else’s files into our own version. We can reference files in the other repository as if they’re sitting right beside our own, but also pull in updates in a controlled manner without causing tons of extra commits. Elegant, handy, sublime.

Most of our work happened in a single app.js file. This file was copied from the Internet Archive’s provided example and only modified slightly. To give you a sense of how one modifies the original script, here’s a portion:

// Create the BookReader object
br = new BookReader();
// Total number of leafs
br.numLeafs = vaultItem.pages;
// Book title & URL used for the book title link
br.bookTitle= vaultItem.title;
br.bookUrl  = vaultItem.root + 'items/' + vaultItem.id + '/' + vaultItem.version + '/';
// how does the bookreader know what images to retrieve given a page number (index)?
br.getPageURI = function(index, reduce, rotate) {
    // reduce and rotate are ignored in this simple implementation, but we
    // could e.g. look at reduce and load images from a different directory
    var url = vaultItem.root + 'file/' + vaultItem.id + '/' + vaultItem.version + '/' + vaultItem.filenames + (index + 1) + '.JPG';
    return url;
}

The Internet Archive’s code provides us with a Bookreader class. All we do is instantiate it once and override certain properties and methods. The code above is all that’s necessary to display a particular image for a given page number; the vaultItem object (VAULT is the name of our IR) consists of information about a single artists’ book, like the number of pages, its title, its ID and version within the IR. The bookreader app cobbles together these pieces of info to figure out it should display images given a page or pair of pages. The getPageURI function is mostly working with a single index argument, while the group of concatenated strings for the URL are related to how our repository stores files. It’s highly specific to the IR we’re using, but not terribly complicated.

The Bookreader itself sits on a web server outside the IR. Since we cannot publicly share almost all of these books (more on this below), we restrict access to the images to users who are authenticated with the IR. So how can the external reader app serve up images of books within the repository? We expect people to discover the books via our library catalog, which links to the IR, or within the IR itself. Once they sign in, our IR’s display templates contain specially crafted URLs that pass the bookreader information about the item in our repository via their query string. Here’s a shortened example of one query string:

?title=Crystals%20to%20Aden&id=17c06cc5-c419-4e77-9bdb-43c69e94b4cd&pages=26

From there, the app parses the query string to figure out that the book’s title is Crystals to Aden, its ID within the IR is “17c06cc5-c419-4e77-9bdb-43c69e94b4cd”, and it has twenty six pages. We store those values in the vaultItem object referenced in the script above. That object contains enough information for the bookreader to determine how to retrieve images from the IR for each page of the book. Since the user has already authenticated with the IR when they discovered the URL earlier, the IR happily serves up the images.

Ch-ch-ch-challenges

Easily the most difficult part of our artists’ books project has been spreading awareness. It’s a cool project that’s required tons of work all around, from the Digital Scholarship Librarian managing the project, to our diligent work study student scanning images, to our circulation staff cataloging them in our IR, to me solving problems in JavaScript while my web design student worker polished the CSS. We’re proud of the result, but also struggling to share it outside of our walls. Our IR is great at some things but not particularly intuitive to use, so we cannot count on patrons stumbling across the artists’ books on their own very often. Furthermore, putting anything behind a login is obviously going to increase the number of people who give up before accessing it. Not being on the campus Central Authentication Service only exacerbates this for us.

The second challenge is—what else!—copyright. We don’t own the rights to any of these titles. We’ve been guerrilla digitizing them without permission from publishers. For internal sharing, that’s fine and well under Fair Use; we’re only showing excerpts and security settings in our IR ensure they’re only visible to our constituents. But I pine to share these gorgeous creations with people outside the college, with my social networks, with other librarians. Right now, we only have permission to share Crystals to Aden by Michael Bulteau (thanks to duration press for letting us!). Which is great! Except Crystals is a relatively text-heavy work; it’s fine poetry, yes, but not the earth-shattering reinterpretation of the codex I promised you in my opening paragraph.

Hopefully, we can secure permissions to share further works. Internally, we’ve pushed out notices on social media, the LMS, and email lists to inform people of the artists’ books. They’re also available for checkout, so hopefully our digital teasers bring people in to see the real deal.

Lasting Problems

And that’s something that must be mentioned; our digital simulations are definitely not the real deal. Never has a set of titles been so resistant to digitization. Our work study has even asked us “how could I possibly separate this work into distinct pairs of pages” about a work which used partially-overlapping leaves that look like vinyl record sleeves.1 Artists deliberately chose the printed book form and there’s a deep sacrilege in trying to digitally represent that. We can only ever offer a vague adumbration of what was truly intended. I still see value in our bookreader—and the works often look brilliant even as scanned images—but there are fundamental impediments to its execution.

Secondly, scanned images are not text. Many works do have text on the page, but because we’re displaying images in the browser users cannot select the text to copy it. Alongside each set of images in our IR, we also have a PDF copy of all the pages with OCR‘d text. But a decision was made not to make the PDFs visible to non-library staff, since they would be easy to download and disseminate (unlike the many discrete images in our bookreader). This all adds up to make the bookreader very inaccessible; its all images, there’s no feasible way for us to associate alt text with each one, and the PDF copy that might be of interest to visually-impaired users is hidden.

I’ve ended on a sour note, but digitizing our artists’ books and fiddling with the Internet Archive’s Bookreader was a great project. Fun, a bit challenging, with some splendid results. We have our work cut out for us if we want to draw more attention to the books and have them be compelling for everyone. But other libraries in similar situations may find the Bookreader to be a very viable, easy-to-implement solution. If you don’t have permissions issues and are dealing with more traditional works, it’s built to be customizable yet offers a pleasant reading experience.

Notes

  1. Paradoxic Mutations by Margot Lovejoy

Comments are closed.