Store and display high resolution images with the International Image Interoperability Framework (IIIF)

Recently a faculty member working in the Digital Humanities on my campus asked the library to explore International Image Interoperability Framework (IIIF) image servers, with the ultimate goal of determining whether it would be feasible for the library to support a IIIF server as a service for the campus.  I typically am not very involved in supporting work in the Digital Humanities on my campus, despite my background in (and love for) the humanities (philosophy majors, unite!). Since I began investigating this technology, I seem to see references to IIIF-compliance popping up all over the place, mostly in discussions related to IIIF compatibility in Digital Asset Management System (DAMS) repositories like Hydra 1 and Rosetta 2, but also including ArtStor3 and the Internet Archive 4.

IIIF was created by a group of technologists from Stanford, the British Library, and Oxford to solve three problems: 1) slow loading of high resolution images in the browser, 2) high variation of user experience across image display platforms, requiring users to learn new controls and navigation for different image sites, and 3) the complexity of setting up high performance image servers.5 Image servers traditionally have also tended to silo content, coupling back-end storage with either customized or commercial systems that do not allow additional 3rd party applications to access the stored data.

By storing your images in a way that multiple applications can access them and render them, you enable users to discover your content through a variety of different portals. With IIIF, images can be stored in a way that facilitates API access to them. This enables a variety of applications to retrieve the data. For example, if you have images stored in a IIIF-compatible server, you could have multiple front-end discovery platforms access the images through API, either at your own institution or other institutions that would be interested in providing gateways to your content. You might have images that are relevant to multiple repositories or collections; for instance, you might want your images to be discoverable through your institutional repository, discovery system, and digital archives system.

IIIF systems are designed to work with two components: an image server (such as the Python-based Loris application)6 and a front-end viewer (such as Mirador 7 or OpenSeadragon8).  There are other viewer options out there (IIIF Viewer 9, for example), and you could conceivably write your own viewer application, or write a IIIF display plugin that can retrieve images from IIIF servers.  Your image server can serve up images via APIs (discussed below) to any IIIF-compatible front-end viewer, and any IIIF-compatible front-end viewer can be configured to access information served by any IIIF-compatible image server.

IIIF Image API and Presentation API

IIIF-compatible software enables retrieval of content from two APIs: the Image API and the Presentation API. As you might expect, the Image API is designed to enable the retrieval of actual images. Supported file types depends on the image server application being used, but API calls enable the retrieval of specific file type extensions including .jpg, .tif, .png, .gif, .jp2, .pdf, and .webp.10. A key feature of the API is the ability to request images to be returned by image region – meaning that if only a portion of the image is requested, the image server can return precisely the area of the image requested.11 This enables faster, more nimble rendering of detailed image regions in the viewer.

A screenshot showing a region of an image that can be returned via a IIIF Image API request. The region to be retrieved is specified using pixel area references (Left, Top, Right, Bottom).
A screenshot showing a region of an image that can be returned via a IIIF Image API request. The region to be retrieved is specified using pixel area references (Left, Top, Right, Bottom). These reference points are then included in the request URI. (Image Source: IIIF Image API 2.0. http://iiif.io/api/image/2.0/#region)

The basic structure of a request to a IIIF image server follows a standard scheme:

{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}

An example request to a IIIF image server might look like this:

http://www.example.org/imageservice/abcd1234/full/full/0/default.jpg12

The Presentation API returns contextual and descriptive information about images, such as how an image fits in with a collection or compound object, or annotations and properties to help the viewer understand the origin of the image. The Presentation API retrieves metadata stored as “manifests” that are often expressed as JSON for Linked Data, or JSON-LD.13 Image servers such as Loris may only provide the ability to work with the Image API; Presentation API data and metadata can be stored on any server and image viewers such as Mirador can be configured to retrieve presentation API data.14

Why would you need a IIIF Image Server or Viewer?

IIIF servers and their APIs are particularly suited for use by cultural heritage organizations. The ability to use APIs to render high resolution images in the browser efficiently is essential for collections like medieval manuscripts that have very fine details that lower-quality image rendering might obscure. Digital humanities, art, and history scholars who need access to high quality images for their research would be able to zoom, pan and analyze images very closely.  This sort of an analysis can also facilitate collaborative editing of metadata – for example, a separate viewing client could be set up specifically to enable scholars to add metadata, annotations, or translations to documents without necessarily publishing the enhanced data to other repositories.

Example: Biblissima

A nice example of the power of the IIIF Framework is with the Biblissima Mirador demo site. As the project website describes it,

In this demo, the user can consult a number of manuscripts, held by different institutions, in the same interface. In particular, there are several manuscripts from Stanford and Yale, as well as the first example from Gallica and served by Biblissima (BnF Français 1728)….

It is important to note that the images displayed in the viewer do not leave their original repositories; this is one of the fundamental principles of the IIIF initiative. All data (images and associated metadata) remain in their respective repositories and the institutions responsible for them maintain full control over what they choose to share. 15.

A screenshot of the Biblissima Mirador demo site.
The Biblissima Mirador demo site displays images that are gathered from remote repositories via API. In this screenshot, the viewer can select from manuscripts available from Yale, the National Library of Wales, and Harvard.

The approach described by Biblissima represents the increasing shift toward designing repositories to guide users toward linked or related information that may not be actually held by the repository.  While I can certainly anticipate some problems with this approach for some archival collections – injecting objects from other collections might skew the authentic representation of some collections, even if the objects are directly related to each other – this approach might work well to help represent provenance for collections that have been broken up across multiple institutions. Without this kind of architecture, researchers would have to visit and keep track of multiple repositories that contain similar collections or associated objects. Manuscript collections are particularly suited to this kind approach, where a single manuscript may have been separated into individual leaves that can be found in multiple institutions worldwide – these manuscripts can be digitally re-assembled without requiring institutions to transfer copies of files to multiple repositories.

One challenge we are running into in exploring IIIF is how to incorporate this technology into existing legacy applications that host high resolution images (for example, ContentDM and DSpace).  We wouldn’t necessarily want to build a separate IIIF image server – it would be ideal if we could continue storing our high res images on our existing repositories and pull them together with a IIIF viewer such as Loris).  There is a Python-based translator to enable ContentDM to serve up images using the IIIF standard16, but I’ve found it difficult to find case studies or step-by-step implementation and troubleshooting information (if you have set up IIIF with ContentDM, I’d love to know about your experience!).  To my knowledge, there is not an existing way to integrate IIIF with DSpace (but again, I would love to stand corrected if there is something out there).  Because IIIF is such a new standard, and legacy applications were not necessarily built to enable this kind of content distribution, it may be some time before legacy digital asset management applications integrate IIIF easily and seamlessly.  Apart from these applications serving up content for use with IIIF viewers, embedding IIIF viewer capabilities into existing applications would be another challenge.

Finally, another challenge is discovering IIIF repositories from which to pull images and content.  Libraries looking to explore supporting IIIF viewers will certainly need to collaborate with content experts, such as archivists, historians, digital humanities and/or art scholars, who may be familiar with external repositories and sources of IIIF content that would be relevant to building coherent collections for IIIF viewers.  Viewers are manually configured to pull in content from repositories, and so any library wanting to support a IIIF viewer will need to locate sources of content and configure the viewer to pull in that content.

Undertaking support for IIIF servers and viewers is fundamentally not a trivial project, but can be a way for libraries to potentially expand the visibility and findability of their own high-resolution digital collections (by exposing content through a IIIF-compatible server) or enable their users to find content related to their collections (by supporting a IIIF viewer).  While my library hasn’t determined what exactly our role will be in supporting IIIF technology, we will definitely be taking information learned from this experiences to shape our exploration of emerging digital asset management systems, such as Hydra and Islandora.

More Information

  • IIIF Website: http://search.iiif.io/
  • IIIF Metadata Overview: https://lib.stanford.edu/home/iiif-metadata-overview
  • IIIF Google Group: https://groups.google.com/forum/#!forum/iiif-discuss

Notes

 

  1. https://wiki.duraspace.org/display/hydra/Page+Turners+%3A+The+Landscape
  2.  Tools for Digital Humanities: Implementation of the Mirador high-resolution viewer on Rosetta – Roxanne Wyns, Business Consultant, KU Leuven/LIBIS – Stephan Pauls, Software architect. http://igelu.org/wp-content/uploads/2015/08/5.42-IGeLU2015_5.42_RoxanneWyns_StephanPauls_v1.pptx
  3.  D-Lib Magazine. 2015. “”Bottled or Tap?” A Map for Integrating International Image Interoperability Framework (IIIF) into Shared Shelf and Artstor”. D-Lib Magazine. 2015-08. http://www.dlib.org/dlib/july15/ying/07ying.html
  4. https://blog.archive.org/2015/10/23/zoom-in-to-9-3-million-internet-archive-books-and-images-through-iiif/
  5. Snydman, Stuart, Robert Sanderson and Tom Cramer. 2015. The International Image Interoperability Framework (IIIF): A
    community & technology approach for web-based images. Archiving Conference 1. 16-21(6). https://stacks.stanford.edu/file/druid:df650pk4327/2015ARCHIVING_IIIF.pdf.
  6. https://github.com/pulibrary/loris
  7. http://github.com/IIIF/mirador
  8.  http://openseadragon.github.io/
  9. http://klokantech.github.io/iiifviewer/
  10.  http://iiif.io/api/image/2.0/#format
  11. http://iiif.io/api/image/2.0/#region
  12. Snydman, Sanderson, and Cramer, The International Image Interoperability Framework (IIIF), 2
  13. http://iiif.io/api/presentation/2.0/#primary-resource-types-1
  14. https://groups.google.com/d/msg/iiif-discuss/F2_-gA6EWjc/2E0B7sIs2hsJ
  15.  http://www.biblissima-condorcet.fr/en/news/interoperable-viewer-prototype-now-online-mirador
  16. https://github.com/IIIF/image-api/tree/master/translators/ContentDM

Low Expectations Distributed: Yet Another Institutional Repository Collection Development Workflow

Anyone who has worked on an institutional repository for even a short time knows  that collecting faculty scholarship is not a straightforward process, no matter how nice your workflow looks on paper or how dedicated you are. Keeping expectations for the process manageable (not necessarily low, as in my clickbaity title) and constant simplification and automation can make your process more manageable, however, and therefore work better. I’ve written before about some ways in which I’ve automated my process for faculty collection development, as well as how I’ve used lightweight project management tools to streamline processes. My newest technique for faculty scholarship collection development brings together pieces of all those to greatly improve our productivity.

Allocating Your Human and Machine Resources

First, here is the personnel situation we have for the institutional repository I manage. Your own circumstances will certainly vary, but I think institutions of all sizes will have some version of this distribution. I manage our repository as approximately half my position, and I have one graduate student assistant who works about 10-15 hours a week. From week to week we only average about 30-40 hours total to devote to all aspects of the repository, of which faculty collection development is only a part. We have 12 librarians who are liaisons with departments and do the majority of the outreach to faculty and promotion of the repository, but a limited amount of the collection development except for specific parts of the process. While they are certainly welcome to do more, in reality, they have so much else to do that it doesn’t make sense for them to spend their time on data entry unless they want to (and some of them do). The breakdown of work is roughly that the liaisons promote the repository to the faculty and answer basic questions; I answer more complex questions, develop procedures, train staff, make interpretations of publishing agreements, and verify metadata; and my GA does the simple research and data entry. From time to time we have additional graduate or undergraduate student help in the form of faculty research assistants, and we have a group of students available for digitization if needed.

Those are our human resources. The tools that we use for the day-to-day work include Digital Measures (our faculty activity system), Excel, OpenRefine, Box, and Asana. I’ll say a bit about what each of these are and how we use them below. By far the most important innovation for our faculty collection development workflow has been integration with the Faculty Activity System, which is how we refer to Digital Measures on our campus. Many colleges and universities have some type of faculty activity system or are in the process of implementing one. These generally are adopted for purposes of annual reports, retention, promotion, and tenure reviews. I have been at two different universities working on adopting such systems, and as you might imagine, it’s a slow process with varying levels of participation across departments. Faculty do not always like these systems for a variety of reasons, and so there may be hesitation to complete profiles even when required. Nevertheless, we felt in the library that this was a great source of faculty publication information that we could use for collection development for the repository and the collection in general.

We now have a required question about including the item in the repository on every item the faculty member enters in the Faculty Activity System. If a faculty member is saying they published an article, they also have to say whether it should be included in the repository. We started this in late 2014, and it revolutionized our ability to reach faculty and departments who never had participated in the repository before, as well as simplify the lives of faculty who were eager participants but now only had to enter data in one place. Of course, there are still a number of people whom we are missing, but this is part of keeping your expectation low–if you can’t reach everyone, focus your efforts on the people you can. And anyway, we are now so swamped with submissions that we can’t keep up with them, which is a good if unusual problem to have in this realm. Note that the process I describe below is basically the same as when we analyze a faculty member’s CV (which I described in my OpenRefine post), but we spend relatively little time doing that these days since it’s easier for most people to just enter their material in Digital Measures and select that they want to include it in the repository.

The ease of integration between your own institution’s faculty activity system (assuming it exists) and your repository certainly will vary, but in most cases it should be possible for the library to get access to the data. It’s a great selling point for the faculty to participate in the system for your Office of Institutional Research or similar office who administers it, since it gives faculty a reason to keep it up to date when they may be in between review cycles. If your institution does not yet have such a system, you might still discuss a partnership with that office, since your repository may hold extremely useful information for them about research activity of which they are not aware.

The Workflow

We get reports from the Faculty Activity System on roughly a quarterly basis. Faculty member data entry tends to bunch around certain dates, so we focus on end of semesters as the times to get the reports. The reports come by email as Excel files with information about the person, their department, contact information, and the like, as well as information about each publication. We do some initial processing in Excel to clean them up, remove duplicates from prior reports, and remove irrelevant information.  It is amazing how many people see a field like “Journal Title” as a chance to ask a question rather than provide information. We focus our efforts on items that have actually been published, since the vast majority of people have no interest in posting pre-prints and those that do prefer to post them in arXiv or similar. The few people who do know about pre-prints and don’t have a subject archive generally submit their items directly. This is another way to lower expectations of what can be done through the process. I’ve already described how I use OpenRefine for creating reports from faculty CVs using the SHERPA/RoMEO API, and we follow a similar but much simplified process since we already have the data in the correct columns. Of course, following this process doesn’t tell us what we can do with every item. The journal title may be entered incorrectly so the API call didn’t pick it up, or the journal may not be in SHERPA/RoMEO. My graduate student assistant fills in what he is able to determine, and I work on the complex cases. As we are doing this, the Excel spreadsheet is saved in Box so we have the change history tracked and can easily add collaborators.

Screen Capture from Asana Setup
A view of how we use Asana for managing faculty collection development workflows.

At this point, we are ready to move to Asana, which is a lightweight project management tool ideal for several people working on a group of related projects. Asana is far more fun and easy to work with than Excel spreadsheets, and this helps us work together better to manage workload and see where we are with all our on-going projects. For each report (or faculty member CV), we create a new project in Asana with several sections. While it doesn’t always happen in practice, in theory each citation is a task that moves between sections as it is completed, and finally checked off when it is either posted or moved off into some other fate not as glamorous as being archived as open access full text. The sections generally cover posting the publisher’s PDF, contacting publishers, reminders for followup, posting author’s manuscripts, or posting to SelectedWorks, which is our faculty profile service that is related to our repository but mainly holds citations rather than full text. Again, as part of the low expectations, we focus on posting final PDFs of articles or book chapters. We add books to a faculty book list, and don’t even attempt to include full text for these unless someone wants to make special arrangements with their publisher–this is rare, but again the people who really care make it happen. If we already know that the author’s manuscript is permitted, we don’t add these to Asana, but keep them in the spreadsheet until we are ready for them.

We contact publishers in batches, trying to group citations by journal and publisher to increase efficiency so we can send one letter to cover many articles or chapters. We note to follow up with a reminder in one month, and then again in a month after that. Usually the second notice is enough to catch the attention of the publisher. As they respond, we move the citation to either posting publisher’s PDF section or to author’s manuscript section, or if it’s not permitted at all to the post to SelectedWorks section. While we’ve tried several different procedures, I’ve determined it’s best for the liaison librarians to ask just for author’s accepted manuscripts for items after we’ve verified that no other version may be posted. And if we don’t ever get them, we don’t worry about it too much.

Conclusion

I hope you’ve gotten some ideas from this post about your own procedures and new tools you might try. Even more, I hope you’ll think about which pieces of your procedures are really working for you, and discard those that aren’t working any more. Your own situation will dictate which those are, but let’s all stop beating ourselves up about not achieving perfection. Make sure to let your repository stakeholders know what works and what doesn’t, and if something that isn’t working is still important, work collaboratively to figure out a way around that obstacle. That type of collaboration is what led to our partnership with the Office of Institutional Research to use the Digital Measures platform for our collection development, and that in turn has  led to other collaborative opportunities.