All too often, libraries are forced to start from scratch each time we roll out a new technology. To avoid this problem librarians should consider middleware as their technology prototyping pipeline. Middleware is a software layer that delivers data in standard XML or JSON formats from a variety of sources. The sources could be distributed in multiple third party databases, residing in HTML, or available as other XML encoded data living on another internet domain. Middleware layers help to integrate these data into more robust and flexible alternative data sources to turnkey products from vendors. A number of turnkey services we rely on do not provide APIs, access to the underlying data or the relevant data librarians need to offer compelling, twenty-first century services our students and faculties require. Middleware design is the process of architecting your own library API.
The right middleware design approach will empower libraries to move data between products and services and deliver it to users where they need it. In the example below, the library does not have access to an API of room reservation data, so a library’s data is held captive. With strategic application of middleware design, the library is free to take the data and reformat it in mobile or tablet-friendly format, greatly improving access and allowing new services for our users.
Problem: This stand alone website allows a student to look up and book available group study rooms in the library. It is a self-service product. But what if we wanted to advertise and show only the currently available study rooms into our website as a data feed? What if we needed *only* currently bookable room data to port into a mobile app? As the website stands we cannot readily access this data other than on a desktop computer.
Library and Information Science foundations can be utilized to extend library data into new interfaces and platforms. Half of digital librarianship is really just the description of data using metadata schemes (usually encoded in XML, but more recently, JSON as well.) Before I can really talk about establishing a prototyping pipeline I need to unpack XML and then I’m going to talk about how building on these foundations we can create the XML feeds we need to extend library data anywhere and everywhere.
A short primer on XML:
Many others have sung the virtues of XML. I personally like a good clean XML feed because to me, it means extensibility. XML supports all kinds of system efficency and data independence. Those are the features I like. Those who develop metadata standards and work with digital library development cite the following when promulgating XML virtues from McDonough, 2008 :
- XML helps ensure platform (and perhaps more critically vendor) independence;
- XML provides the multilingual character support critical to the handling of library materials;
- XML’s extensibility and modularity allow libraries to customize its application within their own operating environments;
- XML helps minimize software development costs by allowing libraries to leverage existing, open source development tools;
- XML, through virtue of being an open standard which enables descriptive markup, may assist in the long-term preservation of electronic materials; and perhaps most importantly
- XML provides a technological basis for interoperability of both content and metadata across library systems.
Now that you’re considering drinking the XML kool-aide, think about what might be possible if you could pull any library data you wanted (in the form of XML) into any other system — like an iPhone or iPad app. To do that you’d need a RESTful API. RESTful APIs are vitally important for extending library data across systems. You can create your own library API with a middleware layer.
One layer I’ve learned in the past couple months is a Tomcat/Jersey stack that allows you to pull in data from multiple sources, and then serialize that data to XML. A recent XML feed that was developed this way is an “available now” XML feed of group rooms in the library that are available in the next hour and can be booked immediately. For this example I pull in an additional Java package to the Jersey program — an HTML parser, Jsoup.
Jersey is implemented on a Tomcat server. Tomcat can run a number of Java based applications but it is essentially a webserver that we are running Jersey from. It also bears noting that Apache Tomcat is not the same as the Apache webserver that many of us know and use for serving HTML pages.
In order to serialize a Java data object to XML Jersey uses a standard MVC architecture, where our data model is the model, the new library web page/mobile app that we display is the view and the Jersey resource file is the controller. In essence the Jersery output is a set of three Java programs that comprise the MVC. The data that Jersey is pulling in from the website is HTML. Since the HTML needs to be parsed and then pulled into a data object, we use another Java library called jsoup. Along with the tutelage of the research programmers in the library, I followed this tutorial on creating a RESTful API with Jersey, that explains the programming annotations needed for creating the web-service, which are rather simple to implement once you have your developer environment set up — for this project I worked entirely in Eclipse since it can also simulate running a Tomcat server on your local machine.
An example of the feed is below (abbreviated):
<models> <model> <date>1/27/2013</date> <endTime>5:00 PM</endTime> <roomName>Collaboration Room 01 - Undergraduate Library</roomName> <startTime>4:00 PM</startTime> </model> <model> <date>1/27/2013</date> <endTime>5:00 PM</endTime> <roomName>Collaboration Room 02 - Undergraduate Library</roomName> <startTime>4:00 PM</startTime> </model> ...
Once you have that feed modeled and serving XML data from the page you are able to pull that into a new system/ interface. Using Apple’s Dashcode I was able to model a prototype of what the room reservation feed might look like in an iPhone app:
Middleware is the digital library prototyping pipeline, it is a profound tool in the digital library toolkit since it is the foundation for new services, initiatives, and the extension of library data. There are some areas that I breezed over in this post, like how to program the HTML screen scraping necessary to pull data in a webpage into a Java data object. I’ll cover screen scraping with Jersey and Jsoup in my next post. I’ll also submit that having access to the underlying database that powered the room reservation website would have been preferable. We could have imported a Java package that acts as a database connector from Jersey to the XML serializer — but alas, as often happens in the wild we also could not get direct database acces to the underlying data in the page. One final thought on the approach used here: the software are open source Java tools — so they are free to download and utilize for your rapid prototyping library needs.
McDonough, Jerome. “Structural Metadata and the Social Limitation of Interoperability: A Sociotechnical View of XML and Digital Library Standards Development.” Presented at Balisage: The Markup Conference 2008, Montréal, Canada, August 12 – 15, 2008. In Proceedings of Balisage: The Markup Conference 2008. Balisage Series on Markup Technologies, vol. 1 (2008). doi:10.4242/BalisageVol1.McDonough01.