Making a Basic LTI (Learning Tools Intoperability) App

Learning Tools Interoperability, or LTI, is an open standard maintained by the IMS Global Learning Consortium used to build external tools or plugins for Learning Management Systems (LMS).  A common use case of an LTI is to build an application that can be accessed from within the LMS to perform searches and import resources into a course.  For example, the Wikipedia LTI application enables instructors to search Wikipedia and embed links to articles directly into their courses.  Academic libraries frequently struggle to integrate library resources in learning management systems, so LTI is an obvious standard to embrace as a potential way to make library resources more accessible.  However, when I began researching how I could begin creating an LTI app, I found it very difficult to find examples of existing app code and resources to get started.  You can’t just create any old web application and have that be ‘consumable’ by a learning management system in an LTI-compliant way.  In this post, I’ll outline some of the resources I found useful to get started building your own LTI app.

LTI General Architecture

These are the basic components of an LTI application:

  • The LTI Tool Provider (TP):  This is your application.  The tool provider is the resource the user sees when they access your application from within the learning management system.  The Wikipedia LTI app linked above is an example of a tool provider.
  • The LTI Tool Consumer (TC): This is the learning management system (e.g., Blackboard, Moodle, Canvas) from which the user accesses your tool provider application.
  • The LTI Launch:  When a user accesses your tool provider from the tool consumer, this is called “launching” the LTI application.  Parameters are passed from the tool consumer to your tool provider, including authorization parameters that ensure the user is permitted to access your application, as well as information about the user’s identity, roles within the tool consumer, and the type of request the user is sending (e.g., a “content item message” is sent to your tool to indicate the user is expecting to import a link back to the tool consumer).
  • OAuth:  LTI applications use OAuth signatures for validating messages between the Tool Consumer and the Tool provider.  LTI applications require that the Tool Consumer and the Tool Consumer have each configured a shared key and secret, which is used to build an OAuth “Access Token” to enable communication between the two systems.1

An additional tip for developing LTI apps:  Sign up for a free instructor account for the Canvas learning management system. Canvas accounts hosted on the Instructure website enable you to add a custom LTI tool to your course (once it’s hosted on a web server, of course) and also enables you to quickly experiment with some existing LTI applications (such as the Khan Academy and Merlot LTI apps) to explore possible functionality you might want to include in your application.  This way, you can see what an instructor or student would see when they interact with your tool provider through an LMS.

Building your first “Hello, World” LTI app (with some help from Harvard)

When I first started looking into LTI, I found it really difficult to find a full (but basic) LTI application to get an overall picture of how LTI apps work – there’s lots of LTI class libraries out there, but I wanted an example of how all the pieces of an LTI app fit together. After some fruitless Googling and GitHub searches, I finally stumbled upon this Harvard LTI workshop on LTI apps that really helped me understand how LTI applications work.  The repository includes a full working LTI application that you can simply “plug in” some basic values to create a fully working LTI application, complete with OAuth authentication.

First, be sure to look at the included presentation in the repository, which is a rare example of a set of presentation slides that is 100% understandable out of context, to get a general introduction to the LTI standard and what it attempts to achieve.  You’ll also want to read through the step-by-step LTI blog tutorial that will get you set up with your first “Hello,World!” LTI application, complete with valid OAuth-signed requests.2

I found it especially useful that the Harvard LTI workshop repository includes a pseudo tool consumer (which mimics how the LMS would interact with your tool) that you can use during development on localhost.  Once you follow the steps of the tutorial to build your basic “Hello, World!” single-page LTI application, you can plug the local URL of that into the tool consumer page and check out how the parameters are passed from the tool consumer to the tool provider.   You can also examine the built-in basic LTI php class library that is included, as well as the basic OAuth functionality to see how the OAuth Access Token is constructed.

Use Case:  A WorldCat Discovery API Search and Retrieval Tool for LMS

My particular use case for exploring LTI involves building a search box that would enable a faculty user to add a link to a resource from the WorldCat Discovery system.  If your library subscribed to FirstSearch or you are a WorldShare Management System (WMS) customer you now likely have access to WorldCat Discovery; but the framework I’m using to build my app would work for any Discovery layer with an API (e.g., Summon, Primo, etc.).

Searching and retrieving via LTI is straightforward.  First, using the Harvard LTI workshop LTI application, I created a /lib directory to host the WorldCat Discovery PHP library published by OCLC cloned from GitHub.  I installed the library using Composer as described in the GitHub repository readme instructions.   I created a very simple search form and response page that enables a user to enter a query and then retrieve results from the WorldCat Discovery API based on that query. Then, I set up my “tool.php” application to display the search form and POST the query to the the simple response page:


error_reporting(E_ALL & ~E_NOTICE);
ini_set("display_errors", 1);
require_once 'ims-blti/blti.php';
$lti = new BLTI("secret", false, false);

header('Content-Type: text/html;charset=utf-8');

<!DOCTYPE html>
    <meta charset="UTF-8" />
    <title>Building Tools with the Learning Tools Operability Specification</title>
    if ($lti->valid) {
    <h2>Search WorldCat Discovery</h2>
      <form action="results.php" method="post" encType="application/x-www-form-urlencoded">
        Search: <input type="text" name="query" id="query" />
    foreach($_POST as $key => $value) {
      echo "<input type=\"hidden\" name=\"" .$key .  "\" value=\"" . $value . "\" />\n";
      <input type="submit" name="submit" value="Submit" />
      } else {
      <h2>This was not a valid LTI launch</h2>
      <p>Error message: <?= $lti->message ?></p>



   use OCLC\Auth\WSKey;
   use OCLC\Auth\AccessToken;
   use WorldCat\Discovery\Bib;

$key = 'somekey';
$secret = 'somesecret';
$options = array('services' => array('WorldCatDiscoveryAPI', 'refresh_token'));
$wskey = new WSKey($key, $secret, $options);
$accessToken = $wskey->getAccessTokenWithClientCredentials('123', '123');

$query = $_POST["query"];
$options = array(
  'useFRBRGrouping' => 'true',
  'sortBy' => 'library_plus_relevance',
  'itemsPerPage' => 25,
$bib = Bib::Search($query, $accessToken, $options);

if (is_a($bib, 'WorldCat\Discovery\Error')) {
   echo $bib->getErrorCode();
   echo $bib->getErrorMessage();
} else {
    foreach ($bib->getSearchResults() as $result){
      echo '<li><a href="'. $result . '">' . $result->getName()->getValue() .' (';
      echo ($result->getDatePublished() ?  '' . $result->getDatePublished()->getValue()  : '') . ')</a></li>';



The application I’ve created so far is mostly a proof of concept, and I have a few essential tasks to finish the application – first, I need to re-write the URLs to point to a specific WorldCat Discovery instance (pointing to generic isn’t helpful when a user is wanting to embed the resources of a specific library, to enable full-text access links and ); second, my app needs to enable the user to return these links to the LMS so that students / course participants can click on them.

For the second point, there is an LTI specification called the “content-item-message” that indicates that the type of interaction requested from the tool is the return of a link to the LMS.  The LMS must include this input parameter in the POST request to the tool.  The LMS “knows” to send this parameter when the tool is initially installed in the LMS.

<input type="hidden" name="lti_message_type" value="ContentItemSelectionRequest" />

The POST request to the tool must also indicate the return URL (e.g., the URL back to the LMS) where the link should be sent (the LMS should generate this input parameter for you; your tool just needs to identify this parameter and include it in the POST request to return the link to the LMS):

<input type="hidden" name="content_item_return_url" value="" />

The Tool provider must then render the link to be imported with some description of the content in JSON, for example:

  "@context" : "", 
  "@graph" : [ 
    { "@type" : "LtiLinkItem",
      "url" : "",
      "mediaType" : "text/html",
      "title" : "Global Warming: Hype or Hazard?"

See the Content Item Message documentation for more details on returning JSON suitable for consumption by the LMS.

Learn, and do, more with LTI

You may find the basic LTI class script included in the Harvard LTI tutorial are insufficient for your use case – the code is a bit aged and the LTI specification has moved on.

A more robust LTI tool provider PHP library than the basic one included in the Harvard tutorial has been made available by IMS Global on GitHub.  You can also find a more complex complete sample app called “Rating” that is a great example of more complex kinds of interactions with an LTI app, including how you might build a server-side data store and recall that data through the LTI app, and how you might handle the assignment of grades or scores through an LTI app.

To learn more, the Canvas Learning Management system has an excellent open course on LTI development that you can enroll yourself in with a free Canvas account.  Once enrolled in the course, you can launch your own locally developed LTI app within the course to check how parameters and data are exchanged between the LMS and the tool.

  1.  See this post on LTI and OAuth for a straightforward discussion of the general implications of OAuth for LTI application development.
  2. I skipped the steps for installing Vagrant and VirtualBox and the tutorial still worked great for me on my MAMP server, so if you’re concerned about installing those and already have a local development server installed (or you’re just working from a LAMP server online) the tutorial will still work for you.

Thoughts from NDLC16

I recently had the pleasure of going to, and presenting at, the National Diversity in Libraries Conference (NDLC) at UCLA. NDLC is an irregular conference that last occurred in 2010. This year’s organizers had hoped to have around 250 registrants and they greatly exceeded those numbers. As a result, there were multiple session, with as many as seven concurrent sessions in a single time slot. I recommend perusing the program for a sense of the conference and full descriptions of sessions and posters. Moreover, there was a lively Twitter stream capturing sessions and continuing conversations at #NDLC16. As I could not attend all of the sessions, I will highlight some key concepts in this post that I think are ideal to carry forward in the work that we do in libraries, supporting people and systems.

Equity (and access, inclusion, and diversity)

Yes, the conference is about “diversity in libraries” but what became apparent throughout many sessions, is that we know we have “diversity.” Not a lot; shockingly little by some measures, but there is diversity. Diversity of life experience, sexual orientation, gender expression, ability status, and racial/ethnic identity (this last one is often very visible and very, very poorly represented). If our goals are to increase diversity, that is great. But if we try to do that work without acknowledging that it takes equity and inclusion to actually make an impact, it is a waste of time.

Equity and inclusion are difficult to achieve when the greater system that we work within has been constructed upon bias and privilege. This is true whether the system is our national government, our institution, or the individual library where we work. You cannot talk about human interactions in a vacuum, and many of the sessions reflected this with themes like “Academic libraries and social justice on campus” and “Cultural aspects and perspectives in health sciences library services.”

Dismantling Structures

An important point is in recognizing the oppressive structures that we operate within and then determining the best course of action to dismantle those and in so doing, remove or reduce the barriers to participation. The difficulty in this lies in the many, many ways that those structures disenfranchise communities. It can be through omission: not collecting in areas that represent the totality of American society, for example. Do collections and archives reflect the communities present? The Native American, Latinx, Asian-American, African-American experience? What of the LGBTQA community?

Or it can be through our actual information seeking systems: subjective subject headings, that belie the dominant narrative. Words are there to describe items, but they are “othering,” (set in contrast to or aside to a presumed norm, e.g. Wikipedia editors’ one-time attempt to separate American novelists based on gender[1]) or strip away intersectionality; or perhaps use euphemisms for uncomfortable truths, such as framing the detention and imprisonment of Japanese-Americans during WWII as relocation camp/assembly center/ temporary detention center, rather than: internment camp/incarceration camp/American concentration camp[2]. (Incidentally, the LCSH for this event is: Japanese American — Evacuation and relocation, 1942-1945, which seems pretty euphemistic to me. What were they being evacuated from? Their safe homes, schools, and jobs?)

Or perhaps it is in our technology: How common practice is it to evaluate databases for screen reader accessibility as part of the selection process? We work with profile systems like PeopleSoft, which limit ability to gender identify. How do animations and lots of visual content on the website affect individuals with autism spectrum disorder (ASD)?

And what about our programming? Are makerspaces implicitly gendering activities in the promotional materials, e.g. using pinks and purples for sewing workshops and blues and black for Arduino workshops? What is really being said when a supervisor asks if there should be an “All Lives Matter” book display to “balance” the “Black Lives Matter” display?

These are complex issues because humans are complex animals. We form identities, often based on how others perceive us. How are the intersections of various facets of identity treated by society? By our organization? By our collections? By our metadata? To strive for a society wherein these oppressive structures are acknowledged, the history that put them in place is a known history, and that all members feel equal and respected is a herculean task. But it is one that we are fortunate enough to be engaged with by virtue of our profession, because we ascribe to ALA’s Core Values of Librarianship, or create things like the ACRL Diversity Standards, or align with the Digital Library Federation’s mission statement.

Start Local

One way to be an active participant in working towards a better future is to assess your local environment. Much of the programming at NDLC concerned recruitment and retention of diverse individuals in our profession. Panel discussions about LIS diversity initiatives and diversity fellow programs and presentations on strategic planning for diversity and inclusion and cultural competencies highlight the serious representation issues in our profession. According to the ALA demographic study of 2014, 87.1% of membership identified as white[3]. The 2015 census data from the United States reports the population as 61.6% white (non-Hispanic, not multi-racial)[4]. For a profession built so firmly on the notion of service and community engagement, this is problematic. Happily, many of the presentations at NDLC made their materials available, so we may learn from each other’s experiences[5].

Representative hiring and active retention are only a portion of what can be done locally to bring about positive change. Collection practices, metadata creation, digital project investment, special collections development, and community outreach and programming are the bread and butter of library activities; whether a public, school, or academic library. It is important that we examine our current processes, not with a “how is this diverse?” perspective, but with a “how is this anti-racist/anti-oppression?” This is an important, yet perhaps subtle, difference. Working towards an anti-oppression mentality means that there is recognition of systemic inequalities woven through the fabric of our society, which may require changing some of the core practices of librarians in order to move the needle.

An example of this could be in the way that an academic library engages with, say, a community of students who inhabit both an ethnic minority identity and are first-generation students. This library may have done a great job at collecting a diverse range of materials, that represent this population well, but if the library does not devote effort in the outreach and programming to this group, it is unlikely that the students will go into the library to discover these works themselves. Why? It’s not that they are lazy, or entitled, or willfully ignorant (as I have witnessed librarians opine).

Libraries are still intimidating places for many students, regardless of identity. Compound that generalized anxiety with first-generation student status and an ethnic minority and it should be clear that the onus is on the librarians to dismantle those barriers – be they perceived or real – and communicate that the library is the students’ library – is our library – is everyone’s library. If this sounds obvious to you (“of course the library is everyone’s library!”), then I fear that you may not have a realistic understanding of the varied and complex lived experiences of people living in America. If this sounds like a challenge worthy of time, effort, and resources, then YAY! And I really hope that you are in a position to take up that challenge.

I’m Tired Now

Another strong takeaway from NDLC is that this work is WORK. Really hard, exhausting work. Work that often falls on the shoulders of those most disenfranchised, and that requires pushing back against a bureaucratic and social machine that has been running for centuries. Writing this blog post took forever, as I have tried to find words that don’t require previous knowledge of “social justice jargon” and that – hopefully – anyone reading can find something to relate to. See, working towards an equitable and inclusive society doesn’t fall to those who lack access to societal privilege. Sure, those individuals will feel the imperative to change in their day to day experiences, but major heavy lifting must also come from those who implicitly – and often unconsciously – benefit from the constructs of the biased system.

Lasting change that doesn’t come from a burn-it-down-and-start-again revolution requires the positive participation of the most privileged. That is a lot of strength and honesty to ask of someone – to recognize that they have – in some ways – benefited from a system, and that benefit has been predicated on someone else’s oppression…that’s a heavy realization. It means recognizing that bringing equity to a system like this will most likely require some loss of benefits to the privileged group.

For example, it is possible that a collection focusing on female religious figures may purchase fewer books on Joan of Arc in order to represent Rabia Basri and Kāraikkāl Ammaiyār. That doesn’t diminish Joan of Arc’s impact on the collection, it just provides a wider lens to view the topic. Yet, some may feel a decline of Joan’s status, and that can be frightening if they’ve built an identity around that status.

This collection example is actually a decent representation of what it means to try to put some of these equity ideals into practice. Recognizing a representation imbalance and then taking action to address it may result in feelings of discomfort or fear for some, but for others it can give voice and visibility. We are lucky to be in a profession where we, as individuals/organizations/systems, can effect change and have such a positive impact on our communities, but we must be willing to recognize uncomfortable truths and believe that a just and equitable society is a future worth working for.

NDLC Again?

Rumor has it that there will be another NDLC in 2020. It was an exhilarating conference and one that I sincerely hope leaders in our profession come to and participate with, en masse. If you are looking to get more of a feel for the entire conference, several Storify’s were created: See for some, as well as Amelia Gibson’s on the BLM Town Hall, ARL’s, and mine.





[5] Coming soon to or look through #NDLC16

A High-Level Look at an ILS Migration

My library recently performed that most miraculous of feats—a full transition from one integrated library system to another, specifically Innovative’s Millennium to the open source Koha (supported by ByWater Solutions). We were prompted to migrate by Millennium’s approaching end-of-life and a desire to move to a more open system where we feel in greater control of our data. I’m sure many librarians have been through ILS migrations, and plenty has been written about them, but as this was my first I wanted to reflect upon the process. If you’re considering changing your ILS, or if you work in another area of librarianship & wonder how a migration looks from the systems end, I hope this post holds some value for you.


No migration is without its problems. For starters, certain pieces of data in our old ILS weren’t accessible in any meaningful format. While Millennium has a robust “Create Lists” feature for querying & exporting different types of records (patron, bibliographic, vendor, etc.), it does not expose certain types of information. We couldn’t find a way to export detailed fines information, only a lump sum for each patron. To help with this post-migration, we saved an email listing of all itemized fines that we can refer to later. The email is saved as a shared Google Doc which allows circulation staff to comment on it as fines are resolved.

We also discovered that patron checkout history couldn’t be exported in bulk. While each patron can opt-in to a reading history & view it in the catalog, there’s no way for an administrator to download everyone’s history at once. As a solution, we kept our self-hosted Millennium instance running & can login to patrons’ accounts to retrieve their reading history upon request. Luckily, this feature wasn’t heavily used, so access to it hasn’t come up many times. We plan to keep our old, self-hosted ILS running for a year and then re-evaluate whether it’s prudent to shut it down, losing the data.

While some types of data simply couldn’t be exported, many more couldn’t emigrate in their exact same form. An ILS is a complicated piece of software, with many interdependent parts, and no two are going to represent concepts in the exact same way. To provide a concrete example: Millennium’s loan rules are based upon patron type & the item’s location, so a rule definition might resemble

  • a FACULTY patron can keep items from the MAIN SHELVES for four weeks & renew them once
  • a STUDENT patron can keep items from the MAIN SHELVES for two weeks & renew them two times

Koha, however, uses patron category & item type to determine loan rules, eschewing location as the pivotal attribute of an item. Neither implementation is wrong in any way; they both make sense, but are suited to slightly different situations. This difference necessitated completely reevaluating our item types, which didn’t previously affect loan rules. We had many, many item types because they were meant to represent the different media in our collection, not act as a hook for particular ILS functionality. Under the new system, our Associate Director of Libraries put copious work into reconfiguring & simplifying our types such that they would be compatible with our loan rules. This was a time-consuming process & it’s just one example of how a straightforward migration from one system to the next was impossible.

While some data couldn’t be exported, and others needed extensive rethinking in the new ILS, there was also information that could only be migrated after much massaging. Our patron records were a good example: under Millennium, users logged in on an insecure HTTP page with their barcode & last name. Yikes. I know, I felt terrible about it, but integration with our campus authentication & upgrading to HTTPS were both additional costs that we couldn’t afford. Now, under Koha, we can use the campus CAS (a central authentication system) & HTTPS (yay!), but wait…we don’t have the usernames for any of our patrons. So I spent a while writing Python scripts to parse our patron data, attempting to extract usernames from institutional email addresses. A system administrator also helped use unique identifying information (like phone number) to find potential patron matches in another campus database.

A more amusing example of weird Millennium data was active holds, which are stored in a single field on item records & looks like this:


Can you tell what’s going on here? With a little poking around in the system, it became apparent that letters like “NNB” stood for “date not needed by” & that other fields were identifiers connecting to patron & item records. So, once again, I wrote scripts to extract meaningful details from this silly format.

I won’t lie, the data munging was some of the most enjoyable work of the migration. Maybe I’m weird, but it was both challenging & interesting as we were suddenly forced to dive deeper into our old system and understand more of its hideous internal organs, just as we were leaving it behind. The problem-solving & sleuthing were fun & distracted me from some of the more frustrating challenges detailed above.

Finally, while we had a migration server where we tested our data & staff played around for almost a month’s time, when it came to the final leap things didn’t quite work as expected. The CAS integration, which I had so anticipated, didn’t work immediately. We started bumping into errors we hadn’t seen on the migration server. Much of this is inevitable; it’s simply unrealistic to create a perfect replica of our live catalog. We cannot, for instance, host the migration server on the exact same domain, and while that seems like a trivial difference it does affect a few things. Luckily, we had few summer classes so there was time to suffer a few setbacks & now that our fall semester is about to begin, we’re in great shape.

Difference & Repetition

Koha is primarily used by public libraries, and as such we’ve run into a few areas where common academic library functions aren’t implemented in a familiar way or are unavailable. Often, it’s that our perspective is so heavily rooted in Millennium that we need to think differently to achieve the same effect in Koha. But sometimes it’s clear that what’s a concern to us isn’t to other libraries.

For instance, bib records for serials with large numbers of issues is an ongoing struggle for us. We have many print periodicals where we have extensive holdings, including bound editions of past issues. The holdings display in the catalog is more oriented towards recent periodicals & displaying whether the latest few issues have arrived yet. That’s fine for materials like newspapers or popular magazines with few back issues, and I’ve seen a few public libraries using Koha that have minimalistic periodical records intended only to point the patron to a certain shelf. However, we have complex holdings like “issues 1 through 10 are bound together, issue 11 is missing, issues 12 through 18 are held in a separate location…” Parsing the catalog record to determine if we have a certain issue, and where it might be, is quite challenging.

Another example of the public versus academic functions: there’s no “recall” feature per se in Koha, wherein a faculty member could retrieve an item they want to place on course reserve from a student. Instead, we have tried to simulate this feature with a mixture of adjustments to our loan rules & internal reports which show the status of contested items. Recall isn’t a huge feature & isn’t used all the time, it’s not something we thought to research when selecting our new ILS, but it’s a great example of a minute difference that ended up creating a headache as we adapted to a new piece of software.

Moving from Millennium to Koha also meant we were shifting from a closed source system where we had to pay additional fees for limited API access to an open source system which boasts full read access to the database via its reporting feature. Koha’s open source nature has been perhaps the biggest boon for me during our migration. It’s very simple to look at the actual server-side code generating particular pages, or pull up specific rows in database tables, to see exactly what’s happening. In a black box ILS, everything we do is based on a vague adumbration of how we think the system operates. We can provide an input & record the output, but we’re never sure about edge cases or whether strange behavior is a bug or somehow intentional.

Koha has its share of bugs, I’ve discovered, but thankfully I’m able to jump right into the source code itself to determine what’s occurring. I’ve been able to diagnose problems by looking at open bug reports on Koha’s bugzilla tracker, pondering over perl code, and applying snippets of code from the Koha wiki or git repository. I’ve already submitted two bug patches, one of which has been pulled into the project. It’s empowering to be able to trace exactly what’s happening when troubleshooting & submit one’s own solution, or just a detailed bug report, for it. Whether or not a patch is the best way to fix an issue, being able to see precisely how the system works is deeply satisfying. It also makes it much easier to me to design JavaScript hacks that smooth over issues on the client side, be it in the staff-facing administrative functions or the public catalog.

What I Would Do Differently

Set clearer expectations.

We had Millennium for more than a decade. We invested substantial resources, both monetary & temporal, in customizing it to suit our tastes & unique collections. As we began testing the new ILS, the most common feedback from staff fell along the lines “this isn’t like it was in Millennium”. I think that would have been a less common observation, or perhaps phrased more productively, if I’d made it clear that a) it’ll take time to customize our new ILS to the degree of the old one, and b) not everything will be or needs to be the same.

Most of the customization decisions were made years ago & were never revisited. We need to return to the reason why things were set up a certain way, then determine if that reason is still legitimate, and finally find a way to achieve the best possible result in the new system. Instead, it’s felt like the process was framed more as “how do we simulate our old ILS in the new one” which sets us up for disappointment & failure from the start. I think there’s a feeling that a new system should automatically be better, and it’s true that we’re gaining several new & useful features, but we’re also losing substantial Millennium-specific customization. It’s important to realize that just because everything is not optimal out of the box doesn’t mean we cannot discover even better solutions if we approach our problems in a new light.

Encourage experimentation, deny expertise.

Because I’m the Systems Librarian, staff naturally turn to me with their systems questions. Here’s a secret: I know very little about the ILS. Like them, I’m still learning, and what’s more I’m often unfamiliar with the particular quarters of the system where they spend large amounts of time. I don’t know what it’s like to check in books & process holds all day, but our circulation staff do. It’s been tough at times when staff seek my guidance & I’m far from able to help them. Instead, we all need to approach the ongoing migration as an exploration. If we’re not sure how something works, the best way is to research & test, then test again. While Koha’s manual is long & quite detailed, it cannot (& arguably should not, lest it grow to unreasonable lengths) specify every edge case that can possibly occur. The only way to know is to test & document, which we should have emphasized & encouraged more towards the start of the process.

To be fair, many staff had reasonable expectations & performed a lot of experiments. Still, I did not do a great job of facilitating either of those as a leader. That’s truly my job as Systems Librarian during this process; I’m not here merely to mold our data so it fits perfectly in the new system, I’m here to oversee the entire transition as a process that involves data, workflows, staff, and technology.

Take more time.

Initially, the ILS migration was such an enormous amount of work that it was not clear where to start. It felt as if, for a few months before our on-site training, we did little but sit around & await a whirlwind of busyness. I wish we had a better sense of the work we could have front-loaded such that we could focus efforts on other tasks later on. For example, we ended up deleting thousands of patron, item, and bibliographic records in an effort to “clean house” & not spend effort migrating data that was unneeded in the first place. We should have attacked that much earlier, and it might have obviated the need for some work. For instance, if in the course of cleaning up Millennium we delete invalid MARC records or eliminate obscure item types, those represent fewer problems encountered later in the migration process.


As we start our fall semester, I feel accomplished. We raced through this migration, beginning the initial stages only in April for a go-live date that would occur in June. I learned a lot & appreciated the challenge but also had one horrible epiphany: I’m still relatively young, and I hope to be in librarianship for a long time, so this is likely not the last ILS migration I’ll participate in. While that very thought gives me chills, I hope the lessons I’ve taken from this one will serve me well in the future.

Getting Closer to Accessible Carousels

Carousels are a popular website feature because they allow one to fit extra information within the same footprint and provide visual interest on a page. But as you most likely know, there is wide disagreement about whether they should ever be used. Reasons include: they can be annoying, no one spends long enough on a page to ever see beyond the first item, people rarely click on them (even if they read the information) and they add bloat to pages (Michael Schofield has a very compelling set of slides on this topic). But by far the most compelling argument against them is that they are difficult if not impossible to make accessible, and accessibility issues exist for all types of users.

In reality, however, it’s not always possible to avoid carousels or other features that may be less than ideal. We all work within frameworks, both technical and political, and we need to figure out how to create the best case scenario within those frameworks. If you work in a university or college library, you may be constrained by a particular CMS you need to use, a particular set of brand requirements, and historical design choices that may be slower to go away in academia than elsewhere. This post is a description of how I made some small improvements to my library website’s carousel to increase accessibility, but I hope it can serve as a larger discussion of how we can always make small improvements within whatever frameworks we work.

What Makes an Accessible Carousel?

We’ve covered accessibility extensively on ACRL TechConnect before. Cynthia Ng wrote a three part series in 2013 on making your website accessible, and Lauren Magnuson wrote about accessibility testing LibGuides in 2015. I am not an expect by any means on web accessibility, and I encourage you to do additional research about the basics of accessibility. For this specific project I needed to understand what it is specifically about carousels that makes them particular inaccessible, and how to ameliorate that. When I was researching this specific project, I found the following resources the most helpful.

The basic issues with carousels are that they move at their own pace but in a way that may be difficult to predict, and are an inherently visual medium. For people with visual impairments the slideshow images are irrelevant unless they provide useful information, and their presence on the page causes difficulty for screen reading software. For people with motor or cognitive impairments (which covers nearly everyone at some point in their lives) a constantly shifting image may be distracting and even if the content is interesting it may not be possible to click on the image at the rate it is set to move.

You can increase accessibility of carousels by making it obvious and easy for users to stop the slideshow and view images at their own pace, make the role of the slideshow and the controls on the page obvious to screen reading software, to make it possible to control the slideshow without a mouse, and to make it still work without stylesheets. Alternative methods of accessing the content have to be available and useful.

I chose to work on the slideshow as part of a retheming of the library website to bring it up to current university branding standards and to make it responsive. The current slideshow lacked obvious controls or any instructions for screen readers, and was not possible to control without a mouse. My general plan in approaching this was to ensure that there were obvious controls to control the slideshow (and that it would pause quickly without a lot of work), have ARIA roles for screen readers, and be keyboard controllable. I had to work with the additional constraints of making this something that would work in Drupal, be responsive, and that would allow the marketing committee to post their own images without my intervention but would still require alt tags and other crucial items for accessibility.

My Approach

Because the library’s website uses Drupal, it made sense to look for a solution that was designed to work with Drupal. Many options exist, and everyone has a favorite or a more appropriate choice for a particular situation, so if you are looking for a good Drupal solution you’ll want to do your own research. I ended up choosing a Drupal module called Views Slideshow after looking at several options. It seemed to be customizable enough that I was pretty sure I could make it accessible even though it lacked some of the features out of the box. The important thing to me is that it would make it possible to give the keys to the slideshow operation to someone else. The way our slideshow traditionally worked required writing HTML into the middle of a hardcoded homepage and uploading the image to the server in a separate process. This meant that my department was a roadblock to updating the images, and required careful coordination before vacations or times away to ensure we could get the images changed. We all agreed that if the slideshow was going to stay, this process had to improve.

Why not just remove the slideshow entirely? That’s one option we definitely considered, but one important caveat I set early in the redesign process was to leave the site content and features alone and just update the look and feel of the site. Thus I wanted to leave every current piece of information that was an important part of the homepage as is, though slightly reorganized. I also didn’t want to change the size of the homepage slideshow images, since the PR committee already had a large stock of images they were using and I didn’t want them to have to redesign everything. In general, we are moving to a much more flexible and iterative process for changing website features and content, so nothing is ruled out for the future.

I won’t go into a lot of detail about the technical fixes I made, since this won’t be widely applicable. Views Slideshow uses a very standard Drupal module called Views to create a list of content. While it is a very popular module, I found it challenging to install correctly without a lot of help (I mainly used this site), since the settings are hard to figure out. In setting up the module, you are able to control things like whether alt text is required, the most basic type of accessibility feature, which allows users who cannot see images to understand their content through screen readers or other assistive technologies. Beyond that, you can set some things up in the templates for the modules. First I created a Drupal content type is called Featured Slideshow. It includes fields for title of the slide, image, and the link it should go to. The image has an alt and title field, which can be set automatically using tokens (text templates), or manually by the person entering data. The module uses jQuery Cycle to control which image is available. I then customized the templates (several PHP files) to include ARIA roles and to edit the controls to make them plain English rather than icons (I can think of downsides to this approach for sure, but at least it makes the point of them clear for many people).

ARIA role. This is frequently updated but non-essential page content. Its default ARIA live state is “off”, meaning unless the user is focused on it changes in state won’t be announced. You can change this to “polite” as well, which means a change in state will be announced at the next convenient opportunity. You would never want to use “assertive”, since that would interrupt the user for no reason.

Features I’m still working on are detailed in The Unbearable Inaccessibility of Slideshows, specifically keyboard focus order and improved performance with stylesheets unavailable. However with a few small changes I’ve improved accessibility of a feature on the site–and this technique can be applied to any feature on any site.

Making Small Improvements to Improve Accessibility.

While librarians who get the privilege of working on their own library’s website have the possibilities to guide the design choices, we are not always able to create exactly the ideal situation. Whether you are dealing with a carousel or any other feature that requires some work to improve accessibility, I would suggest the following strategy:

  • Review what the basic requirements are for making the feature work with your platform and situation. This means both technically and politically.
  • Research the approaches others have taken. You probably won’t be able to use someone else’s technique unless they are in a very similar situation, but you can at least use lessons learned.
  • Create a step by step plan to ensure you’re not missing anything, as well as a list of questions to answer as you are working through the development process.
  • Test the feature. You can use achecker or WAVE, which has a browser plugin to help you test sites in a local development environment.
  • Review errors and fix these. If you can’t fix everything, list the problems and plan for future development, or see if you can pick a new solution.
  • Test with actual users.

This may seem overwhelming, but taking it slow and only working on one feature at a time can be a good way to manage the process. And even better, you’ll improve your practices so that the next time you start a project you can do it correctly from the beginning.


Whither the workshop?

Academic libraries have long provided workshops that focus on research skills and tools to the community. Topics often include citation software or specific database search strategies. Increasingly, however, libraries are offering workshops on topics that some may consider untraditional or outside the natural home of the library. These topics include using R and other analysis packages, data visualization software, and GIS technology training, to name a few. Librarians are becoming trained as Data and Software Carpentry instructors in order to pull from their established lesson plans and become part of a larger instructional community. Librarians are also partnering with non-profit groups like Mozilla’s Science Lab to facilitate research and learning communities.

Traditional workshops have generally been conceived and executed by librarians in the library. Collaborating with outside groups like Software Carpentry (SWC) and Mozilla is a relatively new endeavor. As an example, certified trainers from SWC can come to campus and teach a topic from their course portfolio (e.g. using SQL, Python, R, Git). These workshops may or may not have a cost associated with them and are generally open to the campus community. From what I know, the library is typically the lead organizer of these events. This shouldn’t be terribly surprising. Librarians are often very aware of the research hurdles that faculty encounter, or what research skills aren’t being taught in the classroom to students (more on this later).

Librarians are helpers. If you have some biology knowledge, I find it useful to think of librarians as chaperone proteins, proteins that help other proteins get into their functional conformational shape. Librarians act in the same way, guiding and helping people to be more prepared to do effective research. We may not be altering their DNA, but we are helping them bend in new ways and take on different perspectives. When we see a skills gap, we think about how we can help. But workshops don’t just *spring* into being. They take a huge amount of planning and coordination. Librarians, on top of all the other things we do, pitch the idea to administration and other stakeholders on campus, coordinate the space, timing, refreshments, travel for the instructors (if they aren’t available in-house), registration, and advocate for the funding to pay for the event in order to make it free to the community. A recent listserv discussion regarding hosting SWC workshops resulted in consensus around a recommended minimum six week lead time. The workshops have all been hugely successful at the institutions responding on the list and there are even plans for future Library Carpentry events.

A colleague once said that everything that librarians do in instruction are things that the disciplinary faculty should be doing in the classroom anyway. That is, the research skills workshops, the use of a reference manager, searching databases, the data management best practices are all appropriately – and possibly more appropriately – taught in the classroom by the professor for the subject. While he is completely correct, that is most certainly not happening. We know this because faculty send their students to the library for help. They do this because they lack curricular time to cover any of these topics in depth and they lack professional development time to keep abreast of changes in certain research methods and technologies. And because these are all things that librarians should have expertise in. The beauty of our profession is that information is the coin of the realm for us, regardless of its form or subject. With minimal effort, we should be able to navigate information sources with precision and accuracy. This is one of the reasons why, time and again, the library is considered the intellectual center, the hub, or the heart of the university. Have an information need? We got you. Whether those information sources are in GitHub as code, spreadsheets as data, or databases as article surrogates, we should be able to chaperone our user through that process.

All of this is to the good, as far as I am concerned. Yet, I have a persistent niggle at the back of my mind that libraries are too often taking a passive posture. [Sidebar: I fully admit that this post is written from a place of feeling, of suspicions and anecdotes, and not from empirical data. Therefore, I am both uncomfortable writing it, yet unable to turn away from it.] My concern is that as libraries extend to take on these workshops because there is a need on campus for discipline-agnostic learning experiences, we (as a community) do so without really fomenting what the expectations and compensations of an academic library are, or should be. This is a natural extension of the “what types of positions should libraries provide/support?” question that seems to persist. How much of this response is based on the work of individuals volunteering to meet needs, stretching the work to fit into a job description or existing work loads, and ultimately putting user needs ahead of organizational health? I am not advocating that we ignore these needs; rather I am advocating that we integrate the support for these initiatives within the organization, that we systematize it, and that we own our expertise in it.

This brings me back to the idea of workshops and how we claim ownership of them. Are libraries providing these workshops only because no one else on campus is meeting the need? Or are we asserting our expertise in the domain of information/data shepherding and producing these workshops because the library is the best home for them, not a home by default? And if we are making this assertion, then have we positioned our people to be supported in the continual professional development that this demands? Have we set up mechanisms within the library and within the university for this work to be appropriately rewarded? The end result may be the same – say, providing workshops on R – but the motivation and framing of the service is important.

Information is our domain. We navigate its currents and ride its waves. It is ever changing and evolving, as we must be. And while we must be agile and nimble, we must also be institutionally supported and rewarded. I wonder if libraries can table the self-reflection and self-doubt regarding the appropriateness of our services (see everything ever written regarding libraries and data, digital humanities, digital scholarship, altmetrics, etc.) and instead advocate for the resourcing and recognition that our expertise warrants.

Do Library Stuff Faster with Python

Python is a great programming language to know if you work in a library: it’s (relatively) easy to learn, its syntax is fairly clear and intuitive, and it has great, robust libraries for doing routine library tasks like hacking MARC records and working with delimited data, CSV files, JSON and XML. 1  In this post, I’ll describe a couple of projects I’ve worked on recently that have enabled me to Do Library Stuff Faster using Python.  For reference, both of these scripts were written with Python 2.7 2 in mind, but could easily be adapted for other versions of Python.

Library Holdings Lookup with Beautiful Soup

Here’s a very common library dilemma:  A generous and well-meaning patron, faculty member, or friend of the library has a large personal collection of books or other materials that they would like to bequeath to your library.  They have carefully created a spreadsheet (or word document, or hand-written index) of all of the titles and authors (and maybe dates and ISBNs) in their library and want to know if you want the items.

Many libraries (for very good reason) have policies to just say “no” to these kinds of gifts.  Well-meaning library gift givers don’t always realize that it’s an enormous amount of work for a library to evaluate materials and decide whether or not they can be added to the library’s collection. Beyond relevance to their users and condition of the items, libraries don’t want to accept gifts of duplicate copies of titles they already have in their collection due to limited shelf space.

It’s that final point – how to avoid adding duplicate titles to the collection – that led me to develop a very simple (and very hacky) script to enable me to take a list of titles and authors and do a very simple lookup to see if, at minimum, we have those same titles already in the collection.  Our ILS (Innovative Interface’s Millennium system) does not have a way to feed in a bunch of titles and generate a report of title matches – and I would venture to say that kind of functionality is probably not available in most library systems.  Normally when presented with a dilemma of having to check to see if the library already has a set of titles, we’d sit down an unfortunate student worker and have them manually work through the list – copying and pasting titles into the library catalog and noting down any matches found.  This work is incredibly boring for the student worker, and is a prime candidate for automation (the same task is done over and over again, with a very specific set of criteria as output (match or no match).

Python’s Beautiful Soup library is built for exactly this kind of task – instead of having your student worker scan a bunch of web pages in your catalog, the script can do approximately the same thing by sending search terms to your catalog via URL, and returning back page elements that can tell you whether or not any matches were found.  In my example script, I’m using title and author elements, but you could modify this script to use other elements as long as they are indexed in your catalog – for example, you could send ISBNs, OCLC numbers, etc.

First, using Excel I concatenate a list of titles and authors with a domain and other URL elements to search my library’s catalog.  Here’s a few examples of what the URLs look like:'%20Ending)+and+a:(Austin)&searchscope=9&SORT=DX

I’ll save the full list of these (in my example, I have over 1000 titles and authors to check) in a plain text file called advancedtitleauth.txt.

Next, I start my Python script by calling the Beautiful Soup library, and some other libraries that are useful (urllib – a library built for fetching data by URLs; csv – a library for working with CSV files; and re, for working with regular expressions ).  You’ll probably have to install Beautiful Soup on your system first, which you can do if you have the pip Python package management system 3 installed by using sudo pip install beautifulsoup4 on your system’s command line.

from bs4 import BeautifulSoup
import urllib
import csv
import re

Then I create a blank array and define a CSV file into which to write the output of the script:

url_list = []
csv_out = csv.writer(open('output.txt', 'w'), delimiter = '\t')

The CSV file I’m creating will use tabs instead of commas as delimiters (hence delimiter = ‘\t’).  Typically when working with library data, I prefer tab-delimited text files over comma-separated files, because you never know when a random comma is going to show up in a title and create a delimiter where their should not be one.

Then I open my list of URLs, read it, append each URL to my array, and feed each URL into Beautiful Soup:

  f = open('advancedtitleauth.txt', 'rb')
  for line in f:
    r = urllib.urlopen(line).read()
    soup = BeautifulSoup(r)

Beautiful Soup will go fetch the web page of each URL.  Now that I have the web pages, Beautiful Soup can parse out specific features of each page.  In my case, my catalog returns a page with a single record if a match is found, and a browsable index when a match is found (e.g., your title would be here, but it isn’t, so here’s some stuff with titles that would be nearby).  I can use Beautiful Soup to return page elements that tell me whether a match was found, and if a match is found, to write the permanent URL of the match for later evaluation.  This bit of code looks for an HTML div element with the class “bibRecordLink” on the page, which only appears when a single match is found.  If this div is present on the page, the script grabs the link and drops it into the output file.

      link = soup.find_all("div", class_="bibRecordLink")
      directlink = str(link[0])
      directlink = "" + directlink[36:]

In the code above, [36:] is Python’s way of noting the start position of a string – so in this case, I’m getting the end of the string starting with the 36th character (which in my case, is the bibliographic ID number of the item that allows me to construct a permalink).

If a title/author search results in multiple possible matches – that is, we might have multiple copies, or the title/author combo is too vague to land on just one item, the page that displays in our catalog shows a browsable list of brief record info.  In my script, I just grab the top result:

      briefcit = soup.find_all("span", class_="briefcitTitle")
      bestmatch = str(briefcit[0])
      sep = "&"
      bestmatch = bestmatch.split(sep, 1)[0]
      bestmatch = "" + bestmatch[39:]

In the code above, Beautiful Soup finds all the<span> elements with the class “briefcitTitle”, the script returns the first one, and again returns a URL stored in the bestmatch variable.

You can see a sample output of my lookup script here.  You can see that for each entry, I include publication information, direct links, or a best match link if the elements are found.  If none of the elements are found for a lookup URL, the line reads:

nopub nolink nomatch

We can now divide the output file into “no match” entries, direct links, or best match links.  Direct links and best match links will need to be double-checked by a student worker to make sure they actually represent the item we looked up, including the date and edition.  The “no match” entries represent titles we don’t have in our collection, so those can be evaluated more closely to determine if we want them.

The script certainly has room for improvement; I could write in a lot more functionality to better identify publication information, for example, to possibly reduce or eliminate the need for manual review of direct or partial matches.  But the return on investment for this script is fairly highfor a 37-line script written in an afternoon; we can re-use this dozens of times, and hopefully save countless hours of student worker boredom (and re-assign those student workers to more complex and meaningful tasks!).

Rudimentary Keyword Frequency Analysis

This second example involves, again, dealing with a task that could be done manually, but can be done much more quickly with a script.

My university needed to submit data for the AASHE Sustainability Tracking, Assessment, and Rating System (STARS) Report (, which requires the analysis of data from campus course offerings as well as research output by faculty.  To submit the report, we needed to know how many courses we offer in sustainability (defined by AASHE “in an inclusive way, encompassing human and ecological health, social justice, secure livelihoods and a better world for all generations”) and how many faculty do research in sustainability.  This project was broken up into two components:  Analysis of research data by faculty and analysis of course data.

Sustainability Keywords

Before we even started analyzing research or course data, we needed to define an approach to identify what counts as “sustainability.”  Thankfully, there was precedent from the University of North Carolina, which had developed a list of sustainability-related keywords used to search against faculty research output 4  We adopted this list of keywords to lookup in faculty research articles and course descriptions.

Research data by faculty

We don’t have a comprehensive inventory of research done by faculty at our campus.  Because we were on a somewhat tight deadline to do the analysis, I came up with a very quick and dirty way of getting a lot of citations by using Web of Science.  Web of Science enables you to do a search for research published by affiliates of your university.  I was able to retrieve about 8,000 citations written by current or former faculty associated with my institution going back about 15 years.  Of course, we cannot consider the data in Web of Science to be fully representative of faculty research output, but it seemed like a good start at least.

Web of Science enables you to export 500-record chunks of metadata, so it took an hour or so to export the metadata in several pieces (see Figure 1 for my Web of Science export criteria).

A screenshot of Web of Science's Output Records function showing the following fields selected: All records in this list (up to 500), Author(s) / Editor(s), Abstract, PubMedID, Title, Source, Keywords, Web of Science Categories, Conference Information, Research Areas.

Figure 1. Web of Science’s Output Records function showing metadata fields selected.

Once I had all of the metadata for the 8,000 or so records written by faculty at my institution, I combined them into a single file.  Next, I needed to identify records that had sustainability keywords in either the title or abstract.

First, I created an array of all of the keywords, and turned that list into a Python set.  A Python set is different from a list in that the order of terms does not matter, and is ideal for checking membership of items in the set against strings (or in my case, a bunch of citation and abstract strings).

word_list = 'Agriculture,Alternative,Applied%Science [..snip..]'
word_set <span class="pl-k">=</span> <span class="pl-c1">set</span>(word_list.split(<span class="pl-s"><span class="pl-pds">'</span>,<span class="pl-pds">'</span></span>))

Note the % in “Applied%Science”.  For some reason my set lookup couldn’t match terms with spaces. My hacky solution was to replace spaces with % characters, and then do a find/replace in my spreadsheet of Web of Science data to replace all keyword matches with spaces (such as Applied Science) with percentage signs (Applied%Science).  Luckily, there were only 10 or so keywords on the list with spaces, so the find/replace did not take very long.  Note also that the set match lookup is case sensitive, so I actually found it easier to just turn everything to lower case in my Web of Science spreadsheet and match on the lower case term (though I kept both upper and lower case terms in my lookup set).

Then I checked to see if any words were in the title, abstract, or both, and constructed my query so that a new column would be added to an output spreadsheet indicating *which* matches were found:

for row in csv_reader:
 if (set(row[23].split()) & word_set) & (set(row[9].split()) & word_set) :
 csv_out.writerow(["title & abstract match",row[61],row[1],row[9],row[23],(set(row[9].split()) & word_set), (set(row[23].split()) & word_set)])

If any of the words in my set were found in the 23rd cell of the spreadsheet (the abstract) and the 9th cell of the spreadsheet (the title), then a row would be written to an output sheet indicating that sustainability keywords were found in the title and abstract, pulling in some citation details about the article (including author names), as well as a cell with a list of the matches found for both title and abstract fields.

I did similar conditionals for rows that found, for example, just a title match, or just an author match:

 elif set(row[9].split()) & word_set:
      csv_out.writerow(["title match",row[61],row[1],row[9],row[23], (set(row[9].split()) & word_set)])
    elif set(row[23].split()) & word_set:
      csv_out.writerow(["abstract match",row[61],row[1],row[9],row[23], (set(row[23].split()) & word_set)])

And that is pretty much the whole script!  With the output file, I did have to do a bit more work to identify current faculty at my institution, but I basically used the same set matching method above using a list provided by HR.

Because the STARS report also required analysis of courses related to sustainability, I also created a very similar script to lookup key terms found in course titles and descriptions.

Of course, just because a research article or course description has a keyword, or even multiple keywords, does not mean it’s relevant at all to sustainability.  One of the keywords identified as related to sustainability, for example, is “invest”, which basically meant that almost every finance class returned as a match.  Manual work was required to review the matches and weed out false positives, but because the keyword matching was already done and we could easily see what matches were found, this work was done fairly quickly.  We could, for example, identify courses and research articles that only had a single keyword match.  If that single keyword match was something like “sustainability” it was likely a sustainability-related course and would merit further review; if the single keyword match was something like “systems” it could probably be weeded out.

As with my author/title lookup script, if I had a bit more time to fuss with the script, I could have probably optimized it further (for example, by assigning weight to more sustainability-related keywords to help calculate a relevance score).  But again, a short amount of time invested in this script saved a huge amount of time, and enabled us to do something we would not otherwise have been able to do.

Python Resources

If you’re interested in learning more about Python and its syntax, and don’t have a lot of Python experience, a good (free) place to start is Google’s Python Class, created by Nick Parlante for Google (I actually took a similar class several years ago, also created by Dr. Parlante, through Coursera, which looks to still be available).  If you want to get started using Python right away and don’t want to have to fuss with installing it on your computer, you can check out the interactive course How to Think Like a Computer Scientist created by Brad Miller and David Ranum at Luther College.  For more examples of usage in Python for library work, check out Charles Ed Hill, Heidi Frank, and Mark Pernotto’s Python chapter in the just-released LITA Guide The Librarian’s Introduction to Programming Languagesedited by Beth Thomsett-Scott (full-disclosure: I am a contributor to this book).

  1. Working with CSV files and JSON Data.  In Sweigart, Al (2015). Automate the Boring Stuff with Python: Practical Programming for Total Beginners. San Francisco: No Starch Press.
  2. For an explanation of the difference between Python 2 and 3, see  The reason I use Python 2.7 for these scripts is because of my computing environment (in which Python 2 is installed by default), but if you have Python 3 installed on your computer, note that syntactical changes in Python 3 mean that many Python 2.x scripts may require revision in order to work.
  3. For instructions on using Pip with your Python installation, see:
  4.  Blank-White, Kristen. 2014. Researching the Researchers: Developing a Sustainability Research Inventory.  Presented at the 2014 AASHE Conference and Expo, Portland OR.

Cybersecurity, Usability, Online Privacy, and Digital Surveillance

Cybersecurity is an interesting and important topic, one closely connected to those of online privacy and digital surveillance. Many of us know that it is difficult to keep things private on the Internet. The Internet was invented to share things with others quickly, and it excels at that job. Businesses that process transactions with customers and store the information online are responsible for keeping that information private. No one wants social security numbers, credit card information, medical history, or personal e-mails shared with the world. We expect and trust banks, online stores, and our doctor’s offices to keep our information safe and secure.

However, keeping private information safe and secure is a challenging task. We have all heard of security breaches at J.P Morgan, Target, Sony, Anthem Blue Cross and Blue Shield, the Office of Personnel Management of the U.S. federal government, University of Maryland at College Park, and Indiana University. Sometimes, a data breach takes place when an institution fails to patch a hole in its network systems. Sometimes, people fall for a phishing scam, or a virus in a user’s computer infects the target system. Other times, online companies compile customer data into personal profiles. The profiles are then sold to data brokers and on into the hands of malicious hackers and criminals.

Image from Flickr –

Cybersecurity vs. Usability

To prevent such a data breach, institutional IT staff are trained to protect their systems against vulnerabilities and intrusion attempts. Employees and end users are educated to be careful about dealing with institutional or customers’ data. There are systematic measures that organizations can implement such as two-factor authentication, stringent password requirements, and locking accounts after a certain number of failed login attempts.

While these measures strengthen an institution’s defense against cyberattacks, they may negatively affect the usability of the system, lowering users’ productivity. As a simple example, security measures like a CAPTCHA can cause an accessibility issue for people with disabilities.

Or imagine that a university IT office concerned about the data security of cloud services starts requiring all faculty, students, and staff to only use cloud services that are SOC 2 Type II certified as an another example. SOC stands for “Service Organization Controls.” It consists of a series of standards that measure how well a given service organization keeps its information secure. For a business to be SOC 2 certified, it must demonstrate that it has sufficient policies and strategies that will satisfactorily protect its clients’ data in five areas known as “Trust Services Principles.” Those include the security of the service provider’s system, the processing integrity of this system, the availability of the system, the privacy of personal information that the service provider collects, retains, uses, discloses, and disposes of for its clients, and the confidentiality of the information that the service provider’s system processes or maintains for the clients. The SOC 2 Type II certification means that the business had maintained relevant security policies and procedures over a period of at least six months, and therefore it is a good indicator that the business will keep the clients’ sensitive data secure. The Dropbox for Business is SOC 2 certified, but it costs money. The free version is not as secure, but many faculty, students, and staff in academia use it frequently for collaboration. If a university IT office simply bans people from using the free version of Dropbox without offering an alternative that is as easy to use as Dropbox, people will undoubtedly suffer.

Some of you may know that the USPS website does not provide a way to reset the password for users who forgot their usernames. They are instead asked to create a new account. If they remember the account username but enter the wrong answers to the two security questions more than twice, the system also automatically locks their accounts for a certain period of time. Again, users have to create a new account. Clearly, the system that does not allow the password reset for those forgetful users is more secure than the one that does. However, in reality, this security measure creates a huge usability issue because average users do forget their passwords and the answers to the security questions that they set up themselves. It’s not hard to guess how frustrated people will be when they realize that they entered a wrong mailing address for mail forwarding and are now unable to get back into the system to correct because they cannot remember their passwords nor the answers to their security questions.

To give an example related to libraries, a library may decide to block all international traffic to their licensed e-resources to prevent foreign hackers who have gotten hold of the username and password of a legitimate user from accessing those e-resources. This would certainly help libraries to avoid a potential breach of licensing terms in advance and spare them from having to shut down compromised user accounts one by one whenever those are found. However, this would make it impossible for legitimate users traveling outside of the country to access those e-resources as well, which many users would find it unacceptable. Furthermore, malicious hackers would probably just use a proxy to make their IP address appear to be located in the U.S. anyway.

What would users do if their organization requires them to reset passwords on a weekly basis for their work computers and several or more systems that they also use constantly for work? While this may strengthen the security of those systems, it’s easy to see that it will be a nightmare having to reset all those passwords every week and keeping track of them not to forget or mix them up. Most likely, they will start using less complicated passwords or even begin to adopt just one password for all different services. Some may even stick to the same password every time the system requires them to reset it unless the system automatically detects the previous password and prevents the users from continuing to use the same one. Ill-thought-out cybersecurity measures can easily backfire.

Security is important, but users also want to be able to do their job without being bogged down by unwieldy cybersecurity measures. The more user-friendly and the simpler the cybersecurity guidelines are to follow, the more users will observe them, thereby making a network more secure. Users who face cumbersome and complicated security measures may ignore or try to bypass them, increasing security risks.

Image from Flickr -

Image from Flickr –

Cybersecurity vs. Privacy

Usability and productivity may be a small issue, however, compared to the risk of mass surveillance resulting from aggressive security measures. In 2013, the Guardian reported that the communication records of millions of people were being collected by the National Security Agency (NSA) in bulk, regardless of suspicion of wrongdoing. A secret court order prohibited Verizon from disclosing the NSA’s information request. After a cyberattack against the University of California at Los Angeles, the University of California system installed a device that is capable of capturing, analyzing, and storing all network traffic to and from the campus for over 30 days. This security monitoring was implemented secretly without consulting or notifying the faculty and those who would be subject to the monitoring. The San Francisco Chronicle reported the IT staff who installed the system were given strict instructions not to reveal it was taking place. Selected committee members on the campus were told to keep this information to themselves.

The invasion of privacy and the lack of transparency in these network monitoring programs has caused great controversy. Such wide and indiscriminate monitoring programs must have a very good justification and offer clear answers to vital questions such as what exactly will be collected, who will have access to the collected information, when and how the information will be used, what controls will be put in place to prevent the information from being used for unrelated purposes, and how the information will be disposed of.

We have recently seen another case in which security concerns conflicted with people’s right to privacy. In February 2016, the FBI requested Apple to create a backdoor application that will bypass the current security measure in place in its iOS. This was because the FBI wanted to unlock an iPhone 5C recovered from one of the shooters in San Bernadino shooting incident. Apple iOS secures users’ devices by permanently erasing all data when a wrong password is entered more than ten times if people choose to activate this option in the iOS setting. The FBI’s request was met with strong opposition from Apple and others. Such a backdoor application can easily be exploited for illegal purposes by black hat hackers, for unjustified privacy infringement by other capable parties, and even for dictatorship by governments. Apple refused to comply with the request, and the court hearing was to take place in March 22. The FBI, however, withdrew the request saying that it found a way to hack into the phone in question without Apple’s help. Now, Apple has to figure out what the vulnerability in their iOS if it wants its encryption mechanism to be foolproof. In the meanwhile, iOS users know that their data is no longer as secure as they once thought.

Around the same time, the Senate’s draft bill titled as “Compliance with Court Orders Act of 2016,” proposed that people should be required to comply with any authorized court order for data and that if that data is “unintelligible” – meaning encrypted – then it must be decrypted for the court. This bill is problematic because it practically nullifies the efficacy of any end-to-end encryption, which we use everyday from our iPhones to messaging services like Whatsapp and Signal.

Because security is essential to privacy, it is ironic that certain cybersecurity measures are used to greatly invade privacy rather than protect it. Because we do not always fully understand how the technology actually works or how it can be exploited for both good and bad purposes, we need to be careful about giving blank permission to any party to access, collect, and use our private data without clear understanding, oversight, and consent. As we share more and more information online, cyberattacks will only increase, and organizations and the government will struggle even more to balance privacy concerns with security issues.

Why Libraries Should Advocate for Online Privacy?

The fact that people may no longer have privacy on the Web should concern libraries. Historically, libraries have been strong advocates of intellectual freedom striving to keep patron’s data safe and protected from the unwanted eyes of the authorities. As librarians, we believe in people’s right to read, think, and speak freely and privately as long as such an act itself does not pose harm to others. The Library Freedom Project is an example that reflects this belief held strongly within the library community. It educates librarians and their local communities about surveillance threats, privacy rights and law, and privacy-protecting technology tools to help safeguard digital freedom, and helped the Kilton Public Library in Lebanon, New Hampshire, to become the first library to operate a Tor exit relay, to provide anonymity for patrons while they browse the Internet at the library.

New technologies brought us the unprecedented convenience of collecting, storing, and sharing massive amount of sensitive data online. But the fact that such sensitive data can be easily exploited by falling into the wrong hands created also the unparalleled level of potential invasion of privacy. While the majority of librarians take a very strong stance in favor of intellectual freedom and against censorship, it is often hard to discern a correct stance on online privacy particularly when it is pitted against cybersecurity. Some even argue that those who have nothing to hide do not need their privacy at all.

However, privacy is not equivalent to hiding a wrongdoing. Nor do people keep certain things secrets because those things are necessarily illegal or unethical. Being watched 24/7 will drive any person crazy whether s/he is guilty of any wrongdoing or not. Privacy allows us safe space to form our thoughts and consider our actions on our own without being subject to others’ eyes and judgments. Even in the absence of actual massive surveillance, just the belief that one can be placed under surveillance at any moment is sufficient to trigger self-censorship and negatively affects one’s thoughts, ideas, creativity, imagination, choices, and actions, making people more conformist and compliant. This is further corroborated by the recent study from Oxford University, which provides empirical evidence that the mere existence of a surveillance state breeds fear and conformity and stifles free expression. Privacy is an essential part of being human, not some trivial condition that we can do without in the face of a greater concern. That’s why many people under political dictatorship continue to choose death over life under mass surveillance and censorship in their fight for freedom and privacy.

The Electronic Frontier Foundation states that privacy means respect for individuals’ autonomy, anonymous speech, and the right to free association. We want to live as autonomous human beings free to speak our minds and think on our own. If part of a library’s mission is to contribute to helping people to become such autonomous human beings through learning and sharing knowledge with one another without having to worry about being observed and/or censored, libraries should advocate for people’s privacy both online and offline as well as in all forms of communication technologies and devices.

Are Your LibGuides 2.0 (images, tables, & videos) mobile friendly? Maybe not, and here’s what you can do about it.

LibGuides version 2 was released in summer 2014, and built on Bootstrap 3. However, after examining my own institutions’ guides, and conducting a simple random sampling of academic libraries in the United States, I found that many LibGuides did not display well on phones or mobile devices when it came to images, videos, and tables. Springshare documentation stated that LibGuides version 2 is mobile friendly out of the box and no additional coding is necessary, however, I found this not to necessarily be accurate. While the responsive features are available, they aren’t presented clearly as options in the graphical interface and additional coding needs to be added using the HTML editor in order for mobile display to be truly responsive when it comes to images, videos, and tables.

At my institution, our LibGuides are reserved for our subject librarians to use for their research and course guides. We also use the A-Z database list and other modules. As the LibGuides administrator, I’d known since its version 2 release that the new system was built on Bootstrap, but I didn’t know enough about responsive design to do anything about it at the time. It wasn’t until this past October when I began redesigning our library’s website using Bootstrap as the framework that I delved into customizing our Springshare products utilizing what I had learned.

I have found while looking at our own guides individually, and speaking to the subject librarians about their process, that they have been creating and designing their guides by letting the default settings take over for images, tables, and videos.  As a result, several tables are running out of their boxes, images are getting distorted, and videos are stretched vertically and have large black top and bottom margins. This is because additional coding and/or tweaking is indeed necessary in most cases for these to display correctly on mobile.

I’m by no means a Bootstrap expert, but my findings have been verified with Springshare, and I was told by Springshare support that they will be looked at by the developers. Support indicated that there may be a good reason things work as they do, perhaps to give users flexibility in their decisions, or perhaps a technical reason. I’m not sure, but for now we have begun work on making the adjustments so they display correctly. I’d be interested to hear others’ experiences with these elements and what they have had to do, if anything to assure they are responsive.


Initially, as I learned how to use Bootstrap with the LibGuides system, I looked at my own library’s subject guides and testing the responsiveness and display. To start, I browsed through our guides with my Android phone. I then used Chrome and IE11 on desktop and resized the windows to see if the tables stayed within their boxes, and images respond appropriately. I peeked at the HTML and elements within LibGuides to see how the librarians had their items configured. Once I realized the issues were similar across all guides, I took my search further. Selfishly hoping it wasn’t just us, I used the LibGuides Community site where I sorted the list by libraries on version 2, then sorted by academic libraries. Each state’s list had to be looked at separately (you can’t sort by the whole United States). I placed all libraries from each state in a separate Excel sheet in alphabetical order. Using the random sort function, I examined two to three, sometimes five libraries per state (25 states viewed) by following the link provided in the community site list. I also inspected the elements of several LibGuides in my spreadsheet live in Chrome. I removed dimensions or styling to see how the pages responded since I don’t have admin access to any other universities guides. I created a demo guide for the purposes of testing where I inserted various tables, images and videos.

Some Things You Can Try

Even if you or your LibGuides authors may or may not be familiar with Bootstrap or fundamentals of responsive design, anyone should be able to design or update these guide elements using the instructions below; there is no serious Bootstrap knowledge needed for these solutions.


As we know, tables should not be used for layout. They are meant to display tabular data. This is another issue I came across in my investigation. Many librarians are using tables in this manner. Aside from being an outdated practice, this poses a more serious issue on a mobile device. Authors can learn how to float images, or create columns and rows right within the HTML Editor as an alternative. So for the purposes of this post, I’ll only be using a table in a tabular format.

When inserting a table using the table icon in the rich text editor you are asked typical table questions. How many rows? How many columns? In speaking with the librarians here at my institution, no one is really giving it much thought beyond this. They are filling in these blanks, inserting the table, and populating it. Or worse, copying and pasting a table created in Word. 

However, if you leave things as they are and the table has any width to it, this will be your result once minimized or viewed on mobile device:

Figure 1: Rows overflowing from table

Figure 1: LibGuides default table with no responsive class added

As you can see, the table runs out of the container (the box). To alleviate this, you will have to open the HTML Editor, find where the table begins, and wrap the table in the table-responsive class. The HTML Editor is available to all regular users, no administrative access is needed. If you aren’t familiar with adding classes, you will also need to close the tag after the last table code you see. The HTML looks something like this:

<div class="table-responsive">
    all the other elements go here

Below is the result of wrapping all table elements in the table-responsive class. As you can see it is cleaner, there is no run-off, and bootstrap added a horizontal scroll bar since the table is really too big for the box once it is resized. On a phone, you can now swipe sideways to scroll through the table.

Figure 3: Result of adding the responsive table class

Figure 2: Result of adding the responsive table class.

Springshare has also made the Bootstrap table styling classes available, which you can see in the editor dropdown as well. You can experiment with these to see which styling you prefer (borders, hover rows, striped rows…), but they don’t replace adding the table-responsive class to the table.


When inserting an image in a LibGuides box, the system brings the dimensions of the image with it into the Image Properties box by default. After various tests I found it best to match the image size to the layout/box prior to uploading, and then remove the dimensions altogether from within the Image Properties box (and don’t place it in an unresponsive table). This can easily be done right within the Image Properties box when the image is inserted. It can also be done in the HTML Editor afterwards.

Figure 4: Image dimensions can be removed in the Image Properties box

Figure 3: Image dimensions can be removed in the Image Properties box.


Image dimensions removed

On the left: dimensions in place. On the right: dimensions removed.

By removing the dimensions, the image is better able to resize accordingly, especially in IE which seems to be less forgiving than Chrome. Guide creators should also add descriptive Alternative Text while in the Image Properties box for accessibility purposes.

Some users may be tempted to resize large images by adjusting the dimensions right in the Properties box . However, doing this doesn’t actually decrease the size that gets passed to the user so it doesn’t help download speed. Substantial resizing needs to be done prior to upload. Springshare recommends adjustments of no more than 10-15%.1


There are a few things I tried while figuring the best way to embed a YouTube video:

    1.  Use the YouTube embed code as is. Which can result in a squished image, and a lot of black border in the top and bottom margins.

Default Youtube embed code

    1. Use the YouTube embed code but remove the iframe dimensions (width=”560″ height=”315″). Results in a small image that looks fine, but stays small regardless of the box size.
    2. Use the YouTube embed code, remove the iframe dimensions and add the embed-responsive class. In this case, 16by9. This results in a nice responsive display, with no black margins. Alternately, I discovered that leaving the iframe dimensions while adding the responsive class looks nearly the same.

Youtube embed code dimensions removed and responsive class added

It should also be noted that LibGuides creators and editors should manually add a “title” attribute to the embed code for accessibility.2 Neither LibGuides nor YouTube does this automatically, so it’s up to the guide creator to add it in the HTML Editor. In addition, the “frameborder=0” will be overwritten by Bootstrap, so you can remove it or leave, it’s up to you.

Considering Box Order/Stacking

The way boxes stack and order on smaller devices is also something LibGuides creators or editors should take into consideration. The layout is essentially comprised of columns, and in Bootstrap the columns stack a certain way depending on device size.

I’ve tested several guides and believe the following are representative of how boxes will stack on a phone, or small mobile device. However, it’s always best to test your layout to be sure. Test your own guides by minimizing and resizing your browser window and watch how they stack.

Box stacking order of a guide with no large top box and three columns.


Box stacking order of a guide with a large top box and two columns.



After looking at the number of libraries that have these same issues, it may be safe to say that our subject librarians are similar to others in regard to having limited HTML, CSS, or design skills. They rely on LibGuides easy to use interface and system to do most of the work as their time is limited, or they have no interest in learning these additional skills. Our librarians spend most of their teaching time in a classroom, using a podium and large screen, or at the reference desk on large screens. Because of this they are not highly attuned to the mobile user and how their guides display on other devices, even though their guides are being accessed by students on phones or tablets. We will be initiating a mobile reference service soon, perhaps this will help bring further awareness. For now, I recently taught an internal workshop in order demonstrate and share what I have learned in hopes of helping the librarians get these elements fixed. Helping ensure new guides will be created with mobile in mind is also a priority. To date, several librarians have gone through their guides and made the changes where necessary. Others have summer plans to update their guides and address these issues at the same time. I’m not aware of any way to make these changes in bulk, since they are very individual in nature.


Danielle Rosenthal is the Web Development & Design Librarian at Florida Gulf Coast University. She is responsible for the library’s web site and its applications in support of teaching, learning, and scholarship activities of the FGCU Library community. Her interests include user interface, responsive, and information design.



1 Maximizing your LibGuides for Mobile

Looking Across the Digital Preservation Landscape

When it comes to digital preservation, everyone agrees that a little bit is better than nothing. Look no further than these two excellent presentations from Code4Lib 2016, “Can’t Wait for Perfect: Implementing “Good Enough” Digital Preservation” by Shira Peltzman and Alice Sara Prael, and “Digital Preservation 101, or, How to Keep Bits for Centuries” by Julie Swierczek. I highly suggest you go check those out before reading more of this post if you are new to digital preservation, since they get into some technical details that I won’t.

The takeaway from these for me was twofold. First, digital preservation doesn’t have to be hard, but it does have to be intentional, and secondly, it does require institutional commitment. If you’re new to the world of digital preservation, understanding all the basic issues and what your options are can be daunting. I’ve been fortunate enough to lead a group at my institution that has spent the last few years working through some of these issues, and so in this post I want to give a brief overview of the work we’ve done, as well as the current landscape for digital preservation systems. This won’t be an in-depth exploration, more like a key to the map. Note that ACRL TechConnect has covered a variety of digital preservation issues before, including data management and preservation in “The Library as Research Partner” and using bash scripts to automate digital preservation workflow tasks in “Bash Scripting: automating repetitive command line tasks”.

The committee I chair started examining born digital materials, but expanded focus to all digital materials, since our digitized materials were an easier test case for a lot of our ideas. The committee spent a long time understanding the basic tenets of digital preservation–and in truth, we’re still working on this. For this process, we found working through the NDSA Levels of Digital Preservation an extremely helpful exercise–you can find a helpfully annotated version with tools by Shira Peltzman and Alice Sara Prael, as well as an additional explanation by Shira Peltman. We also relied on the Library of Congress Signal blog and the work of Brad Houston, among other resources. A few of the tasks we accomplished were to create a rough inventory of digital materials, a workflow manual, and to acquire many terabytes (currently around 8) of secure networked storage space for files to replace all removable hard drives being used for backups. While backups aren’t exactly digital preservation, we wanted to at the very least secure the backups we did have. An inventory and workflow manual may sound impressive, but I want to emphasize that these are living and somewhat messy documents. The major advantage of having these is not so much for what we do have, but for identifying gaps in our processes. Through this process, we were able to develop a lengthy (but prioritized) list of tasks that need to be completed before we’ll be satisfied with our processes. An example of this is that one of the major workflow gaps we discovered is that we have many items on obsolete digital media formats, such as floppy disks, that needs to be imaged before it can even be inventoried. We identified the tool we wanted to use for that, but time and staffing pressures have left the completion of this project in limbo. We’re now working on hiring a graduate student who can help work on this and similar projects.

The other piece of our work has been trying to understand what systems are available for digital preservation. I’ll summarize my understanding of this below, with several major caveats. This is a world that is currently undergoing a huge amount of change as many companies and people work on developing new systems or improving existing systems, so there is a lot missing from what I will say. Second, none of these solutions are necessarily mutually exclusive. Some by design require various pieces to be used together, some may not require it, but your circumstances may dictate a different solution. For instance, you may not like the access layer built into one system, and so will choose something else. The dream that you can just throw money at the problem and it will go away is, at present, still just a dream–as are so many library technology problems.

The closest to such a dream is the end-to-end system. This is something where at one end you load in a file or set of files you want to preserve (for example, a large set of donated digital photographs in TIFF format), and at the other end have a processed archival package (which might include the TIFF files, some metadata about the processing, and a way to check for bit rot in your files), as well as an access copy (for example, a smaller sized JPG appropriate for display to the public) if you so desire–not all digital files should be available to the public, but still need to be preserved.

Examples of such systems include Preservica, ArchivesDirect, and Rosetta. All of these are hosted vended products, but ArchivesDirect is based on open source Archivematica so it is possible to get some idea of the experience of using it if you are able to install the tools on which it based. The issues with end-t0-end systems are similar to any other choice you make in library systems. First, they come at a high price–Preservica and ArchivesDirect are open about their pricing, and for a plan that will meet the needs of medium-sized libraries you will be looking at $10,000-$14,000 annual cost. You are pretty much stuck with the options offered in the product, though you still have many decisions to make within that framework. Migrating from one system to another if you change your mind may involve some very difficult processes, and so inertia dictates that you will be using that system for the long haul, which a short trial period or demos may not be enough to really tell you that it’s a good idea. But you do have the potential for more simplicity and therefore a stronger likelihood that you will actually use them, as well as being much more manageable for smaller staffs that lack dedicated positions for digital preservation work–or even room in the current positions for digital preservation work.  A hosted product is ideal if you don’t have the staff or servers to install anything yourself, and helps you get your long-term archival files onto Amazon Glacier. Amazon Glacier is, by the way, where pretty much all the services we’re discussing store everything you are submitting for long-term storage. It’s dirt cheap to store on Amazon Glacier and if you can restore slowly, not too expensive to restore–only expensive if you need to restore a lot quickly. But using it is somewhat technically challenging since you only interact with it through APIs–there’s no way to log in and upload files or download files as with a cloud storage service like Dropbox. For that reason, when you’re paying a service hundreds of dollars a terabyte that ultimately stores all your material on Amazon Glacier which costs pennies per gigabye, you’re paying for the technical infrastructure to get your stuff on and off of there as much as anything else. In another way you’re paying an insurance policy for accessing materials in a catastrophic situation where you do need to recover all your files–theoretically, you don’t have to pay extra for such a situation.

A related option to an end-to-end system that has some attractive features is to join a preservation network. Examples of these include Digital Preservation Network (DPN) or APTrust. In this model, you pay an annual membership fee (right now $20,000 annually, though this could change soon) to join the consortium. This gives you access to a network of preservation nodes (either Amazon Glacier or nodes at other institutions), access to tools, and a right (and requirement) to participate in the governance of the network. Another larger preservation goal of such networks is to ensure long-term access to material even if the owning institution disappears. Of course, $20,000 plus travel to meetings and work time to participate in governance may be out of reach of many, but it appears that both DPN and APTrust are investigating new pricing models that may meet the needs of smaller institutions who would like to participate but can’t contribute as much in money or time. This a world that I would recommend watching closely.

Up until recently, the way that many institutions were achieving digital preservation was through some kind of repository that they created themselves, either with open source repository software such as Fedora Repository or DSpace or some other type of DIY system. With open source Archivematica, and a few other tools, you can build your own end-to-end system that will allow you to process files, store the files and preservation metadata, and provide access as is appropriate for the collection. This is theoretically a great plan. You can make all the choices yourself about your workflows, storage, and access layer. You can do as much or as little as you need to do. But in practice for most of us, this just isn’t going to happen without a strong institutional commitment of staff and servers to maintain this long term, at possibly a higher cost than any of the other solutions. That realization is one of the driving forces behind Hydra-in-a-Box, which is an exciting initiative that is currently in development. The idea is to make it possible for many different sizes of institutions to take advantage of the robust feature sets for preservation in Fedora and workflow management/access in Hydra, but without the overhead of installing and maintaining them. You can follow the project on Twitter and by joining the mailing list.

After going through all this, I am reminded of one of my favorite slides from Julie Swierczek’s Code4Lib presentation. She works through the Open Archival Initiative System model graph to explain it in depth, and comes to a point in the workflow that calls for “Sustainable Financing”, and then zooms in on this. For many, this is the crux of the digital preservation problem. It’s possible to do a sort of ok job with digital preservation for nothing or very cheap, but to ensure long term preservation requires institutional commitment for the long haul, just as any library collection requires. Given how much attention digital preservation is starting to receive, we can hope that more libraries will see this as a priority and start to participate. This may lead to even more options, tools, and knowledge, but it will still require making it a priority and putting in the work.