Information Architecture for a Library Website Redesign

My library is about to embark upon a large website redesign during this summer semester. This isn’t going to be just a new layer of CSS, or a minor version upgrade to Drupal, or moving a few pages around within the same general site. No, it’s going to be a huge, sweeping change that affects the whole of our web presence. With such an enormous task at hand, I wanted to discuss some of the tools and approaches that we’re using to make sure the new site meets our needs.

Why Redesign?

I’ve heard about why the wholesale website redesign is a flawed approach, why we should be continually, iteratively working on our sites. Continual changes stop problems from building up, plus large swaths of changes can disrupt our users who were used to the old site. The gradual redesign makes a lot of sense to me, and also seems like a complete luxury that I’ve never had in my library positions.

The primary problem with a series of smaller changes is that that approach assumes a solid fundamental to begin with. Our current site, however, has a host of interconnected problems that makes tackling any individual issue a challenge. It’s like your holiday lights sitting in a box all year; they’re hopelessly tangled by the time you take them out again.

Our site has decades of discarded, forgotten content. That’s mostly harmless; it’s hard to find and sees virtually no traffic. But it’s still not great to have outdated information scattered around. In particular, I’m not thrilled that a lot of it is static HTML, images, and documents sitting outside our content management system. It’s hard to know how much content we even have because it cannot be managed in one place.

We also fell into a pattern of adding content to the site but never removing or re-organizing existing content. Someone would ask for a button here, or a page dictating a policy there, or a new FAQ entry. Pages that were added didn’t have particular owners responsible for their currency and maintenance; I, as Systems Librarian, was expected to run the technical aspects of the site but also be its primary content editor. That’s simply an impossible task, as I don’t know every detail of the library’s operations or have the time to keep on top of a menagerie of pages of dubious importance.

I tried to create a “website changes form” to manage things, but it didn’t work for staff nor myself. The few staff who did fill out the form ended up requesting things that were difficult to do, large theme changes that I wasn’t comfortable making without user testing or approval from our other librarians. The little content that was added was minor text being ferried through this form and myself, essentially slowing down the editorial process and furthering this idea that web content was solely my domain.

To top our content troubles off, we’re also on an unsupported, outdated version of Drupal. Upgrading or switching a CMS isn’t necessarily related to a website redesign. If you have a functional website on a broken piece of software, you probably don’t want to toss out the good with the bad. But in our case, similar to how our ILS migration gave us the opportunity to clean up our bibliographic records, a CMS migration gives us a chance to rebuild a crumbling website. It just doesn’t make sense to invest technical effort in migrating all our existing content when it’s so clearly in need of major structural change.

Card Sort

Making a card sort

Cards in the middle of being constructed.

Not wanting to go into a redesign process blind, we set out to collect data on our current site and how it could be improved. One of the first ways we gathered data was to ask all library staff to perform a card sort. A card sort is an activity wherein pieces of web content are put on cards which can then be placed into categories; the idea is to form a rough information architecture for your site which can dictate structure and main menus. You can do either open or closed card sorts, meaning the categories are up to the participants to invent or provided ahead of time.

For our card sort, I chose to do an open card sort since we were so uncertain on the categories. Secondly, I selected web content based on our existing site’s analytics. It was clear to me that our current site was bloated and disorganized; there were pages tucked into the nooks of cyberspace that no one had visited in years. There was all sorts of overlapping and unnecessary content. So I selected ≈20 popular pages but also gave each group two pieces of blank paper on which to add whatever content they felt was missing.

Finally, trying to get as much and as useful data as possible, I modified the card sort procedure in a couple ways. I asked people to role play as different types of stakeholders (graduate & undergraduate students, faculty, administrators) and to justify their decisions from that vantage point. I also had everyone, after sorting was done, put dots on content they felt was important enough for the home page. Since one of our current site’s primary challenges in maintenance, or the lack thereof, I wanted to add one last activity wherein participants would write a “responsible staff member” on each card (e.g. the instruction librarian maintains the instruction policy page). Sadly, we ran out of time and couldn’t do that bit.

The results of the card sort were informative. A few categories emerged as a commonality across everyone’s sorts: collections, “about us”, policies, and current events/news. We discovered a need for new content to cover workshops, exhibits, and events happening in the library which were currently only represented (and not very well) on blog posts. In terms of the home page, it was clear that LibGuides, collections, news, and most importantly our open hours needed to represented.

Treejack & Analytics

Once we had enough information to build out the site’s architecture, I organized our content into a few major categories. But there were still several questions on my mind: would users understand terms like “special collections”? Would they understand where to look for LibGuides? Would they know how to find the right contact for various questions? To answer some of these questions, I turned to Optimal Workshop’s “Treejack” tool. Treejack tests a site’s information architecture by having users navigate basic text links to perform basic tasks. We created a few tasks aimed at answering our questions and recruited students to perform them. While we’re only using the free tier of Optimal Workshop, and only using student stakeholders, the data was till informative.

For one, Optimal Workshop’s results data is rich and visualized well. It shows the exact routes each user took through our site’s content, the time it took to complete a task, and whether a task was completed directly, completed indirectly, or failed. Completed directly means the user took an ideal route through our content; no bouncing up and down the site’s hierarchy. Indirect completion means they eventually got to the right place, but didn’t take a perfect path there, while failure means they ended in the wrong place. The graph’s the demonstrate each tasks’ outcomes are wonderful:

Data & charts for a task

The data & charts Treejack shows for a moderately successful task.

"Pie tree" visualizing users' paths

A “pie tree” showing users’ paths while attempting a task.

We can see here that most of our users found their way to LibGuides (named “study guides” here). But a few people expected to find them under our “Collections” category and bounced around in there, clearly lost. This tells us we should represent our guides under Collections alongside items like databases, print collections, and course reserves. While building and running your own Treejack-type tests would be easy, I definitely recommend Optimal Workshop as a great product which provides much insight.

There’s much work to be done in terms of testing—ideally we would adjust our architecture to address the difficulties that users had, recruit different sets of users (faculty & staff), and attempt to answer more questions. That’ll be difficult during the summer while there are fewer people on campus but we know enough now to start adjusting our site and moving along in the redesign process.

Another piece of our redesign philosophy is using analytics about the current site to inform our decisions about the new one. For instance, I track interactions with our home page search box using Google Analytics events 1. The search box has three tabs corresponding to our discovery layer, catalog, and LibGuides. Despite thousands of searches and interactions with the search box, LibGuides search is seeing only trace usage. The tab was clicked on a mere 181 times this year; what’s worse, only 51 times did a user actually search afterwards. This trace amount of usage, plus the fact that users are clearly clicking onto the tab and then not finding what they want there, indicates it’s just not worth any real estate on the home page. When you add in that our LibGuides now appear in our discovery layer, their search tab is clearly disposable.

What’s Next

Data, tests, and conceptual frameworks aside, our next stage will involve building something much closer to an actual, functional website. Tools like Optimal Workshop are wonderful for providing high-level views on how to structure our information, but watching a user interact with a prototype site is so much richer. We can see their hesitation, hear them discuss the meanings of our terms, get their opinions on our stylistic choices. Prototype testing has been a struggle for me in the past; users tend to fixate on the unfinished or unrefined nature of the prototype, providing feedback that tells me what I already know (yes, we need to replace the placeholder images; yes, “Lorem ipsum dolor sit amet” is written on every page) rather than something new. I hope to counter that by setting appropriate expectations and building a small but fairly robust prototype.

We’re also building our site in an entirely new piece of software, Wagtail. Wagtail is exciting for a number of reasons, and will probably have to be the subject of future posts, but it does help address some of the existing issues I noted earlier. We’re excited by the innovative Streamfield approach to content—a replacement for large, rich text fields which are unstructured and often let users override a site’s base styles. We’ve also heard whispers of new workflow features which let us send reminders to owners of different content pages to revisit them periodically. While I could do something like this myself with an ad hoc mess of calendar events and spreadsheets, having it build right into the CMS bodes well for our future maintenace plans. Obviously, the concepts underlying Wagtail and the tools it offers will influence how we implement our information architecture. But we also started gathering data long before we knew what software we’d use, so exactly how it will work remains to be figured out.

Has your library done a website redesign or information architecture test recently? What tools or approaches did you find useful? Let us know in the comments!


  1. I described Google Analytics events before in a previous Tech Connect post

Representing Online Journal Holdings in the Library Catalog

The Problem

It isn’t easy to communicate to patrons what serials they have access to and in what form (print, online). They can find these details, sure, but it’s scattered across our library’s web presence. What’s most frustrating is that we clearly have all the necessary information but the systems offer no built-in way to produce a clear display of it. My fellow librarians noted that “it’d be nice if the catalog showed our exact online holdings” and my initial response was to sigh and say “yes, that would be nice”.

To illustrate the scope of the problem, a user can search for journals in a few of our disparate systems:

  • we use a knowledgebase to track database subscriptions and which journals are included in each subscription package
  • the public catalog for our Koha ILS has records for our print journals, sometimes with a MARC 856$u 1 link to our online holdings in the knowledgebase
  • our discovery layer has both article-level results for the journals in our knowledgebase and journal-level search results for the ones in our catalog

While these systems overlap, they also serve distinct purposes, so it’s not so awful. However, there are a few downsides to our triad of serials information systems. First of all, if a patron searches the knowledgebase looking for a journal which we only have in print, our database holdings wouldn’t show that they have access to print issues. To work around this, we track our print issues both in our ILS and the knowledgebase, which duplicates work and introduces possible inconsistencies.

Secondly, someone might start their research in the discovery layer, finding a journal-level record that links out to our catalog. But it’s too much to ask a user to search the discovery layer, click into the catalog, click a link out to the knowledgebase, and only then discover our online holdings don’t include the particular volume they’re looking for. Possessing three interconnected systems creates labyrinthine search patterns and confusion amongst patrons. Simply describing the systems and their nuanced areas of overlap in this post feels like challenge, and the audience is librarians. I can imagine how our users must feel when we try to outline the differences.

The 360 XML API

Our knowledgebase is Serials Solutions 360KB. I went looking in the vendor’s help documentation for answers, which refers to an API for the product but apparently provides no information on using said API. Luckily, a quick search through GitHub projects yielded several using the API and I was able to determine its URL structure: http://{{your Serials Solution ID}}{{the journal’s ISSN}}

It’s probably possible to search by other parameters as well, but for my purposes ISSN was ideal so I didn’t bother investigating further. If you send a request to the address above, you receive XML in response:

<ssopenurl:openURLResponse xmlns:dc="" xmlns:ssdiag="" xmlns:ssopenurl="" xmlns:xsi="" xsi:schemaLocation="">
    <ssopenurl:results dbDate="2017-02-15">
        <ssopenurl:result format="journal">
                <ssopenurl:issn type="print">0212-5633</ssopenurl:issn>
                <ssopenurl:linkGroup type="holding">
                        <ssopenurl:providerName>Library Specific Holdings</ssopenurl:providerName>
                        <ssopenurl:databaseName>CCA Print Holdings</ssopenurl:databaseName>
                    <ssopenurl:url type="source"></ssopenurl:url>
                    <ssopenurl:url type="journal">
    <ssopenurl:echoedQuery timeStamp="2017-02-15T16:14:12">
        <ssopenurl:library id="EY7MR5FU9X">
            <ssopenurl:name>California College of the Arts</ssopenurl:name>

If you’ve read XML before, then it’s apparent how useful the above data is. It contains a list of our “holdings” for the periodical with information about the start and end (absent here, which implies the holdings run to the present date) dates of the subscription, which database they’re in, and what URL they can be accessed at. Perfect! The XML contains precisely the information we want to display in our catalog.

Unfortunately, our catalog’s JavaScript doesn’t have permission to access the 360 XML API. Due to a browser security policy resources must explicitly say that other domains or pages are allowed to request their data. A page needs to include the Access-Control-Allow-Origin HTTP header to abide by this policy, called Cross-Origin Resource Sharing (CORS), and the 360 API does not.

We can work around this limitation but it requires extra code on our part. While JavaScript from a web page cannot request data directly from 360, we can write a server-side script to pull data. That server-side script can then add its own CORS header which lets our catalog use it. So, in essence, we set up a proxy service that acts as a go-between for our catalog and the API that the catalog cannot use. Typically, this takes little code; the server-side script takes a parameter passed to it in the URL, sends it in a HTTP request to another server, and serves back up whatever response it receives.

Of course, it didn’t turn out to be that simple in practice. As I experimented with my scripts, I could tell that the 360 data was being received, but I couldn’t parse meaningful pieces of information out of it. It’s clearly there; I could see the full XML structure with holdings details. But neither my server-side PHP nor my client-side JavaScript could “find” XML elements like <ssopenurl:linkGroup> and <ssopenurl:normalizedData>. The text before the colon in the tag names is the namespace. Simple jQuery code like $('ssopenurl:linkGroup', xml), which can typically parse XML data, wasn’t working with these namespaced elements.

Finally, I discovered the solution by reading the PHP manual’s entry for the simplexml_load_string function: I can tell PHP how to parse namespaced XML by passing a namespace parameter to the parser function. So my function call turned into:

// parameters: 1) serials solution data since $url is the API we want to pull from
// 2) the type of object that the function should return (this is the default)
// 3) Libxml options (also the default, no special options)
// 4) (finally!) ns, the XML namespace
// 5) "True" here means ns is a prefix and not a URI
$xml= simplexml_load_string( file_get_contents($url), 'SimpleXMLElement', 0, 'ssopenurl', True );

As you can see, two of those parameters don’t even differ from the function’s defaults, but I still need to provide them to get to the “ssopenurl” namespace later. As an aside, technical digressions like these are some of the best and worst parts of my job. It’s rewarding to encounter a problem, perform research, test different approaches, and eventually solve it. But it’d also be nice, and a lot quicker, if code would just work as expected the first time around.

The Catalog

We’re lucky that Koha’s catalog both allows for JavaScript customization and has a well-structured, easy-to-modify record display. Now that I’m able to grab online holdings data from our knowledgebase, inserting into the catalog is trivial. If you wanted to do the same with a different library catalog, the only changes come in the JavaScript that finds ISSN information in a record and then inserts the retrieved holdings information into the display. The complete outline of the data flow from catalog to KB and back looks like:

  • my JavaScript looks for an ISSN on the record’s display page
  • if there’s an ISSN, it sends the ISSN to my proxy script
  • the proxy script adds a few parameters & asks for information from the 360 XML API
  • the 360 XML API returns XML, which my proxy script parses into JSON and sends to the catalog
  • the catalog JavaScript receives the JSON and parses holdings information into formatted HTML like “Online resources: 1992 to present in DOAJ
  • the JS inserts the formatted text into the record’s “online resources” section, creating that section if it doesn’t already exist

Is there a better way to do this? Almost certainly. The six steps above should give you a sense of how convoluted the process is, hacking around a few limitations. Still, the outcome is positive: we stopped updating our print holdings in our knowledgebase and our users have more information at their fingertips. It obviates the final step in the protracted “discovery layer to catalog” search described in the opening of this post.

Our next steps are obvious, too: we should aim to get this information into the discovery layer’s search results for our journals. The general frame of this project would be the same; we already know how to get the data from the API. Much like working with a different library catalog, the only edits are in parsing ISSNs from discovery layer search results and finding a spot in the HTML to insert the holdings data. Finally, we can also remove the redundant and less useful 856$u links from our periodical MARC records now.

The Scripts

These are highly specific to our catalog, but may be of general use to others who want to see how the pieces work together:


  1. For those unfamiliar, 856 is the MARC field for URLs, whether they URL represents the actual resource being described or something supplementary. It’s pretty common for print journals to also have 856 fields for their online counterparts.

Creating an OAI-PMH Feed From Your Website

Libraries who use a flexible content management system such as Drupal or WordPress for their library website and/or resource discovery have a challenge in ensuring that their data is accessible to the rest of the library world. Whether making metadata useable by other libraries or portals such as DPLA, or harvesting content in a discovery layer, there are some additional steps libraries need to take to make this happen. While there are a number of ways to accomplish this, the most straightforward is to create an OAI-PMH feed. OAI-PMH stands for Open Archives Initiative Protocol for Metadata Harvesting, and is a well-supported and understood protocol in many metadata management systems. There’s a tutorial available to understand the details you might want to know, and the Open Archives Initiative has detailed documentation.

Content management tools designed specifically for library and archives usage, such as LibGuides and Omeka, have a built in OAI-PMH feed, and generally all you need to do is find the base URL and plug it in. (For instance, here is what a LibGuides OAI feed looks like). In this post I’ll look at what options are available for Drupal and WordPress to create the feed and become a data provider.


This is short, since there aren’t that many options. If you use WordPress for your library website you will have to experiment, as there is nothing well-supported. Lincoln University in New Zealand has created a script that converts a WordPress RSS feed to a minimal OAI feed. This requires editing a PHP file to include your RSS feed URL, and uploading to a server. I admit that I have been unsuccessful at testing this, but Lincoln University has a working example, and uses this to harvest their WordPress library website into Primo.


If you use Drupal, you will need to first install a module called Views OAI-PMH. What this does is create a Drupal view formatted as an OAI-PMH data provider feed. Those familiar with Drupal know that you can use the Views module to present content in a variety of ways. For instance, you can include certain fields from certain content types in a list or chart that allows you to reuse content rather than recreating it. This is no different, only the formatting is an OAI-PMH compliant XML structure. Rather than placing the view in a Drupal page or block, you create a separate page. This page becomes your base URL to provide to others or reuse in whatever way you need.

The Views OAI-PMH module isn’t the most obvious module to set up, so here are the basic steps you need to follow. First, enable and set permissions as usual. You will also want to refresh your caches (I had trouble until I did this). You’ll discover that unlike other modules the documentation and configuration is not in the interface, but in the README file, so you will need to open that out of the module directory to get the configuration instructions.

To create your OAI-PMH view you have two choices. You can add it to a view that is already created, or create a new one. The module will create an example view called Biblio OAI-PMH (based on an earlier Biblio module used for creating bibliographic metadata). You can just edit this to create your OAI feed. Alternatively, if you have a view that already exists with all the data you want to include, you can add an OAI-PMH display as an additional display. You’ll have to create a path for your view that will make it accessible via a URL.

The details screen for the OAI-PMH display.

The Views OAI-PMH module only supports Dublin Core at this time. If you are using Drupal for bibliographic metadata of some kind, mapping the fields is a fairly straightforward process. However, choosing the Dublin Core mappings for data that is not bibliographic by nature requires some creativity and thought about where the data will end up. When I was setting this up I was trying to harvest most of the library website into our discovery layer, so I knew how the discovery layer parsed OAI DC and could choose fields accordingly.

After adding fields to the view (just as you normally would in creating a view), you will need to select settings for the OAI view to select the Dublin Core element name for each content field.

You can then map each element to the appropriate Dublin Core field. The example from my site includes some general metadata that appears on all content (such as Title), and some that only appears in specific content types. For instance, Collection Description only appears on digital collection content types. I did not choose to include the body content for any page on the site, since most of those pages contain a lot of scripts or other code that wasn’t useful to harvest into the discovery layer. Explanatory content such as the description of a digital collection or a database was more useful to display in the discovery layer, and exists only in special fields for those content types on my Drupal site, so we could pull those out and display those.

In the end, I have a feed that looks like this. Regular pages end up with very basic metadata in the feed:

<oai_dc:dc xsi:schemaLocation="">
<dc:identifier></dc:identifier><dc:creator>Loyola University Libraries</dc:creator></oai_dc:dc>

Whereas databases get more information pulled in. Note that there are two identifiers, one for the database URL, and one for the database description link. We will make these both available, but may choose one to use only one in the discovery layer and hide the other one.

<oai_dc:dc xsi:schemaLocation="">
<dc:title>Annual Bibliography of English Language and Literature</dc:title>
<dc:subject>Modern Languages</dc:subject>
<dc:creator>Loyola University Libraries</dc:creator>

When someone does a search in the discovery layer for something on the library website, the result shows the page right in the interface. We are still doing usability tests on this right now, but expect to move it into production soon.


I’ve just touched on two content management systems, but there are many more out there. Do you create OAI-PMH feeds of your data? What do you do with them? Share your examples in the comments.

Data Refuge and the Role of Libraries

Society is always changing. For some, the change can seem slow and frustrating, while others may feel as though the change occurred in a blink of an eye. What is this change that I speak of? It can be anything…civil rights, autonomous cars, or national leaders. One change that no one ever seems particularly prepared for, however, is when a website link becomes broken. One day, you could click a link and get to a site and the next day you get a 404 error. Sometimes this occurs because a site was migrated to a new server and the link was not redirected. Sometimes this occurs because the owner ceased to maintain the site. And sometimes, this occurs for less benign reasons.

Information access via the Internet is an activity that many (but not all) of us do everyday, in sometimes unconscious fashion: checking the weather, reading email, receiving news alerts. We also use the Internet to make datasets and other sources of information widely available. Individuals, universities, corporations, and governments share data and information in this way. In the Obama administration, the Open Government Initiative led to the development of Project Open Data and Federal agencies started looking at ways to make information sharing easier, especially in areas where the data are unique.

One area of unique data is in climate science. Since climate data is captured on a specific day, time, and under certain conditions, it can never be truly reproduced. It will never be January XX, 2017 again. With these constraints, climate data can be thought of as fragile. The copies that we have are the only records that we have. Much of our nation’s climate data has been captured by research groups at institutes, universities, and government labs and agencies. During the election, much of the rhetoric from Donald Trump was rooted in the belief that climate change is a hoax. Upon his election, Trump tapped Scott Pruitt, who has fought much of the EPA’s attempts to regulate pollution, to lead the EPA. This, along with other messages from the new administration, has raised alarms within the scientific community that the United States may repeat the actions of the Harper administration in Canada, which literally threw away thousands of items from federal libraries that were deemed outside scope, through a process that was criticized as not transparent.

In an effort to safeguard and preserve this data, the Penn Program of Environmental Humanities (PPEH) helped organize a collaborative project called Data Refuge. This project requires the expertise of scientists, librarians, archivists, and programmers to organize, document, and back-up data that is distributed across federal agencies’ websites. Maintaining the integrity of the data, while ensuring the re-usability of it, are paramount concerns and areas where librarians and archivists must work hand in glove with the programmers (sometimes one and the same) who are writing the code to pull, duplicate, and push content. Wired magazine recently covered one of the Data Refuge events and detailed the way that the group worked together, while much of the process is driven by individual actions.

In order to capture as much of this data as possible, the Data Refuge project relies on groups of people organizing around this topic across the country. The PPEH site details the requirements to host a successful DataRescue event and has a Toolkit to help promote and document the event. There is also a survey that you can use to nominate climate or environmental data to be part of the Data Refuge. Not in a position to organize an event? Don’t like people? You can also work on your own! An interesting observation from the work on your own page is the option to nominate any “downloadable data that is vulnerable and valuable.” This means that Internet Archive and the End of Term Harvest Team (a project to preserve government websites from the Obama administration) is interested in any data that you have reason to believe may be in jeopardy under the current administration.

A quick note about politics. Politics are messy and it can seem odd that people are organizing in this way, when administrations change every four or eight years and, when there is a party change in the presidency, it is almost a certainty that there will be major departures in policy and prioritizations from administration to administration. What is important to recognize is that our data holdings are increasingly solely digital, and therefore fragile. The positions on issues like climate, environment, civil rights, and many, many others are so diametrically opposite from the Obama to Trump offices, that we – the public – have no assurances that the data will be retained or made widely available for sharing. This administration speaks of “alternative facts” and “disagree[ing] with the facts” and this makes people charged with preserving facts wary.

Many questions about the sustainability and longevity of the project remain. Will End of Term or Data Refuge be able to/need to expand the scope of these DataRescue efforts? How much resourcing can people donate to these events? What is the role of institutions in these efforts? This is a fantastic way for libraries to build partnerships with entities across campus and across a community, but some may view the political nature of these actions as incongruous with the library mission.

I would argue that policies and political actions are not inert abstractions. There is a difference between promoting a political party and calling attention to policies that are in conflict with human rights and freedom to information. Loathe as I am to make this comparison, would anyone truly claim that burning books is protected political speech, and that opposing such burning is “playing politics?” Yet, these were the actions of a political party – in living memory – hosted at university towns across Germany. Considering the initial attempt to silence the USDA and the temporary freeze on the EPA, libraries should strongly support the efforts of PPEH, Data Refuge, End of Term, and concerned citizens across the country.


#1Lib1Ref Edit (2017)

I participated in the “#1Lib1Ref” campaign again this year, recording my experience and talking through why I think it’s important.

Cybersecurity, Usability, Online Privacy, and Digital Surveillance

Cybersecurity is an interesting and important topic, one closely connected to those of online privacy and digital surveillance. Many of us know that it is difficult to keep things private on the Internet. The Internet was invented to share things with others quickly, and it excels at that job. Businesses that process transactions with customers and store the information online are responsible for keeping that information private. No one wants social security numbers, credit card information, medical history, or personal e-mails shared with the world. We expect and trust banks, online stores, and our doctor’s offices to keep our information safe and secure.

However, keeping private information safe and secure is a challenging task. We have all heard of security breaches at J.P Morgan, Target, Sony, Anthem Blue Cross and Blue Shield, the Office of Personnel Management of the U.S. federal government, University of Maryland at College Park, and Indiana University. Sometimes, a data breach takes place when an institution fails to patch a hole in its network systems. Sometimes, people fall for a phishing scam, or a virus in a user’s computer infects the target system. Other times, online companies compile customer data into personal profiles. The profiles are then sold to data brokers and on into the hands of malicious hackers and criminals.

Image from Flickr –

Cybersecurity vs. Usability

To prevent such a data breach, institutional IT staff are trained to protect their systems against vulnerabilities and intrusion attempts. Employees and end users are educated to be careful about dealing with institutional or customers’ data. There are systematic measures that organizations can implement such as two-factor authentication, stringent password requirements, and locking accounts after a certain number of failed login attempts.

While these measures strengthen an institution’s defense against cyberattacks, they may negatively affect the usability of the system, lowering users’ productivity. As a simple example, security measures like a CAPTCHA can cause an accessibility issue for people with disabilities.

Or imagine that a university IT office concerned about the data security of cloud services starts requiring all faculty, students, and staff to only use cloud services that are SOC 2 Type II certified as an another example. SOC stands for “Service Organization Controls.” It consists of a series of standards that measure how well a given service organization keeps its information secure. For a business to be SOC 2 certified, it must demonstrate that it has sufficient policies and strategies that will satisfactorily protect its clients’ data in five areas known as “Trust Services Principles.” Those include the security of the service provider’s system, the processing integrity of this system, the availability of the system, the privacy of personal information that the service provider collects, retains, uses, discloses, and disposes of for its clients, and the confidentiality of the information that the service provider’s system processes or maintains for the clients. The SOC 2 Type II certification means that the business had maintained relevant security policies and procedures over a period of at least six months, and therefore it is a good indicator that the business will keep the clients’ sensitive data secure. The Dropbox for Business is SOC 2 certified, but it costs money. The free version is not as secure, but many faculty, students, and staff in academia use it frequently for collaboration. If a university IT office simply bans people from using the free version of Dropbox without offering an alternative that is as easy to use as Dropbox, people will undoubtedly suffer.

Some of you may know that the USPS website does not provide a way to reset the password for users who forgot their usernames. They are instead asked to create a new account. If they remember the account username but enter the wrong answers to the two security questions more than twice, the system also automatically locks their accounts for a certain period of time. Again, users have to create a new account. Clearly, the system that does not allow the password reset for those forgetful users is more secure than the one that does. However, in reality, this security measure creates a huge usability issue because average users do forget their passwords and the answers to the security questions that they set up themselves. It’s not hard to guess how frustrated people will be when they realize that they entered a wrong mailing address for mail forwarding and are now unable to get back into the system to correct because they cannot remember their passwords nor the answers to their security questions.

To give an example related to libraries, a library may decide to block all international traffic to their licensed e-resources to prevent foreign hackers who have gotten hold of the username and password of a legitimate user from accessing those e-resources. This would certainly help libraries to avoid a potential breach of licensing terms in advance and spare them from having to shut down compromised user accounts one by one whenever those are found. However, this would make it impossible for legitimate users traveling outside of the country to access those e-resources as well, which many users would find it unacceptable. Furthermore, malicious hackers would probably just use a proxy to make their IP address appear to be located in the U.S. anyway.

What would users do if their organization requires them to reset passwords on a weekly basis for their work computers and several or more systems that they also use constantly for work? While this may strengthen the security of those systems, it’s easy to see that it will be a nightmare having to reset all those passwords every week and keeping track of them not to forget or mix them up. Most likely, they will start using less complicated passwords or even begin to adopt just one password for all different services. Some may even stick to the same password every time the system requires them to reset it unless the system automatically detects the previous password and prevents the users from continuing to use the same one. Ill-thought-out cybersecurity measures can easily backfire.

Security is important, but users also want to be able to do their job without being bogged down by unwieldy cybersecurity measures. The more user-friendly and the simpler the cybersecurity guidelines are to follow, the more users will observe them, thereby making a network more secure. Users who face cumbersome and complicated security measures may ignore or try to bypass them, increasing security risks.

Image from Flickr -

Image from Flickr –

Cybersecurity vs. Privacy

Usability and productivity may be a small issue, however, compared to the risk of mass surveillance resulting from aggressive security measures. In 2013, the Guardian reported that the communication records of millions of people were being collected by the National Security Agency (NSA) in bulk, regardless of suspicion of wrongdoing. A secret court order prohibited Verizon from disclosing the NSA’s information request. After a cyberattack against the University of California at Los Angeles, the University of California system installed a device that is capable of capturing, analyzing, and storing all network traffic to and from the campus for over 30 days. This security monitoring was implemented secretly without consulting or notifying the faculty and those who would be subject to the monitoring. The San Francisco Chronicle reported the IT staff who installed the system were given strict instructions not to reveal it was taking place. Selected committee members on the campus were told to keep this information to themselves.

The invasion of privacy and the lack of transparency in these network monitoring programs has caused great controversy. Such wide and indiscriminate monitoring programs must have a very good justification and offer clear answers to vital questions such as what exactly will be collected, who will have access to the collected information, when and how the information will be used, what controls will be put in place to prevent the information from being used for unrelated purposes, and how the information will be disposed of.

We have recently seen another case in which security concerns conflicted with people’s right to privacy. In February 2016, the FBI requested Apple to create a backdoor application that will bypass the current security measure in place in its iOS. This was because the FBI wanted to unlock an iPhone 5C recovered from one of the shooters in San Bernadino shooting incident. Apple iOS secures users’ devices by permanently erasing all data when a wrong password is entered more than ten times if people choose to activate this option in the iOS setting. The FBI’s request was met with strong opposition from Apple and others. Such a backdoor application can easily be exploited for illegal purposes by black hat hackers, for unjustified privacy infringement by other capable parties, and even for dictatorship by governments. Apple refused to comply with the request, and the court hearing was to take place in March 22. The FBI, however, withdrew the request saying that it found a way to hack into the phone in question without Apple’s help. Now, Apple has to figure out what the vulnerability in their iOS if it wants its encryption mechanism to be foolproof. In the meanwhile, iOS users know that their data is no longer as secure as they once thought.

Around the same time, the Senate’s draft bill titled as “Compliance with Court Orders Act of 2016,” proposed that people should be required to comply with any authorized court order for data and that if that data is “unintelligible” – meaning encrypted – then it must be decrypted for the court. This bill is problematic because it practically nullifies the efficacy of any end-to-end encryption, which we use everyday from our iPhones to messaging services like Whatsapp and Signal.

Because security is essential to privacy, it is ironic that certain cybersecurity measures are used to greatly invade privacy rather than protect it. Because we do not always fully understand how the technology actually works or how it can be exploited for both good and bad purposes, we need to be careful about giving blank permission to any party to access, collect, and use our private data without clear understanding, oversight, and consent. As we share more and more information online, cyberattacks will only increase, and organizations and the government will struggle even more to balance privacy concerns with security issues.

Why Libraries Should Advocate for Online Privacy?

The fact that people may no longer have privacy on the Web should concern libraries. Historically, libraries have been strong advocates of intellectual freedom striving to keep patron’s data safe and protected from the unwanted eyes of the authorities. As librarians, we believe in people’s right to read, think, and speak freely and privately as long as such an act itself does not pose harm to others. The Library Freedom Project is an example that reflects this belief held strongly within the library community. It educates librarians and their local communities about surveillance threats, privacy rights and law, and privacy-protecting technology tools to help safeguard digital freedom, and helped the Kilton Public Library in Lebanon, New Hampshire, to become the first library to operate a Tor exit relay, to provide anonymity for patrons while they browse the Internet at the library.

New technologies brought us the unprecedented convenience of collecting, storing, and sharing massive amount of sensitive data online. But the fact that such sensitive data can be easily exploited by falling into the wrong hands created also the unparalleled level of potential invasion of privacy. While the majority of librarians take a very strong stance in favor of intellectual freedom and against censorship, it is often hard to discern a correct stance on online privacy particularly when it is pitted against cybersecurity. Some even argue that those who have nothing to hide do not need their privacy at all.

However, privacy is not equivalent to hiding a wrongdoing. Nor do people keep certain things secrets because those things are necessarily illegal or unethical. Being watched 24/7 will drive any person crazy whether s/he is guilty of any wrongdoing or not. Privacy allows us safe space to form our thoughts and consider our actions on our own without being subject to others’ eyes and judgments. Even in the absence of actual massive surveillance, just the belief that one can be placed under surveillance at any moment is sufficient to trigger self-censorship and negatively affects one’s thoughts, ideas, creativity, imagination, choices, and actions, making people more conformist and compliant. This is further corroborated by the recent study from Oxford University, which provides empirical evidence that the mere existence of a surveillance state breeds fear and conformity and stifles free expression. Privacy is an essential part of being human, not some trivial condition that we can do without in the face of a greater concern. That’s why many people under political dictatorship continue to choose death over life under mass surveillance and censorship in their fight for freedom and privacy.

The Electronic Frontier Foundation states that privacy means respect for individuals’ autonomy, anonymous speech, and the right to free association. We want to live as autonomous human beings free to speak our minds and think on our own. If part of a library’s mission is to contribute to helping people to become such autonomous human beings through learning and sharing knowledge with one another without having to worry about being observed and/or censored, libraries should advocate for people’s privacy both online and offline as well as in all forms of communication technologies and devices.

Evaluating Whether You Should Move Your Library Site to Drupal 8

After much hard work over years by the Drupal community, Drupal users rejoiced when Drupal 8 came out late last year. The system has been completely rewritten and does a lot of great stuff–but can it do what we need Drupal websites to do for libraries?  The quick answer seems to be that it’s not quite ready, but depending on your needs it might be worth a look.

For those who aren’t familiar with Drupal, it’s a content management system designed to manage complex sites with multiple types of content, users, features, and appearances.  Certain “core” features are available to everyone out of the box, but even more useful are the “modules”, which extend the features to do all kinds of things from the mundane but essential backup of a site to a flashy carousel slider. However, the modules are created by individuals or companies and contributed back to the community, and thus when Drupal makes a major version change they need to be rewritten, quite drastically in the case of Drupal 8. That means that right now we are in a period where developers may or may not be redoing their modules, or they may be rethinking about how a certain task should be done in the future. Because most of these developers are doing this work as volunteers, it’s not reasonable to expect that they will complete the work on your timeline. The expectation is that if a feature is really important to you, then you’ll work on development to make it happen. That is, of course, easier said than done for people who barely have enough time to do the basic web development asked of them, much less complex programming or learning a new system top to bottom, so most of us are stuck waiting or figuring out our own solutions.

Despite my knowledge of the reality of how Drupal works, I was very excited at the prospect of getting into Drupal 8 and learning all the new features. I installed it right away and started poking around, but realized pretty quickly I was going to have to do a complete evaluation for whether it was actually practical to use it for my library’s website. Our website has been on Drupal 7 since 2012, and works pretty well, though it does need a new theme to bring it into line with 2016 design and accessibility standards. Ideally, however, we could be doing even more with the site, such as providing better discovery for our digital special collections and making the site information more semantic web friendly. It was those latter, more advanced, feature desires that made me really wish to use Drupal 8, which includes semantic HTML5 integration and markup, as well as better integration with other tools and libraries. But the question remains–would it really be practical to work on migrating the site immediately, or would it make more sense to spend some development time on improving the Drupal 7 site to make it work for the next year or so while working on Drupal 8 development more slowly?

A bit of research online will tell you that there’s no right answer, but that the first thing to do in an evaluation is determine whether any the modules on which your site depends are available for Drupal 8, and if not, whether there is a good alternative. I must add that while all the functions I am going to mention can be done manually or through custom code, a lot of that work would take more time to write and maintain than I expect to have going forward. In fact, we’ve been working to move more of our customized code to modules already, since that makes it possible to distribute some of the workload to others outside of the very few people at our library who write code or even know HTML well, not to mention taking advantage of all the great expertise of the Drupal community.

I tried two different methods for the evaluation. First, I created a spreadsheet with all the modules we actually use in Drupal 7, their versions, and the current status of those modules in Drupal 8 or if I found a reasonable substitute. Next, I tried a site that automates that process, Basically you fill in your website URL and email, and wait a day for your report, which is very straightforward with a list of modules found for your site, whether there is a stable release, an alpha or beta release, or no Drupal 8 release found yet. This is a useful timesaver, but will need some manual work to complete and isn’t always completely up to date.

My manual analysis determined that there were 30 modules on which we depend to a greater or lesser extent. Of those, 10 either moved into Drupal core (so would automatically be included) or the functions on which used them moved into another piece of core. 5 had versions available in Drupal 8, with varying levels of release (i.e. several in stable alpha release, so questionable to use for production sites but probably fine), and 5 were not migrated but it was possible to identify substitute Drupal 8 modules. That’s pretty good– 18 modules were available in Drupal 8, and in several cases one module could do the job that two or more had done in Drupal 7. Of the additional 11 modules that weren’t migrated and didn’t have an easy substitution, three of them are critical to maintaining our current site workflows. I’ll talk about those in more detail below. found 21 modules in use, though I didn’t include all of them on my own spreadsheet if I didn’t intend to keep using them in the future. I’ve included a screenshot of the report, and there are a few things to note. This list does not have all the modules I had on my list, since some of those are used purely behind the scenes for administrative purposes and would have no indication of use without administrative access. The very last item on the list is Core, which of course isn’t going to be upgraded to Drupal 8–it is Drupal 8. I also found that it’s not completely up to date. For instance, my own analysis found a pre-release version of Workbench Moderation, but that information had not made it to this site yet. A quick email to them fixed it almost immediately, however, so this screenshot is out of date.

I decided that there were three dealbreaker modules for the upgrade, and I want to talk about why we rely on them, since I think my reasoning will be applicable to many libraries with limited web development time. I will also give honorable mention to a module that we are not currently using, but I know a lot of libraries rely on and that I would potentially like to use in the future.

Webform is a module that creates a very simple to use interface for creating webforms and doing all kinds of things with them beyond just simply sending emails. We have many, many custom PHP/MySQL forms throughout our website and intranet, but there are only two people on the staff who can edit those or download the submitted entries from them. They also occasionally have dreadful spam problems. We’ve been slowly working on migrating these custom forms to the Drupal Webform module, since that allows much more distribution of effort across the staff, and provides easier ways to stop spam using, for instance, the Honeypot module or Mollom. (We’ve found that the Honeypot module stopped nearly all our spam problems and didn’t need to move to Mollom, since we don’t have user comments to moderate). The thought of going back to coding all those webforms myself is not appealing, so for now I can’t move forward until I come up with a Drupal solution.

Redirect does a seemingly tiny job that’s extremely helpful. It allows you to create redirects for URLs on your site, which is incredibly helpful for all kinds of reasons. For instance, if you want to create a library site branded link that forwards somewhere else like a database vendor or another page on your university site, or if you want to change a page URL but ensure people with bookmarks to the old page will still find it. This is, of course, something that you can do on your web server, assuming you have access to it, but this module takes a lot of the administrative overhead away and helps keep things organized.

Backup and Migrate is my greatest helper in my goal to be someone who would like to at least be in the neighborhood of best practices for web development when web development is only half my job, or some weeks more like a quarter of my job. It makes a very quick process of keeping my development, staging, and production sites in sync, and since I created a workflow using this module I have been far more successful in keeping my development processes sane. It provides an interface for creating a backup of your site database, files directories, or your database and files that you can use in the Backup and Migrate module to completely restore a site. I use it at least every two weeks, or more often when working on a particular feature to move the database between servers (I don’t move the files with the module for this process, but that’s useful for backups that are for emergency restoration of the site). There are other ways to accomplish this work, but this particular workflow has been so helpful that I hate to dump a lot of time into redoing it just now.

One last honorable mention goes to Workbench, which we don’t use but I know a lot of libraries do use. This allows you to create a much more friendly interface for content editors so they don’t have to deal with the administrative backend of Drupal and allows them to just see their own content. We do use Workbench Moderation, which does have a Drupal 8 release, and allows a moderation queue for the six or so members of staff who can create or edit content but don’t have administrative rights to have their content checked by an administrator. None of them particularly like the standard Drupal content creation interface, and it’s not something that we would ever ask the rest of the staff to use. We know from the lack of use of our intranet, which also is on Drupal, that no one particularly cares for editing content there. So if we wanted to expand access to website editing, which we’ve talked about a lot, this would be a key module for us to use.

Given the current status of these modules  with rewrites in progress, it seems likely that by the end of the year it may be possible to migrate to Drupal 8 with our current setup, or in playing around with Drupal 8 on a development site that we determine a different way to approach these needs. If you have the interest and time to do this, there are worse ways to pass the time. If you are creating a completely new Drupal site and don’t have a time crunch, starting in Drupal 8 now is probably the way to go, since by the time the site would be ready you may have additional modules available and get to take advantage of all the new features. If this is something you’re trying to roll out by the end of the semester, maybe wait on it.

Have you considered upgrading your library’s site to Drupal 8? Have you been successful? Let us know in the comments.

Creating a Flipbook Reader for the Web

At my library, we have a wonderful collection of artist’s books. These titles are not mere text, or even concrete poetry shaping letters across a page, but works of art that use the book as a form in the same way a sculptor might choose clay or marble. They’re all highly inventive and typically employ an interplay between language, symbol, and image that challenges our understanding of what a print volume is. As an art library, collecting these unique and inspiring works seems natural. But how can we share our artists’ books with the world? For a while, our work study students have been scanning sets of pages from the books using a typical flatbed scanner but we weren’t sure how to present these images. Below, I’ll detail how we came to fork the Internet Archive’s Bookreader to publish our images to the web.

Choosing a Book Display

Our Digital Scholarship Librarian, Lisa Conrad, led the artists’ book project. She investigated a number of potential options for creating interactive “flipbooks” out of our series of images. We had a few requirements we were looking for:

  • mobile friendly — ideally, our books would be able to be read on any device, not just desktops with Adobe Flash
  • easy to use — our work study staff shouldn’t need sophisticated skills, like editing code or markup, to publish a work
  • visually appealing — obviously we’re dealing with works of art, if our presentation is hideous or obscures the elegance of the original then it’s doing more harm than good
  • works with our repository — ultimately, the books would “live” in our institutional repository, so we needed software that would emit something like static HTML or documents that could be easily retained in it

These are not especially stringent requirements, in my mind. Still, we struggled to find decent options. The mobile limitation restricted things quite a bit; a surprising number of apps used flash or published in desktop-first formats. We felt our options to customize and simplify the user interface of other apps was too small. Even if an app spat out a nice bundle of HTML and assets that we could upload, the HTML might call out to sketchy external services or present a number of options we didn’t want or need. While I was at first hesitant to implement an open source piece of software that I knew would require much of my time, ultimately the Internet Archive’s Bookreader became the frontrunner. We felt more comfortable using an established, free piece of software that doesn’t require any server-side component (it’s all JavaScript!). My primary concern was that the last commit to the bookreader code base was over a half year ago.

Implementing the Internet Archive Bookreader

The Bookreader proved startlingly easy to implement. We simply copied the code from an example given by the Internet Archive, edited several lines of JavaScript, and were ready to go. Everything else was refinement. Still, we made a few nice decisions in working on the Bookreader that I’d like to share. If you’re not a coder, feel free to skip the rest of this section as it may be unnecessarily detailed.

First off, our code is all up on GitHub. But it won’t be useful to anyone else! It contains specific logic based on how our IR serves up links to the bookreader app itself. But one of the smarter moves, if I may indulge in some self-congratulation, was using submodules in git. If you know git then you know it’s easily one of the best ways to version code and manage projects. But what if you app is just an implementation of an existing code base? There’s a huge portion of code you’re borrowing from somewhere else, and you want to be able to segment that off and incorporate changes to it separately.

As I noted before, the bookreader isn’t exactly actively developed. But it still benefits us to separate our local modifications from the stable, external code. A submodule lets us incorporate another repository into our own. It’s the clean way of copying someone else’s files into our own version. We can reference files in the other repository as if they’re sitting right beside our own, but also pull in updates in a controlled manner without causing tons of extra commits. Elegant, handy, sublime.

Most of our work happened in a single app.js file. This file was copied from the Internet Archive’s provided example and only modified slightly. To give you a sense of how one modifies the original script, here’s a portion:

// Create the BookReader object
br = new BookReader();
// Total number of leafs
br.numLeafs = vaultItem.pages;
// Book title & URL used for the book title link
br.bookTitle= vaultItem.title;
br.bookUrl  = vaultItem.root + 'items/' + + '/' + vaultItem.version + '/';
// how does the bookreader know what images to retrieve given a page number (index)?
br.getPageURI = function(index, reduce, rotate) {
    // reduce and rotate are ignored in this simple implementation, but we
    // could e.g. look at reduce and load images from a different directory
    var url = vaultItem.root + 'file/' + + '/' + vaultItem.version + '/' + vaultItem.filenames + (index + 1) + '.JPG';
    return url;

The Internet Archive’s code provides us with a Bookreader class. All we do is instantiate it once and override certain properties and methods. The code above is all that’s necessary to display a particular image for a given page number; the vaultItem object (VAULT is the name of our IR) consists of information about a single artists’ book, like the number of pages, its title, its ID and version within the IR. The bookreader app cobbles together these pieces of info to figure out it should display images given a page or pair of pages. The getPageURI function is mostly working with a single index argument, while the group of concatenated strings for the URL are related to how our repository stores files. It’s highly specific to the IR we’re using, but not terribly complicated.

The Bookreader itself sits on a web server outside the IR. Since we cannot publicly share almost all of these books (more on this below), we restrict access to the images to users who are authenticated with the IR. So how can the external reader app serve up images of books within the repository? We expect people to discover the books via our library catalog, which links to the IR, or within the IR itself. Once they sign in, our IR’s display templates contain specially crafted URLs that pass the bookreader information about the item in our repository via their query string. Here’s a shortened example of one query string:


From there, the app parses the query string to figure out that the book’s title is Crystals to Aden, its ID within the IR is “17c06cc5-c419-4e77-9bdb-43c69e94b4cd”, and it has twenty six pages. We store those values in the vaultItem object referenced in the script above. That object contains enough information for the bookreader to determine how to retrieve images from the IR for each page of the book. Since the user has already authenticated with the IR when they discovered the URL earlier, the IR happily serves up the images.


Easily the most difficult part of our artists’ books project has been spreading awareness. It’s a cool project that’s required tons of work all around, from the Digital Scholarship Librarian managing the project, to our diligent work study student scanning images, to our circulation staff cataloging them in our IR, to me solving problems in JavaScript while my web design student worker polished the CSS. We’re proud of the result, but also struggling to share it outside of our walls. Our IR is great at some things but not particularly intuitive to use, so we cannot count on patrons stumbling across the artists’ books on their own very often. Furthermore, putting anything behind a login is obviously going to increase the number of people who give up before accessing it. Not being on the campus Central Authentication Service only exacerbates this for us.

The second challenge is—what else!—copyright. We don’t own the rights to any of these titles. We’ve been guerrilla digitizing them without permission from publishers. For internal sharing, that’s fine and well under Fair Use; we’re only showing excerpts and security settings in our IR ensure they’re only visible to our constituents. But I pine to share these gorgeous creations with people outside the college, with my social networks, with other librarians. Right now, we only have permission to share Crystals to Aden by Michael Bulteau (thanks to duration press for letting us!). Which is great! Except Crystals is a relatively text-heavy work; it’s fine poetry, yes, but not the earth-shattering reinterpretation of the codex I promised you in my opening paragraph.

Hopefully, we can secure permissions to share further works. Internally, we’ve pushed out notices on social media, the LMS, and email lists to inform people of the artists’ books. They’re also available for checkout, so hopefully our digital teasers bring people in to see the real deal.

Lasting Problems

And that’s something that must be mentioned; our digital simulations are definitely not the real deal. Never has a set of titles been so resistant to digitization. Our work study has even asked us “how could I possibly separate this work into distinct pairs of pages” about a work which used partially-overlapping leaves that look like vinyl record sleeves.1 Artists deliberately chose the printed book form and there’s a deep sacrilege in trying to digitally represent that. We can only ever offer a vague adumbration of what was truly intended. I still see value in our bookreader—and the works often look brilliant even as scanned images—but there are fundamental impediments to its execution.

Secondly, scanned images are not text. Many works do have text on the page, but because we’re displaying images in the browser users cannot select the text to copy it. Alongside each set of images in our IR, we also have a PDF copy of all the pages with OCR‘d text. But a decision was made not to make the PDFs visible to non-library staff, since they would be easy to download and disseminate (unlike the many discrete images in our bookreader). This all adds up to make the bookreader very inaccessible; its all images, there’s no feasible way for us to associate alt text with each one, and the PDF copy that might be of interest to visually-impaired users is hidden.

I’ve ended on a sour note, but digitizing our artists’ books and fiddling with the Internet Archive’s Bookreader was a great project. Fun, a bit challenging, with some splendid results. We have our work cut out for us if we want to draw more attention to the books and have them be compelling for everyone. But other libraries in similar situations may find the Bookreader to be a very viable, easy-to-implement solution. If you don’t have permissions issues and are dealing with more traditional works, it’s built to be customizable yet offers a pleasant reading experience.


  1. Paradoxic Mutations by Margot Lovejoy

From Consensus to Expertise: Rethinking Library Web Governance

The world is changing, the web is changing and libraries are changing along with them. Commercial behemoths like Amazon, Google and Facebook, together with significant advancements in technical infrastructure and consumer technology, have established a new set of expectations for even casual users of the web. These expectations have created new mental models of how things ought to work, and why—not just online, either. The Internet of Things may not yet be fully realized but we clearly see its imminent appearance in our daily lives.

Within libraries, has our collective concept of the intention and purpose of the library website evolved as well? How should the significant changes in how the web works, what websites do and how we interact with them also impact how we manage, assess and maintain library websites?

In some cases it has been easier to say what the library website is not – a catalog, a fixed-form document, a repository—although it facilitates access to these things, and perhaps makes them discoverable. What, then, is the library website? As academic librarians, we define it as follows.

The library website is an integrated representation of the library, providing continuously updated content and tools to engage with the academic mission of the college/university.

It is constructed and maintained for the benefit of the user. Value is placed on consumption of content by the user rather than production of content by staff.

Moving from a negative definition to a positive definition empowers both stewards of and contributors to the website to participate in an ongoing conversation about how to respond proactively to the future, our changing needs and expectations and, chiefly, to our users’ changing needs and expectations. Web content management systems have moved from being just another content silo to being a key part of library service infrastructure. Building on this forward momentum enables progress to a better, more context-sensitive user experience for all as we consider our content independent of its platform.

It is just this reimagining of how and why the library website contributes value, and what role it fulfills within the organization in terms of our larger goals for connecting with our local constituencies—supporting research and teaching through providing access to resources and expertise—that demands a new model for library web governance.

Emerging disciplines like content strategy and a surge of interest in user experience design and design thinking give us new tools to reflect on our practice and even to redefine what constitutes best practice in the area of web librarianship.

Historically, libraries have managed websites through committees and task forces. Appointments to these governing bodies were frequently driven by a desire to ensure adequate balance across the organizational chart, and to varying degrees by individuals’ interest and expertise. As such, we must acknowledge the role of internal politics as a variable factor in these groups’ ability to succeed—one might be working either with, or against, the wind. Librarians, particularly in groups of this kind, notoriously prefer consensus-driven decision making.

The role of expertise is largely taken for granted across most library units; that is, not just anyone is qualified to perform a range of essential duties, from cataloging to instruction to server administration to website management.1 Consciously according ourselves and our colleagues the trust to employ their unique expertise allows individuals to flourish and enlarges the capacity of the organization and the profession. In the context of web design and governance, consensus is a blocker to nimble, standards-based, user-focused action. Collaborative processes, in which all voices are heard, together with empirical data are essential inputs for effective decision-making by domain experts in web librarianship as in other areas of library operations.

Web librarianship, through bridging and unifying individual and collaborative contributions to better enable discovery, supports the overall mission of libraries in the context of the following critical function:

providing multiple systems and/or interfaces
to browse, identify, locate, obtain and use
spaces, collections, and services,
either known or previously unknown to the searcher
with the goal of enabling completion of an information-related task or goal.

The scope of content potentially relevant to the user’s discovery journey encompasses the hours a particular library is open on a given day up to and including advanced scholarship, and all points between. This perhaps revives in the reader’s mind the concept of library website as portal – that analogy has its strengths and weaknesses, to be sure. Ultimately, success for a library website may be defined as the degree to which it enables seamless passage; the user’s journey only briefly intersects with our systems and services, and we should permit her to continue on to her desired information destination without unnecessary inconvenience or interference. A friction-free experience of this kind requires a holistic vision and relies on thoughtful stewardship and effective governance of meaningful content – in other words, on a specific and cultivated expertise, situated within the context of library practice. Welcome to a new web librarianship.

Courtney Greene McDonald (@xocg) is Head of Discovery & Research Services at the Indiana University Libraries in Bloomington. A technoluddite at heart, she’s equally likely to be leafing through the NUC to answer a reference question as she is to be knee-deep in a config file. She presents and writes about user experience in libraries, and is the author of Putting the User First: 30 Strategies for Transforming Library Services (ACRL 2014). She’s also a full–time word nerd and gourmand, a fair–weather gardener, and an aspiring world traveler.

Anne Haines (@annehaines) is the Web Content Specialist for the Indiana University Bloomington Libraries. She loves creating webforms in Drupal, talking to people about how to make their writing work better on the web, and sitting in endless meetings. (Okay, maybe not so much that last one.) You can find her hanging out at the intersection of content strategy and librarianship, singing a doo-wop tune underneath the streetlight.

Rachael Cohen (@RachaelCohen1) is the Discovery User Experience Librarian at Indiana University Bloomington Libraries, where she is the product owner for the library catalog discovery layer and manages the web-scale discovery service. When she’s not negotiating with developers, catalogers, and public service people you can find her hoarding books and Googling for her family.



  1. While it is safe to say that all library staff have amassed significant experience in personal web use, not all staff are equally equipped with the growing variety of skillsets and technical mastery necessary to oversee and steward a thriving website.

Best Practices for Hacking Third-Party Sites

While customizing vendor web services is not the most glamorous task, it’s something almost every library does. Whether we have full access to a templating system, as with LibGuides 2, or merely the ability to insert an HTML header or footer, as on many database platforms, we are tested by platform limitations and a desire to make our organization’s fractured web presence cohesive and usable.

What does customizing a vendor site look like? Let’s look at one example before going into best practices. Many libraries subscribe to EBSCO databases, which have a corresponding administrative side “EBSCOadmin”. Electronic Resources and Web Librarians commonly have credentials for these admin sites. When we sign into EBSCOadmin, there are numerous configuration options for our database subscriptions, including a “branding” tab under the “Customize Services” section.

While EBSCO’s branding options include specifying the primary and secondary colors of their databases, there’s also a “bottom branding” section which allows us to inject custom HTML. Branding colors can be important, but this post is focuses on effectively injecting markup onto vendor web pages. The steps for doing so in EBSCOadmin are numerous and not informative for any other system, but the point is that when given custom HTML access one can make many modifications, from inserting text on the page, to an entirely new stylesheet, to modifying user interface behavior with JavaScript. Below, I’ve turned footer links orange and written a message to my browser’s JavaScript console using the custom HTML options in EBSCOadmin.

customized EBSCO database

These opportunities for customization come in many flavors. We might have access only to a section of HTML in the header or footer of a page. We might be customizing the appearance of our link resolver, subscription databases, or catalog. Regardless, there are a few best practices which can aid us in making modifications that are effective.

General Best Practices

Ditch best practices when they become obstacles

It’s too tempting; I have to start this post about best practices by noting their inherent limitations. When we’re working with a site designed by someone else, the quality of our own code is restricted by decisions they made for unknown reasons. Commonly-spouted wisdom—reduce HTTP requests! don’t use eval! ID selectors should be avoided!—may be unusable or even counter-productive.

To note but one shining example: CSS specificity. If you’ve worked long enough with CSS then you know that it’s easy to back yourself into a corner by using overly powerful selectors like IDs or—the horror—inline style attributes. These methods of applying CSS have high specificity, which means that CSS written later in a stylesheet or loaded later in the HTML document might not override them as anticipated, a seeming contradiction in the “cascade” part of CSS. The hydrogen bomb of specificity is the !important modifier which automatically overrides anything but another !important later in the page’s styles.

So it’s best practice to avoid inline style attributes, ID selectors, and especially !important. Except when hacking on vendor sites it’s often necessary. What if we need to override an inline style? Suddenly, !important looks necessary. So let’s not get caught up following rules written for people in greener pastures; we’re already in the swamp, throwing some mud around may be called for.

There are dozens of other examples that come to mind. For instance, in serving content from a vendor site where we have no server-side control, we may be forced to violate web performance best practices such as sending assets with caching headers and utilizing compression. While minifying code is another performance best practice, for small customizations it adds little but obfuscates our work for other staff. Keeping a small script or style tweak human-readable might be more prudent. Overall, understanding why certain practices are recommended, and when it’s appropriate to sacrifice them, can aid our decision-making.

Test. Test. Test. When you’re done testing, test again

Whenever we’re creating an experience on the web it’s good to test. To test with Chrome, with Firefox, with Internet Explorer. To test on an iPhone, a Galaxy S4, a Chromebook. To test on our university’s wired network, on wireless, on 3G. Our users are vast; they contain multitudes. We try to represent their experiences as best as possible in the testing environment, knowing that we won’t emulate every possibility.

Testing is important, sure. But when hacking a third party site, the variance is more than doubled. The vendor has likely done their own testing. They’ve likely introduced their own hacks that work around issues with specific browsers, devices, or connectivity conditions. They may be using server-side device detection to send out subtly different versions of the site to different users; they may not offer the same functionality in all situations. All of these circumstances mean that testing is vitally important and unending. We will never cover enough ground to be sure our hacks are foolproof, but we better try or they’ll not work at all.

Analytics and error reporting

Speaking of testing, how will we know when something goes wrong? Surely, our users will send us a detailed error report, complete with screenshots and the full schematics of every piece of hardware and software involved. After all, they do not have lives or obligations of their own. They exist merely to make our code more error-proof.

If, however, for some odd reason someone does not report an error, we may still want to know that one occurred. It’s good to set up unobtrusive analytics that record errors or other measures of interaction. Did we revamp a form to add additional validation? Try tracking what proportion of visitors successfully submit the form, how often the validation is violated, how often users submit invalid data multiple times in a row, and how often our code encounters an error. There are some intriguing client-side error reporting services out there that can catch JavaScript errors and detail them for our perusal later. But even a little work with events in Google Analytics can log errors, successes, and everything in between. With the mere information that problems are occurring, we may be able to identify patterns, focus our testing, and ultimately improve our customizations and end-user experience.

Know when to cut your losses

Some aspects of a vendor site are difficult to customize. I don’t want to say impossible, since one can do an awful lot with only a single <script> tag to work with, but unfeasible. Sometimes it’s best to know when sinking more time and effort into a customization isn’t worth it.

For instance, our repository has a “hierarchy browse” feature which allows us to present filtered subsets of items to users. We often get requests to customize the hierarchies for specific departments or purposes—can we change the default sort, can we hide certain info here but not there, can we use grid instead of list-based results? We probably can, because the hierarchy browse allows us to inject arbitrary custom HTML at the top of each section. But the interface for doing so is a bit clumsy and would need to be repeated everywhere a customization is made, sometimes across dozens of places simply to cover a single department’s work. So while many of these change requests are technically possible, they’re unwise. Updates would be difficult and impossible to automate, virtually ensuring errors are introduced over time as I forget to update one section or make a manual mistake somewhere. Instead, I can focus on customizing the site-wide theme to fix other, potentially larger issues with more maintainable solutions.

A good alternative to tricky and unmaintainable customizations is to submit a feature request to the vendor. Some vendors have specific sites where we can submit ideas for new features and put our support behind others’ ideas. For instance, the Innovative Users Group hosts an annual vote where members can select their most desired enhancement requests. Remember that vendors want to make a better product after all; our feedback is valued. Even if there’s no formal system for submitting feature requests, a simple email to our sales representative or customer support can help.

CSS Best Practices

While the above section spoke to general advice, CSS and JavaScript have a few specific peculiarities to keep in mind while working within a hostile host environment.

Don’t write brittle, overly-specific selectors

There are two unifying characteristics of hacking on third-party sites: 1) we’re unfamiliar with the underlying logic of why the site is constructed in a particular way and 2) everything is subject to change without notice. Both of these making targeting HTML elements, whether with CSS or JavaScript, challenging. We want our selectors to be as flexible as possible, to withstand as much change as possible without breaking. Say we have the following list of helpful tools in a sidebar:

<div id="tools">
        <li><span class="icon icon-hat"></span><a href="#">Email a Librarian</a></li>
        <li><span class="icon icon-turtle"></span><a href="#">Citations</a></li>
        <li><span class="icon icon-unicorn"></span><a href="#">Catalog</a></li>

We can modify the icons listed with a selector like #tools > ul > li > span.icon.icon-hat. But many small changes could break this style: a wrapper layer injected in between the #tools div and the unordered list, a switch from unordered to ordered list, moving from <span>s for icons to another tag such as <i>. Instead, a selector like #tools .icon.icon-hat assumes that little will stay the same; it thinks there’ll be icons inside the #tools section, but doesn’t care about anything in between. Some assumptions have to stay, that’s the nature of customizing someone else’s site, but it’s pretty safe to bet on the icon classes to remain.

In general, sibling and child selectors make for poor choices for vendor sites. We’re suddenly relying not just on tags, classes, and IDs to stay the same, but also the particular order that elements appear in. I’d also argue that pseudo-selectors like :first-child, :last-child, and :nth-child() are dangerous for the same reason.

Avoid positioning if possible

Positioning and layout can be tricky to get right on a vendor site. Unless we’re confident in our tests and have covered all the edge cases, try to avoid properties like position and float. In my experience, many poorly structured vendor sites employ ad hoc box-sizing measurements, float-based layout, and lack a grid system. These are all a recipe for weird interconnections between disparate parts—we try to give a call-out box a bit more padding and end up sending the secondary navigation flying a thousand pixels to the right offscreen.

display: none is your friend

display: none is easily my most frequently used CSS property when I customize vendor sites. Can’t turn off a feature in the admin options? Hide it from the interface entirely. A particular feature is broken on mobile? Hide it. A feature is of niche appeal and adds more clutter than it’s worth? Hide it. The footer? Yeah, it’s a useless advertisement, let’s get rid of it. display: none is great but remember it does affect a site’s layout; the hidden element will collapse and no longer take up space, so be careful when hiding structural elements that are presented as menus or columns.

Attribute selectors are excellent

Attribute selectors, which enable us to target an element by the value of any of its HTML attributes, are incredibly powerful. They aren’t very common, so here’s a quick refresher on what they look. Say we have the following HTML element:

<a href="" title="the best site, seriously" target="_blank">

This is an anchor tag with three attributes: href, title, and target. Attribute selectors allow us to target an element by whether it has an attribute or an attribute with a particular value, like so:

/* applies to <a> tags with a "target" attribute */
a[target] {
    background: red;
/* applies to <a> tags with an "href" that begin with "http://"
this is a great way to style links pointed at external websites
or one particular external website! */
a[href^="http://"] {
    cursor: help;
/* applies to <a> tags with the text "best" anywhere in their "title" attribute */
a[title*="best"] {
    font-variant: small-caps;

Why is this useful among the many ways we can select elements in CSS? Vendor sites often aren’t anticipating all the customizations we want to employ; they may not provide handy class and ID styling hooks where we need them. Or, as noted above, the structure of the document may be subject to change either over time or across different pieces of the site. Attribute selectors can help mitigate this by making style bindings more explicit. Instead of saying “change the background icon for some random span inside a list inside a div”, we can say “change the background icon for the link that points at our citation management tool”.

If that’s unclear, let me give another example from our institutional repository. While we have the ability to list custom links in the main left-hand navigation of our site, we cannot control the icons that appear with them. What’s worse, there are virtually no styling hooks available; we have an unadorned anchor tag to work with. But that turns out to be plenty for a selector of form a[href$=hierarchy] to target all <a>s with an href ending in “hierarchy”; suddenly we can define icon styles based on the URLs we’re pointing it, which is exactly what we want to base them on anyways.

Attribute selectors are brittle in their own ways—when our URLs change, these icons will break. But they’re a handy tool to have.

JavaScript Best Practices

Avoid the global scope

JavaScript has a notorious problem with global variables. By default, all variables lacking the var keyword are made global. Furthermore, variables outside the scope of any function will also be global. Global variables are considered harmful because they too easily allow unrelated pieces of code to interact; when everything’s sharing the same namespace, the chance that common names like i for index or count are used in two conflicting contexts increases greatly.

To avoid polluting the global scope with our own code, we wrap our entire script customizations in an immediately-invoked function expression (IIFE):

(function() {
    // do stuff here 

Wrapping our code in this hideous-looking construction gives it its own scope, so we can define variables without fear of overwriting ones in the global scope. As a bonus, our code still has access to global variables like window and navigator. However, global variables defined by the vendor site itself are best avoided; it is possible they will change or are subject to strange conditions that we can’t determine. Again, the fewer assumptions our code makes about how the vendor’s site works, the more resilient it will be.

Avoid calling vendor-provided functions

Oftentimes the vendor site itself will put important functions in the global scope, funtions like submitForm or validate where their intention seems quite obvious. We may even be able to reverse engineer their code a bit, determining what the parameters we should pass to these functions are. But we must not succumb to the temptation to actually reference their code within our own!

Even if we have a decent handle on the vendor’s current code, it is far too subject to change. Instead, we should seek to add or modify site functionality in a more macro-like way; instead of calling vendor functions in our code, we can automate interactions with the user interface. For instance, say the “save” button is in an inconvenient place on a form and has the following code:

<button type="submit" class="btn btn-primary" onclick="submitForm(0)">Save</button>

We can see that the button saves the form by calling the submitForm function when it’s clicked with a value of 0. Maybe we even figure out that 0 means “no errors” whereas 1 means “error”.1 So we could create another button somewhere which calls this same submitForm function. But so many changes break our code; if the meaning of the “0” changes, if the function name changes, or if something else happens when the save button is clicked that’s not evident in the markup. Instead, we can have our new button trigger the click event on the original save button exactly as a user interacting with the site would. In this way, our new save button should emulate exactly the behavior of the old one through many types of changes.

{{Insert Your Best Practices Here}}

Web-savvy librarians of the world, what are the practices you stick to when modifying your LibGuides, catalog, discovery layer, databases, etc.? It’s actually been a while since I did customization outside of my college’s IR, so the ideas in this post are more opinion than practice. If you have your own techniques—or disagree with the ones in this post!—we’d love to hear about it in the comments.


  1. True story, I reverse engineered a vendor form where this appeared to be the case.