Evaluating Whether You Should Move Your Library Site to Drupal 8

After much hard work over years by the Drupal community, Drupal users rejoiced when Drupal 8 came out late last year. The system has been completely rewritten and does a lot of great stuff–but can it do what we need Drupal websites to do for libraries?  The quick answer seems to be that it’s not quite ready, but depending on your needs it might be worth a look.

For those who aren’t familiar with Drupal, it’s a content management system designed to manage complex sites with multiple types of content, users, features, and appearances.  Certain “core” features are available to everyone out of the box, but even more useful are the “modules”, which extend the features to do all kinds of things from the mundane but essential backup of a site to a flashy carousel slider. However, the modules are created by individuals or companies and contributed back to the community, and thus when Drupal makes a major version change they need to be rewritten, quite drastically in the case of Drupal 8. That means that right now we are in a period where developers may or may not be redoing their modules, or they may be rethinking about how a certain task should be done in the future. Because most of these developers are doing this work as volunteers, it’s not reasonable to expect that they will complete the work on your timeline. The expectation is that if a feature is really important to you, then you’ll work on development to make it happen. That is, of course, easier said than done for people who barely have enough time to do the basic web development asked of them, much less complex programming or learning a new system top to bottom, so most of us are stuck waiting or figuring out our own solutions.

Despite my knowledge of the reality of how Drupal works, I was very excited at the prospect of getting into Drupal 8 and learning all the new features. I installed it right away and started poking around, but realized pretty quickly I was going to have to do a complete evaluation for whether it was actually practical to use it for my library’s website. Our website has been on Drupal 7 since 2012, and works pretty well, though it does need a new theme to bring it into line with 2016 design and accessibility standards. Ideally, however, we could be doing even more with the site, such as providing better discovery for our digital special collections and making the site information more semantic web friendly. It was those latter, more advanced, feature desires that made me really wish to use Drupal 8, which includes semantic HTML5 integration and schema.org markup, as well as better integration with other tools and libraries. But the question remains–would it really be practical to work on migrating the site immediately, or would it make more sense to spend some development time on improving the Drupal 7 site to make it work for the next year or so while working on Drupal 8 development more slowly?

A bit of research online will tell you that there’s no right answer, but that the first thing to do in an evaluation is determine whether any the modules on which your site depends are available for Drupal 8, and if not, whether there is a good alternative. I must add that while all the functions I am going to mention can be done manually or through custom code, a lot of that work would take more time to write and maintain than I expect to have going forward. In fact, we’ve been working to move more of our customized code to modules already, since that makes it possible to distribute some of the workload to others outside of the very few people at our library who write code or even know HTML well, not to mention taking advantage of all the great expertise of the Drupal community.

I tried two different methods for the evaluation. First, I created a spreadsheet with all the modules we actually use in Drupal 7, their versions, and the current status of those modules in Drupal 8 or if I found a reasonable substitute. Next, I tried a site that automates that process, d8upgrade.org. Basically you fill in your website URL and email, and wait a day for your report, which is very straightforward with a list of modules found for your site, whether there is a stable release, an alpha or beta release, or no Drupal 8 release found yet. This is a useful timesaver, but will need some manual work to complete and isn’t always completely up to date.

My manual analysis determined that there were 30 modules on which we depend to a greater or lesser extent. Of those, 10 either moved into Drupal core (so would automatically be included) or the functions on which used them moved into another piece of core. 5 had versions available in Drupal 8, with varying levels of release (i.e. several in stable alpha release, so questionable to use for production sites but probably fine), and 5 were not migrated but it was possible to identify substitute Drupal 8 modules. That’s pretty good– 18 modules were available in Drupal 8, and in several cases one module could do the job that two or more had done in Drupal 7. Of the additional 11 modules that weren’t migrated and didn’t have an easy substitution, three of them are critical to maintaining our current site workflows. I’ll talk about those in more detail below.

d8upDrupal8analysisgrade.org found 21 modules in use, though I didn’t include all of them on my own spreadsheet if I didn’t intend to keep using them in the future. I’ve included a screenshot of the report, and there are a few things to note. This list does not have all the modules I had on my list, since some of those are used purely behind the scenes for administrative purposes and would have no indication of use without administrative access. The very last item on the list is Core, which of course isn’t going to be upgraded to Drupal 8–it is Drupal 8. I also found that it’s not completely up to date. For instance, my own analysis found a pre-release version of Workbench Moderation, but that information had not made it to this site yet. A quick email to them fixed it almost immediately, however, so this screenshot is out of date.

I decided that there were three dealbreaker modules for the upgrade, and I want to talk about why we rely on them, since I think my reasoning will be applicable to many libraries with limited web development time. I will also give honorable mention to a module that we are not currently using, but I know a lot of libraries rely on and that I would potentially like to use in the future.

Webform is a module that creates a very simple to use interface for creating webforms and doing all kinds of things with them beyond just simply sending emails. We have many, many custom PHP/MySQL forms throughout our website and intranet, but there are only two people on the staff who can edit those or download the submitted entries from them. They also occasionally have dreadful spam problems. We’ve been slowly working on migrating these custom forms to the Drupal Webform module, since that allows much more distribution of effort across the staff, and provides easier ways to stop spam using, for instance, the Honeypot module or Mollom. (We’ve found that the Honeypot module stopped nearly all our spam problems and didn’t need to move to Mollom, since we don’t have user comments to moderate). The thought of going back to coding all those webforms myself is not appealing, so for now I can’t move forward until I come up with a Drupal solution.

Redirect does a seemingly tiny job that’s extremely helpful. It allows you to create redirects for URLs on your site, which is incredibly helpful for all kinds of reasons. For instance, if you want to create a library site branded link that forwards somewhere else like a database vendor or another page on your university site, or if you want to change a page URL but ensure people with bookmarks to the old page will still find it. This is, of course, something that you can do on your web server, assuming you have access to it, but this module takes a lot of the administrative overhead away and helps keep things organized.

Backup and Migrate is my greatest helper in my goal to be someone who would like to at least be in the neighborhood of best practices for web development when web development is only half my job, or some weeks more like a quarter of my job. It makes a very quick process of keeping my development, staging, and production sites in sync, and since I created a workflow using this module I have been far more successful in keeping my development processes sane. It provides an interface for creating a backup of your site database, files directories, or your database and files that you can use in the Backup and Migrate module to completely restore a site. I use it at least every two weeks, or more often when working on a particular feature to move the database between servers (I don’t move the files with the module for this process, but that’s useful for backups that are for emergency restoration of the site). There are other ways to accomplish this work, but this particular workflow has been so helpful that I hate to dump a lot of time into redoing it just now.

One last honorable mention goes to Workbench, which we don’t use but I know a lot of libraries do use. This allows you to create a much more friendly interface for content editors so they don’t have to deal with the administrative backend of Drupal and allows them to just see their own content. We do use Workbench Moderation, which does have a Drupal 8 release, and allows a moderation queue for the six or so members of staff who can create or edit content but don’t have administrative rights to have their content checked by an administrator. None of them particularly like the standard Drupal content creation interface, and it’s not something that we would ever ask the rest of the staff to use. We know from the lack of use of our intranet, which also is on Drupal, that no one particularly cares for editing content there. So if we wanted to expand access to website editing, which we’ve talked about a lot, this would be a key module for us to use.

Given the current status of these modules  with rewrites in progress, it seems likely that by the end of the year it may be possible to migrate to Drupal 8 with our current setup, or in playing around with Drupal 8 on a development site that we determine a different way to approach these needs. If you have the interest and time to do this, there are worse ways to pass the time. If you are creating a completely new Drupal site and don’t have a time crunch, starting in Drupal 8 now is probably the way to go, since by the time the site would be ready you may have additional modules available and get to take advantage of all the new features. If this is something you’re trying to roll out by the end of the semester, maybe wait on it.

Have you considered upgrading your library’s site to Drupal 8? Have you been successful? Let us know in the comments.


Creating a Flipbook Reader for the Web

At my library, we have a wonderful collection of artist’s books. These titles are not mere text, or even concrete poetry shaping letters across a page, but works of art that use the book as a form in the same way a sculptor might choose clay or marble. They’re all highly inventive and typically employ an interplay between language, symbol, and image that challenges our understanding of what a print volume is. As an art library, collecting these unique and inspiring works seems natural. But how can we share our artists’ books with the world? For a while, our work study students have been scanning sets of pages from the books using a typical flatbed scanner but we weren’t sure how to present these images. Below, I’ll detail how we came to fork the Internet Archive’s Bookreader to publish our images to the web.

Choosing a Book Display

Our Digital Scholarship Librarian, Lisa Conrad, led the artists’ book project. She investigated a number of potential options for creating interactive “flipbooks” out of our series of images. We had a few requirements we were looking for:

  • mobile friendly — ideally, our books would be able to be read on any device, not just desktops with Adobe Flash
  • easy to use — our work study staff shouldn’t need sophisticated skills, like editing code or markup, to publish a work
  • visually appealing — obviously we’re dealing with works of art, if our presentation is hideous or obscures the elegance of the original then it’s doing more harm than good
  • works with our repository — ultimately, the books would “live” in our institutional repository, so we needed software that would emit something like static HTML or documents that could be easily retained in it

These are not especially stringent requirements, in my mind. Still, we struggled to find decent options. The mobile limitation restricted things quite a bit; a surprising number of apps used flash or published in desktop-first formats. We felt our options to customize and simplify the user interface of other apps was too small. Even if an app spat out a nice bundle of HTML and assets that we could upload, the HTML might call out to sketchy external services or present a number of options we didn’t want or need. While I was at first hesitant to implement an open source piece of software that I knew would require much of my time, ultimately the Internet Archive’s Bookreader became the frontrunner. We felt more comfortable using an established, free piece of software that doesn’t require any server-side component (it’s all JavaScript!). My primary concern was that the last commit to the bookreader code base was over a half year ago.

Implementing the Internet Archive Bookreader

The Bookreader proved startlingly easy to implement. We simply copied the code from an example given by the Internet Archive, edited several lines of JavaScript, and were ready to go. Everything else was refinement. Still, we made a few nice decisions in working on the Bookreader that I’d like to share. If you’re not a coder, feel free to skip the rest of this section as it may be unnecessarily detailed.

First off, our code is all up on GitHub. But it won’t be useful to anyone else! It contains specific logic based on how our IR serves up links to the bookreader app itself. But one of the smarter moves, if I may indulge in some self-congratulation, was using submodules in git. If you know git then you know it’s easily one of the best ways to version code and manage projects. But what if you app is just an implementation of an existing code base? There’s a huge portion of code you’re borrowing from somewhere else, and you want to be able to segment that off and incorporate changes to it separately.

As I noted before, the bookreader isn’t exactly actively developed. But it still benefits us to separate our local modifications from the stable, external code. A submodule lets us incorporate another repository into our own. It’s the clean way of copying someone else’s files into our own version. We can reference files in the other repository as if they’re sitting right beside our own, but also pull in updates in a controlled manner without causing tons of extra commits. Elegant, handy, sublime.

Most of our work happened in a single app.js file. This file was copied from the Internet Archive’s provided example and only modified slightly. To give you a sense of how one modifies the original script, here’s a portion:

// Create the BookReader object
br = new BookReader();
// Total number of leafs
br.numLeafs = vaultItem.pages;
// Book title & URL used for the book title link
br.bookTitle= vaultItem.title;
br.bookUrl  = vaultItem.root + 'items/' + vaultItem.id + '/' + vaultItem.version + '/';
// how does the bookreader know what images to retrieve given a page number (index)?
br.getPageURI = function(index, reduce, rotate) {
    // reduce and rotate are ignored in this simple implementation, but we
    // could e.g. look at reduce and load images from a different directory
    var url = vaultItem.root + 'file/' + vaultItem.id + '/' + vaultItem.version + '/' + vaultItem.filenames + (index + 1) + '.JPG';
    return url;
}

The Internet Archive’s code provides us with a Bookreader class. All we do is instantiate it once and override certain properties and methods. The code above is all that’s necessary to display a particular image for a given page number; the vaultItem object (VAULT is the name of our IR) consists of information about a single artists’ book, like the number of pages, its title, its ID and version within the IR. The bookreader app cobbles together these pieces of info to figure out it should display images given a page or pair of pages. The getPageURI function is mostly working with a single index argument, while the group of concatenated strings for the URL are related to how our repository stores files. It’s highly specific to the IR we’re using, but not terribly complicated.

The Bookreader itself sits on a web server outside the IR. Since we cannot publicly share almost all of these books (more on this below), we restrict access to the images to users who are authenticated with the IR. So how can the external reader app serve up images of books within the repository? We expect people to discover the books via our library catalog, which links to the IR, or within the IR itself. Once they sign in, our IR’s display templates contain specially crafted URLs that pass the bookreader information about the item in our repository via their query string. Here’s a shortened example of one query string:

?title=Crystals%20to%20Aden&id=17c06cc5-c419-4e77-9bdb-43c69e94b4cd&pages=26

From there, the app parses the query string to figure out that the book’s title is Crystals to Aden, its ID within the IR is “17c06cc5-c419-4e77-9bdb-43c69e94b4cd”, and it has twenty six pages. We store those values in the vaultItem object referenced in the script above. That object contains enough information for the bookreader to determine how to retrieve images from the IR for each page of the book. Since the user has already authenticated with the IR when they discovered the URL earlier, the IR happily serves up the images.

Ch-ch-ch-challenges

Easily the most difficult part of our artists’ books project has been spreading awareness. It’s a cool project that’s required tons of work all around, from the Digital Scholarship Librarian managing the project, to our diligent work study student scanning images, to our circulation staff cataloging them in our IR, to me solving problems in JavaScript while my web design student worker polished the CSS. We’re proud of the result, but also struggling to share it outside of our walls. Our IR is great at some things but not particularly intuitive to use, so we cannot count on patrons stumbling across the artists’ books on their own very often. Furthermore, putting anything behind a login is obviously going to increase the number of people who give up before accessing it. Not being on the campus Central Authentication Service only exacerbates this for us.

The second challenge is—what else!—copyright. We don’t own the rights to any of these titles. We’ve been guerrilla digitizing them without permission from publishers. For internal sharing, that’s fine and well under Fair Use; we’re only showing excerpts and security settings in our IR ensure they’re only visible to our constituents. But I pine to share these gorgeous creations with people outside the college, with my social networks, with other librarians. Right now, we only have permission to share Crystals to Aden by Michael Bulteau (thanks to duration press for letting us!). Which is great! Except Crystals is a relatively text-heavy work; it’s fine poetry, yes, but not the earth-shattering reinterpretation of the codex I promised you in my opening paragraph.

Hopefully, we can secure permissions to share further works. Internally, we’ve pushed out notices on social media, the LMS, and email lists to inform people of the artists’ books. They’re also available for checkout, so hopefully our digital teasers bring people in to see the real deal.

Lasting Problems

And that’s something that must be mentioned; our digital simulations are definitely not the real deal. Never has a set of titles been so resistant to digitization. Our work study has even asked us “how could I possibly separate this work into distinct pairs of pages” about a work which used partially-overlapping leaves that look like vinyl record sleeves.1 Artists deliberately chose the printed book form and there’s a deep sacrilege in trying to digitally represent that. We can only ever offer a vague adumbration of what was truly intended. I still see value in our bookreader—and the works often look brilliant even as scanned images—but there are fundamental impediments to its execution.

Secondly, scanned images are not text. Many works do have text on the page, but because we’re displaying images in the browser users cannot select the text to copy it. Alongside each set of images in our IR, we also have a PDF copy of all the pages with OCR‘d text. But a decision was made not to make the PDFs visible to non-library staff, since they would be easy to download and disseminate (unlike the many discrete images in our bookreader). This all adds up to make the bookreader very inaccessible; its all images, there’s no feasible way for us to associate alt text with each one, and the PDF copy that might be of interest to visually-impaired users is hidden.

I’ve ended on a sour note, but digitizing our artists’ books and fiddling with the Internet Archive’s Bookreader was a great project. Fun, a bit challenging, with some splendid results. We have our work cut out for us if we want to draw more attention to the books and have them be compelling for everyone. But other libraries in similar situations may find the Bookreader to be a very viable, easy-to-implement solution. If you don’t have permissions issues and are dealing with more traditional works, it’s built to be customizable yet offers a pleasant reading experience.

Notes

  1. Paradoxic Mutations by Margot Lovejoy

From Consensus to Expertise: Rethinking Library Web Governance

The world is changing, the web is changing and libraries are changing along with them. Commercial behemoths like Amazon, Google and Facebook, together with significant advancements in technical infrastructure and consumer technology, have established a new set of expectations for even casual users of the web. These expectations have created new mental models of how things ought to work, and why—not just online, either. The Internet of Things may not yet be fully realized but we clearly see its imminent appearance in our daily lives.

Within libraries, has our collective concept of the intention and purpose of the library website evolved as well? How should the significant changes in how the web works, what websites do and how we interact with them also impact how we manage, assess and maintain library websites?

In some cases it has been easier to say what the library website is not – a catalog, a fixed-form document, a repository—although it facilitates access to these things, and perhaps makes them discoverable. What, then, is the library website? As academic librarians, we define it as follows.

The library website is an integrated representation of the library, providing continuously updated content and tools to engage with the academic mission of the college/university.

It is constructed and maintained for the benefit of the user. Value is placed on consumption of content by the user rather than production of content by staff.

Moving from a negative definition to a positive definition empowers both stewards of and contributors to the website to participate in an ongoing conversation about how to respond proactively to the future, our changing needs and expectations and, chiefly, to our users’ changing needs and expectations. Web content management systems have moved from being just another content silo to being a key part of library service infrastructure. Building on this forward momentum enables progress to a better, more context-sensitive user experience for all as we consider our content independent of its platform.

It is just this reimagining of how and why the library website contributes value, and what role it fulfills within the organization in terms of our larger goals for connecting with our local constituencies—supporting research and teaching through providing access to resources and expertise—that demands a new model for library web governance.

Emerging disciplines like content strategy and a surge of interest in user experience design and design thinking give us new tools to reflect on our practice and even to redefine what constitutes best practice in the area of web librarianship.

Historically, libraries have managed websites through committees and task forces. Appointments to these governing bodies were frequently driven by a desire to ensure adequate balance across the organizational chart, and to varying degrees by individuals’ interest and expertise. As such, we must acknowledge the role of internal politics as a variable factor in these groups’ ability to succeed—one might be working either with, or against, the wind. Librarians, particularly in groups of this kind, notoriously prefer consensus-driven decision making.

The role of expertise is largely taken for granted across most library units; that is, not just anyone is qualified to perform a range of essential duties, from cataloging to instruction to server administration to website management.1 Consciously according ourselves and our colleagues the trust to employ their unique expertise allows individuals to flourish and enlarges the capacity of the organization and the profession. In the context of web design and governance, consensus is a blocker to nimble, standards-based, user-focused action. Collaborative processes, in which all voices are heard, together with empirical data are essential inputs for effective decision-making by domain experts in web librarianship as in other areas of library operations.

Web librarianship, through bridging and unifying individual and collaborative contributions to better enable discovery, supports the overall mission of libraries in the context of the following critical function:

providing multiple systems and/or interfaces
to browse, identify, locate, obtain and use
spaces, collections, and services,
either known or previously unknown to the searcher
with the goal of enabling completion of an information-related task or goal.

The scope of content potentially relevant to the user’s discovery journey encompasses the hours a particular library is open on a given day up to and including advanced scholarship, and all points between. This perhaps revives in the reader’s mind the concept of library website as portal – that analogy has its strengths and weaknesses, to be sure. Ultimately, success for a library website may be defined as the degree to which it enables seamless passage; the user’s journey only briefly intersects with our systems and services, and we should permit her to continue on to her desired information destination without unnecessary inconvenience or interference. A friction-free experience of this kind requires a holistic vision and relies on thoughtful stewardship and effective governance of meaningful content – in other words, on a specific and cultivated expertise, situated within the context of library practice. Welcome to a new web librarianship.


Courtney Greene McDonald (@xocg) is Head of Discovery & Research Services at the Indiana University Libraries in Bloomington. A technoluddite at heart, she’s equally likely to be leafing through the NUC to answer a reference question as she is to be knee-deep in a config file. She presents and writes about user experience in libraries, and is the author of Putting the User First: 30 Strategies for Transforming Library Services (ACRL 2014). She’s also a full–time word nerd and gourmand, a fair–weather gardener, and an aspiring world traveler.

Anne Haines (@annehaines) is the Web Content Specialist for the Indiana University Bloomington Libraries. She loves creating webforms in Drupal, talking to people about how to make their writing work better on the web, and sitting in endless meetings. (Okay, maybe not so much that last one.) You can find her hanging out at the intersection of content strategy and librarianship, singing a doo-wop tune underneath the streetlight.

Rachael Cohen (@RachaelCohen1) is the Discovery User Experience Librarian at Indiana University Bloomington Libraries, where she is the product owner for the library catalog discovery layer and manages the web-scale discovery service. When she’s not negotiating with developers, catalogers, and public service people you can find her hoarding books and Googling for her family.

 

Notes

  1. While it is safe to say that all library staff have amassed significant experience in personal web use, not all staff are equally equipped with the growing variety of skillsets and technical mastery necessary to oversee and steward a thriving website.

Best Practices for Hacking Third-Party Sites

While customizing vendor web services is not the most glamorous task, it’s something almost every library does. Whether we have full access to a templating system, as with LibGuides 2, or merely the ability to insert an HTML header or footer, as on many database platforms, we are tested by platform limitations and a desire to make our organization’s fractured web presence cohesive and usable.

What does customizing a vendor site look like? Let’s look at one example before going into best practices. Many libraries subscribe to EBSCO databases, which have a corresponding administrative side “EBSCOadmin”. Electronic Resources and Web Librarians commonly have credentials for these admin sites. When we sign into EBSCOadmin, there are numerous configuration options for our database subscriptions, including a “branding” tab under the “Customize Services” section.

While EBSCO’s branding options include specifying the primary and secondary colors of their databases, there’s also a “bottom branding” section which allows us to inject custom HTML. Branding colors can be important, but this post is focuses on effectively injecting markup onto vendor web pages. The steps for doing so in EBSCOadmin are numerous and not informative for any other system, but the point is that when given custom HTML access one can make many modifications, from inserting text on the page, to an entirely new stylesheet, to modifying user interface behavior with JavaScript. Below, I’ve turned footer links orange and written a message to my browser’s JavaScript console using the custom HTML options in EBSCOadmin.

customized EBSCO database

These opportunities for customization come in many flavors. We might have access only to a section of HTML in the header or footer of a page. We might be customizing the appearance of our link resolver, subscription databases, or catalog. Regardless, there are a few best practices which can aid us in making modifications that are effective.

General Best Practices

Ditch best practices when they become obstacles

It’s too tempting; I have to start this post about best practices by noting their inherent limitations. When we’re working with a site designed by someone else, the quality of our own code is restricted by decisions they made for unknown reasons. Commonly-spouted wisdom—reduce HTTP requests! don’t use eval! ID selectors should be avoided!—may be unusable or even counter-productive.

To note but one shining example: CSS specificity. If you’ve worked long enough with CSS then you know that it’s easy to back yourself into a corner by using overly powerful selectors like IDs or—the horror—inline style attributes. These methods of applying CSS have high specificity, which means that CSS written later in a stylesheet or loaded later in the HTML document might not override them as anticipated, a seeming contradiction in the “cascade” part of CSS. The hydrogen bomb of specificity is the !important modifier which automatically overrides anything but another !important later in the page’s styles.

So it’s best practice to avoid inline style attributes, ID selectors, and especially !important. Except when hacking on vendor sites it’s often necessary. What if we need to override an inline style? Suddenly, !important looks necessary. So let’s not get caught up following rules written for people in greener pastures; we’re already in the swamp, throwing some mud around may be called for.

There are dozens of other examples that come to mind. For instance, in serving content from a vendor site where we have no server-side control, we may be forced to violate web performance best practices such as sending assets with caching headers and utilizing compression. While minifying code is another performance best practice, for small customizations it adds little but obfuscates our work for other staff. Keeping a small script or style tweak human-readable might be more prudent. Overall, understanding why certain practices are recommended, and when it’s appropriate to sacrifice them, can aid our decision-making.

Test. Test. Test. When you’re done testing, test again

Whenever we’re creating an experience on the web it’s good to test. To test with Chrome, with Firefox, with Internet Explorer. To test on an iPhone, a Galaxy S4, a Chromebook. To test on our university’s wired network, on wireless, on 3G. Our users are vast; they contain multitudes. We try to represent their experiences as best as possible in the testing environment, knowing that we won’t emulate every possibility.

Testing is important, sure. But when hacking a third party site, the variance is more than doubled. The vendor has likely done their own testing. They’ve likely introduced their own hacks that work around issues with specific browsers, devices, or connectivity conditions. They may be using server-side device detection to send out subtly different versions of the site to different users; they may not offer the same functionality in all situations. All of these circumstances mean that testing is vitally important and unending. We will never cover enough ground to be sure our hacks are foolproof, but we better try or they’ll not work at all.

Analytics and error reporting

Speaking of testing, how will we know when something goes wrong? Surely, our users will send us a detailed error report, complete with screenshots and the full schematics of every piece of hardware and software involved. After all, they do not have lives or obligations of their own. They exist merely to make our code more error-proof.

If, however, for some odd reason someone does not report an error, we may still want to know that one occurred. It’s good to set up unobtrusive analytics that record errors or other measures of interaction. Did we revamp a form to add additional validation? Try tracking what proportion of visitors successfully submit the form, how often the validation is violated, how often users submit invalid data multiple times in a row, and how often our code encounters an error. There are some intriguing client-side error reporting services out there that can catch JavaScript errors and detail them for our perusal later. But even a little work with events in Google Analytics can log errors, successes, and everything in between. With the mere information that problems are occurring, we may be able to identify patterns, focus our testing, and ultimately improve our customizations and end-user experience.

Know when to cut your losses

Some aspects of a vendor site are difficult to customize. I don’t want to say impossible, since one can do an awful lot with only a single <script> tag to work with, but unfeasible. Sometimes it’s best to know when sinking more time and effort into a customization isn’t worth it.

For instance, our repository has a “hierarchy browse” feature which allows us to present filtered subsets of items to users. We often get requests to customize the hierarchies for specific departments or purposes—can we change the default sort, can we hide certain info here but not there, can we use grid instead of list-based results? We probably can, because the hierarchy browse allows us to inject arbitrary custom HTML at the top of each section. But the interface for doing so is a bit clumsy and would need to be repeated everywhere a customization is made, sometimes across dozens of places simply to cover a single department’s work. So while many of these change requests are technically possible, they’re unwise. Updates would be difficult and impossible to automate, virtually ensuring errors are introduced over time as I forget to update one section or make a manual mistake somewhere. Instead, I can focus on customizing the site-wide theme to fix other, potentially larger issues with more maintainable solutions.

A good alternative to tricky and unmaintainable customizations is to submit a feature request to the vendor. Some vendors have specific sites where we can submit ideas for new features and put our support behind others’ ideas. For instance, the Innovative Users Group hosts an annual vote where members can select their most desired enhancement requests. Remember that vendors want to make a better product after all; our feedback is valued. Even if there’s no formal system for submitting feature requests, a simple email to our sales representative or customer support can help.

CSS Best Practices

While the above section spoke to general advice, CSS and JavaScript have a few specific peculiarities to keep in mind while working within a hostile host environment.

Don’t write brittle, overly-specific selectors

There are two unifying characteristics of hacking on third-party sites: 1) we’re unfamiliar with the underlying logic of why the site is constructed in a particular way and 2) everything is subject to change without notice. Both of these making targeting HTML elements, whether with CSS or JavaScript, challenging. We want our selectors to be as flexible as possible, to withstand as much change as possible without breaking. Say we have the following list of helpful tools in a sidebar:

<div id="tools">
    <ul>
        <li><span class="icon icon-hat"></span><a href="#">Email a Librarian</a></li>
        <li><span class="icon icon-turtle"></span><a href="#">Citations</a></li>
        <li><span class="icon icon-unicorn"></span><a href="#">Catalog</a></li>
    </ul>
</div>

We can modify the icons listed with a selector like #tools > ul > li > span.icon.icon-hat. But many small changes could break this style: a wrapper layer injected in between the #tools div and the unordered list, a switch from unordered to ordered list, moving from <span>s for icons to another tag such as <i>. Instead, a selector like #tools .icon.icon-hat assumes that little will stay the same; it thinks there’ll be icons inside the #tools section, but doesn’t care about anything in between. Some assumptions have to stay, that’s the nature of customizing someone else’s site, but it’s pretty safe to bet on the icon classes to remain.

In general, sibling and child selectors make for poor choices for vendor sites. We’re suddenly relying not just on tags, classes, and IDs to stay the same, but also the particular order that elements appear in. I’d also argue that pseudo-selectors like :first-child, :last-child, and :nth-child() are dangerous for the same reason.

Avoid positioning if possible

Positioning and layout can be tricky to get right on a vendor site. Unless we’re confident in our tests and have covered all the edge cases, try to avoid properties like position and float. In my experience, many poorly structured vendor sites employ ad hoc box-sizing measurements, float-based layout, and lack a grid system. These are all a recipe for weird interconnections between disparate parts—we try to give a call-out box a bit more padding and end up sending the secondary navigation flying a thousand pixels to the right offscreen.

display: none is your friend

display: none is easily my most frequently used CSS property when I customize vendor sites. Can’t turn off a feature in the admin options? Hide it from the interface entirely. A particular feature is broken on mobile? Hide it. A feature is of niche appeal and adds more clutter than it’s worth? Hide it. The footer? Yeah, it’s a useless advertisement, let’s get rid of it. display: none is great but remember it does affect a site’s layout; the hidden element will collapse and no longer take up space, so be careful when hiding structural elements that are presented as menus or columns.

Attribute selectors are excellent

Attribute selectors, which enable us to target an element by the value of any of its HTML attributes, are incredibly powerful. They aren’t very common, so here’s a quick refresher on what they look. Say we have the following HTML element:

<a href="http://example.com" title="the best site, seriously" target="_blank">

This is an anchor tag with three attributes: href, title, and target. Attribute selectors allow us to target an element by whether it has an attribute or an attribute with a particular value, like so:

/* applies to <a> tags with a "target" attribute */
a[target] {
    background: red;
}
/* applies to <a> tags with an "href" that begin with "http://"
this is a great way to style links pointed at external websites
or one particular external website! */
a[href^="http://"] {
    cursor: help;
}
/* applies to <a> tags with the text "best" anywhere in their "title" attribute */
a[title*="best"] {
    font-variant: small-caps;
}

Why is this useful among the many ways we can select elements in CSS? Vendor sites often aren’t anticipating all the customizations we want to employ; they may not provide handy class and ID styling hooks where we need them. Or, as noted above, the structure of the document may be subject to change either over time or across different pieces of the site. Attribute selectors can help mitigate this by making style bindings more explicit. Instead of saying “change the background icon for some random span inside a list inside a div”, we can say “change the background icon for the link that points at our citation management tool”.

If that’s unclear, let me give another example from our institutional repository. While we have the ability to list custom links in the main left-hand navigation of our site, we cannot control the icons that appear with them. What’s worse, there are virtually no styling hooks available; we have an unadorned anchor tag to work with. But that turns out to be plenty for a selector of form a[href$=hierarchy] to target all <a>s with an href ending in “hierarchy”; suddenly we can define icon styles based on the URLs we’re pointing it, which is exactly what we want to base them on anyways.

Attribute selectors are brittle in their own ways—when our URLs change, these icons will break. But they’re a handy tool to have.

JavaScript Best Practices

Avoid the global scope

JavaScript has a notorious problem with global variables. By default, all variables lacking the var keyword are made global. Furthermore, variables outside the scope of any function will also be global. Global variables are considered harmful because they too easily allow unrelated pieces of code to interact; when everything’s sharing the same namespace, the chance that common names like i for index or count are used in two conflicting contexts increases greatly.

To avoid polluting the global scope with our own code, we wrap our entire script customizations in an immediately-invoked function expression (IIFE):

(function() {
    // do stuff here 
}())

Wrapping our code in this hideous-looking construction gives it its own scope, so we can define variables without fear of overwriting ones in the global scope. As a bonus, our code still has access to global variables like window and navigator. However, global variables defined by the vendor site itself are best avoided; it is possible they will change or are subject to strange conditions that we can’t determine. Again, the fewer assumptions our code makes about how the vendor’s site works, the more resilient it will be.

Avoid calling vendor-provided functions

Oftentimes the vendor site itself will put important functions in the global scope, funtions like submitForm or validate where their intention seems quite obvious. We may even be able to reverse engineer their code a bit, determining what the parameters we should pass to these functions are. But we must not succumb to the temptation to actually reference their code within our own!

Even if we have a decent handle on the vendor’s current code, it is far too subject to change. Instead, we should seek to add or modify site functionality in a more macro-like way; instead of calling vendor functions in our code, we can automate interactions with the user interface. For instance, say the “save” button is in an inconvenient place on a form and has the following code:

<button type="submit" class="btn btn-primary" onclick="submitForm(0)">Save</button>

We can see that the button saves the form by calling the submitForm function when it’s clicked with a value of 0. Maybe we even figure out that 0 means “no errors” whereas 1 means “error”.1 So we could create another button somewhere which calls this same submitForm function. But so many changes break our code; if the meaning of the “0” changes, if the function name changes, or if something else happens when the save button is clicked that’s not evident in the markup. Instead, we can have our new button trigger the click event on the original save button exactly as a user interacting with the site would. In this way, our new save button should emulate exactly the behavior of the old one through many types of changes.

{{Insert Your Best Practices Here}}

Web-savvy librarians of the world, what are the practices you stick to when modifying your LibGuides, catalog, discovery layer, databases, etc.? It’s actually been a while since I did customization outside of my college’s IR, so the ideas in this post are more opinion than practice. If you have your own techniques—or disagree with the ones in this post!—we’d love to hear about it in the comments.

Notes

  1. True story, I reverse engineered a vendor form where this appeared to be the case.

A Video on Browser Extensions

I thought we’d try something new on ACRL TechConnect, so I recorded a fifteen-minute video discussing general use cases for browser extensions and some specifics of Google Chrome extensions.

The video mentions my WikipeDPLA post on this blog and walks through some slides I presented at a Code4Lib Northern California event.

If you’re looking for another good extension example in libraryland, Stephen Schor of New York Public Library recently wrote extensions for Chrome and Firefox that improve the appearance and utility of the Library of Congress’ EAD documentation. The Chrome extension uses the same content script approach as my silly example in the video. It’s a good demonstration of how you can customize a site you don’t control using the power of browser add-ons.

Have you found a use case for browser add-ons at your library? Let us know in the comments!


Using Grunt to Automate Repetitive Tasks

Riding a tangent from my previous post on web performance, here is an introduction to Grunt, a JavaScript task runner.

Why would we use Grunt? It’s become a common tool for web development as it puts together a number of tedious but necessary steps for optimizing a website. However, “task runner” is intentionally generic; Grunt isn’t specifically limited to websites, it can move and modify files in manifold ways.

Initial Setup

Unfortunately, virtually no operating system comes prepared to use Grunt out of the box. Grunt is written in  Node.js and thus require a few install steps. The good news is that Node is cross-platform; in my experience, it works better on Windows than most other programming frameworks.

  • Install Node
  • Ensure that the node and npm commands are on your path by running node --version and npm --version
    • If not, try a web search like “add node to path {{operating system}}”, it takes at most editing a single line of a particular file
  • Install Grunt globally with npm install -g grunt-cli
  • Ensure Grunt is on your path (grunt --version) and, again, search for an answer if not
  • Inside your project, run npm install grunt to install a local copy of grunt (it’ll appear in a folder named “node_modules”)

We’re ready to run! Let’s do a basic example.

First Example: Basic Web App Optimization

Say we have a simple website: there’s an index.html page, a stylesheet in a “css” subfolder, and a script in a “js” subfolder. First, let’s define what we want to accomplish. We want to: keep a full-size, easily readable copy of all our code while also building minified versions of both the CSS and JS. To do this, we’ll use three plugins: cssmin, uglify, and copy. This whole example is available on GitHub; even if you don’t use git, you can download a zip archive of the files.

First, inside our project, We run npm install grunt-contrib-cssmin grunt-contrib-uglify grunt-contrib-copy. These plugins are now installed in a “nodemodules” folder, but Grunt still needs to know _how to use them. It needs to know what tasks manipulate what files and other options. We provide this information in a file named “Gruntfile.js” in the root of our project. Here’s our initial one:

module.exports = function(grunt) {
    // this tells Grunt about the 3 plugins we installed 
    grunt.loadNpmTasks('grunt-contrib-cssmin');
    grunt.loadNpmTasks('grunt-contrib-uglify');
    grunt.loadNpmTasks('grunt-contrib-copy');
    grunt.initConfig({
        // our configuration will go here 
    });
    // we're going to want to run cssmin, uglify, & copy together 
    // so let's group them under a single "build" task 
    grunt.registerTask('build', ['cssmin', 'uglify', 'copy']);
};

For each task, there’ll be a section inside initConfig with its settings. For cssmin, the setup is nested a few layers but is really just a single line of code. Under a cssmin property we specify a target name, which can be anything. Targets allow us to have multiple configurations for a single task, which is handy in more complex projects but often unneeded. Under our target, which we’ll name “minify”, there’s a files property where we associate an array of input files with a single output file. This makes concatenating multiple stylesheets easy.

cssmin: {
  minify: {
    files: {
      'build/css/main.css': ['css/main.css']
    }
  }
}, // trailing comma since we'll add another section

Uglify’s setup is identical. We’ll name the target the same and only change the paths inside the files property.

uglify: {
  minify: {
    files: {
      'build/js/main.js': ['js/main.js']
    }
  }
}, // again, trailing comma, we add more below

While cssmin handles stylesheets and uglify handles JavaScript, our index.html only needs to be copied and not modified.1 See if you can write out the copy task’s settings by yourself, mimicking what we’ve already done.

Now we run grunt build in our command prompt and some messages tell us about each task’s status. Look in the “build” folder which appears after we ran the command. Far smaller, optimized versions of our main CSS and JS files are in there.

Great! We’ve accomplished our goals. But we must run grunt build over and over each time we want to remake our optimized assets, switching between our code editor and command prompt each time. If we’re doing a lot of piecemeal editing, this is most annoying. Instead, let’s use another plugin by running npm install grunt-contrib-watch to get the “watch” task and load it with the line grunt.loadNpmTasks('grunt-contrib-watch').2 Then, write this configuration:

watch: {
    minify: {
      files: ['\*.html', 'css/\*.css', 'js/\*.js'],
      tasks: ['build']
  }
}

Watch has just two intuitive parameters in its configuration; an array of files to watch and an array of tasks to execute when those files change. The asterisks are wildcards, so unlike our settings above this stanza isn’t dependent on exact file names. Now, by running grunt watch in our command prompt, optimized assets are magically constructed every time we save changes to a file. We can edit in peace without continually switching between the command line and our editor. Better yet, watch can work with a local development server to reload new versions of files upon every edit.3 The right combination of tasks can yield super efficient workflows where we edit and view results without worrying about optimizations made behind the scenes.

More Advanced: Portable Header

While the above is suitable for much small-scale web development, Grunt can handle far more complex situations. For example, I wrote a portable HTML header which can be inserted onto various vendor websites such as LibGuides or an A to Z list. The project’s Gruntfile is 159 lines long and makes use of ten Grunt plugins.

I won’t go into detail to explain how each Grunt tasks’ settings work, but I will outline what’s happening. A “sass” task compiles SCSS code into minified CSS that browser can understand using an external program. A couple of linting tools, jshint and scss-lint, check files against code quality heuristics. Our good friends copy and uglify are back doing their job, only this time they’ve joined by htmlmin which handles the index page. “String-replace” is an example of a multi-target task; its first target searches over a series of files for strings wrapped in double-curly braces like “{{example}}”. It then swaps out these placeholders with values specified in another file. The second takes the entire contents of a stylesheet and a script and inlines them into the main HTML.

That’s a lot of labor being handling by computer programs instead of humans. While passing a couple files through tools that remove comments and whitespace isn’t tough, the many steps in constructing an optimized HTML header from several files provide a good demonstration of Grunt’s value. While it took me some time to configure everything properly, the combined “build” task for this project has probably run hundreds of times and saved me hours of work. Not only that, because of the linting and minification, the final product is doubtless more high-quality than I could assemble manually.

The length and complexity of my Gruntfile points to one of the tougher pieces of using Grunt heavily; the order and delegated responsibilities of numerous tasks is tricky to coordinate properly. Throughout my Gruntfile, there are comments indicating when particular tasks run because running them out of order would either be fruitless or cause an error. For instance, the “string-replace” task’s “inline” target must run after those other files have been minified, otherwise the minification serves no purpose (the inlined code would be full size).

Similarly, coordinating which tasks move which files has been a constant headache for me in many projects. While the “copy” task moves images to the build folder, the “tpl” target of the “string-replace” task moves everything else. But I could’ve also used the uglify or sass tasks to move files! Since every task can potentially move the files it operates upon, it’s difficult to keep track of where a file is at a particular time in the workflow. The best way to debug these issues is to run multi-task aliases like build one at a time; first run uglify, then run cssmin, then run htmlmin… checking the state of files in between each to make sure that changes are occurring as anticipated.

Use Cases Abound

I use Grunt in almost all my projects, whether they’re web development or not. I use Grunt to copy my shell customizations into place, so that when I’m working on a new one I can just run grunt watch and rely on the changes being synched into place. I use Grunt to build Chrome extensions which require extra packaging steps before they can be pushed into the Chrome Web Store. I use Grunt in our catalog’s customized pages to minify code and also to check for potential errors. As an additional step, Grunt can be hooked up to deploy processes such that once a successful build is made the new files are pushed off to a remote server.

Grunt can be used to construct almost any workflow from a series of discrete pieces. Compiling some EAD finding aids into an HTML website via XSLT. Processing vendor MARC files with a PyMARC script and then uploading them into an ILS. Anything that can be scripted could be tied to Grunt tasks with a plug-in like grunt-exec, which executes arbitrary shell commands. However, there is a limit to what it’s sensible to do with Grunt. These last two examples are arguably better accomplished with shell scripts. Grunt is at its best when its great suite of plug-ins are relied upon and those tend to perform web-specific tasks. Grunt also requires at least a modicum of comfort with coding. It falls into an odd space, because while the configuration file is indeed JavaScript, it reads like a series of lists of settings, files, and ordered tasks. If you have more complex needs that involve if-then conditions and custom scripts, a lot of Grunt’s utility is negated. On the other hand, for those who would rather avoid code and the command line, options like the CodeKit app make more sense.

The Grunt site’s Getting Started and Sample Gruntfile pages are helpful sources of documentation.

A Beginner’s Guide to Grunt: Redux — a nice, updated overview. Some of the steps here are unnecessary for beginners, however, as they require a lot of files and structure. That’s great for experienced developers in the long run, because everything is smaller and more modular, but too much setup for simple projects.

I find myself constantly consulting the readme’s for various grunt plugins to figure out how they work, since their options are not necessarily discoverable otherwise. A quick way to pop open the home page of a package is by running npm home grunt-contrib-uglify (inserting the plugin name of your choice) which will open the registered home page of the package, often on GitHub.

Finally, it’s worth mentioning Gulp, a competing JavaScript task runner. Gulp is the same type of tool as Grunt (you wouldn’t use both in a project) but follows a different design philosophy. In short, Gulp tends to run faster due to its design and setting it up looks more like code and less like a configuration file, which some people prefer.

Notes

  1. There’s actually another great plugin, grunt-contrib-htmlmin, which minifies HTML. Its settings are only a little bit more involved than the copy task, so trying to configure htmlmin would make another nice exercise for those wanting to build on this post.
  2. Writing grunt.loadNpmTasks for each task we add to a complex Gruntfile gets tiresome. It takes a bit more initial work—we need to run npm init before anything else, fill in some prompts, and append --save-dev to all your npm install commands—so I decided to skip it in this intro, but we can use load-grunt-tasks to get this down to a single line that doesn’t need to be updated each time a new plugin is added.
  3. The appropriate setting is options.livereload as documented here. While our scenario doesn’t quite capture how time-saving this can be, grunt watch shines when working with a language that compiles to CSS like SASS. A process like “edit SASS, compile CSS, reload web page, view changes” becomes simply “edit SASS, view changes” because the intermediary stages are triggered by grunt watch.

Analyzing EZProxy Logs

Analyzing EZProxy logs may not be the most glamorous task in the world, but it can be illuminating. Depending on your EZProxy configuration, log analysis can allow you to see the top databases your users are visiting, the busiest days of the week, the number of connections to your resources occurring on or off-campus, what kinds of users (e.g., staff or faculty) are accessing proxied resources, and more.

What’s an EZProxy Log?

EZProxy logs are not significantly different from regular server logs.  Server logs are generally just plain text files that record activity that happens on the server.  Logs that are frequently analyzed to provide insight into how the server is doing include error logs (which can be used to help diagnose problems the server is having) and access logs (which can be used to identify usage activity).

EZProxy logs are a kind of modified access log, which record activities (page loads, http requests, etc.) your users undertake while connected in an EZProxy session. This article will briefly outline five potential methods for analyzing EZProxy logs:  AWStats, Piwik, EZPaarse, a custom Python script for parsing starting-point URLS (SPU) logs, and a paid option called Splunk.

The ability of  any log analyzer will of course depend upon how your EZProxy log directives are configured.  You will need to know your LogFormat and/or LogSPU directives in order to configure most log file analyzing solutions.  In EZProxy, you can see how your logs are formatted in config.txt/ezproxy.cfg by looking for the LogFormat directive, 1  e.g.,

LogFormat %h %l %u %t “%r” %s %b “%{user-agent}i”

and / or, to log Starting Point URLs (SPUs):

LogSPU -strftime log/spu/spu%Y%m.log %h %l %u %t “%r” %s %b “%{ezproxy-groups}i”

Logging Starting Point URLs can be useful because those tend to be users either clicking into a database or the full-text of an article, but no activity after that initial contact is logged.  This type of logging does not log extraneous resource loading, such as loading scripts and images – which often clutter up your traditional LogFormat logs and lead to misleadingly high hits.  LogSPU directives can be defined in addition to traditional LogFormat to provide two different possible views of your users’ data.  SPULogs can be easier to analyze and give more interesting data, because they can give a clearer picture of which links and databases are most popular  among your EZProxy users.  If you haven’t already set it up, SPULogs can be a very useful way to observe general usage trends by database.

You can find some very brief anonymized EZProxy log sample files on Gist:

On a typical EZProxy installation, historical monthly logs can be found inside the ezproxy/log directory.  By default they will rotate out every 12 months, so you may only find the past year of data stored on your server.

AWStats

Get It:  http://www.awstats.org/#DOWNLOAD

Best Used With:  Full Logs or SPU Logs

Code / Framework:  Perl

    An example AWStats monthly history report. Can you tell when our summer break begins?

An example AWStats monthly history report. Can you tell when our summer break begins?

AWStats Pros:

  • Easy installation, including on localhost
  • You can define your unique LogFormat easily in AWStats’ .conf file.
  • Friendly, albeit a little bit dated looking, charts show overall usage trends.
  • Extensive (but sometimes tricky) customization options can be used to more accurately represent sometimes unusual EZProxy log data.
Hourly traffic distribution in AWStats.  While our traffic peaks during normal working hours, we have steady usage going on until about 1 AM, after which point it crashes pretty hard.  We could use this data to determine  how much virtual reference staffing we should have available during these hours.

Hourly traffic distribution in AWStats. While our traffic peaks during normal working hours, we have steady usage going on until about Midnight, after which point it crashes pretty hard. We could use this data to determine how much virtual reference staffing we should have available during these hours.

 

AWStats Cons:

  • If you make a change to .conf files after you’ve ingested logs, the changes do not take effect on already ingested data.  You’ll have to re-ingest your logs.
  • Charts and graphs are not particularly (at least easily) customizable, and are not very modern-looking.
  • Charts are static and not interactive; you cannot easily cross-section the data to make custom charts.

Piwik

Get It:  http://piwik.org/download/

Best Used With:  SPULogs, or embedded on web pages web traffic analytic tool

Code / Framework:  Python

piwik visitor dashboard

The Piwik visitor dashboard showing visits over time. Each point on the graph is interactive. The report shown actually is only displaying stats for a single day. The graphs are friendly and modern-looking, but can be slow to load.

Piwik Pros:

  • The charts and graphs generated by Piwik are much more attractive and interactive than those produced by AWStats, with report customizations very similar to what’s available in Google Analytics.
  • If you are comfortable with Python, you can do additional customizations to get more details out of your logs.
Piwik file ingestion in PowerShell

To ingest a single monthly log took several hours. On the plus side, with this running on one of Lauren’s monitors, anytime someone walked into her office they thought she was doing something *really* technical.

Piwik Cons:

  • By default, parsing of large log files seems to be pretty slow, but performance may depend on your environment, the size of your log files and how often you rotate your logs.
  • In order to fully take advantage of the library-specific information your logs might contain and your LogFormat setup, you might have to do some pretty significant modification of Piwik’s import_logs.py script.
When looking at popular pages in Piwik you’re somewhat at the mercy that the subdirectories of databases have meaningful labels; luckily EBSCO does, as shown here.  We have a lot of users looking at EBSCO Ebooks, apparently.

When looking at popular pages in Piwik you’re somewhat at the mercy that the subdirectories of database URLs have meaningful labels; luckily EBSCO does, as shown here. We have a lot of users looking at EBSCO Ebooks, apparently.

EZPaarse

Get Ithttp://analogist.couperin.org/ezpaarse/download

Best Used With:  Full Logs or SPULogs

Code / Framework:  Node.js

ezPaarse’s friendly drag and drop interface.  You can also copy/paste lines for your logs to try out the functionality by creating an account at http://ezpaarse.couperin.org.

ezPaarse’s friendly drag and drop interface. You can also copy/paste lines for your logs to try out the functionality by creating an account at http://ezpaarse.couperin.org.

EZPaarse Pros:

  • Has a lot of potential to be used to analyze existing log data to better understand e-resource usage.
  • Drag-and-drop interface, as well as copy/paste log analysis
  • No command-line needed
  • Its goal is to be able to associate meaningful metadata (domains, ISSNs) to provide better electronic resource usage statistics.
ezPaarse Excel output generated from a sample log file, showing type of resource (article, book, etc.) ISSN, publisher, domain, filesize, and more.

ezPaarse Excel output generated from a sample log file, showing type of resource (article, book, etc.) ISSN, publisher, domain, filesize, and more.

EZPaarse Cons:

  • In Lauren’s testing, we couldn’t get of the logs to ingest correctly (perhaps due to a somewhat non-standard EZProxy logformat) but the samples files provided worked well. UPDATE 11/26:  With some gracious assistance from EZPaarse’s developers, we got EZPaarse to work!  It took about 10 minutes to process 2.5 million log lines, which is pretty awesome. Lesson learned – if you get stuck, reach out to ezpaarse [at] couperin.org or tweet for help @ezpaarse.  Also be sure to try out some of the pre-defined parameters set up by other institutions under Parameters. Check out the comments below for some more detail from ezpaarse’s developers.
  • Output is in Excel Sheets rather than a dashboard-style format – but as pointed out in the comments below, you can optionally output the results in JSON.

Write Your Own with Python

Get Started With:  https://github.com/robincamille/ezproxy-analysis/blob/master/ezp-analysis.py

Best used with: SPU logs

Code / Framework:  Python

code

Screenshot of a Python script, available at Robin Davis’ Github

 

Custom Script Pros:

  • You will have total control over what data you care about. DIY analyzers are usually written up because you’re looking to answer a specific question, such as “How many connections come from within the Library?”
  • You will become very familiar with the data! As librarians in an age of user tracking, we need to have a very good grasp of the kinds of data that our various services collect from our patrons, like IP addresses.
  • If your script is fairly simple, it should run quickly. Robin’s script took 5 minutes to analyze almost 6 years of SPU logs.
  • Your output will probably be a CSV, a flexible and useful data format, but could be any format your heart desires. You could even integrate Python libraries like Plotly to generate beautiful charts in addition to tabular data.
  • If you use Python for other things in your day-to-day, analyzing structured data is a fun challenge. And you can impress your colleagues with your Pythonic abilities!

 

Action shot: running the script from the command line. (Source)

Action shot: running the script from the command line.

Custom Script Cons:

  • If you have not used Python to input/output files or analyze tables before, this could be challenging.
  • The easiest way to run the script is within an IDE or from the command line; if this is the case, it will likely only be used by you.
  • You will need to spend time ascertaining what’s what in the logs.
  • If you choose to output data in a CSV file, you’ll need more elbow grease to turn the data into a beautiful collection of charts and graphs.
output

Output of the sample script is a labeled CSV that divides connections by locations and user type (student or faculty). (Source)

Splunk (Paid Option)

Best Used with:  Full Logs and SPU Logs

Get It (as a free trial):  http://www.splunk.com/download

Code / Framework:  Various, including Python

A Splunk distribution showing traffic by days of the week.  You can choose to visualize this data in several formats, such as a bar chart or scatter plot.  Notice that this chart was generated by a syntactical query in the upper left corner:  host=lmagnuson| top limit=20 date_wday

A Splunk distribution showing traffic by days of the week. You can choose to visualize this data in several formats, such as a bar chart or scatter plot. Notice that this chart was generated by a syntactical query in the upper left corner: host=lmagnuson| top limit=20 date_wday

Splunk Pros:  

  • Easy to use interface, no scripting/command line required (although command line interfacing (CLI) is available)
  • Incredibly fast processing.  As soon as you import a file, splunk begins ingesting the file and indexing it for searching
  • It’s really strong in interactive searching.  Rather than relying on canned reports, you can dynamically and quickly search by keywords or structured queries to generate data and visualizations on the fly.
Here's a search for log entries containing a URL (digital.films.com), which Splunk uses to create a chart showing the hours of the day that this URL is being accessed.  This particular database is most popular around 4 PM.

Here’s a search for log entries containing a URL (digital.films.com), which Splunk uses to display a chart showing the hours of the day that this URL is being accessed. This particular database is most popular around 4 PM.

Splunk Cons:

    • It has a little bit of a learning curve, but it’s worth it for the kind of features and intelligence you can get from Splunk.
    • It’s the only paid option on this list.  You can try it out for 60 days with up to 500MB/day a day, and certain non-profits can apply to continue using Splunk under the 500MB/day limit.  Splunk can be used with any server access or error log, so a library might consider partnering with other departments on campus to purchase a license.2

What should you choose?

It depends on your needs, but AWStats is always a tried and true easy to install and maintain solution.  If you have the knowledge, a custom Python script is definitely better, but obviously takes time to test and develop.  If you have money and could partner with others on your campus (or just need a one-time report generated through a free trial), Splunk is very powerful, generates some slick-looking charts, and is definitely work looking into.  If there are other options not covered here, please let us know in the comments!

About our guest author: Robin Camille Davis is the Emerging Technologies & Distance Services Librarian at John Jay College of Criminal Justice (CUNY) in New York City. She received her MLIS from the University of Illinois Urbana-Champaign in 2012 with a focus in data curation. She is currently pursuing an MA in Computational Linguistics from the CUNY Graduate Center.

Notes
  1. Details about LogFormat and what each %/lettter value means can be found at http://www.oclc.org/support/services/ezproxy/documentation/cfg/logformat.en.html; LogSPU details can be found http://oclc.org/support/services/ezproxy/documentation/cfg/logspu.en.html
  2. Another paid option that offers a free trial, and comes with extensions made for parsing EZProxy logs, is Sawmill: https://www.sawmill.net/downloads.html

Using the Stripe API to Collect Library Fines by Accepting Online Payment

Recently, my library has been considering accepting library fines via online. Currently, many library fines of a small amount that many people owe are hard to collect. As a sum, the amount is significant enough. But each individual fines often do not warrant even the cost for the postage and the staff work that goes into creating and sending out the fine notice letter. Libraries that are able to collect fines through the bursar’s office of their parent institutions may have a better chance at collecting those fines. However, others can only expect patrons to show up with or to mail a check to clear their fines. Offering an online payment option for library fines is one way to make the library service more user-friendly to those patrons who are too busy to visit the library in person or to mail a check but are willing to pay online with their credit cards.

If you are new to the world of online payment, there are several terms you need to become familiar with. The following information from the article in SixRevisions is very useful to understand those terms.1

  • ACH (Automated Clearing House) payments: Electronic credit and debit transfers. Most payment solutions use ACH to send money (minus fees) to their customers.
  • Merchant Account: A bank account that allows a customer to receive payments through credit or debit cards. Merchant providers are required to obey regulations established by card associations. Many processors act as both the merchant account as well as the payment gateway.
  • Payment Gateway: The middleman between the merchant and their sponsoring bank. It allows merchants to securely pass credit card information between the customer and the merchant and also between merchant and the payment processor.
  • Payment Processor: A company that a merchant uses to handle credit card transactions. Payment processors implement anti-fraud measures to ensure that both the front-facing customer and the merchant are protected.
  • PCI (the Payment Card Industry) Compliance: A merchant or payment gateway must set up their payment environment in a way that meets the Payment Card Industry Data Security Standard (PCI DSS).

Often, the same company functions as both payment gateway and payment processor, thereby processing the credit card payment securely. Such a product is called ‘Online payment system.’ Meyer’s article I have cited above also lists 10 popular online payment systems: Stripe, Authorize.Net, PayPal, Google Checkout, Amazon Payments, Dwolla, Braintree, Samurai by FeeFighters, WePay, and 2Checkout. Bear in mind that different payment gateways, merchant accounts, and bank accounts may or may not work together, your bank may or may not work as a merchant account, and your library may or may not have a merchant account. 2

Also note that there are fees in using online payment systems like these and that different systems have different pay structures. For example, Authorize.net has the $99 setup fee and then charges $20 per month plus a $0.10 per-transaction fee. Stripe charges 2.9% + $0.30 per transaction with no setup or monthly fees. Fees for mobile payment solutions with a physical card reader such as Square may go up much higher.

Among various online payment systems, I picked Stripe because it was recommended on the Code4Lib listserv. One of the advantages for using Stripe is that it acts as both the payment gateway and the merchant account. What this means is that your library does not have to have a merchant account to accept payment online. Another big advantage of using Stripe is that you do not have to worry about the PCI compliance part of your website because the Stripe API uses a clever way to send the sensitive credit card information over to the Stripe server while keeping your local server, on which your payment form sits, completely blind to such sensitive data. I will explain this in more detail later in this post.

Below I will share some of the code that I have used to set up Stripe as my library’s online payment option for testing. This may be of interest to you if you are thinking about offering online payment as an option for your patrons or if you are simply interested in how an online payment API works. Even if your library doesn’t need to collect library fines via online, an online payment option can be a handy tool for a small-scale fund-raising drive or donation.

The first step to take to make Stripe work is getting an API keys. You do not have to create an account to get API keys for testing. But if you are going to work on your code more than one day, it’s probably worth getting an account. Stripe API has excellent documentation. I have read ‘Getting Started’ section and then jumped over to the ‘Examples’ section, which can quickly get you off the ground. (https://stripe.com/docs/examples) I found an example by Daniel Schröter in GitHub from the list of examples in the Stripe’s Examples section and decided to test out. (https://github.com/myg0v/Simple-Bootstrap-Stripe-Payment-Form) Most of the time, getting an example code requires some probing and tweaking such as getting all the required library downloaded and sorting out the paths in the code and adding API keys. This one required relatively little work.

Now, let’s take a look at the form that this code creates.

borrowedcode

In order to create a form of my own for testing, I decided to change a few things in the code.

  1. Add Patron & Payment Details.
  2. Allow custom amount for payment.
  3. Change the currency from Euro to US dollars.
  4. Configure the validation for new fields.
  5. Hide the payment form once the charge goes through instead of showing the payment form below the payment success message.

html

4. can be done as follows. The client-side validation is performed by Bootstrapvalidator jQuery Plugin. So you need to get the syntax correct to get the code, which now has new fields, to work properly.
validator

This is the Javascript that allows you to send the data submitted to your payment form to the Stripe server. First, include the Stripe JS library (line 24). Include JQuery, Bootstrap, Bootstrap Form Helpers plugin, and Bootstrap Validator plugin (line 25-28). The next block of code includes an event handler for the form, which send the payment information to the Stripe via AJAX when the form is submitted. Stripe will validate the payment information and then return a token that identifies this particular transaction.

jspart

When the token is received, this code calls for the function, stripeResponseHandler(). This function, stripeResponseHandler() checks if the Stripe server did not return any error upon receiving the payment information and, if no error has been returned, attaches the token information to the form and submits the form.

jspart2

The server-side PHP script then checks if the Stripe token has been received and, if so, creates a charge to send it to Stripe as shown below. I am using PHP here, but Stripe API supports many other languages than PHP such as Ruby and Python. So you have many options. The real payment amount appears here as part of the charge array in line 326. If the charge succeeds, the payment success message is stored in a div to be displayed.

phppart

The reason why you do not have to worry about the PCI compliance with Stripe is that Stripe API asks to receive the payment information via AJAX and the input fields of sensitive information does not have the name attribute and value. (See below for the Card Holder Name and Card Number information as an example; Click to bring up the clear version of the image.)  By omitting the name attribute and value, the local server where the online form sits is deprived of any means to retrieve the information in those input fields submitted through the form. Since sensitive information does not touch the local server at all, PCI compliance for the local server becomes no concern. To clarify, not all fields in the payment form need to be deprived of the name attribute. Only the sensitive fields that you do not want your web server to have access to need to be protected this way. Here, for example, I am assigning the name attribute and value to fields such as name and e-mail in order to use them later to send a e-mail receipt.

(NB. Please click images to see the enlarged version.)

Screen Shot 2014-08-17 at 8.01.08 PM

Now, the modified form has ‘Fee Category’, custom ‘Payment Amount,’ and some other information relevant to the billing purpose of my library.

updated

When the payment succeeds, the page changes to display the following message.

success

Stripe provides a number of fake card numbers for testing. So you can test various cases of failures. The Stripe website also displays all payments and related tokens and charges that are associated with those payments. This greatly helps troubleshooting. One thing that I noticed while troubleshooting is that Stripe logs sometimes do lag behind. That is, when a payment would succeed, associated token and charge may not appear under the “Logs” section immediately. But you will see the payment shows up in the log. So you will know that associated token and charge will eventually appear in the log later.

recent_payment

Once you are ready to test real payment transactions, you need to flip the switch from TEST to LIVE located on the top left corner. You will also need to replace your API keys for ‘TESTING’ (both secret and public) with those for ‘LIVE’ transaction. One more thing that is needed before making your library getting paid with real money online is setting up SSL (Secure Sockets Layer) for your live online payment page. This is not required for testing but necessary for processing live payment transactions. It is not a very complicated work. So don’t be discouraged at this point. You just have to buy a security certificate and put it in your Web server. Speak to your system administrator for how to get the SSL set up for your payment page. More information about setting up SSL can be found in the Stripe documentation I just linked above.

My library has not yet gone live with this online payment option. Before we do, I may make some more modifications to the code to fit the staff workflow better, which is still being mapped out. I am also planning to place the online payment page behind the university’s Shibboleth authentication in order to cut down spam and save some tedious data entry by library patrons by getting their information such as name, university email, student/faculty/staff ID number directly from the campus directory exposed through Shibboleth and automatically inserting it into the payment form fields.

In this post, I have described my experience of testing out the Stripe API as an online payment solution. As I have mentioned above, however, there are many other online payment systems out there. Depending your library’s environment and financial setup, different solutions may work better than others. To me, not having to worry about the PCI compliance by using Stripe was a big plus. If your library accepts online payment, please share what solution you chose and what factors led you to the particular online payment system in the comments.

* This post has been based upon my recent presentation, “Accepting Online Payment for Your Library and ‘Stripe’ as an Example”, given at the Code4Lib DC Unconference. Slides are available at the link above.

Notes
  1. Meyer, Rosston. “10 Excellent Online Payment Systems.” Six Revisions, May 15, 2012. http://sixrevisions.com/tools/online-payment-systems/.
  2. Ullman, Larry. “Introduction to Stripe.” Larry Ullman, October 10, 2012. http://www.larryullman.com/2012/10/10/introduction-to-stripe/.

Please Don’t Kill the Web: A Screed on Performance & Austerity

Understanding web performance is important for everyone, whether you’re a front-end web developer working with HTML and CSS, an application developer writing services that will interact with the web, or a content author who writes for the web without using code. While there are only a few simple tricks to making web pages load quickly, they’re unfortunately eschewed all too often.

Libraries can ill afford to alienate anyone in their communities. We aim to be open institutions that welcome anyone to use our services, regardless of race, gender, class, creed, ability, or anything else. Yet when we make websites that work only for high-powered desktop computers with broadband connections, we privilege the wealthy. Design a slow enough website and even laptops on decent wireless connections may struggle to load a site in a timely fashion. But what about people who only own smart phones or people who live in rural areas where dial-up is the only option? Poor web performance renders sites unusable for some and frustrating for all.

But what can I do?!?

Below, I’ll elaborate on why certain practices make sites faster, but I want to outline some actions you can take to immediately speed up your sites.

First of all, test your main web properties to identify where the pain points are. Google Pagespeed and Yahoo’s YSlow are two good services that not only spot a broad range of potential problems but provide easy instructions for ameliorating them. They both output letter grades but it’s not worth worrying about getting an A; simply find the laggard sites and look for low-hanging fruit to fix.

Speaking of low-hanging fruit, I’m going to guess you have images on your site. In fact, images make up the bulk of most websites’ download size.1 So if you’re a content author uploading images to a site, ensuring that their file sizes are as small as possible goes a long way to speeding up your site. First of all, ask if the image is really necessary. While images certainly aid a site’s visual appeal, so does clean design with thoughtful colors and highlighted calls to action. It’s becoming easier and easier to avoid images too, as CSS advances and techniques like SVG and icon fonts become more popular. In general, if an image isn’t a logo or picture, it can be replaced through those technologies with less of a performance impact.

If you’re certain an image is necessary, ensure that you’re uploading an appropriately sized one. If you upload a 1200×900 image which will be scaled down to 400×300 inside an <img> tag, change the dimensions before uploading. Lastly, run the image through an optimizer that removes useless gunk from the file (this includes metadata and unused color profiles). ImageOptim is a great choice on Mac. I don’t use Windows so I can’t vouch for these, but FileOptimizer and RIOT (Radical Image Optimization Tool) appear to be popular choices for that platform.

What’s the second largest source of bytes in a website? JavaScript. Scripts are exponentially smaller than images but a prime opportunity to trim download size. You should always minify your JavaScript before putting it on a live site. What’s minification? It takes all the functionally useless bytes out of a file, producing text that’s operational but smaller. Here’s an example script before and after minification.

Before (236 bytes):

// add a proxy server prefix to all links with class=proxied
var links = document.querySelectorAll('a[href].proxied');

Array.prototype.forEach.call(links, function (link) {
    var proxy = 'https://proxy.example.com/login?url='
        , newHref = proxy + link.href;

    link.href = newHref;
});

After (169 bytes):

var links=document.querySelectorAll("a[href].proxied");Array.prototype.forEach.call(links,function(r){var e="https://proxy.example.com/login?url=",l=e+r.href;r.href=l});

By running my script through UglifyJS, I reduced its size 28%. We can see where Uglify applied its tricks: it removed a comment, removed whitespace (including line breaks), and changed my local variable names to single letters. Yet my script’s actions are the same, thus there’s no reason to deliver the larger, human-readable version to a browser.

Minifying JavaScript is a best practice, but CSS can also be minified. CSS benefits primarily through whitespace removal because selectors and properties must all remain the same. While it may not be an option in situations where your pages are served up by a CMS or vendor application, HTML can benefit substantially from having comments removed, quotes taken off of attributes, and even closing tags erased (perfectly valid under HTML5). While the outcome is ugly, the vast majority of your users won’t peak at your source code but will appreciate the quicker page load.

There’s one last optimization that most content authors or front-end web developers can do: combine like files. Rather than make the browser request several separate scripts and stylesheets, combine your assets into as few files as possible. Saving HTTP requests helps sites load quickly, especially over cellular connections. While it’s possible to minify and combine files one-by-one by pasting them into online tools, ideally you have a development workflow that employs a tool like Grunt or Gulp to minify your text-based resources and concatenate them into one big package. 2

When going through one’s scripts and stylesheets looking for optimizations, it’s again pertinent to ask how much is truly necessary. Yes, you can accomplish some gorgeous things with just a little code, slick modal dialogs and luscious fly-out menus. But is that really what your website needs? Are users struggling because your nav bar doesn’t collapse and stick to the top of the page as you scroll, or because your link text is obscure? Throwing scripts at a site can create more issues than it resolves.

But what can my server admin do?!?

If you have access to your library’s servers, then you’ll be able to change a few settings that will make broad improvements. If not, try emailing your local server admin to see if they can make these alterations.

One of the easiest, and biggest, improvements you can make is to turn on compression. With compression on, your server packs up requests into smaller files (think .zip archives) before sending them over the web. This makes a huge difference with most web assets because they’re repetitive plain text which is very amenable to compression algorithms. The best part is that turning on compression is typically a matter of copy-pasting a few lines into a server configuration file. If you’re unsure of how to do it, see the HTML5 Boilerplate Server Configs project. If you’re not certain your site is using compression, try the aptly-named GzipWTF.

What’s the only thing faster than sending a minified, compressed file? Not having to send that file at all. Browsers cache assets if they’re told to, meaning that repeat visits to your site won’t incur repeat downloads. This significantly increases speed if a visitor spends a lot of time on your site. Caching is, again, a matter of inserting a few configuration lines. It can be a little tricky, however, because when you update cached files you want users to download the new version, bypassing their local cache. One work-around is filename-based versioning. 3

Don’t have server access? Or are your servers taxed to the breaking point? Popular front-end resources are often available over Content Delivery Networks (CDNs). These CDNs have servers spread across the globe whose only job is to serve up static assets which get cached in browsers. Because many sites use them, users may already have assets cached in their browser (e.g. if they’ve visited Reddit recently, they have version 2.1.1. of the jQuery JavaScript library from a popular CDN run by Google). While Google and Microsoft run well-known CDNs for a few popular libraries, the lesser-known jsDelivr, CDNJS, and Bootstrap CDN give you even greater options.

The Basic Rules

I’ve avoided the theory behind web performance thus far, but not because it’s complex. There are only two primary rules: send fewer bytes and send fewer requests. So given a choice between two files that fulfill the same function, choose the smaller one (the logic behind minification and compression). If you can combine multiple files into one or avoiding sending something altogether, that saves requests (the logic behind concatenation and caching).

Sending fewer bytes makes intuitive sense; we know that larger files take longer to download. Not only that, we’ve all probably visually witnessed the effects of file size as manifested in progress bars or massive images slowly tiling into place. The benefits of concatenation, however, may seem peculiar. Why would sending one 50kb file be quicker than sending two 25kb files, or five 10kb files? It turns out that there are inefficient redundancies across multiple HTTP requests. A request has a few stages:

  • A DNS lookup translates a domain like “acrl.ala.org” into an IP address like “38.69.5.149”
  • The client and server establish a connection with a TCP handshake
  • The server receives the request and determines what to send back
  • Finally, the data is pushed out over the network

Throughout these steps there may be a great amount of latency as data travels through the air and over wires. All of that latency is repeated with each request, making multiple requests slower. So even if we serve all our assets from the same domain, thus requiring only one DNS request, and configure our server to not repeat the TCP connection on each request, there’s still additional overhead to multiple requests.

Here’s a visual explanation of this phenomenon:

While the DNS, TCP, and download times are identical for two scripts sent independently and the product of their concatenation, the total “Time To First Byte” (analogous to latency) is greater with the dual requests.

Incidentally, the next version of the HTTP specification—HTTP 2.0, currently active as the SPDY protocol—features request multiplexing. The ability to pack multiple requests into one connection virtually eliminates the need for concatenation. It’s a long way off though, since the spec hasn’t been finalized and it will need to be implemented in a backwards-compatible way in both web browsers and servers.

Don’t Sweat What You Can’t Control

Here’s a sad truth: a lot of your library’s web properties probably have terrible performance and there’s little you can do about it. Libraries rely heavily upon third-party websites that may give you a box for pasting custom HTML into, but no or limited control over page templates. Since we’re purchasing a web site and not a web service, we trade convenience for quality. 4 Perhaps because making a generically useful website is really tough, or perhaps because they just don’t know, most of these vendor sites ignore best practices. Similarly, we tend to reuse large open source applications without auditing their performance characteristics. I began listing some of the more egregious offenders, but the list became so long that I’ve put it in a separate appendix below.

There’s More

There’s a lot more to web performance, including avoiding redirects, combining images into sprites, and putting scripts down towards the end of the <body> tag. An entire field has grown up around the topic, spurred on by the rising complexity of web applications and the prominence of mobile devices. Google’s Make the Web Faster project is great place to learn more, as is Steve Souder’s dated but classic High Performance Web Sites. The main takeaway, however, is that understanding a few tenets of web performance can make a difference. By slimming your images and combining your CSS, you can make your sites accessible to more devices and quicker for everyone.

Appendix A: The Wall of Performance Shame

I don’t mean to imply that any of these software packages are poorly made or bad choices. In fact, I’m quite fond of many of them. But run a few through tools like Google Pagespeed and it quickly becomes apparent that even the easiest optimizations aren’t being applied.


LibGuides 1.0 loads all its 5 scripts in the <head> and includes two large utility libraries in jQuery and Dojo, which I struggle to believe is necessary. LibGuides 2.0 does the right thing in relying on CDNs to serve up a few libraries, but still includes 5 scripts, two of which are unminified (despite having “.min” in the name).

The EQUELLA digital repository software loads 11 CSS files and and 14 scripts in the <head> of the document. I didn’t bother to check them all, but the ones I did view are unminified (e.g. they’ve left jQuery at full size, which increases the download 154kb for no good reason). This essentially crushes the page load on all but desktop devices with decent bandwidth; I cannot imagine a user enduring the wait on a phone’s 3G connection.

DSpace loads 9 stylesheets, most of which appear unminified, and 5 scripts (which, luckily, are minified) so far up in the <head> of the document that the <title> of the page loads noticeably slow on a phone.

VuFind loads 10 scripts in a row in the <head>; these could be easily combined.

Thinking a little broader beyond library-specific software, Drupal could do a better job concatenating and minifying resources. While the performance tools have a simple checkbox for combining scripts, by default Drupal will still load multiple (3-6 stylesheets and scripts) resources on each page. The JavaScript and CSS that come with many contributed modules tend to not be minified for some reason and Drupal sometimes doesn’t process them itself. I don’t know the best practices for providing client-side assets with a module but many authors seem to be ignoring performance.

On a positive note, the Blacklight discovery layer does a great job of combining all CSS and JavaScript into one file apiece, probably due to the fact it’s built with Ruby on Rails which provides some nice tools for conforming to best practices. The Koha ILS also appears to keep the number of files down and pushes most of its scripts to the bottom of the document.

Notes

  1. The HTTP Archive is a wonderful source for data like this.
  2. These development workflows are a bit much to go into at the moment, but trust me that they’re not too difficult to set up. There are template configurations which you can use that make it relatively painless to simply point a piece of software at a folder full of scripts and watch an optimized output appear in a prescribed destination. If there’s interest in another blog post with more details, leave me a note in the comments and I’ll happily write one.
  3. This Stack Overflow answer gives a decent overview of filename-based versioning with some further reading. Google’s Make the Web Faster project has a nice article on how HTTP caching works.
  4. When I say “web service” I mean platforms which expose all their functionality through an API, which allows one to write a custom interface layer on top. That’s often a lot of work but would provide total control over things like the number and order of resources loaded in the HTML.

Bootstrap Responsibly

Bootstrap is the most popular front-end framework used for websites. An estimate by meanpath several months ago sat it firmly behind 1% of the web – for good reason: Bootstrap makes it relatively painless to puzzle together a pretty awesome plug-and-play, component-rich site. Its modularity is its key feature, developed so Twitter could rapidly spin-up internal microsites and dashboards.

Oh, and it’s responsive. This is kind of a thing. There’s not a library conference today that doesn’t showcase at least one talk about responsive web design. There’s a book, countless webinars, courses, whole blogs dedicated to it (ahem), and more. The pressure for libraries to have responsive, usable websites can seem to come more from the likes of us than from the patronbase itself, but don’t let that discredit it. The trend is clear and it is only a matter of time before our libraries have their mobile moment.

Library websites that aren’t responsive feel dated, and more importantly they are missing an opportunity to reach a bevy of mobile-only users that in 2012 already made up more than a quarter of all web traffic. Library redesigns are often quickly pulled together in a rush to meet the growing demand from stakeholders, pressure from the library community, and users. The sprint makes the allure of frameworks like Bootstrap that much more appealing, but Bootstrapped library websites often suffer the cruelest of responsive ironies:

They’re not mobile-friendly at all.

Assumptions that Frameworks Make

Let’s take a step back and consider whether using a framework is the right choice at all. A front-end framework like Bootstrap is a Lego set with all the pieces conveniently packed. It comes with a series of templates, a blown-out stylesheet, scripts tuned to the environment that let users essentially copy-and-paste fairly complex web-machinery into being. Carousels, tabs, responsive dropdown menus, all sorts of buttons, alerts for every occasion, gorgeous galleries, and very smart decisions made by a robust team of far-more capable developers than we.

Except for the specific layout and the content, every Bootstrapped site is essentially a complete organism years in the making. This is also the reason that designers sometimes scoff, joking that these sites look the same. Decked-out frameworks are ideal for rapid prototyping with a limited timescale and budget because the design decisions have by and large already been made. They assume you plan to use the framework as-is, and they don’t make customization easy.

In fact, Bootstrap’s guide points out that any customization is better suited to be cosmetic than a complete overhaul. The trade-off is that Bootstrap is otherwise complete. It is tried, true, usable, accessible out of the box, and only waiting for your content.

Not all Responsive Design is Created Equal

It is still common to hear the selling point for a swanky new site is that it is “responsive down to mobile.” The phrase probably rings a bell. It describes a website that collapses its grid as the width of the browser shrinks until its layout is appropriate for whatever screen users are carrying around. This is kind of the point – and cool, as any of us with a browser-resizing obsession could tell you.

Today, “responsive down to mobile” has a lot of baggage. Let me explain: it represents a telling and harrowing ideology that for these projects mobile is the afterthought when mobile optimization should be the most important part. Library design committees don’t actually say aloud or conceive of this stuff when researching options, but it should be implicit. When mobile is an afterthought, the committee presumes users are more likely to visit from a laptop or desktop than a phone (or refrigerator). This is not true.

See, a website, responsive or not, originally laid out for a 1366×768 desktop monitor in the designer’s office, wistfully depends on visitors with that same browsing context. If it looks good in-office and loads fast, then looking good and loading fast must be the default. “Responsive down to mobile” is divorced from the reality that a similarly wide screen is not the common denominator. As such, responsive down to mobile sites have a superficial layout optimized for the developers, not the user.

In a recent talk at An Event Apart–a conference–in Atlanta, Georgia, Mat Marquis stated that 72% of responsive websites send the same assets to mobile sites as they do desktop sites, and this is largely contributing to the web feeling slower. While setting img { width: 100%; } will scale media to fit snugly to the container, it is still sending the same high-resolution image to a 320px-wide phone as a 720px-wide tablet. A 1.6mb page loads differently on a phone than the machine it was designed on. The digital divide with which librarians are so familiar is certainly nowhere near closed, but while internet access is increasingly available its ubiquity doesn’t translate to speed:

  1. 50% of users ages 12-29 are “mostly mobile” users, and you know what wireless connections are like,
  2. even so, the weight of the average website ( currently 1.6mb) is increasing.

Last December, analysis of data from pagespeed quantiles during an HTTP Archive crawl tried to determine how fast the web was getting slower. The fastest sites are slowing at a greater rate than the big bloated sites, likely because the assets we send–like increasingly high resolution images to compensate for increasing pixel density in our devices–are getting bigger.

The havoc this wreaks on the load times of “mobile friendly” responsive websites is detrimental. Why? Well, we know that

  • users expect a mobile website to load as fast on their phone as it does on a desktop,
  • three-quarters of users will give up on a website if it takes longer than 4 seconds to load,
  • the optimistic average load time for just a 700kb website on 3G is more like 10-12 seconds

eep O_o.

A Better Responsive Design

So there was a big change to Bootstrap in August 2013 when it was restructured from a “responsive down to mobile” framework to “mobile-first.” It has also been given a simpler, flat design, which has 100% faster paint time – but I digress. “Mobile-first” is key. Emblazon this over the door of the library web committee. Strike “responsive down to mobile.” Suppress the record.

Technically, “mobile-first” describes the structure of the stylesheet using CSS3 Media Queries, which determine when certain styles are rendered by the browser.

.example {
  styles: these load first;
}

@media screen and (min-width: 48em) {

  .example {

    styles: these load once the screen is 48 ems wide;

  }

}

The most basic styles are loaded first. As more space becomes available, designers can assume (sort of) that the user’s device has a little extra juice, that their connection may be better, so they start adding pizzazz. One might make the decision that, hey, most of the devices less than 48em (720px approximately with a base font size of 16px) are probably touch only, so let’s not load any hover effects until the screen is wider.

Nirvana

In a literal sense, mobile-first is asset management. More than that, mobile-first is this philosophical undercurrent, an implicit zen of user-centric thinking that aligns with libraries’ missions to be accessible to all patrons. Designing mobile-first means designing to the lowest common denominator: functional and fast on a cracked Blackberry at peak time; functional and fast on a ten year old machine in the bayou, a browser with fourteen malware toolbars trudging through the mire of a dial-up connection; functional and fast [and beautiful?] on a 23″ iMac. Thinking about the mobile layout first makes design committees more selective of the content squeezed on to the front page, which makes committees more concerned with the quality of that content.

The Point

This is the important statement that Bootstrap now makes. It expects the design committee to think mobile-first. It comes with all the components you could want, but they want you to trim the fat.

Future Friendly Bootstrapping

This is what you get in the stock Bootstrap:

  • buttons, tables, forms, icons, etc. (97kb)
  • a theme (20kb)
  • javascripts (30kb)
  • oh, and jQuery (94kb)

That’s almost 250kb of website. This is like a browser eating a brick of Mackinac Island Fudge – and this high calorie bloat doesn’t include images. Consider that if the median load time for a 700kb page is 10-12 seconds on a phone, half that time with out-of-the-box Bootstrap is spent loading just the assets.

While it’s not totally deal-breaking, 100kb is 5x as much CSS as an average site should have, as well as 15%-20% of what all the assets on an average page should weigh. Josh Broton

To put this in context, I like to fall back on Ilya Girgorik’s example comparing load time to user reaction in his talk “Breaking the 1000ms Time to Glass Mobile Barrier.” If the site loads in just 0-100 milliseconds, this feels instant to the user. By 100-300ms, the site already begins to feel sluggish. At 300-1000ms, uh – is the machine working? After 1 second there is a mental context switch, which means that the user is impatient, distracted, or consciously aware of the load-time. After 10 seconds, the user gives up.

By choosing not to pair down, your Bootstrapped Library starts off on the wrong foot.

The Temptation to Widgetize

Even though Bootstrap provides modals, tabs, carousels, autocomplete, and other modules, this doesn’t mean a website needs to use them. Bootstrap lets you tailor which jQuery plugins are included in the final script. The hardest part of any redesign is to let quality content determine the tools, not the ability to tabularize or scrollspy be an excuse to implement them. Oh, don’t Google those. I’ll touch on tabs and scrollspy in a few minutes.

I am going to be super presumptuous now and walk through the total Bootstrap package, then make recommendations for lightening the load.

Transitions

Transitions.js is a fairly lightweight CSS transition polyfill. What this means is that the script checks to see if your user’s browser supports CSS Transitions, and if it doesn’t then it simulates those transitions with javascript. For instance, CSS transitions often handle the smooth, uh, transition between colors when you hover over a button. They are also a little more than just pizzazz. In a recent article, Rachel Nabors shows how transition and animation increase the usability of the site by guiding the eye.

With that said, CSS Transitions have pretty good browser support and they probably aren’t crucial to the functionality of the library website on IE9.

Recommendation: Don’t Include.

 Modals

“Modals” are popup windows. There are plenty of neat things you can do with them. Additionally, modals are a pain to design consistently for every browser. Let Bootstrap do that heavy lifting for you.

Recommendation: Include

Dropdown

It’s hard to conclude a library website design committee without a lot of links in your menu bar. Dropdown menus are kind of tricky to code, and Bootstrap does a really nice job keeping it a consistent and responsive experience.

Recommendation: Include

Scrollspy

If you have a fixed sidebar or menu that follows the user as they read, scrollspy.js can highlight the section of that menu you are currently viewing. This is useful if your site has a lot of long-form articles, or if it is a one-page app that scrolls forever. I’m not sure this describes many library websites, but even if it does, you probably want more functionality than Scrollspy offers. I recommend jQuery-Waypoints – but only if you are going to do something really cool with it.

Recommendation: Don’t Include

Tabs

Tabs are a good way to break-up a lot of content without actually putting it on another page. A lot of libraries use some kind of tab widget to handle the different search options. If you are writing guides or tutorials, tabs could be a nice way to display the text.

Recommendation: Include

Tooltips

Tooltips are often descriptive popup bubbles of a section, option, or icon requiring more explanation. Tooltips.js helps handle the predictable positioning of the tooltip across browsers. With that said, I don’t think tooltips are that engaging; they’re sometimes appropriate, but you definitely use to see more of them in the past. Your library’s time is better spent de-jargoning any content that would warrant a tooltip. Need a tooltip? Why not just make whatever needs the tooltip more obvious O_o?

Recommendation: Don’t Include

Popover

Even fancier tooltips.

Recommendation: Don’t Include

Alerts

Alerts.js lets your users dismiss alerts that you might put in the header of your website. It’s always a good idea to give users some kind of control over these things. Better they read and dismiss than get frustrated from the clutter.

Recommendation: Include

Collapse

The collapse plugin allows for accordion-style sections for content similarly distributed as you might use with tabs. The ease-in-ease-out animation triggers motion-sickness and other aaarrghs among users with vestibular disorders. You could just use tabs.

Recommendation: Don’t Include

Button

Button.js gives a little extra jolt to Bootstrap’s buttons, allowing them to communicate an action or state. By that, imagine you fill out a reference form and you click “submit.” Button.js will put a little loader icon in the button itself and change the text to “sending ….” This way, users are told that the process is running, and maybe they won’t feel compelled to click and click and click until the page refreshes. This is a good thing.

Recommendation: Include

Carousel

Carousels are the most popular design element on the web. It lets a website slideshow content like upcoming events or new material. Carousels exist because design committees must be appeased. There are all sorts of reasons why you probably shouldn’t put a carousel on your website: they are largely inaccessible, have low engagement, are slooooow, and kind of imply that libraries hate their patrons.

Recommendation: Don’t Include.

Affix

I’m not exactly sure what this does. I think it’s a fixed-menu thing. You probably don’t need this. You can use CSS.

Recommendation: Don’t Include

Now, Don’t You Feel Better?

Just comparing the bootstrap.js and bootstrap.min.js files between out-of-the-box Bootstrap and one tailored to the specs above, which of course doesn’t consider the differences in the CSS, the weight of the images not included in a carousel (not to mention the unquantifiable amount of pain you would have inflicted), the numbers are telling:

File Before After
bootstrap.js 54kb 19kb
bootstrap.min.js 29kb 10kb

So, Bootstrap Responsibly

There is more to say. When bouncing this topic around twitter awhile ago, Jeremy Prevost pointed out that Bootstrap’s minified assets can be GZipped down to about 20kb total. This is the right way to serve assets from any framework. It requires an Apache config or .htaccess rule. Here is the .htaccess file used in HTML5 Boilerplate. You’ll find it well commented and modular: go ahead and just copy and paste the parts you need. You can eke out even more performance by “lazy loading” scripts at a given time, but these are a little out of the scope of this post.

Here’s the thing: when we talk about having good library websites we’re mostly talking about the look. This is the wrong discussion. Web designs driven by anything but the content they already have make grasping assumptions about how slick it would look to have this killer carousel, these accordions, nifty tooltips, and of course a squishy responsive design. Subsequently, these responsive sites miss the point: if anything, they’re mobile unfriendly.

Much of the time, a responsive library website is used as a marker that such-and-such site is credible and not irrelevant, but as such the website reflects a lack of purpose (e.g., “this website needs to increase library-card registration). A superficial understanding of responsive webdesign and easy-to-grab frameworks entail that the patron is the least priority.

 

About Our Guest Author :

Michael Schofield is a front-end librarian in south Florida, where it is hot and rainy – always. He tries to do neat things there. You can hear him talk design and user experience for libraries on LibUX.