Farewell

As is probably evident from the lack of recent posts, ACRL Tech Connect is no longer publishing. We had a good run, publishing 221 posts between 2011 and 2019, and are very proud of the work we did. Two of the founding members, Bohyun Kim and Margaret Heller, wrote for the blog right up until the end. In the last couple years, we struggled to recruit new writers and balance writing for Tech Connect with meeting our other professional commitments, so alas it is time to shut down this blog.

We greatly appreciate everyone who contributed to Tech Connect over the years, whether it was through being an author and editor, guest writer, or through their thoughtful commentary and discussion. We are grateful to ACRL for hosting the blog and the other support they provided. We will leave the content up as long as ACRL sees fit to host it.

~ Eric Phetteplace, Editor

Broken Links in the Discovery Layer—Pt. I: Researching a Problem

Like many administrators of discovery layers, I’m constantly baffled and frustrated when users can’t access full text results from their searches. After implementing Summon, we heard a few reports of problems and gradually our librarians started to stumble across them on their own. At first, we had no formal system for tracking these errors. Eventually, I added a script which inserted a “report broken link” form into our discovery layer’s search results. 1 I hoped that collecting reported problems and then reporting then would identify certain systemic issues that could be resolved, ultimately leading to fewer problems. Pointing out patterns in these errors to vendors should lead to actual progress in terms of user experience.

From the broken links form, I began to cull some data on the problem. I can tell you, for instance, which destination databases experience the most problems or what the character of the most common problems is. The issue is the sample bias—are the problems that are reported really the most common? Or are they just the ones that our most diligent researchers (mostly our librarians, graduate students, and faculty) are likely to report? I long for quantifiable evidence of the issue without this bias.

How I classify the broken links that have been reported via our form. N = 57

Select Searches & Search Results

So how would one go about objectively studying broken links in a discovery layer? The first issue to solve is what searches and search results to review. Luckily, we have data on this—we can view in our analytics what the most popular searches are. But a problem becomes apparent when one goes to review those search terms:

  • artstor
  • hours
  • jstor
  • kanopy

Of course, the most commonly occurring searches tend to be single words. These searches all trigger “best bet” or database suggestions that send users directly to other resources. If their result lists do contain broken links, those links are unlikely to ever be visited, making them a poor choice for our study. If I go a little further into the set of most common searches, I see single-word subject searches for “drawing” followed by some proper nouns (“suzanne lacy”, “chicago manual of style”). These are better since it’s more likely users actually select items from their results but still aren’t a great representation of all the types of searches that occur.

Why are these types of single-word searches not the best test cases? Because search phrases necessarily have a long tail distribution; the most popular searches aren’t that popular in the context of the total quantity of searches performed 2. There are many distinct search queries that were only ever executed once. Our most popular search of “artstor”? It was executed 122 times over the past two years. Yet we’ve had somewhere near 25,000 searches in the past six months alone. This supposedly popular phrase has a negligible share of that total. Meanwhile, just because a search for “How to Hack it as a Working Parent. Jaclyn Bedoya, Margaret Heller, Christina Salazar, and May Yan. Code4Lib (2015) iss. 28″ has only been run once doesn’t mean it doesn’t represent a type of search—exact citation search—that is fairly common and worth examining, since broken links during known item searches are more likely to be frustrating.

Even our 500 most popular searches evince a long tail distribution.

So let’s say we resolve the problem of which searches to choose by creating a taxonomy of search types, from single-word subjects to copy-pasted citations. 3 We can select a few real world samples of each type to use in our study. Yet we still haven’t decided which search results we’re going to examine! Luckily, this proves much easier to resolve. People don’t look very far down in the search results 4, rarely scrolling past the first “page” listed (Summon has an infinite scroll so there technically are no pages, but you get the idea). Only items within the first ten results are likely to be selected.

Once we have our searches and know that we want to examine only the first ten or so results, my next thought is that it might be worth filtering our results that are unlikely to have problems. But does skipping the records from our catalog, institutional repository, LibGuides, etc. make other problems abnormally more apparent? After all, these sorts of results are likely to work since we’re providing direct links to the Summon link. Also, our users do not heavily employ facets—they would be unlikely to filter out results from the library catalog. 5 In a way, by focusing a study on search results that are the most likely to fail and thus give us information about underlying linking issues, we’re diverging away from the typical search experience. In the end, I think it’s worthwhile to stay true to more realistic search patterns and not apply, for instance, a “Full Text Online” filter which would exclude our library catalog.

Next Time on Tech Connect—oh how many ways can things go wrong?!? I’ll start investigating broken links and attempt to enumerate their differing natures.

Notes

  1. This script was largely copied from Robert Hoyt of Fairfield University, so all credit due to him.
  2. For instance, see: Beitzel, S. M., Jensen, E. C., Chowdhury, A., Frieder, O., & Grossman, D. (2007). Temporal analysis of a very large topically categorized web query log. Journal of the American Society for Information Science and Technology, 58(2), 166–178. “… it is clear that the vast majority of queries in an hour appear only one to five times and that these rare queries consistently account for large portions of the total query volume”
  3. Ignore, for the moment, that this taxonomy’s constitution is an entire field of study to itself.
  4. Pan, B., Hembrooke, H., Joachims, T., Lorigo, L., Gay, G., & Granka, L. (2007). In google we trust: Users’ decisions on rank, position, and relevance. Journal of Computer-Mediated Communication, 12(3), 801–823.
  5. In fact, the most common facet used in our discovery layer is “library catalog” showing that users often want only bibliographic records; the precise opposite of a search aimed at only retrieving article database results.

Managing ILS Updates

We’ve done a few screencasts in the past here at TechConnect and I wanted to make a new one to cover a topic that’s come up this summer: managing ILS updates. Integrated Library Systems are huge, unwieldy pieces of software and it can be difficult to track what changes with each update: new settings are introduced, behaviors change, bugs are (hopefully) fixed. The video belows shows my approach to managing this process and keeping track of ongoing issues with our Koha ILS.

Names are Hard

A while ago I stumbled onto the post “Falsehoods Programmers Believe About Names” and was stunned. Personal names are one of the most deceptively difficult forms of data to work with and this article touched on so many common but unaddressed problems. Assumptions like “people have exactly one canonical name” and “My system will never have to deal with names from China/Japan/Korea” were apparent everywhere. I consider myself a fairly critical and studious person, I devote time to thinking about the consequences of design decisions and carefully attempt to avoid poor assumptions. But I’ve repeatedly run into trouble when handling personal names as data. There is a cognitive dissonance surrounding names; we treat them as rigid identifiers when they’re anything but. We acknowledge their importance but struggle to take them as seriously.

Names change. They change due to marriage, divorce, child custody, adoption, gender identity, religious devotion, performance art, witness protection, or none of these at all. Sometimes people just want a new name. And none of these reasons for change are more or less valid than others, though our legal system doesn’t always treat them equally. We have students who change their legal name, which is often something systems expect, but then they have the audacity to want to change their username, too! And that works less often because all sorts of system integrations expect usernames to be persistent.

Names do not have a universal structure. There is no set quantity of components in a name nor an established order to those components. At my college, we have students without surnames. In almost all our systems, surname is a required field, so we put a period “.” there to satisfy that requirement. Then, on displays in our digital repository where surnames are assumed, we end up with bolded section headers like “., Johnathan” which look awkward.

Many Western names might follow a [Given name] – [Middle name] – [Surname] structure and an unfortunate number of the systems I have to deal with assume all names share this structure. It’s easy to see how this yields problematic results. For instance, if you want to a see a sorted list of users, you probably want to sort by family name, but many systems sort by the name in the last position causing, for instance, Chinese names 1 to be handled differently from Western ones. 2 But it’s not only that someone might not have a middle name, or might have two middle names, or might have a family name in the first position—no, even that would be too simple! Some name components defy simple classifications. I once met a person named “Bus Stop”. “Stop” is clearly not a family affiliation, despite coming in the final position of the name. Sometimes the second component of a tripartite Western name isn’t a middle name at all, but a maiden name or the second word of a two-word first name (e.g. “Mary Anne” or “Lady Bird”)! One cannot even determine by looking at a familiar structure the roles of all of a name’s pieces!

Names are also contextual. One’s name with family, with legal institutions, and with classmates can all differ. Many of our international students have alternative Westernized first names. Their family may call them Qiáng but they introduce themselves as Brian in class. We ask for a “preferred name” in a lot of systems, which is a nice step forward, but don’t ask when it’s preferred. Names might be meant for different situations. We have no system remotely ready for this, despite the personalization that’s been seeping into web platforms for decades.

So if names are such a trouble, why not do our best and move on? Aren’t these fringe cases that don’t affect the vast majority of our users? These issues simply cannot be ignored because names are vital. What one is called, even if it’s not a stable identifier, has great effects on one’s life. It’s dispiriting to witness one’s name misspelled, mispronounced, treated as an inconvenience, botched at every turn. A system that won’t adapt to suit a name delegitimizes the name. It says, “oh that’s not your real name” as if names had differing degrees of reality. But a person may have multiple names—or many overlapping names over time—and while one may be more institutionally recognized at a given time, none are less real than the others. If even a single student a year is affected, it’s the absolute least amount of respect we can show to affirm their name(s).

So what do we to do? Endless enumerations of the difficulties of working with names does little but paralyze us. Honestly, when I consider about the best implementation of personal names, the MODS metadata schema comes to mind. Having a <name> element with any number of <namePart> children is the best model available. The <namePart>s can be ordered in particular ways, a “@type” attribute can define a part’s function 3, a record can include multiple names referencing the same person, multiple names with distinct parts can be linked to the same authority record, etc. MODS has a flexible and comprehensive treatment of name data. Unfortunately, returning to “Falsehoods Programmers Believe”, none of the library systems I administer do anywhere near as good a job as this metadata schema. Nor is it necessarily a problem with Western bias—even the Chinese government can’t develop computer systems to accurately represent the names of people in the country, or even agree on what the legal character set should be! 4 It seems that programmers start their apps by creating a “users” database table with columns for unique identifier, username, “firstname”/”lastname” [sic], and work from there. On the bright side, the name isn’t used as the identifier at least! We all learned that in databases class but we didn’t learn to make “names” a separate table linked to “users” in our relational databases.

In my day-to-day work, the best I’ve done is to be sensitive to the importance of names changes specifically and how our systems handle them. After a few meetings with a cross-departmental team, we developed a name change process at our college. System administrators from across the institution are on a shared listserv where name changes are announced. In the libraries, I spoke with our frontline service staff about assisting with name changes. Our people at the circulation desk know to notice name discrepancies—sometimes a name badge has been updated but not our catalog records, we can offer to make them match—but also to guide students who may need to contact the registrar or other departments on campus to initiate the top-down name change process. While most of our the library’s systems don’t easily accommodate username changes, I can write administrative scripts for our institutional repository that alter the ownership of a set of items from an old username to a new one. I think it’s important to remember that we’re inconveniencing the user with the work of implementing their name change and not the other way around. So taking whatever extra steps we can do on our own, without pushing labor onto our students and staff, is the best way we can mitigate how poorly our tools are able to support the protean nature of personal names.

Notes

  1. Chinese names typically have the surname first, followed by the given name.
  2. Another poor implementation can be seen in The Chicago Manual of Style‘s indexing instructions, which has an extensive list of exceptions to the Western norm and how to handle them. But CMoS provides no guidance on how one would go about identifying a name’s cultural background or, for instance, identifying a compound surname.
  3. Although the MODS user guidelines sadly limit the use of the type attribute to a fixed list of values which includes “family” and “given”, rendering it subject to most of the critiques in this post. Substantially expanding this list with “maiden”, “patronymic/matronymic” (names based on a parental given name, e.g. Mikhailovich), and more, as well as some sort of open-ended “other” option, would be a great improvement.
  4. https://www.nytimes.com/2009/04/21/world/asia/21china.html

Reflections on Code4Lib 2018

A few members of Tech Connect attended the recent Code4Lib 2018 conference in Washington, DC. If you missed it, the full livestream of the conference is on the Code4Lib YouTube channel. We wanted to  highlight some of our favorite talks and tie them into the work we’re doing.

Also, it’s worth pointing to the Code4Lib community’s Statement in Support of opening keynote speaker Chris Bourg. Chris offered some hard truths in her speech that angry men on the internet, predictably, were unhappy about, but it’s a great model that the conference organizers and attendees promptly stood in support.


Ashley:

One of my favorite talks at Code4lib this year was Amy Wickner’s talk, “Web Archiving and You / Web Archiving and Us.” (Video, slides) I felt this talk really captured some of the essence of what I love most about Code4lib, this being my 4th conference in the past 5 years. (And I believe this was Amy’s first!). This talk was about a technical topic relevant to collecting libraries and handled in a way that acknowledges and prioritizes the essential personal component of any technical endeavor. This is what I found so wonderful about Amy’s talk and this is what I find so refreshing about Code4lib as an inherently technical conference with intentionality behind the human aspects of it.

Web archiving seems to be something of interest but seemingly overwhelming to begin to tackle. I mean, the internet is just so big. Amy brought forth a sort of proposal for ways in which a person or institution can begin thinking about how to start a web archiving project, focusing first on the significance of appraisal. Wickner, citing Terry Cook, spoke of the “care and feeding of archives” and thinking about appraisal as storytelling. I think this is a great way to make a big internet seem smaller, understanding the importance of care in appraisal while acknowledging that for web archiving, it is an essential practice. Representation in web archives is more likely to be chosen in the appraisal of web materials than in other formats historically.

This statement resonated with me: “Much of the power that archivists wield are in how we describe or create metadata that tells a story of a collection and its subjects.”

And also: For web archives, “the narrative of how they are built is closely tied to the stories they tell and how they represent the world.”

Wickner went on to discuss how web archives are and will be used, and who they will be used by, giving some examples but emphasizing there are many more, noting that we must learn to “critically read as much as learn to critically build” web archives, while acknowledging web archives exist both within and outside of institutions. And that for personal archiving, it can be as simple as replacing links in documents with perma.cc, Wayback Machine links, or WebRecorder links.

Another topic I enjoyed in this talk was the celebration of precarious web content through community storytelling on Twitter with the hashtags #VinesWithoutVines and #GifHistory, two brief but joyous moments.


Bohyun:

The part of this year’s Code4Lib conference that I found most interesting was the talks and the discussion at a breakout session related to machine learning and deep learning. Machine learning is a subfield of artificial intelligence and deep learning is a kind of machine learning that utilizes hidden layers between the input layer and the output layer in order to refine and produce the algorithm that best represents the result in the output. Once such algorithm is produced from the data in the training set, it can be applied to a new set of data to predict results. Deep learning has been making waves in many fields such as Go playing, autonomous driving, and radiology to name a few. There were a few different talks on this topic ranging from reference chat sentiment analysis to feature detection (such as railroads) in the map data using the convolutional neural network model.

“Deep Learning for Libraries” presented by Lauren Di Monte and Nilesh Patil from University of Rochester was the most practical one among those talks as it started with a specific problem to solve and resulted in action that will address the problem. In their talk, Di Monte and Patil showed how they applied deep learning techniques to solve a problem in their library’s space assessment. The problem that they wanted to solve is to find out how many people visit the library to use the library’s space and services and how many people are simply passing through to get to another building or to the campus bus stop that is adjacent to the library. This made it difficult for the library to decide on the appropriate staffing level or the hours that best serve the users’ needs. It also prevented the library from showing the library’s reach and impact based upon the data and advocate for needed resources or budget to the decision-makers on the campus. The goal of their project was to develop automated and scalable methods for conducting space assessment and reporting tools that support decision-making for operations, service design, and service delivery.

For this project, they chose an area bounded by four smart control access gates on the first floor. They obtained the log files (with the data at the sensor level minute by minute) from the eight bi-directional sensors on those gates. They analyzed the data in order to create a recurrent neural network model. They trained the algorithm using this model, so that they can predict the future incoming and the outgoing traffic in that area and visually present those findings as a data dashboard application. For data preparation, processing, and modeling, they used Python. The tools used included Seaborn, Matplotlib, Pandas, NumPy, SciPy, TensorFlow, and Keras. They picked the recurrent neural network with stochastic gradient descent optimization, which is less complex than the time series model. For data visualization, they used Tableau. The project code is available at the library’s GitHub repo: https://github.com/URRCL/predicting_visitors.

Their project result led to the library to install six more gates in order to get a better overview of the library space usage. As a side benefit, the library was also able to pinpoint the times when the gates malfunctioned and communicate the issue with the gate vendor. Di Monte and Patil plan to hand over this project to the library’s assessment team for ongoing monitoring and to look for ways to map the library’s traffic flow across multiple buildings as the next step.

Overall, there were a lot of interests in machine learning, deep learning, and artificial intelligence at the Code4Lib conference this year. The breakout session I led at the conference on these topics produced a lively discussion on a variety of tools, current and future projects for many different libraries, as well as the impact of rapidly developing AI technologies on society. This breakout session also generated #ai-dl-ml channel in the Code4Lib Slack Space. The growing interests in these areas are also shown in the newly formed Machine and Deep Learning Research Interest Group of the Library and Information Technology Association. I hope to see more talks and discussion on these topics in the future Code4Lib and other library technology conferences.


Eric:

One of the talks which struck me the most this year was Matthew Reidsma’s Auditing Algorithms. He used examples of search suggestions in the Summon discovery layer to show biased and inaccurate results:

In 2015 my colleague Jeffrey Daniels showed me the Summon search results for his go-to search: “Stress in the workplace.” Jeff likes this search because ‘stress’ is a common engineering term as well as one common to psychology and the social sciences. The search demonstrates how well a system handles word proximities, and in this regard, Summon did well. There are no apparent results for evaluating bridge design. But Summon’s Topic Explorer, the right-hand sidebar that provides contextual information about the topic you are searching for, had an issue. It suggested that Jeff’s search for “stress in the workplace” was really a search about women in the workforce. Implying that stress at work was caused, perhaps, by women.

This sort of work is not, for me, novel or groundbreaking. Rather, it was so important to hear because of its relation to similar issues I’ve been reading about since library school. From the bias present in Library of Congress subject headings where “Homosexuality” used to be filed under “Sexual deviance”, to Safiya Noble’s work on the algorithmic bias of major search engines like Google where her queries for the term “black girls” yielded pornographic results; our systems are not neutral but reify the existing power relations of our society. They reflect the dominant, oppressive forces that constructed them. I contrast LC subject headings and Google search suggestions intentionally; this problem is as old as the organization of information itself. Whether we use hierarchical, browsable classifications developed by experts or estimated proximities generated by an AI with massive amounts of user data at its disposal, there will be oppressive misrepresentations if we don’t work to prevent them.

Reidsma’s work engaged with algorithmic bias in a way that I found relatable since I manage a discovery layer. The talk made me want to immediately implement his recording script in our instance so I can start looking for and reporting problematic results. It also touched on some of what despairs me in library work lately—our reliance on vendors and their proprietary black boxes. We’ve had a number of issues lately related to full-text linking that are confusing for end users and make me feel powerless. I submit support ticket after support ticket only to be told there’s no timeline for the fix.

On a happier note, there were many other talks at Code4Lib that I enjoyed and admired: Chris Bourg gave a rousing opening keynote featuring a rallying cry against mansplaining; Andreas Orphanides, who keynoted last year’s conference, gave yet another great talk on design and systems theory full of illuminating examples; Jason Thomale’s introduction to Pycallnumber wowed me and gave me a new tool I immediately planned to use; Becky Yoose navigated the tricky balance between using data to improve services and upholding our duty to protect patron privacy. I fear I’ve not mentioned many more excellent talks but I don’t want to ramble any further. Suffice to say, I always find Code4Lib worthwhile and this year was no exception.

Yet Another Library Open Hours App

a.k.a. Yet ALOHA.

It’s a problem as old as library websites themselves: how to represent the times when a library building is open in a way that’s easy for patrons to understand and easy for staff to update?

Every website or content management system has its own solution that can’t quite suit our needs. In a previous position, I remember using a Drupal module which looked slick and had a nice menu for entering data on the administrative side…but it was made by a European developer and displayed dates in the (inarguably more logical) DD/MM/YYYY format. I didn’t know enough PHP at the time to fix it, and it would’ve confused our users, so I scrapped it.

Then there’s the practice of simply manually updating an HTML fragment that has the hours written out. This approach has advantages that aren’t easily dismissed: you can write out detailed explanations, highlight one-off closures, adjust to whatever oddity comes up. But it’s tedious for staff to edit a web page and easy to forget. This is especially true if hours information is displayed in several places; keeping everything in sync is an additional burden, with a greater possibility for human error. So when we went to redesign our library website, developing an hours application that made entering data and then reusing it in multiple places easy was at the forefront of my mind.

Why is this so hard?

One might think displaying hours is easy. The end products often look innocuous. But there are a bevy of reasons why it’s complicated for many libraries:

  • open hours differ across different branches
  • hours of particular services within a branch may not fully overlap with the library building’s open hours
  • a branch might close and re-open during the day
  • a branch might be open later than midnight, so technically “closing” on a date different than when it opened
  • holidays, campus closures, unexpected emergencies, and other exceptions disrupt regular schedules
  • in academia, schedules differ whether class is in session, it’s a break during a term, or it’s a break in between terms
  • the staff who know or determine a branch’s open hours aren’t necessarily technically skilled and may be spread across disparate library departments
  • dates and times are unique forms of data with their own unique displays, storage types, and operations (e.g. chronological comparisons)

Looking at other libraries, the struggle to represent their business hours is evident. For instance, the University of Illinois has an immense list of library branches and their open hours on its home page. There’s a lot to like about the display; it’s on the home page so patrons don’t have to go digging for the info, there’s a filter by name feature, the distinct open/closed colors help one to identify at a glance which places are open, the library branch rows expand with extra information. But it’s also an overwhelming amount of information longer than a typical laptop screen.

Hours display on the UIUC Libraries' home page.
Hours display on the UIUC Libraries’ home page.

Many libraries use SpringShare’s LibCal as a way of managing and display their open hours. See Loyola’s Hours page with its embedded table from LibCal. As a disclaimer, I’ve not used LibCal, but it comes with some obvious caveats: it’s a paid service that not all libraries can afford and it’s Yet Another App outside the website CMS. I’ve also been told that the hours entry has a learning curve and that it’s necessary to use the API for most customization. So, much as I appreciate the clarity of the LibCal schedule, I wanted to build an hours app that would work well for us, providing flexibility in terms of data format and display.

Our Hours

Our website CMS Wagtail uses a concept called “snippets” to store pieces of content which aren’t full web pages. If you’re familiar with Drupal, Snippets are like a more abstract version of Blocks. We have a snippet for each staff member, for instance, so that we can connect particular pages to different staff members but also have a page where all staff are displayed in a configurable list. When I built our hours app, snippets were clearly the appropriate way to handle the data. Ideally, hours would appear in multiple places, not be tied to a single page. Snippets also have their own section in the CMS admin side which makes entering them straightforward.

Our definition of an “open hours” snippet has but a few components:

  • the library branch the hours are for
  • the date range being described, e.g. “September 5th through December 15th” for our Fall semester
  • a list of open hours for each weekday, e.g. Monday = “8am – 10pm”, Tuesday = “8am – 8pm”, etc.

There are some nuances here. First, for a given academic term, staff have to enter hours once for each branch, so there is quite a bit of data entry. Secondly, the weekday hours are actually stored as text, not a numeric data type. This lets us add parentheticals such as “8am – 5pm (no checkouts)”. While I can see some theoretical scenarios where having numeric data is handy, such as determining if a particular branch is open on a given hour on a given date, using text simplified building the app’s data model for me and data entry for staff.

But what about when the library closes for a holiday? Each holiday effectively triples the data entry for a term: we need a data set for the time period leading up to the holiday, one for the holiday itself, and one for the time following it. For example, when we closed for Thanksgiving, our Fall term would’ve been split into a pre-Thanksgiving, during Thanksgiving, and post-Thanksgiving triad. And more so for each other holiday.

To alleviate the holiday problem, I made a second snippet type called “closures”. Closures let us punch holes in a set of open hours; rather than require pre- and post- data sets, we have one open hours snippet for the whole term and then any number of closures within it. A closure is composed of only a library branch and a date range. Whenever data about open hours is passed around inside our CMS, the app first consults the list of closures and then adjusts appropriately.

The open hours for the current day are displayed prominently on our home page. When we rebuilt our website, surfacing hours information was a primary design goal. Our old site’s hours page wasn’t exactly easy to find…yet it was the second most-visited page behind the home page.1 In our new site, the hours app allows us to show the same information in a few places, for instance as a larger table that shows our open times for a full week. The page showing the full table will also accept a date parameter in its URL, showing our schedule for future times. This lets us put up a notice about changes for periods like Thanksgiving week or Spring break.

Hours API

What really excited me about building an hours application from the ground up was the chance to include an API (inside the app’s views.py file, which in turn uses a couple functions from models.py). The app’s public API endpoint is at https://libraries.cca.edu/hours?format=json and by default it returns the open hours for the current day for all our library branches. The branch parameter allows API consumers to get the weekly schedule for a single branch while the date parameter lets them discover the hours for a specific date.

// GET https://libraries.cca.edu/hours/?format=json
{
    Materials: "11am - 4pm",
    Meyer: "8am - 5pm",
    Simpson: "9am - 6pm"
}

I’m using the API in two places, our library catalog home page and as an HTML snippet when users search our discovery layer for “hours” or “library hours”. I have hopes that other college websites will also want to reuse this information, for instance on our student portal or on a campus map. One can see the limitation of using text strings as the data format for temporal intervals; an application trying to use this API to determine “is a given library open at this instant” would have to do a bit of parsing to determine if the current time falls within the range. In the end, the benefits for data entry and straightforward display make text the best choice for us.

To summarize, the hours app fulfills our goals for the new website in a few ways. It allows us to surface our schedule not only on our home page but also in other places, sets us up to be able to reuse the information in even more places, and minimizes the burden of data entry on our staff. There are still improvements to be made—as I was writing this post I discovered a problem with cached API responses being outdated—but on the whole I’m very happy with how everything worked out.

Notes


  1. Libraries, I beg you, make your open hours obvious! People want to know.

Working with a Web Design Firm

As I’ve mentioned in the previous post, my library is undergoing a major website redesign. As part of that process, we contracted with an outside web design and development firm to help build the theme layer. I’ve done a couple major website overhauls in the course of my career, but never with an outside developer participating so much. In fact, I’ve always handled the coding part of redesigns entirely by myself as I’ve worked at smaller institutions. This post discusses what the process has been like in case other libraries are considering working with a web designer.

An Outline

To start with, our librarians had already been working to identify components of other library websites that we liked. We used Airtable, a more dynamic sort of spreadsheet, to collect our ideas and articulate why we liked certain peer websites, some of which were libraries and some not (mostly museums and design companies). From prior work, we already knew we wanted a few different page templates types. We organized our ideas around how they fit into these templates, such as a special collections showcase, a home page with a central search box, or a text-heavy policy page.

Once we knew we were going to work with the web development firm, we had a conference call with them to discuss the goals of our website redesign and show the contents of our Airtable. As we’re a small art and design library, our library director was actually the one to create an initial set of mockups to demonstrate our vision. Shortly afterwards, the designer had his own visual mockups for a few of our templates. The mockups included inline comments explaining stylistic choices. One aspect I liked about their mockups was that they were divided into desktop and mobile; there wasn’t just a “blog post” example, but a “blog post on mobile” and “blog post on desktop”. This division showed that the designer was already thinking ahead towards how the site’s theme would function on a variety of devices.

With some templates in hand, we could provide feedback. There was some push and pull—some of our initial ideas the designer thought were unimportant or against best practices, while we also had strong opinions. The discussion was interesting for me, as someone who is a librarian foremost but empathetic to usability concerns and following web conventions. It was good to have a designer who didn’t mindlessly follow our every request; when he felt like a stylistic choice was counterproductive, he could articulate why and that changed a few of our ideas. However, on some principles we were insistent. For instance, we wanted to avoid multiple search boxes on a single page; not a central catalog search and a site search in the header. I find that users are easily confused when confronted with two search engines and struggle to distinguish the different purposes and domains of both. The designer thought that it was a common enough pattern to be familiar to users, but our experiences lead us to insist otherwise.

Finally, once we had settled on agreeable mockups, a frontend developer turned them into code with an impressive turnaround; about 90% of the mockups were implemented within a week and a half. We weren’t given something like Drupal or WordPress templates; we received only frontend code (CSS, JavaScript) and some example templates showing how to structure our HTML. It was all in single a git repository complete with fake data, Mustache templates, and instructions for running a local Node.js server to view the examples. I was able to get the frontend repo working easily enough, but it was a bit surprising to me working with code completely decoupled from its eventual destination. If we had had more funds, I would have liked to have the web design firm go all the way to implementing their theme in our CMS, since I did struggle in a few places when combining the two (more on that later). But, like many libraries, we’re frugal, and it was a luxury to get this kind of design work at all.

The final code took a few months to deliver, mostly due to a single user interface bug we pointed out that the developer struggled to recreate and then fix. I was ready to start working with the frontend code almost exactly a month after our first conversation with the firm’s designer. The total time from that conversation to signing off on the final templates was a little under two months. Given our hurried timeline for rebuilding our entire site over the summer, that quick delivery was a serious boon.

Code Quirks

I’ve a lot of opinions about how code should look and be structured, even if I don’t always follow them myself. So I was a bit apprehensive working with an outside firm; would they deliver something highly functional but structured in an alien way? Luckily, I was pleasantly surprised with how the CSS was delivered.

First of all, the designer didn’t use CSS, he used SASS, which Margaret wrote about previously on Tech Connect. SASS adds several nice tools to CSS, from variables to darken and lighten functions for adjusting colors. But perhaps most importantly, it gives you much more control when structuring your stylesheets, using imports, nested selectors, and mixins. Basically, SASS is the antithesis of having one gigantic CSS file with thousands of lines. Instead, the frontend code we were given was about fifty files neatly divided by our different templates and some reusable components. Here’s the directory tree of the SASS files:

components
    about-us
    blog
    collections
    footer
    forms
    header
    home
    misc
    search
    services
fonts
reset
settings
utilities

Other than the uninformative “misc”, these folders all have meaningful names (“about-us” and “collections” refer to styles specific to particular templates we’d asked for) and it never takes me more than a moment to locate the styles I want.

Within the SASS itself, almost all styles (excepting the “reset” portion) hinge on class names. This is a best practice for CSS since it doesn’t couple your styles tightly to markup; whether a particular element is a <div>, <section>, or <article>, it will appear correctly if it bears the right class name. When our new CMS output some HTML in an unexpected manner, I was still able to utilize the designer’s theme by applying the appropriate class names. Even better, the class names are written in BEM “Block-Element-Modifier” form. BEM is a methodology I’d heard of before and read about, but never used. It uses underscores and dashes to show which high-level “block” is being styled, which element inside that block, and what variation or state the element takes on. The introduction to BEM nicely defines what it means by Block-Element-Modifier. Its usage is evident if you look at the styles related to the “see next/previous blog post” pagination at the bottom of our blog template:

.blog-post-pagination {
  border-top: 1px solid black(0.1);
 
  @include respond($break-medium) {
    margin-top: 40px;
  }
}
 
  .blog-post-pagination__title {
    font-size: 16px;
  }
 
  .blog-post-pagination__item {
    @include clearfix();
    flex: 1 0 50%;
 }
 
  .blog-post-pagination__item--prev {
    display: none;
  }

Here, blog-post-pagination is the block, __title and __item are elements within it, and the --prev modifier effects just the “previous blog post” item element. Even in this small excerpt, other advantages of SASS are evident: the respond mixin and $break-medium variables for writing responsive styles that adapt to differing device screen sizes, the clearfix include, and these related styles all being nested inside the brackets of the parent blog-post-pagination block.

Trouble in Paradise

However, as much as I admire the BEM class names and structure of the styles given to us, of course I can’t be perfectly happy. As I’ve started building out our site I’ve run into a few obvious problems. First of all, while all the components and templates we’d asked for are well-designed with clearly written code, there’s no generic framework for adding on anything new. I’d hoped, and to be honest simply assumed, that a framework like Bootstrap or Foundation would be used as the basis of our styles, with more specific CSS for our components and templates. Instead, apart from a handful of minor utilities like the clearfix include referenced above, everything that we received is intended only for our existing templates. That’s fine up to a point, but as soon as I went to write a page with a HTML table in it I noticed there was no styling whatsoever.

Relatedly, since the class names are so focused on distinct blocks, when I want to write something similar but slightly different I end up with a bunch of misleading class names. So, for instance, some of our non-blog pages have templates which are littered with class names including a .blog- prefix. The easiest way for me to build them was to co-opt the blog styles, but now the HTML looks misleading. I suppose if I had greater time I could write new styles which simply copy the blog ones under new names, but that also seems unideal in that it’s a) a lot more work and b) leads to a lot of redundant code.

Lastly, the way our CMS handles “rich text” fields (think: HTML edited in a WYSIWYG editor, not coded by hand) has caused numerous problems for our theme. The rich text output is always wrapped in a <div class="rich-text">, which made translating some of the HTML templates from the frontend code a bit tricky. The frontend styles also included a “reset” stylesheet which erased all default styles for most HTML tags. That’s fine, and a common approach for most sites, but many of the styles for elements available in the rich text editor ended up being reset. As content authors went about creating lower-level headings and unordered lists, they discovered that they appeared just as plain text.

Reflecting on these issues, they boil primarily down to insufficient communication on our part. When we first asked for design work, it was very much centered around the specific templates we wanted to use for a few different sections of our site. I never specifically outlined a need for a generic framework which could encompass new, unanticipated types of content. While there was an offhand mention of Bootstrap early on in our discussions, I didn’t make it explicit that I’d like it or something similar to form the backbone of the styles we wanted. I should have also made it clearer that styles should specifically anticipate working within our CMS and alongside rich text content. Instead, by the time I realized some of these issues, we had already approved much of the frontend work as complete.

Conclusion

For me, as someone who has worked at smaller libraries for the duration of their professional career, working with a web design company was a unique experience. I’m curious, has your library contracted for design or web development work? Was it successful or not? As tech savvy librarians, we’re often asked to do everything even if some of the tasks are beyond our skills. Working with professionals was a nice break from that and a learning experience. If I could do anything differently, I’d be more assertive about requirements in our initial talks. Outlining expectations about that the styles include a generic framework and anticipate working with our particular CMS would have saved me some time and headaches later on.

Information Architecture for a Library Website Redesign

My library is about to embark upon a large website redesign during this summer semester. This isn’t going to be just a new layer of CSS, or a minor version upgrade to Drupal, or moving a few pages around within the same general site. No, it’s going to be a huge, sweeping change that affects the whole of our web presence. With such an enormous task at hand, I wanted to discuss some of the tools and approaches that we’re using to make sure the new site meets our needs.

Why Redesign?

I’ve heard about why the wholesale website redesign is a flawed approach, why we should be continually, iteratively working on our sites. Continual changes stop problems from building up, plus large swaths of changes can disrupt our users who were used to the old site. The gradual redesign makes a lot of sense to me, and also seems like a complete luxury that I’ve never had in my library positions.

The primary problem with a series of smaller changes is that that approach assumes a solid fundamental to begin with. Our current site, however, has a host of interconnected problems that makes tackling any individual issue a challenge. It’s like your holiday lights sitting in a box all year; they’re hopelessly tangled by the time you take them out again.

Our site has decades of discarded, forgotten content. That’s mostly harmless; it’s hard to find and sees virtually no traffic. But it’s still not great to have outdated information scattered around. In particular, I’m not thrilled that a lot of it is static HTML, images, and documents sitting outside our content management system. It’s hard to know how much content we even have because it cannot be managed in one place.

We also fell into a pattern of adding content to the site but never removing or re-organizing existing content. Someone would ask for a button here, or a page dictating a policy there, or a new FAQ entry. Pages that were added didn’t have particular owners responsible for their currency and maintenance; I, as Systems Librarian, was expected to run the technical aspects of the site but also be its primary content editor. That’s simply an impossible task, as I don’t know every detail of the library’s operations or have the time to keep on top of a menagerie of pages of dubious importance.

I tried to create a “website changes form” to manage things, but it didn’t work for staff nor myself. The few staff who did fill out the form ended up requesting things that were difficult to do, large theme changes that I wasn’t comfortable making without user testing or approval from our other librarians. The little content that was added was minor text being ferried through this form and myself, essentially slowing down the editorial process and furthering this idea that web content was solely my domain.

To top our content troubles off, we’re also on an unsupported, outdated version of Drupal. Upgrading or switching a CMS isn’t necessarily related to a website redesign. If you have a functional website on a broken piece of software, you probably don’t want to toss out the good with the bad. But in our case, similar to how our ILS migration gave us the opportunity to clean up our bibliographic records, a CMS migration gives us a chance to rebuild a crumbling website. It just doesn’t make sense to invest technical effort in migrating all our existing content when it’s so clearly in need of major structural change.

Card Sort

Making a card sort
Cards in the middle of being constructed.

Not wanting to go into a redesign process blind, we set out to collect data on our current site and how it could be improved. One of the first ways we gathered data was to ask all library staff to perform a card sort. A card sort is an activity wherein pieces of web content are put on cards which can then be placed into categories; the idea is to form a rough information architecture for your site which can dictate structure and main menus. You can do either open or closed card sorts, meaning the categories are up to the participants to invent or provided ahead of time.

For our card sort, I chose to do an open card sort since we were so uncertain on the categories. Secondly, I selected web content based on our existing site’s analytics. It was clear to me that our current site was bloated and disorganized; there were pages tucked into the nooks of cyberspace that no one had visited in years. There was all sorts of overlapping and unnecessary content. So I selected ≈20 popular pages but also gave each group two pieces of blank paper on which to add whatever content they felt was missing.

Finally, trying to get as much and as useful data as possible, I modified the card sort procedure in a couple ways. I asked people to role play as different types of stakeholders (graduate & undergraduate students, faculty, administrators) and to justify their decisions from that vantage point. I also had everyone, after sorting was done, put dots on content they felt was important enough for the home page. Since one of our current site’s primary challenges in maintenance, or the lack thereof, I wanted to add one last activity wherein participants would write a “responsible staff member” on each card (e.g. the instruction librarian maintains the instruction policy page). Sadly, we ran out of time and couldn’t do that bit.

The results of the card sort were informative. A few categories emerged as a commonality across everyone’s sorts: collections, “about us”, policies, and current events/news. We discovered a need for new content to cover workshops, exhibits, and events happening in the library which were currently only represented (and not very well) on blog posts. In terms of the home page, it was clear that LibGuides, collections, news, and most importantly our open hours needed to represented.

Treejack & Analytics

Once we had enough information to build out the site’s architecture, I organized our content into a few major categories. But there were still several questions on my mind: would users understand terms like “special collections”? Would they understand where to look for LibGuides? Would they know how to find the right contact for various questions? To answer some of these questions, I turned to Optimal Workshop’s “Treejack” tool. Treejack tests a site’s information architecture by having users navigate basic text links to perform basic tasks. We created a few tasks aimed at answering our questions and recruited students to perform them. While we’re only using the free tier of Optimal Workshop, and only using student stakeholders, the data was till informative.

For one, Optimal Workshop’s results data is rich and visualized well. It shows the exact routes each user took through our site’s content, the time it took to complete a task, and whether a task was completed directly, completed indirectly, or failed. Completed directly means the user took an ideal route through our content; no bouncing up and down the site’s hierarchy. Indirect completion means they eventually got to the right place, but didn’t take a perfect path there, while failure means they ended in the wrong place. The graph’s the demonstrate each tasks’ outcomes are wonderful:

Data & charts for a task
The data & charts Treejack shows for a moderately successful task.

"Pie tree" visualizing users' paths
A “pie tree” showing users’ paths while attempting a task.

We can see here that most of our users found their way to LibGuides (named “study guides” here). But a few people expected to find them under our “Collections” category and bounced around in there, clearly lost. This tells us we should represent our guides under Collections alongside items like databases, print collections, and course reserves. While building and running your own Treejack-type tests would be easy, I definitely recommend Optimal Workshop as a great product which provides much insight.

There’s much work to be done in terms of testing—ideally we would adjust our architecture to address the difficulties that users had, recruit different sets of users (faculty & staff), and attempt to answer more questions. That’ll be difficult during the summer while there are fewer people on campus but we know enough now to start adjusting our site and moving along in the redesign process.

Another piece of our redesign philosophy is using analytics about the current site to inform our decisions about the new one. For instance, I track interactions with our home page search box using Google Analytics events 1. The search box has three tabs corresponding to our discovery layer, catalog, and LibGuides. Despite thousands of searches and interactions with the search box, LibGuides search is seeing only trace usage. The tab was clicked on a mere 181 times this year; what’s worse, only 51 times did a user actually search afterwards. This trace amount of usage, plus the fact that users are clearly clicking onto the tab and then not finding what they want there, indicates it’s just not worth any real estate on the home page. When you add in that our LibGuides now appear in our discovery layer, their search tab is clearly disposable.

What’s Next

Data, tests, and conceptual frameworks aside, our next stage will involve building something much closer to an actual, functional website. Tools like Optimal Workshop are wonderful for providing high-level views on how to structure our information, but watching a user interact with a prototype site is so much richer. We can see their hesitation, hear them discuss the meanings of our terms, get their opinions on our stylistic choices. Prototype testing has been a struggle for me in the past; users tend to fixate on the unfinished or unrefined nature of the prototype, providing feedback that tells me what I already know (yes, we need to replace the placeholder images; yes, “Lorem ipsum dolor sit amet” is written on every page) rather than something new. I hope to counter that by setting appropriate expectations and building a small but fairly robust prototype.

We’re also building our site in an entirely new piece of software, Wagtail. Wagtail is exciting for a number of reasons, and will probably have to be the subject of future posts, but it does help address some of the existing issues I noted earlier. We’re excited by the innovative Streamfield approach to content—a replacement for large, rich text fields which are unstructured and often let users override a site’s base styles. We’ve also heard whispers of new workflow features which let us send reminders to owners of different content pages to revisit them periodically. While I could do something like this myself with an ad hoc mess of calendar events and spreadsheets, having it build right into the CMS bodes well for our future maintenace plans. Obviously, the concepts underlying Wagtail and the tools it offers will influence how we implement our information architecture. But we also started gathering data long before we knew what software we’d use, so exactly how it will work remains to be figured out.

Has your library done a website redesign or information architecture test recently? What tools or approaches did you find useful? Let us know in the comments!

Notes

  1. I described Google Analytics events before in a previous Tech Connect post

Representing Online Journal Holdings in the Library Catalog

The Problem

It isn’t easy to communicate to patrons what serials they have access to and in what form (print, online). They can find these details, sure, but it’s scattered across our library’s web presence. What’s most frustrating is that we clearly have all the necessary information but the systems offer no built-in way to produce a clear display of it. My fellow librarians noted that “it’d be nice if the catalog showed our exact online holdings” and my initial response was to sigh and say “yes, that would be nice”.

To illustrate the scope of the problem, a user can search for journals in a few of our disparate systems:

  • we use a knowledgebase to track database subscriptions and which journals are included in each subscription package
  • the public catalog for our Koha ILS has records for our print journals, sometimes with a MARC 856$u 1 link to our online holdings in the knowledgebase
  • our discovery layer has both article-level results for the journals in our knowledgebase and journal-level search results for the ones in our catalog

While these systems overlap, they also serve distinct purposes, so it’s not so awful. However, there are a few downsides to our triad of serials information systems. First of all, if a patron searches the knowledgebase looking for a journal which we only have in print, our database holdings wouldn’t show that they have access to print issues. To work around this, we track our print issues both in our ILS and the knowledgebase, which duplicates work and introduces possible inconsistencies.

Secondly, someone might start their research in the discovery layer, finding a journal-level record that links out to our catalog. But it’s too much to ask a user to search the discovery layer, click into the catalog, click a link out to the knowledgebase, and only then discover our online holdings don’t include the particular volume they’re looking for. Possessing three interconnected systems creates labyrinthine search patterns and confusion amongst patrons. Simply describing the systems and their nuanced areas of overlap in this post feels like challenge, and the audience is librarians. I can imagine how our users must feel when we try to outline the differences.

The 360 XML API

Our knowledgebase is Serials Solutions 360KB. I went looking in the vendor’s help documentation for answers, which refers to an API for the product but apparently provides no information on using said API. Luckily, a quick search through GitHub projects yielded several using the API and I was able to determine its URL structure: http://{{your Serials Solution ID}}.openurl.xml.serialssolutions.com/openurlxml?version=1.0&url_ver=Z39.88-2004&issn={{the journal’s ISSN}}

It’s probably possible to search by other parameters as well, but for my purposes ISSN was ideal so I didn’t bother investigating further. If you send a request to the address above, you receive XML in response:

<ssopenurl:openURLResponse xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ssdiag="http://xml.serialssolutions.com/ns/diagnostics/v1.0" xmlns:ssopenurl="http://xml.serialssolutions.com/ns/openurl/v1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xml.serialssolutions.com/ns/openurl/v1.0 http://xml.serialssolutions.com/ns/openurl/v1.0/ssopenurl.xsd http://xml.serialssolutions.com/ns/diagnostics/v1.0 http://xml.serialssolutions.com/ns/diagnostics/v1.0/diagnostics.xsd">
    <ssopenurl:version>1.0</ssopenurl:version>
    <ssopenurl:results dbDate="2017-02-15">
        <ssopenurl:result format="journal">
            <ssopenurl:citation>
                <dc:source>Croquis</dc:source>
                <ssopenurl:issn type="print">0212-5633</ssopenurl:issn>
            </ssopenurl:citation>
            <ssopenurl:linkGroups>
                <ssopenurl:linkGroup type="holding">
                    <ssopenurl:holdingData>
                        <ssopenurl:startDate>1989</ssopenurl:startDate>
                        <ssopenurl:providerId>PRVLSH</ssopenurl:providerId>
                        <ssopenurl:providerName>Library Specific Holdings</ssopenurl:providerName>
                        <ssopenurl:databaseId>ZYW</ssopenurl:databaseId>
                        <ssopenurl:databaseName>CCA Print Holdings</ssopenurl:databaseName>
                        <ssopenurl:normalizedData>
                            <ssopenurl:startDate>1989-01-01</ssopenurl:startDate>
                        </ssopenurl:normalizedData>
                    </ssopenurl:holdingData>
                    <ssopenurl:url type="source">https://library.cca.edu/</ssopenurl:url>
                    <ssopenurl:url type="journal">
                    https://library.cca.edu/cgi-bin/koha/opac-search.pl?idx=ns&q=0212-5633
                    </ssopenurl:url>
                </ssopenurl:linkGroup>
            </ssopenurl:linkGroups>
        </ssopenurl:result>
    </ssopenurl:results>
    <ssopenurl:echoedQuery timeStamp="2017-02-15T16:14:12">
        <ssopenurl:library id="EY7MR5FU9X">
            <ssopenurl:name>California College of the Arts</ssopenurl:name>
        </ssopenurl:library>
        <ssopenurl:queryString>version=1.0&url_ver=Z39.88-2004&issn=0212-5633</ssopenurl:queryString>
    </ssopenurl:echoedQuery>
</ssopenurl:openURLResponse>

If you’ve read XML before, then it’s apparent how useful the above data is. It contains a list of our “holdings” for the periodical with information about the start and end (absent here, which implies the holdings run to the present date) dates of the subscription, which database they’re in, and what URL they can be accessed at. Perfect! The XML contains precisely the information we want to display in our catalog.

Unfortunately, our catalog’s JavaScript doesn’t have permission to access the 360 XML API. Due to a browser security policy resources must explicitly say that other domains or pages are allowed to request their data. A page needs to include the Access-Control-Allow-Origin HTTP header to abide by this policy, called Cross-Origin Resource Sharing (CORS), and the 360 API does not.

We can work around this limitation but it requires extra code on our part. While JavaScript from a web page cannot request data directly from 360, we can write a server-side script to pull data. That server-side script can then add its own CORS header which lets our catalog use it. So, in essence, we set up a proxy service that acts as a go-between for our catalog and the API that the catalog cannot use. Typically, this takes little code; the server-side script takes a parameter passed to it in the URL, sends it in a HTTP request to another server, and serves back up whatever response it receives.

Of course, it didn’t turn out to be that simple in practice. As I experimented with my scripts, I could tell that the 360 data was being received, but I couldn’t parse meaningful pieces of information out of it. It’s clearly there; I could see the full XML structure with holdings details. But neither my server-side PHP nor my client-side JavaScript could “find” XML elements like <ssopenurl:linkGroup> and <ssopenurl:normalizedData>. The text before the colon in the tag names is the namespace. Simple jQuery code like $('ssopenurl:linkGroup', xml), which can typically parse XML data, wasn’t working with these namespaced elements.

Finally, I discovered the solution by reading the PHP manual’s entry for the simplexml_load_string function: I can tell PHP how to parse namespaced XML by passing a namespace parameter to the parser function. So my function call turned into:

// parameters: 1) serials solution data since $url is the API we want to pull from
// 2) the type of object that the function should return (this is the default)
// 3) Libxml options (also the default, no special options)
// 4) (finally!) ns, the XML namespace
// 5) "True" here means ns is a prefix and not a URI
$xml= simplexml_load_string( file_get_contents($url), 'SimpleXMLElement', 0, 'ssopenurl', True );

As you can see, two of those parameters don’t even differ from the function’s defaults, but I still need to provide them to get to the “ssopenurl” namespace later. As an aside, technical digressions like these are some of the best and worst parts of my job. It’s rewarding to encounter a problem, perform research, test different approaches, and eventually solve it. But it’d also be nice, and a lot quicker, if code would just work as expected the first time around.

The Catalog

We’re lucky that Koha’s catalog both allows for JavaScript customization and has a well-structured, easy-to-modify record display. Now that I’m able to grab online holdings data from our knowledgebase, inserting into the catalog is trivial. If you wanted to do the same with a different library catalog, the only changes come in the JavaScript that finds ISSN information in a record and then inserts the retrieved holdings information into the display. The complete outline of the data flow from catalog to KB and back looks like:

  • my JavaScript looks for an ISSN on the record’s display page
  • if there’s an ISSN, it sends the ISSN to my proxy script
  • the proxy script adds a few parameters & asks for information from the 360 XML API
  • the 360 XML API returns XML, which my proxy script parses into JSON and sends to the catalog
  • the catalog JavaScript receives the JSON and parses holdings information into formatted HTML like “Online resources: 1992 to present in DOAJ
  • the JS inserts the formatted text into the record’s “online resources” section, creating that section if it doesn’t already exist

Is there a better way to do this? Almost certainly. The six steps above should give you a sense of how convoluted the process is, hacking around a few limitations. Still, the outcome is positive: we stopped updating our print holdings in our knowledgebase and our users have more information at their fingertips. It obviates the final step in the protracted “discovery layer to catalog” search described in the opening of this post.

Our next steps are obvious, too: we should aim to get this information into the discovery layer’s search results for our journals. The general frame of this project would be the same; we already know how to get the data from the API. Much like working with a different library catalog, the only edits are in parsing ISSNs from discovery layer search results and finding a spot in the HTML to insert the holdings data. Finally, we can also remove the redundant and less useful 856$u links from our periodical MARC records now.

The Scripts

These are highly specific to our catalog, but may be of general use to others who want to see how the pieces work together:

Notes

  1. For those unfamiliar, 856 is the MARC field for URLs, whether they URL represents the actual resource being described or something supplementary. It’s pretty common for print journals to also have 856 fields for their online counterparts.