Broken Links in the Discovery Layer—Pt. I: Researching a Problem

Like many administrators of discovery layers, I’m constantly baffled and frustrated when users can’t access full text results from their searches. After implementing Summon, we heard a few reports of problems and gradually our librarians started to stumble across them on their own. At first, we had no formal system for tracking these errors. Eventually, I added a script which inserted a “report broken link” form into our discovery layer’s search results. 1 I hoped that collecting reported problems and then reporting then would identify certain systemic issues that could be resolved, ultimately leading to fewer problems. Pointing out patterns in these errors to vendors should lead to actual progress in terms of user experience.

From the broken links form, I began to cull some data on the problem. I can tell you, for instance, which destination databases experience the most problems or what the character of the most common problems is. The issue is the sample bias—are the problems that are reported really the most common? Or are they just the ones that our most diligent researchers (mostly our librarians, graduate students, and faculty) are likely to report? I long for quantifiable evidence of the issue without this bias.

How I classify the broken links that have been reported via our form. N = 57

Select Searches & Search Results

So how would one go about objectively studying broken links in a discovery layer? The first issue to solve is what searches and search results to review. Luckily, we have data on this—we can view in our analytics what the most popular searches are. But a problem becomes apparent when one goes to review those search terms:

  • artstor
  • hours
  • jstor
  • kanopy

Of course, the most commonly occurring searches tend to be single words. These searches all trigger “best bet” or database suggestions that send users directly to other resources. If their result lists do contain broken links, those links are unlikely to ever be visited, making them a poor choice for our study. If I go a little further into the set of most common searches, I see single-word subject searches for “drawing” followed by some proper nouns (“suzanne lacy”, “chicago manual of style”). These are better since it’s more likely users actually select items from their results but still aren’t a great representation of all the types of searches that occur.

Why are these types of single-word searches not the best test cases? Because search phrases necessarily have a long tail distribution; the most popular searches aren’t that popular in the context of the total quantity of searches performed 2. There are many distinct search queries that were only ever executed once. Our most popular search of “artstor”? It was executed 122 times over the past two years. Yet we’ve had somewhere near 25,000 searches in the past six months alone. This supposedly popular phrase has a negligible share of that total. Meanwhile, just because a search for “How to Hack it as a Working Parent. Jaclyn Bedoya, Margaret Heller, Christina Salazar, and May Yan. Code4Lib (2015) iss. 28″ has only been run once doesn’t mean it doesn’t represent a type of search—exact citation search—that is fairly common and worth examining, since broken links during known item searches are more likely to be frustrating.

Even our 500 most popular searches evince a long tail distribution.

So let’s say we resolve the problem of which searches to choose by creating a taxonomy of search types, from single-word subjects to copy-pasted citations. 3 We can select a few real world samples of each type to use in our study. Yet we still haven’t decided which search results we’re going to examine! Luckily, this proves much easier to resolve. People don’t look very far down in the search results 4, rarely scrolling past the first “page” listed (Summon has an infinite scroll so there technically are no pages, but you get the idea). Only items within the first ten results are likely to be selected.

Once we have our searches and know that we want to examine only the first ten or so results, my next thought is that it might be worth filtering our results that are unlikely to have problems. But does skipping the records from our catalog, institutional repository, LibGuides, etc. make other problems abnormally more apparent? After all, these sorts of results are likely to work since we’re providing direct links to the Summon link. Also, our users do not heavily employ facets—they would be unlikely to filter out results from the library catalog. 5 In a way, by focusing a study on search results that are the most likely to fail and thus give us information about underlying linking issues, we’re diverging away from the typical search experience. In the end, I think it’s worthwhile to stay true to more realistic search patterns and not apply, for instance, a “Full Text Online” filter which would exclude our library catalog.

Next Time on Tech Connect—oh how many ways can things go wrong?!? I’ll start investigating broken links and attempt to enumerate their differing natures.

Notes

  1. This script was largely copied from Robert Hoyt of Fairfield University, so all credit due to him.
  2. For instance, see: Beitzel, S. M., Jensen, E. C., Chowdhury, A., Frieder, O., & Grossman, D. (2007). Temporal analysis of a very large topically categorized web query log. Journal of the American Society for Information Science and Technology, 58(2), 166–178. “… it is clear that the vast majority of queries in an hour appear only one to five times and that these rare queries consistently account for large portions of the total query volume”
  3. Ignore, for the moment, that this taxonomy’s constitution is an entire field of study to itself.
  4. Pan, B., Hembrooke, H., Joachims, T., Lorigo, L., Gay, G., & Granka, L. (2007). In google we trust: Users’ decisions on rank, position, and relevance. Journal of Computer-Mediated Communication, 12(3), 801–823.
  5. In fact, the most common facet used in our discovery layer is “library catalog” showing that users often want only bibliographic records; the precise opposite of a search aimed at only retrieving article database results.

ORCID for System Interoperability in Scholarly Communication Workflows

What is ORCID?

If you work in an academic library or otherwise provide support for research and scholarly communication, you have probably heard of ORCID (Open Contributor & Researcher Identifier) in terms of “ORCID iD,” a unique 16-digit identifier that represents an individual in order to mitigate name ambiguity. The ORCID iD number is presented as a URI (unique resource identifier) that serves as the link to a corresponding ORCID record, where disambiguating data about an individual is stored. For example, https://orcid.org/0000-0002-9079-593X is the ORCID iD for the late Stephen Hawking, and clicking on this link will take you to Hawking’s ORCID record. Data within ORCID records can include things like names(s) and other identifiers, biographical information, organizational affiliations, and works.

The image is a screenshot of an ORCID record, showing fields for name, ORCID iD, alternate names, country, keywords, websites, other IDs, biography, employment, education and qualifications, invited positions and distinctions, membership and service, funding, and works.
Figure 1: This screenshot shows the types of data that can be contained in an ORCID record.

Anyone can register for an ORCID iD for free, and individuals have full control over what data appears in their record, the visibility of that data, and whether other individuals or organizations are authorized to add data to their ORCID record on their behalf. Individuals can populate information in their ORCID record themselves, or they can grant permission to organizations, like research institutions, publishers, and funding agencies, to connect with their ORCID record as trusted parties, establishing an official affiliation between the individual and the organization. For example, Figures 2 and 3 illustrate an authenticated ORCID connection between an individual author and the University of Virginia (UVA) as represented in LibraOpen, the UVA Library’s Samvera institutional repository.

Screenshot image shows the user interface for the University of Virginia Library's LibraOpen institutional repository, showing the page for an article titled, "Data Management Assessment and Planning Tools." There are two authors listed, both with an ORCID iD icon and URI listed next to their names.
Figure 2: The University of Virginia Library’s LibraOpen Institutional Repository is configured to make authenticated connections with authors’ ORCID records, linking the author to their contributions and to the institution. Once an author authenticates/connects their ORCID iD in the system, ORCID iD URIs are displayed next to the authors’ names. Image source: doi.org/10.18130/V3FB8T
Screenshot shows an ORCID record with several works listed. The source of each work is "University of Virginia"
Figure 3: By clicking on the author’s ORCID iD URI in LibraOpen, we can see the work listed on the individual’s ORCID record, with “University of Virginia” as the source of the data, which means that the author gave permission for UVA to write to their ORCID record. This saves time for the author, ensures integrity of metadata, and contributes trustworthy data back to the scholarly communication ecosystem that can then be used by other systems connected with ORCID. Image courtesy of Sherry Lake, UVA https://orcid.org/0000-0002-5660-2970

ORCID Ecosystem & Interoperability

These authenticated connections are made possible by configuring software systems to communicate with the ORCID registry through the ORCID API, which is based on OAuth 2.0. With individual researchers/contributors at the center, and their affiliated organizations connecting with them through the ORCID API, all participating organizations’ systems can also communicate with each other. In this way, ORCID not only serves as a mechanism for name disambiguation, it also provides a linchpin for system interoperability in the research and scholarly communication ecosystem.

Graphic shows an ecosystem diagram with a researcher at the center with their ORCID iD, and publishers, employers, and funders forming a connected circle showing how they can connect with the research and each other. Text within the graphic reads: "Interoperability, Enter Once Re-use often"
Figure 4: ORCID serves as a mechanism for interoperability between systems and data in the scholarly communication ecosystem. Graphic courtesy of the ORCID organization.

Publishers, funders, research institutions (employers), government agencies, and other stakeholders have been adopting and using ORCID increasingly in their systems over the past several years. As a global initiative, over 5 million individuals around the world have registered for an ORCID iD, and that number continues to grow steadily as more organizations start to require ORCID iDs in their workflows. For example, over 65 publishers have signed on to an open letter committing to use ORCID in their processes, and grant funders are continuing to come on board with ORCID as well, having recently released their own open letter demonstrating commitment to ORCID. A full list of participating ORCID member organizations around the globe can be found at https://orcid.org/members.

ORCID Integrations

ORCID can be integrated into any system that touches the types of data contained within an ORCID record, including repositories, publishing and content management platforms, data management systems, central identity management systems, human resources, grants management, and Current Research Information Systems (CRIS). ORCID integrations can either be custom built into local systems, such as the example from UVA above, or made available through a vendor system out of the box. Several vendor-hosted CRIS such as Pure, Faculty 180, Digital Measures, and Symplectic Elements, already have built-in support for authenticated ORCID connections that can be utilized by institutional ORCID members, which provides a quick win for pulling ORCID data into assessment workflows with no development required. While ORCID has a public API that offers limited functionality for connecting with ORCID iDs and reading public ORCID data, the ORCID member API allows organizations to read from, write to, and auto-update ORCID data for their affiliated researchers. The ORCID institutional membership model allows organizations to support the ORCID initiative and benefit from the more robust functionality that the member API provides. ORCID can be integrated with disparate systems, or with one system from which data flows into others, as illustrated in Figure 5.

Graphic shows one central ID management system connected to the ORCID registry, with arrows fllowing to other systems such as CRIS, SIS, and DSpace
Figure 5: This graphic from the Czech Technical University in Prague illustrates how a central identity management system is configured to connect with the ORCID registry via the ORCID API, with ORCID data flowing internally to other institutional systems. Image Source: Czech Technical University in Prague Central Library & Computing and Information Centre , 2016: Solving a Problem of Authority Control in DSpace During ORCID Implementation

ORCID in US Research Institutions

In January of 2018, four consortia in the US – the NorthEast Research Libraries (NERL), the Greater Western Library Alliance (GWLA), the Big Ten Academic Alliance (BTAA), and LYRASIS – joined forces to form a national partnership for a consortial approach to ORCID membership among research institutions in the US, known as the ORCID US Community. The national partnership allows non-profit research institutions to become premium ORCID member organizations for a significantly discounted fee and employs staff to provide dedicated technical and community support for its members. As of December 1, 2018, there are 107 member organizations in the ORCID US Community.

In addition to encouraging adoption of ORCID, a main goal of the consortium approach is to build a community of practice around ORCID in the US. Prior to 2018, any institutions participating in ORCID were essentially going it alone and there were no dedicated communication channels or forums for discussion and sharing around ORCID at a national level. However, with the formation of the ORCID US Community, there is now a website with community resources for ORCID adoption specific to the US, dedicated communication channels, and an open door to collaboration between member institutions.

Among ORCID US Community member organizations, just under half have integrated ORCID with one or more systems, and the other slightly more than half are either in early planning stages or technical development. (See the ORCID US Community 2018 newsletter for more information.) As an ecosystem, ORCID relies not only on organizations but also the participation of individual researchers, so all members have also been actively reaching out to their affiliated researchers to encourage them to register for, connect, and use their ORCID iD.

Getting Started with ORCID

ORCID can benefit research institutions by mitigating confusion caused by name ambiguity, providing an interoperable data source that can be used for individual assessment and aggregated review of institutional impact, allowing institutions to assert authority over their institutional name and verify affiliations with researchers, ultimately saving time and reducing administrative burden for both organizations and individuals. To get the most value from ORCID, research institutions should consider the following three activities as outlined in the ORCID US Planning Guide:

  1. Forming a cross-campus ORCID committee or group with stakeholders from different campus units (libraries, central IT, research office, graduate school, grants office, human resources, specific academic units, etc.) to strategically plan ORCID system integration and outreach efforts
  2. Assessing all of the current systems used on campus to determine which workflows could benefit from ORCID integration
  3. Conducting outreach and education around research impact and ORCID to encourage researchers to register for and use their ORCID iD

The more people and organizations/systems using ORCID, the more all stakeholders can benefit from ORCID by maintaining a record of an individuals’ scholarly and cultural contributions throughout their career, mitigating confusion caused by name ambiguity, assessing individual contributions as well as institutional impact, and enabling trustworthy and efficient sharing of data across scholarly communication workflows. Effectively, ORCID represents a paradigm shift from siloed, repetitive workflows to the ideal of being able to “enter once, re-use often” by using ORCID to transfer data between systems, workflows, and individuals, ultimately making everyone’s lives easier.

Sheila Rabun is the ORCID US Community Specialist at LYRASIS, providing technical and community support for 100+ institutional members of the ORCID US Community. In prior roles, she managed community and communication for the International Image Interoperability Framework (IIIF) Consortium, and served as a digital project manager for several years at the University of Oregon Libraries’ Digital Scholarship Center. Learn more at https://orcid.org/0000-0002-1196-6279

Creating Presentations with Beautiful.AI

Updated 2018-11-12 at 3:30PM with accessibility information.

Beautiful.AI is a new website that enables users to create dynamic presentations quickly and easily with “smart templates” and other design optimized features. So far the service is free with a paid pro tier coming soon. I first heard about Beautiful.AI in an advertisement on NPR and was immediately intrigued. The landscape of presentation software platforms has broadened in recent years to include websites like Prezi, Emaze, and an array of others beyond the tried and true PowerPoint. My preferred method of creating presentations for the past couple of years has been to customize the layouts available on Canva and download the completed PDFs for use in PowerPoint. I am also someone who enjoys tinkering with fonts and other design elements until I get a presentation just right, but I know that these steps can be time consuming and overwhelming for many people. With that in mind, I set out to put Beautiful.AI to the test by creating a short “prepare and share” presentation about my first experience at ALA’s Annual Conference this past June for an upcoming meeting.

A title slide created with Beautiful.AI.

Features

To help you get started, Beautiful.AI includes an introductory “Design Tips for Beautiful Slides” presentation. It is also fully customizable so you can play around with all of of the features and options as you explore, or you can click on “create new presentation” to start from scratch. You’ll then be prompted to choose a theme, and you can also choose a color palette. Once you start adding slides you can make use of Beautiful.AI’s template library. This is the foundation of the site’s usefulness because it helps alleviate guesswork about where to put content and that dreaded “staring at the blank slide” feeling. Each individual slide becomes a canvas as you create a presentation, similar to what is likely familiar in PowerPoint. In fact, all of the most popular PowerPoint features are available in Beautiful.AI, they’re just located in very different places. From the navigation at the left of the screen users can adjust the colors and layout of each slide as well as add images, animation, and presenter notes. Options to add, duplicate, or delete a slide are available on the right of the screen. The organize feature also allows you to zoom out and see all of the slides in the presentation.

Beautiful.AI offers a built-in template to create a word cloud.

One of Beautiful.AI’s best features, and my personal favorite, is its built-in free stock image library. You can choose from pre-selected categories such as Data, Meeting, Nature, or Technology or search for other images. An import feature is also available, but providing the stock images is extremely useful if you don’t have your own photos at the ready. Using these images also ensures that no copyright restrictions are violated and helps add a professional polish to your presentation. The options to add an audio track and advance times to slides are also nice to have for creating presentations as tutorials or introductions to a topic. When you’re ready to present, you can do so directly from the browser or export to PDF or PowerPoint. Options to share with a link or embed with code are also available.

Usability

While intuitive design and overall usability won’t necessarily make or break the existence of a presentation software platform, each will play a role in influencing whether someone uses it more than once. For the most part, I found Beautiful.AI to be easy and fun to use. The interface is bold, yet simplistic, and on trend with current website design aesthetics. Still, users who are new to creating presentations online in a non-PowerPoint environment may find the Beautiful.AI interface to be confusing at first. Most features are consolidated within icons and require you to hover over them to reveal their function. Icons like the camera to represent “Add Image” are pretty obvious, but others such as Layout and Organize are less intuitive. Some of Beautiful.AI’s terminology may also not be as easily recognizable. For example, the use of the term “variations” was confusing to me at first, especially since it’s only an option for the title slide.

The absence of any drag and drop capability for text boxes is definitely a feature that’s missing for me. This is really where the automated design adaptability didn’t seem to work as well as I would’ve expected given that it’s one of the company’s most prominent marketing statements. On the title slide of my presentation, capitalizing a letter in the title caused the text to move closer to the edge of the slide. In Canva, I could easily pull the text block over to the left a little or adjust the font size down by a few points. I really am a stickler for spacing in my presentations, and I would’ve expected this to be an element that the “Design AI” would pick up on. Each template also has different pre-set design elements, and it can be confusing when you choose one that includes a feature that you didn’t expect. Yet, text sizes that are pre-set to fit the dimensions of each template does help not only with readability in the creation phase but with overall visibility for audiences. Again, this alleviates some of the guesswork that often happens in PowerPoint with not knowing exactly how large your text sizes will appear when projected onto larger screens.

A slide created using a basic template and stock photos available in Beautiful.AI.

One feature that does work really well is the export option. Exporting to PowerPoint creates a perfectly sized facsimile presentation, and being able to easily download a PDF is very useful for creating handouts or archiving a presentation later on. Both are nice to have as a backup for conferences where Internet access may be spotty, and it’s nice that Beautiful.AI understands the need for these options. Unfortunately, Beautiful.AI doesn’t address accessibility on its FAQ page nor does it offer alternative text or other web accessibility features. Users will need to add their own slide titles and alt text in PowerPoint and Adobe Acrobat after exporting from Beautiful.AI to create an accessible presentation. 

Conclusion

Beautiful.AI challenged me to think in new ways about how best to deliver information in a visually engaging way. It’s a useful option for librarians and students who are looking for a presentation website that is fun to use, engaging, and on trend with current web design.

Click here to view “My first ALA”presentation created with Beautiful.AI.


Jeanette Sewell is the Database and Metadata Management Coordinator at Fondren Library, Rice University.

National Forum on Web Privacy and Web Analytics

We had the fantastic experience of participating in the National Forum on Web Privacy and Web Analytics in Bozeman, Montana last month. This event brought together around forty people from different areas and types of libraries to do in-depth discussion and planning about privacy issues in libraries. Our hosts from Montana State University, Scott Young, Jason Clark, Sara Mannheimer, and Jacqueline Frank, framed the event with different (though overlapping) areas of focus. We broke into groups based on our interests from a pre-event survey and worked through a number of activities to identify projects. You can follow along with all the activities and documents produced during the Forum in this document that collates all of them.

Drawing of ship
Float your boat exercise

            While initially worried that the activities would feel too forced, instead they really worked to release creative ideas. Here’s an example: our groups drew pictures of boats with sails showing opportunities, and anchors showing problems. We started out in two smaller subgroups of our subgroups and drew a boat, then met with the large subgroup to combine the boat ideas. This meant that it was easy to spot the common themes—each smaller group had written some of the same themes (like GDPR). Working in metaphor meant we could express some more complex issues, like politics, as the ocean—something that always surrounds the issue and can be helpful or unhelpful without much warning. This helped us think differently about issues and not get too focused on our own individual perspective.

The process of turning metaphor into action was hard. We had to take the whole world of problems and opportunities and come up with how these could be realistically accomplished. Good and important ideas had to get left behind because they were so big there was no way to feasibly plan them, certainly not in a day or two. The differing assortment of groups (which were mixable where ideas overlapped) ensured that we were able to question each other’s assumptions and ask some hard questions. For example, one of the issues Margaret’s group had identified as a problem was disagreement in the profession about what the proper limits were on privacy. Individually identifiable usage metrics are a valuable commodity to some, and a thing not to be touched to others. While everyone in the room was probably biased more in favor of privacy than perhaps the profession at large is, we could share stories and realities of the types of data we were collecting and what it was being used for. Considering the realities of our environments, one of our ideas to bring everyone from across the library and archives world to create a unified set of privacy values was not going to happen. Despite that, we were able to identify one of the core problems that led to a lack of unity, which was, in many cases, lack of knowledge about what privacy issues existed and how these might affect institutions. When you don’t completely understand something, or only half understand it, you are more likely to be afraid of it.

            On the afternoon of the second day and continuing into the morning of the third day, we had to get serious and pick just one idea to focus on to create a project plan. Again, the facilitators utilized a few processes that helped us take a big idea and break it down into more manageable components. We used “Big SCAI” thinking to frame the project: what is the status quo, what are the challenges, what actions are required, and what are the ideals. From there we worked through what was necessary for the project, nice to have, unlikely to get, and completely unnecessary to the project. This helped focus efforts and made the process of writing a project implementation plan much easier.

Laptop with postits on wall.
What the workday looked like.

Writing the project implementation plan as a group was made easier by shared documents, but we all commented on the irony of using Google Docs to write privacy plans. On the other hand, trying to figure out how to write in groups and easily share what we wrote using any other platform was a challenge in the moment. This reality illustrates the problems with privacy: the tool that is easiest to use and comes to mind first will be the one that ends up being used. We have to create tools that make privacy easy (which was a discussion many of us at the Forum had), but even more so we need to think about the tradeoffs that we make in choosing a tool and educate ourselves and others about this. In this case, since all the outcomes of the project were going to be public anyway, going on the “quick and easy” side was ok.

            The Forum project leaders recently presented about their work at the DLF Forum 2018 conference. In this presentation, they outlined the work that they did leading up to the Forum, and the strategies that emerged from the day. They characterized the strategies as Privacy Badging and Certifications, Privacy Leadership Training, Privacy for Tribal Communities and Organizations, Model License for Vendor Contracts, Privacy Research Institute, and a Responsible Assessment Toolkit. You can read through the thought process and implementation strategies for these projects and others yourself at the project plan index. The goal is to ensure that whoever wants to do the work can do it. To quote Scott Young’s follow-up email, “We ask only that you keep in touch with us for the purposes of community facilitation and grant reporting, and to note the provenance of the idea in future proposals—a sort of CC BY designation, to speak in copyright terms.”

            For us, this three-day deep dive into privacy was an inspiration and a chance to make new connections (while also catching up with some old friends). But even more, it was a reminder that you don’t need much of anything to create a community. Provided the right framing, as long as you have people with differing experiences and perspectives coming together to learn from each other, you’ve facilitated the community-building.  

The Ex Libris Knowledge Center and Orangewashing

Two days after ProQuest completed their acquisition of Ex Libris in December 2015, Ex Libris announced the launch of their new online Customer Knowledge Center. In the press release for the Knowledge Center, the company describes it as “a single gateway to all Ex Libris knowledge resources,” including training materials, release notes, and product manuals. A defining feature is that there has never been any paywall or log-on requirement, so that all Knowledge Center materials remain freely accessible to any site visitor. Historically, access to documentation for automated library systems has been restricted to subscribing institutions, so the Knowledge Center represents a unique change in approach.

Within the press release, it is also readily apparent how Ex Libris aims to frame the openness of the Knowledge Center as a form of support for open access. As the company states in the second paragraph, “Demonstrating the Company’s belief in the importance of open access, the site is open to all, without requiring any logon procedure.” Former Ex Libris CEO Matti Shem Tov goes a step further in the following paragraph: “We want our resources and documentation to be as accessible and as open as our library management, discovery, and higher-education technology solutions are.”

The problem with how Ex Libris frames their press release is that it elides the difference between mere openness and actual open access. They are a for-profit company, and their currently burgeoning market share is dependent upon a software-as-a-service (SaaS) business model. Therefore, one way to describe their approach in this case is orangewashing. During a recent conversation with me, Margaret Heller came up with the term, based on the color of the PLOS open access symbol. Similar in concept to greenwashing, we can define orangewashing as a misappropriation of open access rhetoric for business purposes.

What perhaps makes orangewashing more initially difficult to diagnose in Ex Libris’s (and more broadly, ProQuest’s) case is that they attempt to tie support for open access to other product offerings. Even before purchasing Ex Libris, ProQuest had been including an author-side paid open-access publishing option to its Electronic Thesis and Dissertation platform, though we can question whether this is actually a good option for authors. For its part, Ex Libris has listened to customer feedback about open access discovery. As an example, there are now open access filters for both the Primo and Summon discovery layers.

Ex Libris has also, generally speaking, remained open to customer participation regarding systems development, particularly with initiatives like the Developer Network and Idea Exchange. Perhaps the most credible example is in a June 24, 2015 press release, where the company declares “support of the Open Discovery Initiative (ODI) and conformance with ODI’s recommended practice for pre-indexed ‘web-scale’ discovery services.” A key implication is that “conforming to ODI regulations about ranking of search results, linking to content, inclusion of materials in Primo Central, and discovery of open access content all uphold the principles of content neutrality.”

Given the above information, in the case of the Knowledge Center, it is tempting to give Ex Libris the benefit of the doubt. As an access services librarian, I understand how much of a hassle it can be to find and obtain systems documentation in order to properly do my job. I currently work for an Ex Libris institution, and can affirm that the Knowledge Center is of tangible benefit. Besides providing easier availability for their materials, Ex Libris has done fairly well in keeping information and pathing up to date. Notably, as of last month, customers can also contribute their own documentation to product-specific Community Knowledge sections within the Knowledge Center.

Nevertheless, this does not change the fact that while the Knowledge Center is unique in its format, it represents a low bar to clear for a company of Ex Libris’s size. Their systems documentation should be openly accessible in any case. Moreover, the Knowledge Center represents openness—in the form of company transparency and customer participation—for systems and products that are not open. This is why when we go back to the Knowledge Center press release, we can identify it as orangewashing. Open access is not the point of a profit-driven company offering freely accessible documentation, and any claims to this effect ultimately ring hollow.

So what is the likely point of the Knowledge Center, then? We should consider that Alma has become the predominant service platform within academic libraries, with Primo and Summon being the only supported discovery layers for it. While OCLC and EBSCO offer or support competing products, Ex Libris already held an advantageous position even before the ProQuest purchase. Therefore, besides the Knowledge Center serving as supportive measure for current customers, we can view it as a sales pitch to future ones. This may be a smart business strategy, but again, it has little to do with open access.

Two other recent developments provide further evidence of Ex Libris’s orangewashing. The first is MLA’s announcement that EBSCO will become the exclusive vendor for the MLA International Bibliography. On the PRIMO-L listserv, Ex Libris posted a statement [listserv subscription required] noting that the agreement “goes against the goals of NISO’s Open Discovery Initiative…to promote collaboration and transparency among content and discovery providers.” Nevertheless, despite not being involved in the agreement, Ex Libris shares some blame given the long-standing difficulty over EBSCO not providing content to the Primo Central Index. As a result, what may occur is the “siloing” of an indispensable research database, while Ex Libris customers remain dependent on the company to help determine an eventual route to access.

Secondly, in addition to offering research publications through ProQuest and discovery service through Primo/Summon, Ex Libris now provides end-to-end content management through Esploro. Monetizing more aspects of the research process is certainly far from unusual among academic publishers and service providers. Elsevier arguably provides the most egregious example, and as Lisa Janicke Hinchliffe notes, their pattern of recent acquisitions belies an apparent goal of creating a vertical stack service model for publication services.

In considering what Elsevier is doing, it is unsurprising—from a business standpoint—for Ex Libris and ProQuest to pursue profits in a similar manner. That said, we should bear in mind that libraries are already losing control over open access as a consequence of the general strategy that Elsevier is employing. Esploro will likely benefit from having strong library development partners and “open” customer feedback, but the potential end result could place its customers in a more financially disadvantageous and less autonomous position. This is simply antithetical to open access.

Over the past few years, Ex Libris has done well not just in their product development, but also their customer support. Making the Knowledge Center “open to all” in late 2015 was a very positive step forward. Yet the company’s decision to orangewash through claiming support for open access as part of a product unveiling still warrants critique. Peter Suber reminds us that open access is a “revolutionary kind of access”—one that is “unencumbered by a motive of financial gain.” While Ex Libris can perhaps talk about openness with a little more credibility than their competitors, their bottom line is still what really matters.

Managing ILS Updates

We’ve done a few screencasts in the past here at TechConnect and I wanted to make a new one to cover a topic that’s come up this summer: managing ILS updates. Integrated Library Systems are huge, unwieldy pieces of software and it can be difficult to track what changes with each update: new settings are introduced, behaviors change, bugs are (hopefully) fixed. The video belows shows my approach to managing this process and keeping track of ongoing issues with our Koha ILS.

Blockchain: Merits, Issues, and Suggestions for Compelling Use Cases

Blockchain holds a great potential for both innovation and disruption. The adoption of blockchain also poses certain risks, and those risks will need to be addressed and mitigated before blockchain becomes mainstream. A lot of people have heard of blockchain at this point. But many are unfamiliar with how this new technology exactly works and unsure about under which circumstances or on what conditions it may be useful to libraries.

In this post, I will provide a brief overview of the merits and the issues of blockchain. I will also make some suggestions for compelling use cases of blockchain at the end of this post.

What Blockchain Accomplishes

Blockchain is the technology that underpins a well-known decentralized cryptocurrency, Bitcoin. To simply put, blockchain is a kind of distributed digital ledger on a peer-to-peer (P2P) network, in which records are confirmed and encrypted. Blockchain records and keeps data in the original state in a secure and tamper-proof manner[1] by its technical implementation alone, thereby obviating the need for a third-party authority to guarantee the authenticity of the data. Records in blockchain are stored in multiple ledgers in a distributed network instead of one central location. This prevents a single point of failure and secures records by protecting them from potential damage or loss. Blocks in each blockchain ledger are chained to one another by the mechanism called ‘proof of work.’ (For those familiar with a version control system such as Git, a blockchain ledger can be thought of as something similar to a P2P hosted git repository that allows sequential commits only.[2]) This makes records in a block immutable and irreversible, that is, tamper-proof.

In areas where the authenticity and security of records is of paramount importance, such as electronic health records, digital identity authentication/authorization, digital rights management, historic records that may be contested or challenged due to the vested interests of certain groups, and digital provenance to name a few, blockchain can lead to efficiency, convenience, and cost savings.

For example, with blockchain implemented in banking, one will be able to transfer funds across different countries without going through banks.[3] This can drastically lower the fees involved, and the transaction will take effect much more quickly, if not immediately. Similarly, adopted in real estate transactions, blockchain can make the process of buying and selling a property more straightforward and efficient, saving time and money.[4]

Disruptive Potential of Blockchain

The disruptive potential of blockchain lies in its aforementioned ability to render the role of a third-party authority obsolete, which records and validates transactions and guarantees their authenticity, should a dispute arise. In this respect, blockchain can serve as an alternative trust protocol that decentralizes traditional authorities. Since blockchain achieves this by public key cryptography, however, if one loses one’s own personal key to the blockchain ledger holding one’s financial or real estate asset, for example, then that will result in the permanent loss of such asset. With the third-party authority gone, there will be no institution to step in and remedy the situation.

Issues

This is only some of the issues with blockchain. Other issues include (a) interoperability between different blockchain systems, (b) scalability of blockchain at a global scale with large amount of data, (c) potential security issues such as the 51% attack [5], and (d) huge energy consumption [6] that a blockchain requires to add a block to a ledger. Note that the last issue of energy consumption has both environmental and economic ramifications because it can cancel out the cost savings gained from eliminating a third-party authority and related processes and fees.

Challenges for Wider Adoption

There are growing interests in blockchain among information professionals, but there are also some obstacles to those interests gaining momentum and moving further towards wider trial and adoption. One obstacle is the lack of general understanding about blockchain in a larger audience of information professionals. Due to its original association with bitcoin, many mistake blockchain for cryptocurrency. Another obstacle is technical. The use of blockchain requires setting up and running a node in a blockchain network, such as Ethereum[7], which may be daunting to those who are not tech-savvy. This makes a barrier to entry high to those who are not familiar with command line scripting and yet still want to try out and test how a blockchain functions.

The last and most important obstacle is the lack of compelling use cases for libraries, archives, and museums. To many, blockchain is an interesting new technology. But even many blockchain enthusiasts are skeptical of its practical benefits at this point when all associated costs are considered. Of course, this is not an insurmountable obstacle. The more people get familiar with blockchain, the more ways people will discover to use blockchain in the information profession that are uniquely beneficial for specific purposes.

Suggestions for Compelling Use Cases of Blockchain

In order to determine what may make a compelling use case of blockchain, the information profession would benefit from considering the following.

(a) What kind of data/records (or the series thereof) must be stored and preserved exactly the way they were created.

(b) What kind of information is at great risk to be altered and compromised by changing circumstances.

(c) What type of interactions may need to take place between such data/records and their users.[8]

(d) How much would be a reasonable cost for implementation.

These will help connecting the potential benefits of blockchain with real-world use cases and take the information profession one step closer to its wider testing and adoption. To those further interested in blockchain and libraries, I recommend the recordings from the Library 2.018 online mini-conference, “Blockchain Applied: Impact on the Information Profession,” held back in June. The Blockchain National Forum, which is funded by IMLS and is to take place in San Jose, CA on August 6th, will also be livestreamed.

Notes

[1] For an excellent introduction to blockchain, see “The Great Chain of Being Sure about Things,” The Economist, October 31, 2015, https://www.economist.com/news/briefing/21677228-technology-behind-bitcoin-lets-people-who-do-not-know-or-trust-each-other-build-dependable.

[2] Justin Ramos, “Blockchain: Under the Hood,” ThoughtWorks (blog), August 12, 2016, https://www.thoughtworks.com/insights/blog/blockchain-under-hood.

[3] The World Food Programme, the food-assistance branch of the United Nations, is using blockchain to increase their humanitarian aid to refugees. Blockchain may possibly be used for not only financial transactions but also the identity verification for refugees. Russ Juskalian, “Inside the Jordan Refugee Camp That Runs on Blockchain,” MIT Technology Review, April 12, 2018, https://www.technologyreview.com/s/610806/inside-the-jordan-refugee-camp-that-runs-on-blockchain/.

[4] Joanne Cleaver, “Could Blockchain Technology Transform Homebuying in Cook County — and Beyond?,” Chicago Tribune, July 9, 2018, http://www.chicagotribune.com/classified/realestate/ct-re-0715-blockchain-homebuying-20180628-story.html.

[5] “51% Attack,” Investopedia, September 7, 2016, https://www.investopedia.com/terms/1/51-attack.asp.

[6] Sherman Lee, “Bitcoin’s Energy Consumption Can Power An Entire Country — But EOS Is Trying To Fix That,” Forbes, April 19, 2018, https://www.forbes.com/sites/shermanlee/2018/04/19/bitcoins-energy-consumption-can-power-an-entire-country-but-eos-is-trying-to-fix-that/#49ff3aa41bc8.

[7] Osita Chibuike, “How to Setup an Ethereum Node,” The Practical Dev, May 23, 2018, https://dev.to/legobox/how-to-setup-an-ethereum-node-41a7.

[8] The interaction can also be a self-executing program when certain conditions are met in a blockchain ledger. This is called a “smart contract.” See Mike Orcutt, “States That Are Passing Laws to Govern ‘Smart Contracts’ Have No Idea What They’re Doing,” MIT Technology Review, March 29, 2018, https://www.technologyreview.com/s/610718/states-that-are-passing-laws-to-govern-smart-contracts-have-no-idea-what-theyre-doing/.

Introducing Our New Best Friend, GDPR

You’ve seen the letters GDPR in every single email you’ve gotten from a vendor or a mailing list lately, but you might not be exactly sure what it is. With GDPR enforcement starting on May 25, it’s time for a crash course in what GDPR is, and why it could be your new best friend whether you are in the EU or not.

First, you can check out the EU GDPR information site (though it probably will be under heavy load for a few days!) for lots of information on this. It’s important to recognize, however, that for universities like mine with a campus located in the EU, it has created additional oversight to ensure that our own data collection practices are GDPR compliant, or that we restrict people residing in the EU from accessing those services. You should definitely work with legal counsel on your own campus in making any decisions about GDPR compliance.

So what does the GDPR actually mean in practice? The requirements break down this way: any company which holds the data of any EU citizen must provide data controls, no matter where the company or the data is located. This means that every large web platform and pretty much every library vendor must comply or face heavy fines. The GDPR offers the following protections for personally identifiable information, which includes things like IP address: privacy terms and conditions must be written in easy to understand language, data breaches require quick notifications, the right to know what data is being collected and to receive a copy of it, the “right to be forgotten” or data erasure (unless it’s in the public interest for the data to be retained), ability to transfer data between providers, systems to be private by design and only collect necessary data, and for companies to appoint data privacy officers without conflicts of interest. How this all works in practice is not consistent, and there will be a lot to be worked out in the courts in the coming years. Note that Google recently lost several right to be forgotten cases, and were required to remove information that they had originally stated was in the public interest to retain.

The GDPR has actually been around for a few years, but May 25, 2018 was set as the enforcement date, so many people have been scrambling to meet that deadline. If you’re reading this today, there’s probably not a lot of time to do anything about your own practices, but if you haven’t yet reviewed what your vendors are doing, this would be a good time. Note too that there are no rights guaranteed for any Americans, and several companies, including Facebook, have moved data governance out of their Irish office to California to be out of reach of suits brought in Irish courts.

Where possible, however, we should be using all the features at our disposal. As librarians, we already tend to the “privacy by design” philosophy, even though we aren’t always perfect at it. As I wrote in my last post, my library worked on auditing our practices and creating a new privacy policy, and one of the last issues was trying to figure out how we would approach some of our third-party services which we need to provide services to our patrons but that did not allow deleting data. Now some of those features are being made available. For example, Google Analytics now has a data retention feature, which allows you to set data to expire and be deleted after a certain amount of time. Google provides some more detailed instructions to ensure that you are not accidentally collecting personally-identifiable information in your analytics data.

Lots of our library vendors provide personal account features, and those too are subject to these new GDPR features. This means that there are new levels of transparency about what kinds of tracking they are doing, and greater ability for patrons to control data, and for you to control data on the behalf of patrons. Here are a few example vendor GDPR compliance statements or FAQs:

Note that some vendors, like EBSCO, are moving to HTTPS for all sites that weren’t before, and so this may require changes to proxy servers or other links.

I am excited about GDPR because no matter where we are located, it gives us new tools to defend the privacy of our patrons. Even better than that, it is providing lots of opportunities on our campuses to talk about privacy with all stakeholders. At my institution, the library has been able to showcase our privacy expertise and have some good conversations about data governance and future goals for privacy. It doesn’t mean that all our problems will be solved, but we are moving in a more positive direction.

Names are Hard

A while ago I stumbled onto the post “Falsehoods Programmers Believe About Names” and was stunned. Personal names are one of the most deceptively difficult forms of data to work with and this article touched on so many common but unaddressed problems. Assumptions like “people have exactly one canonical name” and “My system will never have to deal with names from China/Japan/Korea” were apparent everywhere. I consider myself a fairly critical and studious person, I devote time to thinking about the consequences of design decisions and carefully attempt to avoid poor assumptions. But I’ve repeatedly run into trouble when handling personal names as data. There is a cognitive dissonance surrounding names; we treat them as rigid identifiers when they’re anything but. We acknowledge their importance but struggle to take them as seriously.

Names change. They change due to marriage, divorce, child custody, adoption, gender identity, religious devotion, performance art, witness protection, or none of these at all. Sometimes people just want a new name. And none of these reasons for change are more or less valid than others, though our legal system doesn’t always treat them equally. We have students who change their legal name, which is often something systems expect, but then they have the audacity to want to change their username, too! And that works less often because all sorts of system integrations expect usernames to be persistent.

Names do not have a universal structure. There is no set quantity of components in a name nor an established order to those components. At my college, we have students without surnames. In almost all our systems, surname is a required field, so we put a period “.” there to satisfy that requirement. Then, on displays in our digital repository where surnames are assumed, we end up with bolded section headers like “., Johnathan” which look awkward.

Many Western names might follow a [Given name] – [Middle name] – [Surname] structure and an unfortunate number of the systems I have to deal with assume all names share this structure. It’s easy to see how this yields problematic results. For instance, if you want to a see a sorted list of users, you probably want to sort by family name, but many systems sort by the name in the last position causing, for instance, Chinese names 1 to be handled differently from Western ones. 2 But it’s not only that someone might not have a middle name, or might have two middle names, or might have a family name in the first position—no, even that would be too simple! Some name components defy simple classifications. I once met a person named “Bus Stop”. “Stop” is clearly not a family affiliation, despite coming in the final position of the name. Sometimes the second component of a tripartite Western name isn’t a middle name at all, but a maiden name or the second word of a two-word first name (e.g. “Mary Anne” or “Lady Bird”)! One cannot even determine by looking at a familiar structure the roles of all of a name’s pieces!

Names are also contextual. One’s name with family, with legal institutions, and with classmates can all differ. Many of our international students have alternative Westernized first names. Their family may call them Qiáng but they introduce themselves as Brian in class. We ask for a “preferred name” in a lot of systems, which is a nice step forward, but don’t ask when it’s preferred. Names might be meant for different situations. We have no system remotely ready for this, despite the personalization that’s been seeping into web platforms for decades.

So if names are such a trouble, why not do our best and move on? Aren’t these fringe cases that don’t affect the vast majority of our users? These issues simply cannot be ignored because names are vital. What one is called, even if it’s not a stable identifier, has great effects on one’s life. It’s dispiriting to witness one’s name misspelled, mispronounced, treated as an inconvenience, botched at every turn. A system that won’t adapt to suit a name delegitimizes the name. It says, “oh that’s not your real name” as if names had differing degrees of reality. But a person may have multiple names—or many overlapping names over time—and while one may be more institutionally recognized at a given time, none are less real than the others. If even a single student a year is affected, it’s the absolute least amount of respect we can show to affirm their name(s).

So what do we to do? Endless enumerations of the difficulties of working with names does little but paralyze us. Honestly, when I consider about the best implementation of personal names, the MODS metadata schema comes to mind. Having a <name> element with any number of <namePart> children is the best model available. The <namePart>s can be ordered in particular ways, a “@type” attribute can define a part’s function 3, a record can include multiple names referencing the same person, multiple names with distinct parts can be linked to the same authority record, etc. MODS has a flexible and comprehensive treatment of name data. Unfortunately, returning to “Falsehoods Programmers Believe”, none of the library systems I administer do anywhere near as good a job as this metadata schema. Nor is it necessarily a problem with Western bias—even the Chinese government can’t develop computer systems to accurately represent the names of people in the country, or even agree on what the legal character set should be! 4 It seems that programmers start their apps by creating a “users” database table with columns for unique identifier, username, “firstname”/”lastname” [sic], and work from there. On the bright side, the name isn’t used as the identifier at least! We all learned that in databases class but we didn’t learn to make “names” a separate table linked to “users” in our relational databases.

In my day-to-day work, the best I’ve done is to be sensitive to the importance of names changes specifically and how our systems handle them. After a few meetings with a cross-departmental team, we developed a name change process at our college. System administrators from across the institution are on a shared listserv where name changes are announced. In the libraries, I spoke with our frontline service staff about assisting with name changes. Our people at the circulation desk know to notice name discrepancies—sometimes a name badge has been updated but not our catalog records, we can offer to make them match—but also to guide students who may need to contact the registrar or other departments on campus to initiate the top-down name change process. While most of our the library’s systems don’t easily accommodate username changes, I can write administrative scripts for our institutional repository that alter the ownership of a set of items from an old username to a new one. I think it’s important to remember that we’re inconveniencing the user with the work of implementing their name change and not the other way around. So taking whatever extra steps we can do on our own, without pushing labor onto our students and staff, is the best way we can mitigate how poorly our tools are able to support the protean nature of personal names.

Notes

  1. Chinese names typically have the surname first, followed by the given name.
  2. Another poor implementation can be seen in The Chicago Manual of Style‘s indexing instructions, which has an extensive list of exceptions to the Western norm and how to handle them. But CMoS provides no guidance on how one would go about identifying a name’s cultural background or, for instance, identifying a compound surname.
  3. Although the MODS user guidelines sadly limit the use of the type attribute to a fixed list of values which includes “family” and “given”, rendering it subject to most of the critiques in this post. Substantially expanding this list with “maiden”, “patronymic/matronymic” (names based on a parental given name, e.g. Mikhailovich), and more, as well as some sort of open-ended “other” option, would be a great improvement.
  4. https://www.nytimes.com/2009/04/21/world/asia/21china.html

Our Assumptions: of Neutrality, of People, & of Systems

Discussions of neutrality have been coming up a lot in libraryland recently. I would argue that people have been talking about this for years1 2 3 4,  but this year we saw a confluence of events drive the “neutrality of libraries” topic to the fore. To be clear, I have a position on this on this topic5 and it is that libraries cannot be neutral players and still claim to be a part of the society they serve. But this post is about what we assume to be neutral, what we bring forward with those assumptions, and how to we react when those assumptions are challenged. When we challenge ideas that have been built into systems, either as “benevolent, neutral” librarians or “pure logic, neutral” algorithms, what part of ourselves are we challenging? How do reactions change based on who is doing the challenging? Be forewarned, this is a convoluted landscape.

At the 2018 ALA Midwinter conference, the ALA President’s program was a debate about neutrality. I will not summarize that event (see here), but I do want to call attention to something that became very clear in the course of the program: everyone was using a different definition of neutrality. People spoke with assumptions of what neutrality means and why they do, or do not, believe that it is important for libraries to maintain. But what are we assuming when we make these assumptions? Without an agreed upon definition, some referred to legal rulings to define neutrality, some used a dictionary definition (“not aligned with a political or ideological grouping” – Merriam Webster) without probing how political or ideological perspectives play out in real life. But why do we assume libraries should be neutral? What safety or security does that assumption carry? What else are we assuming should be neutral? Software? Analytics? What value judgements are we bringing forward with those assumptions?

An assumption of neutrality often comes with a transference of trust. A speaker at ALA even said that the three professions thought of as the most trustworthy (via a national poll) are firefighting, nursing, and librarianship, and so, by his logic, we must be neutral. Perhaps some do not conflate trust and neutrality, but when we do assume neutrality equates with trust in these situations, we remove the human aspect from the equation. Nurses and librarians, as people, are not neutral. People hold biases and a variety of lived experiences that shape perspectives and approaches. If you engage this line of thought and interrogate your assumptions and beliefs, it can become apparent that it takes effort to recognize and mitigate our human biases throughout the various interactions in our lives.

What of our technology? Systems and software are often put forward as logic-driven, neutral devices, considered apart from their human creators. The position of some people is that machines lack emotions and are, therefore, immune to our human biases and prejudices. This position is inaccurate and dangerous and requires our attention. Algorithms and analytics are not neutral. They are designed by people, who carry forward their own notions of what is true and what is neutral. These ideas are built into the structure of the systems and have the potential to influence our perception of reality. As we rely on “data-driven decision-making” across all aspects of our society — education, healthcare, entertainment, policy — we transfer trust and power to that data. All too often, we do that without scrutinizing the sources of the data, or the algorithms acting upon them. Moreover, as we push further into machine learning systems – systems that are trained on data to look for patterns and optimize processes – we open the door for those systems to amplify biases. To “learn” our systemic prejudices and inequities.

People far more expert in this domain than me have raised these questions and researched the effects that biased systems can have on our society6 7 8. I often bring these issues up when I want to emphasize how problematic it is to let the assumption of data-driven outcomes as “truth” persist and how critical it is to apply information literacy practices to data. But as I thought about this issue and read more from these experts, I have been struck by the variety of responses that these experts illicit. How do reactions change based on who is doing the challenging?

Angela Galvan questioned assumptions related to hiring, performance, and belonging in librarianship, based on the foundation of the profession’s “whiteness,” and was met with hostile comments on the post9. Nicole A. Cooke wrote about implicit assumptions when we write about tolerance and diversity and has been met with hostile comments10 while her micro-aggression research has been highlighted by Campus Reform11, which led to a series of hostile communications to her. Chris Bourg’s keynote about diversity and technology at Code4Lib was met with hostility12. Safiya Noble wrote a book about bias in algorithms and technology, which resulted in one of the more spectacular Twitter disasters13 14, wherein someone found it acceptable to dismiss her research without even reading the book.

Assumptions of neutrality, whether it be related to library services, space, collections, or the people doing the work, allow oppressive systems to persist and contribute to a climate where the perspectives and expertise of marginalized people in particular can be dismissed. Insisting that we promulgate the library and technology – and the people working in it and with it – as neutral actors, erases the realities that these women (and countless others) have experienced. Moreover, it allows the those operating with harmful and discriminatory assumptions to believe that they *are* neutral, by virtue of working in those spaces, and that their truth is an objective truth. It limits the desire for dialog, discourse, and growth – because who is really motivated to listen when you think you are operating from a place of “Truth”…when you feel that the strength of your assumptions can invalidate a person’s life?