Blockchain: Merits, Issues, and Suggestions for Compelling Use Cases

Blockchain holds a great potential for both innovation and disruption. The adoption of blockchain also poses certain risks, and those risks will need to be addressed and mitigated before blockchain becomes mainstream. A lot of people have heard of blockchain at this point. But many are unfamiliar with how this new technology exactly works and unsure about under which circumstances or on what conditions it may be useful to libraries.

In this post, I will provide a brief overview of the merits and the issues of blockchain. I will also make some suggestions for compelling use cases of blockchain at the end of this post.

What Blockchain Accomplishes

Blockchain is the technology that underpins a well-known decentralized cryptocurrency, Bitcoin. To simply put, blockchain is a kind of distributed digital ledger on a peer-to-peer (P2P) network, in which records are confirmed and encrypted. Blockchain records and keeps data in the original state in a secure and tamper-proof manner[1] by its technical implementation alone, thereby obviating the need for a third-party authority to guarantee the authenticity of the data. Records in blockchain are stored in multiple ledgers in a distributed network instead of one central location. This prevents a single point of failure and secures records by protecting them from potential damage or loss. Blocks in each blockchain ledger are chained to one another by the mechanism called ‘proof of work.’ (For those familiar with a version control system such as Git, a blockchain ledger can be thought of as something similar to a P2P hosted git repository that allows sequential commits only.[2]) This makes records in a block immutable and irreversible, that is, tamper-proof.

In areas where the authenticity and security of records is of paramount importance, such as electronic health records, digital identity authentication/authorization, digital rights management, historic records that may be contested or challenged due to the vested interests of certain groups, and digital provenance to name a few, blockchain can lead to efficiency, convenience, and cost savings.

For example, with blockchain implemented in banking, one will be able to transfer funds across different countries without going through banks.[3] This can drastically lower the fees involved, and the transaction will take effect much more quickly, if not immediately. Similarly, adopted in real estate transactions, blockchain can make the process of buying and selling a property more straightforward and efficient, saving time and money.[4]

Disruptive Potential of Blockchain

The disruptive potential of blockchain lies in its aforementioned ability to render the role of a third-party authority obsolete, which records and validates transactions and guarantees their authenticity, should a dispute arise. In this respect, blockchain can serve as an alternative trust protocol that decentralizes traditional authorities. Since blockchain achieves this by public key cryptography, however, if one loses one’s own personal key to the blockchain ledger holding one’s financial or real estate asset, for example, then that will result in the permanent loss of such asset. With the third-party authority gone, there will be no institution to step in and remedy the situation.

Issues

This is only some of the issues with blockchain. Other issues include (a) interoperability between different blockchain systems, (b) scalability of blockchain at a global scale with large amount of data, (c) potential security issues such as the 51% attack [5], and (d) huge energy consumption [6] that a blockchain requires to add a block to a ledger. Note that the last issue of energy consumption has both environmental and economic ramifications because it can cancel out the cost savings gained from eliminating a third-party authority and related processes and fees.

Challenges for Wider Adoption

There are growing interests in blockchain among information professionals, but there are also some obstacles to those interests gaining momentum and moving further towards wider trial and adoption. One obstacle is the lack of general understanding about blockchain in a larger audience of information professionals. Due to its original association with bitcoin, many mistake blockchain for cryptocurrency. Another obstacle is technical. The use of blockchain requires setting up and running a node in a blockchain network, such as Ethereum[7], which may be daunting to those who are not tech-savvy. This makes a barrier to entry high to those who are not familiar with command line scripting and yet still want to try out and test how a blockchain functions.

The last and most important obstacle is the lack of compelling use cases for libraries, archives, and museums. To many, blockchain is an interesting new technology. But even many blockchain enthusiasts are skeptical of its practical benefits at this point when all associated costs are considered. Of course, this is not an insurmountable obstacle. The more people get familiar with blockchain, the more ways people will discover to use blockchain in the information profession that are uniquely beneficial for specific purposes.

Suggestions for Compelling Use Cases of Blockchain

In order to determine what may make a compelling use case of blockchain, the information profession would benefit from considering the following.

(a) What kind of data/records (or the series thereof) must be stored and preserved exactly the way they were created.

(b) What kind of information is at great risk to be altered and compromised by changing circumstances.

(c) What type of interactions may need to take place between such data/records and their users.[8]

(d) How much would be a reasonable cost for implementation.

These will help connecting the potential benefits of blockchain with real-world use cases and take the information profession one step closer to its wider testing and adoption. To those further interested in blockchain and libraries, I recommend the recordings from the Library 2.018 online mini-conference, “Blockchain Applied: Impact on the Information Profession,” held back in June. The Blockchain National Forum, which is funded by IMLS and is to take place in San Jose, CA on August 6th, will also be livestreamed.

Notes

[1] For an excellent introduction to blockchain, see “The Great Chain of Being Sure about Things,” The Economist, October 31, 2015, https://www.economist.com/news/briefing/21677228-technology-behind-bitcoin-lets-people-who-do-not-know-or-trust-each-other-build-dependable.

[2] Justin Ramos, “Blockchain: Under the Hood,” ThoughtWorks (blog), August 12, 2016, https://www.thoughtworks.com/insights/blog/blockchain-under-hood.

[3] The World Food Programme, the food-assistance branch of the United Nations, is using blockchain to increase their humanitarian aid to refugees. Blockchain may possibly be used for not only financial transactions but also the identity verification for refugees. Russ Juskalian, “Inside the Jordan Refugee Camp That Runs on Blockchain,” MIT Technology Review, April 12, 2018, https://www.technologyreview.com/s/610806/inside-the-jordan-refugee-camp-that-runs-on-blockchain/.

[4] Joanne Cleaver, “Could Blockchain Technology Transform Homebuying in Cook County — and Beyond?,” Chicago Tribune, July 9, 2018, http://www.chicagotribune.com/classified/realestate/ct-re-0715-blockchain-homebuying-20180628-story.html.

[5] “51% Attack,” Investopedia, September 7, 2016, https://www.investopedia.com/terms/1/51-attack.asp.

[6] Sherman Lee, “Bitcoin’s Energy Consumption Can Power An Entire Country — But EOS Is Trying To Fix That,” Forbes, April 19, 2018, https://www.forbes.com/sites/shermanlee/2018/04/19/bitcoins-energy-consumption-can-power-an-entire-country-but-eos-is-trying-to-fix-that/#49ff3aa41bc8.

[7] Osita Chibuike, “How to Setup an Ethereum Node,” The Practical Dev, May 23, 2018, https://dev.to/legobox/how-to-setup-an-ethereum-node-41a7.

[8] The interaction can also be a self-executing program when certain conditions are met in a blockchain ledger. This is called a “smart contract.” See Mike Orcutt, “States That Are Passing Laws to Govern ‘Smart Contracts’ Have No Idea What They’re Doing,” MIT Technology Review, March 29, 2018, https://www.technologyreview.com/s/610718/states-that-are-passing-laws-to-govern-smart-contracts-have-no-idea-what-theyre-doing/.

Names are Hard

A while ago I stumbled onto the post “Falsehoods Programmers Believe About Names” and was stunned. Personal names are one of the most deceptively difficult forms of data to work with and this article touched on so many common but unaddressed problems. Assumptions like “people have exactly one canonical name” and “My system will never have to deal with names from China/Japan/Korea” were apparent everywhere. I consider myself a fairly critical and studious person, I devote time to thinking about the consequences of design decisions and carefully attempt to avoid poor assumptions. But I’ve repeatedly run into trouble when handling personal names as data. There is a cognitive dissonance surrounding names; we treat them as rigid identifiers when they’re anything but. We acknowledge their importance but struggle to take them as seriously.

Names change. They change due to marriage, divorce, child custody, adoption, gender identity, religious devotion, performance art, witness protection, or none of these at all. Sometimes people just want a new name. And none of these reasons for change are more or less valid than others, though our legal system doesn’t always treat them equally. We have students who change their legal name, which is often something systems expect, but then they have the audacity to want to change their username, too! And that works less often because all sorts of system integrations expect usernames to be persistent.

Names do not have a universal structure. There is no set quantity of components in a name nor an established order to those components. At my college, we have students without surnames. In almost all our systems, surname is a required field, so we put a period “.” there to satisfy that requirement. Then, on displays in our digital repository where surnames are assumed, we end up with bolded section headers like “., Johnathan” which look awkward.

Many Western names might follow a [Given name] – [Middle name] – [Surname] structure and an unfortunate number of the systems I have to deal with assume all names share this structure. It’s easy to see how this yields problematic results. For instance, if you want to a see a sorted list of users, you probably want to sort by family name, but many systems sort by the name in the last position causing, for instance, Chinese names 1 to be handled differently from Western ones. 2 But it’s not only that someone might not have a middle name, or might have two middle names, or might have a family name in the first position—no, even that would be too simple! Some name components defy simple classifications. I once met a person named “Bus Stop”. “Stop” is clearly not a family affiliation, despite coming in the final position of the name. Sometimes the second component of a tripartite Western name isn’t a middle name at all, but a maiden name or the second word of a two-word first name (e.g. “Mary Anne” or “Lady Bird”)! One cannot even determine by looking at a familiar structure the roles of all of a name’s pieces!

Names are also contextual. One’s name with family, with legal institutions, and with classmates can all differ. Many of our international students have alternative Westernized first names. Their family may call them Qiáng but they introduce themselves as Brian in class. We ask for a “preferred name” in a lot of systems, which is a nice step forward, but don’t ask when it’s preferred. Names might be meant for different situations. We have no system remotely ready for this, despite the personalization that’s been seeping into web platforms for decades.

So if names are such a trouble, why not do our best and move on? Aren’t these fringe cases that don’t affect the vast majority of our users? These issues simply cannot be ignored because names are vital. What one is called, even if it’s not a stable identifier, has great effects on one’s life. It’s dispiriting to witness one’s name misspelled, mispronounced, treated as an inconvenience, botched at every turn. A system that won’t adapt to suit a name delegitimizes the name. It says, “oh that’s not your real name” as if names had differing degrees of reality. But a person may have multiple names—or many overlapping names over time—and while one may be more institutionally recognized at a given time, none are less real than the others. If even a single student a year is affected, it’s the absolute least amount of respect we can show to affirm their name(s).

So what do we to do? Endless enumerations of the difficulties of working with names does little but paralyze us. Honestly, when I consider about the best implementation of personal names, the MODS metadata schema comes to mind. Having a <name> element with any number of <namePart> children is the best model available. The <namePart>s can be ordered in particular ways, a “@type” attribute can define a part’s function 3, a record can include multiple names referencing the same person, multiple names with distinct parts can be linked to the same authority record, etc. MODS has a flexible and comprehensive treatment of name data. Unfortunately, returning to “Falsehoods Programmers Believe”, none of the library systems I administer do anywhere near as good a job as this metadata schema. Nor is it necessarily a problem with Western bias—even the Chinese government can’t develop computer systems to accurately represent the names of people in the country, or even agree on what the legal character set should be! 4 It seems that programmers start their apps by creating a “users” database table with columns for unique identifier, username, “firstname”/”lastname” [sic], and work from there. On the bright side, the name isn’t used as the identifier at least! We all learned that in databases class but we didn’t learn to make “names” a separate table linked to “users” in our relational databases.

In my day-to-day work, the best I’ve done is to be sensitive to the importance of names changes specifically and how our systems handle them. After a few meetings with a cross-departmental team, we developed a name change process at our college. System administrators from across the institution are on a shared listserv where name changes are announced. In the libraries, I spoke with our frontline service staff about assisting with name changes. Our people at the circulation desk know to notice name discrepancies—sometimes a name badge has been updated but not our catalog records, we can offer to make them match—but also to guide students who may need to contact the registrar or other departments on campus to initiate the top-down name change process. While most of our the library’s systems don’t easily accommodate username changes, I can write administrative scripts for our institutional repository that alter the ownership of a set of items from an old username to a new one. I think it’s important to remember that we’re inconveniencing the user with the work of implementing their name change and not the other way around. So taking whatever extra steps we can do on our own, without pushing labor onto our students and staff, is the best way we can mitigate how poorly our tools are able to support the protean nature of personal names.

Notes

  1. Chinese names typically have the surname first, followed by the given name.
  2. Another poor implementation can be seen in The Chicago Manual of Style‘s indexing instructions, which has an extensive list of exceptions to the Western norm and how to handle them. But CMoS provides no guidance on how one would go about identifying a name’s cultural background or, for instance, identifying a compound surname.
  3. Although the MODS user guidelines sadly limit the use of the type attribute to a fixed list of values which includes “family” and “given”, rendering it subject to most of the critiques in this post. Substantially expanding this list with “maiden”, “patronymic/matronymic” (names based on a parental given name, e.g. Mikhailovich), and more, as well as some sort of open-ended “other” option, would be a great improvement.
  4. https://www.nytimes.com/2009/04/21/world/asia/21china.html

Our Assumptions: of Neutrality, of People, & of Systems

Discussions of neutrality have been coming up a lot in libraryland recently. I would argue that people have been talking about this for years1 2 3 4,  but this year we saw a confluence of events drive the “neutrality of libraries” topic to the fore. To be clear, I have a position on this on this topic5 and it is that libraries cannot be neutral players and still claim to be a part of the society they serve. But this post is about what we assume to be neutral, what we bring forward with those assumptions, and how to we react when those assumptions are challenged. When we challenge ideas that have been built into systems, either as “benevolent, neutral” librarians or “pure logic, neutral” algorithms, what part of ourselves are we challenging? How do reactions change based on who is doing the challenging? Be forewarned, this is a convoluted landscape.

At the 2018 ALA Midwinter conference, the ALA President’s program was a debate about neutrality. I will not summarize that event (see here), but I do want to call attention to something that became very clear in the course of the program: everyone was using a different definition of neutrality. People spoke with assumptions of what neutrality means and why they do, or do not, believe that it is important for libraries to maintain. But what are we assuming when we make these assumptions? Without an agreed upon definition, some referred to legal rulings to define neutrality, some used a dictionary definition (“not aligned with a political or ideological grouping” – Merriam Webster) without probing how political or ideological perspectives play out in real life. But why do we assume libraries should be neutral? What safety or security does that assumption carry? What else are we assuming should be neutral? Software? Analytics? What value judgements are we bringing forward with those assumptions?

An assumption of neutrality often comes with a transference of trust. A speaker at ALA even said that the three professions thought of as the most trustworthy (via a national poll) are firefighting, nursing, and librarianship, and so, by his logic, we must be neutral. Perhaps some do not conflate trust and neutrality, but when we do assume neutrality equates with trust in these situations, we remove the human aspect from the equation. Nurses and librarians, as people, are not neutral. People hold biases and a variety of lived experiences that shape perspectives and approaches. If you engage this line of thought and interrogate your assumptions and beliefs, it can become apparent that it takes effort to recognize and mitigate our human biases throughout the various interactions in our lives.

What of our technology? Systems and software are often put forward as logic-driven, neutral devices, considered apart from their human creators. The position of some people is that machines lack emotions and are, therefore, immune to our human biases and prejudices. This position is inaccurate and dangerous and requires our attention. Algorithms and analytics are not neutral. They are designed by people, who carry forward their own notions of what is true and what is neutral. These ideas are built into the structure of the systems and have the potential to influence our perception of reality. As we rely on “data-driven decision-making” across all aspects of our society — education, healthcare, entertainment, policy — we transfer trust and power to that data. All too often, we do that without scrutinizing the sources of the data, or the algorithms acting upon them. Moreover, as we push further into machine learning systems – systems that are trained on data to look for patterns and optimize processes – we open the door for those systems to amplify biases. To “learn” our systemic prejudices and inequities.

People far more expert in this domain than me have raised these questions and researched the effects that biased systems can have on our society6 7 8. I often bring these issues up when I want to emphasize how problematic it is to let the assumption of data-driven outcomes as “truth” persist and how critical it is to apply information literacy practices to data. But as I thought about this issue and read more from these experts, I have been struck by the variety of responses that these experts illicit. How do reactions change based on who is doing the challenging?

Angela Galvan questioned assumptions related to hiring, performance, and belonging in librarianship, based on the foundation of the profession’s “whiteness,” and was met with hostile comments on the post9. Nicole A. Cooke wrote about implicit assumptions when we write about tolerance and diversity and has been met with hostile comments10 while her micro-aggression research has been highlighted by Campus Reform11, which led to a series of hostile communications to her. Chris Bourg’s keynote about diversity and technology at Code4Lib was met with hostility12. Safiya Noble wrote a book about bias in algorithms and technology, which resulted in one of the more spectacular Twitter disasters13 14, wherein someone found it acceptable to dismiss her research without even reading the book.

Assumptions of neutrality, whether it be related to library services, space, collections, or the people doing the work, allow oppressive systems to persist and contribute to a climate where the perspectives and expertise of marginalized people in particular can be dismissed. Insisting that we promulgate the library and technology – and the people working in it and with it – as neutral actors, erases the realities that these women (and countless others) have experienced. Moreover, it allows the those operating with harmful and discriminatory assumptions to believe that they *are* neutral, by virtue of working in those spaces, and that their truth is an objective truth. It limits the desire for dialog, discourse, and growth – because who is really motivated to listen when you think you are operating from a place of “Truth”…when you feel that the strength of your assumptions can invalidate a person’s life?

Information Architecture for a Library Website Redesign

My library is about to embark upon a large website redesign during this summer semester. This isn’t going to be just a new layer of CSS, or a minor version upgrade to Drupal, or moving a few pages around within the same general site. No, it’s going to be a huge, sweeping change that affects the whole of our web presence. With such an enormous task at hand, I wanted to discuss some of the tools and approaches that we’re using to make sure the new site meets our needs.

Why Redesign?

I’ve heard about why the wholesale website redesign is a flawed approach, why we should be continually, iteratively working on our sites. Continual changes stop problems from building up, plus large swaths of changes can disrupt our users who were used to the old site. The gradual redesign makes a lot of sense to me, and also seems like a complete luxury that I’ve never had in my library positions.

The primary problem with a series of smaller changes is that that approach assumes a solid fundamental to begin with. Our current site, however, has a host of interconnected problems that makes tackling any individual issue a challenge. It’s like your holiday lights sitting in a box all year; they’re hopelessly tangled by the time you take them out again.

Our site has decades of discarded, forgotten content. That’s mostly harmless; it’s hard to find and sees virtually no traffic. But it’s still not great to have outdated information scattered around. In particular, I’m not thrilled that a lot of it is static HTML, images, and documents sitting outside our content management system. It’s hard to know how much content we even have because it cannot be managed in one place.

We also fell into a pattern of adding content to the site but never removing or re-organizing existing content. Someone would ask for a button here, or a page dictating a policy there, or a new FAQ entry. Pages that were added didn’t have particular owners responsible for their currency and maintenance; I, as Systems Librarian, was expected to run the technical aspects of the site but also be its primary content editor. That’s simply an impossible task, as I don’t know every detail of the library’s operations or have the time to keep on top of a menagerie of pages of dubious importance.

I tried to create a “website changes form” to manage things, but it didn’t work for staff nor myself. The few staff who did fill out the form ended up requesting things that were difficult to do, large theme changes that I wasn’t comfortable making without user testing or approval from our other librarians. The little content that was added was minor text being ferried through this form and myself, essentially slowing down the editorial process and furthering this idea that web content was solely my domain.

To top our content troubles off, we’re also on an unsupported, outdated version of Drupal. Upgrading or switching a CMS isn’t necessarily related to a website redesign. If you have a functional website on a broken piece of software, you probably don’t want to toss out the good with the bad. But in our case, similar to how our ILS migration gave us the opportunity to clean up our bibliographic records, a CMS migration gives us a chance to rebuild a crumbling website. It just doesn’t make sense to invest technical effort in migrating all our existing content when it’s so clearly in need of major structural change.

Card Sort

Making a card sort
Cards in the middle of being constructed.

Not wanting to go into a redesign process blind, we set out to collect data on our current site and how it could be improved. One of the first ways we gathered data was to ask all library staff to perform a card sort. A card sort is an activity wherein pieces of web content are put on cards which can then be placed into categories; the idea is to form a rough information architecture for your site which can dictate structure and main menus. You can do either open or closed card sorts, meaning the categories are up to the participants to invent or provided ahead of time.

For our card sort, I chose to do an open card sort since we were so uncertain on the categories. Secondly, I selected web content based on our existing site’s analytics. It was clear to me that our current site was bloated and disorganized; there were pages tucked into the nooks of cyberspace that no one had visited in years. There was all sorts of overlapping and unnecessary content. So I selected ≈20 popular pages but also gave each group two pieces of blank paper on which to add whatever content they felt was missing.

Finally, trying to get as much and as useful data as possible, I modified the card sort procedure in a couple ways. I asked people to role play as different types of stakeholders (graduate & undergraduate students, faculty, administrators) and to justify their decisions from that vantage point. I also had everyone, after sorting was done, put dots on content they felt was important enough for the home page. Since one of our current site’s primary challenges in maintenance, or the lack thereof, I wanted to add one last activity wherein participants would write a “responsible staff member” on each card (e.g. the instruction librarian maintains the instruction policy page). Sadly, we ran out of time and couldn’t do that bit.

The results of the card sort were informative. A few categories emerged as a commonality across everyone’s sorts: collections, “about us”, policies, and current events/news. We discovered a need for new content to cover workshops, exhibits, and events happening in the library which were currently only represented (and not very well) on blog posts. In terms of the home page, it was clear that LibGuides, collections, news, and most importantly our open hours needed to represented.

Treejack & Analytics

Once we had enough information to build out the site’s architecture, I organized our content into a few major categories. But there were still several questions on my mind: would users understand terms like “special collections”? Would they understand where to look for LibGuides? Would they know how to find the right contact for various questions? To answer some of these questions, I turned to Optimal Workshop’s “Treejack” tool. Treejack tests a site’s information architecture by having users navigate basic text links to perform basic tasks. We created a few tasks aimed at answering our questions and recruited students to perform them. While we’re only using the free tier of Optimal Workshop, and only using student stakeholders, the data was till informative.

For one, Optimal Workshop’s results data is rich and visualized well. It shows the exact routes each user took through our site’s content, the time it took to complete a task, and whether a task was completed directly, completed indirectly, or failed. Completed directly means the user took an ideal route through our content; no bouncing up and down the site’s hierarchy. Indirect completion means they eventually got to the right place, but didn’t take a perfect path there, while failure means they ended in the wrong place. The graph’s the demonstrate each tasks’ outcomes are wonderful:

Data & charts for a task
The data & charts Treejack shows for a moderately successful task.
"Pie tree" visualizing users' paths
A “pie tree” showing users’ paths while attempting a task.

We can see here that most of our users found their way to LibGuides (named “study guides” here). But a few people expected to find them under our “Collections” category and bounced around in there, clearly lost. This tells us we should represent our guides under Collections alongside items like databases, print collections, and course reserves. While building and running your own Treejack-type tests would be easy, I definitely recommend Optimal Workshop as a great product which provides much insight.

There’s much work to be done in terms of testing—ideally we would adjust our architecture to address the difficulties that users had, recruit different sets of users (faculty & staff), and attempt to answer more questions. That’ll be difficult during the summer while there are fewer people on campus but we know enough now to start adjusting our site and moving along in the redesign process.

Another piece of our redesign philosophy is using analytics about the current site to inform our decisions about the new one. For instance, I track interactions with our home page search box using Google Analytics events 1. The search box has three tabs corresponding to our discovery layer, catalog, and LibGuides. Despite thousands of searches and interactions with the search box, LibGuides search is seeing only trace usage. The tab was clicked on a mere 181 times this year; what’s worse, only 51 times did a user actually search afterwards. This trace amount of usage, plus the fact that users are clearly clicking onto the tab and then not finding what they want there, indicates it’s just not worth any real estate on the home page. When you add in that our LibGuides now appear in our discovery layer, their search tab is clearly disposable.

What’s Next

Data, tests, and conceptual frameworks aside, our next stage will involve building something much closer to an actual, functional website. Tools like Optimal Workshop are wonderful for providing high-level views on how to structure our information, but watching a user interact with a prototype site is so much richer. We can see their hesitation, hear them discuss the meanings of our terms, get their opinions on our stylistic choices. Prototype testing has been a struggle for me in the past; users tend to fixate on the unfinished or unrefined nature of the prototype, providing feedback that tells me what I already know (yes, we need to replace the placeholder images; yes, “Lorem ipsum dolor sit amet” is written on every page) rather than something new. I hope to counter that by setting appropriate expectations and building a small but fairly robust prototype.

We’re also building our site in an entirely new piece of software, Wagtail. Wagtail is exciting for a number of reasons, and will probably have to be the subject of future posts, but it does help address some of the existing issues I noted earlier. We’re excited by the innovative Streamfield approach to content—a replacement for large, rich text fields which are unstructured and often let users override a site’s base styles. We’ve also heard whispers of new workflow features which let us send reminders to owners of different content pages to revisit them periodically. While I could do something like this myself with an ad hoc mess of calendar events and spreadsheets, having it build right into the CMS bodes well for our future maintenace plans. Obviously, the concepts underlying Wagtail and the tools it offers will influence how we implement our information architecture. But we also started gathering data long before we knew what software we’d use, so exactly how it will work remains to be figured out.

Has your library done a website redesign or information architecture test recently? What tools or approaches did you find useful? Let us know in the comments!

Notes

  1. I described Google Analytics events before in a previous Tech Connect post

Creating an OAI-PMH Feed From Your Website

Libraries who use a flexible content management system such as Drupal or WordPress for their library website and/or resource discovery have a challenge in ensuring that their data is accessible to the rest of the library world. Whether making metadata useable by other libraries or portals such as DPLA, or harvesting content in a discovery layer, there are some additional steps libraries need to take to make this happen. While there are a number of ways to accomplish this, the most straightforward is to create an OAI-PMH feed. OAI-PMH stands for Open Archives Initiative Protocol for Metadata Harvesting, and is a well-supported and understood protocol in many metadata management systems. There’s a tutorial available to understand the details you might want to know, and the Open Archives Initiative has detailed documentation.

Content management tools designed specifically for library and archives usage, such as LibGuides and Omeka, have a built in OAI-PMH feed, and generally all you need to do is find the base URL and plug it in. (For instance, here is what a LibGuides OAI feed looks like). In this post I’ll look at what options are available for Drupal and WordPress to create the feed and become a data provider.

WordPress

This is short, since there aren’t that many options. If you use WordPress for your library website you will have to experiment, as there is nothing well-supported. Lincoln University in New Zealand has created a script that converts a WordPress RSS feed to a minimal OAI feed. This requires editing a PHP file to include your RSS feed URL, and uploading to a server. I admit that I have been unsuccessful at testing this, but Lincoln University has a working example, and uses this to harvest their WordPress library website into Primo.

Drupal

If you use Drupal, you will need to first install a module called Views OAI-PMH. What this does is create a Drupal view formatted as an OAI-PMH data provider feed. Those familiar with Drupal know that you can use the Views module to present content in a variety of ways. For instance, you can include certain fields from certain content types in a list or chart that allows you to reuse content rather than recreating it. This is no different, only the formatting is an OAI-PMH compliant XML structure. Rather than placing the view in a Drupal page or block, you create a separate page. This page becomes your base URL to provide to others or reuse in whatever way you need.

The Views OAI-PMH module isn’t the most obvious module to set up, so here are the basic steps you need to follow. First, enable and set permissions as usual. You will also want to refresh your caches (I had trouble until I did this). You’ll discover that unlike other modules the documentation and configuration is not in the interface, but in the README file, so you will need to open that out of the module directory to get the configuration instructions.

To create your OAI-PMH view you have two choices. You can add it to a view that is already created, or create a new one. The module will create an example view called Biblio OAI-PMH (based on an earlier Biblio module used for creating bibliographic metadata). You can just edit this to create your OAI feed. Alternatively, if you have a view that already exists with all the data you want to include, you can add an OAI-PMH display as an additional display. You’ll have to create a path for your view that will make it accessible via a URL.

The details screen for the OAI-PMH display.

The Views OAI-PMH module only supports Dublin Core at this time. If you are using Drupal for bibliographic metadata of some kind, mapping the fields is a fairly straightforward process. However, choosing the Dublin Core mappings for data that is not bibliographic by nature requires some creativity and thought about where the data will end up. When I was setting this up I was trying to harvest most of the library website into our discovery layer, so I knew how the discovery layer parsed OAI DC and could choose fields accordingly.

After adding fields to the view (just as you normally would in creating a view), you will need to select settings for the OAI view to select the Dublin Core element name for each content field.

You can then map each element to the appropriate Dublin Core field. The example from my site includes some general metadata that appears on all content (such as Title), and some that only appears in specific content types. For instance, Collection Description only appears on digital collection content types. I did not choose to include the body content for any page on the site, since most of those pages contain a lot of scripts or other code that wasn’t useful to harvest into the discovery layer. Explanatory content such as the description of a digital collection or a database was more useful to display in the discovery layer, and exists only in special fields for those content types on my Drupal site, so we could pull those out and display those.

In the end, I have a feed that looks like this. Regular pages end up with very basic metadata in the feed:

<metadata>
<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/  http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Hours</dc:title>
<dc:identifier>http://libraries.luc.edu/hours</dc:identifier><dc:creator>Loyola University Libraries</dc:creator></oai_dc:dc>
</metadata>

Whereas databases get more information pulled in. Note that there are two identifiers, one for the database URL, and one for the database description link. We will make these both available, but may choose one to use only one in the discovery layer and hide the other one.

<metadata>
<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/  http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Annual Bibliography of English Language and Literature</dc:title>
<dc:identifier>http://flagship.luc.edu/login?url=http://collections.chadwyck.com/home/home_abell.jsp</dc:identifier>
<dc:subject>Modern Languages</dc:subject>
<dc:type>Index/Database</dc:type>
<dc:identifier>http://libraries.luc.edu/annual-bibliography-english-language-and-literature</dc:identifier>
<dc:creator>Loyola University Libraries</dc:creator>
</oai_dc:dc>
</metadata>

When someone does a search in the discovery layer for something on the library website, the result shows the page right in the interface. We are still doing usability tests on this right now, but expect to move it into production soon.

Conclusion

I’ve just touched on two content management systems, but there are many more out there. Do you create OAI-PMH feeds of your data? What do you do with them? Share your examples in the comments.

Data Refuge and the Role of Libraries

Society is always changing. For some, the change can seem slow and frustrating, while others may feel as though the change occurred in a blink of an eye. What is this change that I speak of? It can be anything…civil rights, autonomous cars, or national leaders. One change that no one ever seems particularly prepared for, however, is when a website link becomes broken. One day, you could click a link and get to a site and the next day you get a 404 error. Sometimes this occurs because a site was migrated to a new server and the link was not redirected. Sometimes this occurs because the owner ceased to maintain the site. And sometimes, this occurs for less benign reasons.

Information access via the Internet is an activity that many (but not all) of us do everyday, in sometimes unconscious fashion: checking the weather, reading email, receiving news alerts. We also use the Internet to make datasets and other sources of information widely available. Individuals, universities, corporations, and governments share data and information in this way. In the Obama administration, the Open Government Initiative led to the development of Project Open Data and data.gov. Federal agencies started looking at ways to make information sharing easier, especially in areas where the data are unique.

One area of unique data is in climate science. Since climate data is captured on a specific day, time, and under certain conditions, it can never be truly reproduced. It will never be January XX, 2017 again. With these constraints, climate data can be thought of as fragile. The copies that we have are the only records that we have. Much of our nation’s climate data has been captured by research groups at institutes, universities, and government labs and agencies. During the election, much of the rhetoric from Donald Trump was rooted in the belief that climate change is a hoax. Upon his election, Trump tapped Scott Pruitt, who has fought much of the EPA’s attempts to regulate pollution, to lead the EPA. This, along with other messages from the new administration, has raised alarms within the scientific community that the United States may repeat the actions of the Harper administration in Canada, which literally threw away thousands of items from federal libraries that were deemed outside scope, through a process that was criticized as not transparent.

In an effort to safeguard and preserve this data, the Penn Program of Environmental Humanities (PPEH) helped organize a collaborative project called Data Refuge. This project requires the expertise of scientists, librarians, archivists, and programmers to organize, document, and back-up data that is distributed across federal agencies’ websites. Maintaining the integrity of the data, while ensuring the re-usability of it, are paramount concerns and areas where librarians and archivists must work hand in glove with the programmers (sometimes one and the same) who are writing the code to pull, duplicate, and push content. Wired magazine recently covered one of the Data Refuge events and detailed the way that the group worked together, while much of the process is driven by individual actions.

In order to capture as much of this data as possible, the Data Refuge project relies on groups of people organizing around this topic across the country. The PPEH site details the requirements to host a successful DataRescue event and has a Toolkit to help promote and document the event. There is also a survey that you can use to nominate climate or environmental data to be part of the Data Refuge. Not in a position to organize an event? Don’t like people? You can also work on your own! An interesting observation from the work on your own page is the option to nominate any “downloadable data that is vulnerable and valuable.” This means that Internet Archive and the End of Term Harvest Team (a project to preserve government websites from the Obama administration) is interested in any data that you have reason to believe may be in jeopardy under the current administration.

A quick note about politics. Politics are messy and it can seem odd that people are organizing in this way, when administrations change every four or eight years and, when there is a party change in the presidency, it is almost a certainty that there will be major departures in policy and prioritizations from administration to administration. What is important to recognize is that our data holdings are increasingly solely digital, and therefore fragile. The positions on issues like climate, environment, civil rights, and many, many others are so diametrically opposite from the Obama to Trump offices, that we – the public – have no assurances that the data will be retained or made widely available for sharing. This administration speaks of “alternative facts” and “disagree[ing] with the facts” and this makes people charged with preserving facts wary.

Many questions about the sustainability and longevity of the project remain. Will End of Term or Data Refuge be able to/need to expand the scope of these DataRescue efforts? How much resourcing can people donate to these events? What is the role of institutions in these efforts? This is a fantastic way for libraries to build partnerships with entities across campus and across a community, but some may view the political nature of these actions as incongruous with the library mission.

I would argue that policies and political actions are not inert abstractions. There is a difference between promoting a political party and calling attention to policies that are in conflict with human rights and freedom to information. Loathe as I am to make this comparison, would anyone truly claim that burning books is protected political speech, and that opposing such burning is “playing politics?” Yet, these were the actions of a political party – in living memory – hosted at university towns across Germany. Considering the initial attempt to silence the USDA and the temporary freeze on the EPA, libraries should strongly support the efforts of PPEH, Data Refuge, End of Term, and concerned citizens across the country.

 

Online Privacy in Post-Election America

A commitment to protecting the privacy of our patrons is enshrined in the ALA Code of Ethics. While that has always been an important aspect of librarianship, it’s become even more pivotal in an information age where privacy is far more nuanced and difficult to achieve. Given the rhetoric of the election season, and statements made by our President-Elect as well as his Cabinet nominees 1, the American surveillance state has become even more disconcerting. As librarians, we have an obligation to empower our communities with the knowledge they need to secure their own personal information. This post will cover, at a high level, a few areas where librarians of various types can assist patrons.

The Tools

Given that so much information is exchanged online these days, librarians are in a unique position to educate patrons about the Internet. We spend so much time either building web services or utilizing them, it’s highly likely that a librarian knows more about the web than your average citizen. As such, we can relate some of the powerful pieces of software and services that aid in protecting one’s online presence. To name just a handful that almost everyone could benefit from knowing:

DuckDuckGo is a privacy-aware search engine which explicitly does not track individual users. While it is a for-profit endeavor earning money through ad revenue, its policies set it apart from major competitors such as Google and Bing.

TorBrowser is a web browser utilizing The Onion Router protocol which obfuscates the user’s IP address, essentially masking their online activities behind a web of redirects. The Tor network is run by volunteers and TorBrowser is open source software developed by a non-profit organization.

HTTPS is the encrypted version of HTTP, the data transfer protocol that powers the internet. HTTPS sites are less likely to have their traffic intercepted or surveilled. Tools like HTTPS Everywhere help one to find HTTPS versions of sites without too much trouble.

Two-factor authentication is available for many apps and web services. It decreases the possibility that a third-party can access your account by providing an additional layer of protection beyond your password, e.g. through a code sent to your phone.

Signal is an open source private messaging app which uses end-to-end encryption, think of it as HTTPS for your text messages. Signal is made by Open Whisper Systems which, like the Tor Foundation, is a non-profit.

These are just a few major tools in different areas, all of which are worth knowing about. Many have usability trade-offs but switching to just one or two is enough to substantially improve an individual’s privacy.

Privacy Workshops

Merely knowing about particular pieces of software is not enough to secure one’s communications. Tor perhaps says it best in their “Tips on Staying Anonymous“:

Tor is NOT all you need to browse anonymously! You may need to change some of your browsing habits to ensure your identity stays safe.

A laundry list of web browsers, extensions, and apps doesn’t do much by itself. A person’s behavior is still the largest factor in how private their information is. One can visit a secure HTTPS site but still use a password that’s trivial to crack; one can use the “incognito” or “privacy” mode of a browser but still be tracked by their IP address. Online privacy is an immensely complicated and difficult subject which requires knowledge of practices as well tools. As such, libraries can offer workshops that teach both at once. Most libraries teach skills-based workshops, whether they’re on using a citation manager or how to evaluate information sources for credibility. Adding privacy skills is a natural extension of work we already do. Workshops can fit into particular classes—whether they’re history, computer science, or ethics—or be extra-curricular. Look for sympathetic partners on campus, such as student groups or concerned faculty, to see if you can collaborate or at least find an avenue for advertising your events.

Does your library not have anyone qualified or willing to teach a privacy workshop? Consider contacting an outside expert. The Library Freedom Project immediately comes to mind as a wonderful resource offering: a privacy toolkit for librarians, an online class, “train the trainers” type events, and community-focused workshops.2 Academic librarians may also have access to local computer security experts, whether they’re computer science instructors or particularly savvy students, who would be willing to lend their expertise. My one caution would be that just because someone is a subject expert doesn’t mean they’re equipped to effectively lead a workshop, and that working with an expert to ensure an event is tailored to your community will be more successful than simply outsourcing the entire task.

Patron Data

Depending on your position at your library, this final section might either be the most or least obvious thing to be done: control access to data about your patrons. If you’re an instruction or reference librarian, I imagine workshops were the first thing on your mind. If you’re a systems librarian such as myself, you may have thought of technologies like HTTPS or considered data security measures. This section will be longer not because it’s more important, but because these are topics I think about often as they directly relate to my job responsibilities.

Patron data is tricky. I’ll be the first to admit that my library collects quite a bit of data about patrons, a rather small amount of which contains personally identifying information. Data is extremely useful both in fine-tuning our services to meet community needs as well as in demonstrating our value to stakeholders like the college administration. Still, there is good reason to review data practices and web services to see if anything can be improved. Here’s a brief list of heuristics to use:

Are your websites using HTTPS? Secure sites, especially for one’s with patron accounts that hold sensitive information, help prevent data from being intercepted by third parties. I fully realize this is actually more difficult than it appears; our previous ILS offered HTTPS but only as a paid add-on which we couldn’t afford. If a vendor is the holdup here, pester them relentlessly until progress is made. I’ve found that most vendors understand that HTTPS is important, it’s just further down in their development priorities. Making a fuss can change that.

Is personal information being unnecessarily collected? What’s “necessary” is subjective, certainly. A good measure is looking at when the last time personal information was actually used in any substantive manner. If you’re tracking the names of students who ask reference questions, have you ever actually needed them for follow-ups? Could an anonymized ID be used instead? Could names be deleted after a certain amount of time has passed? Which brings us to…

Where personal information is collected, do retention policies exist? E.g. if you’re doing website user studies that record someone’s name, likeness, or voice, do you eventually delete the files? This goes for paper files as well, which can be reviewed and then shredded if deemed unnecessary. Retention policies are beneficial in a few ways. They not only prevent old data from leaking into the wrong hands, they often help with organization and “spring cleaning” tasks. I try to review my hard drive periodically for random files I’ve been sent by faculty or students which can be cleaned out.

Can patrons be empowered with options regarding their own data? Opt-in policies regarding data retention are desirable because they allow a library to collect information that might prove valuable while also giving people the ability to limit their vulnerabilities. Catalog reading lists are the quintessential example: some patrons find these helpful as a tool to review what they’ve read, while others would prefer to obscure their checkout history. It should go without saying that these options existing without any surrounding education is rather useless. Patrons need to know what’s at stake and how to use the systems at their disposal; the setting does nothing by itself. While optional workshops typically only touch a fragment of the overall student population, perhaps in-browser tips and suggestions can be presented to prompt our users to consider about the ramifications of their account’s configuration.

Relevance Ranking

Every so often, an event will happen which foregrounds the continued relevance of our profession. The most recent American election was an unmitigated disaster in terms of information literacy 3, but it also presents an opportunity for us to redouble our efforts where they are needed. Like the terrifying revelations of Edward Snowden, we are reminded that we serve communities that are constantly at risk of oppression, surveillance, and strife. As information professionals, we should strive to take on the challenge of protecting our patrons, and much of that protection occurs online. We can choose to be paralyzed by distress when faced with the state of affairs in our country, or to be challenged to rise to the occasion.

Notes

  1. To name a few examples, incoming CIA chief Mike Pompeo supports NSA bulk data collection and President-Elect Trump has been ambiguous as to whether he supports the idea of a registry or database for Muslim Americans.
  2. Library Freedom Director Alison Macrina has an excellent running Twitter thread on privacy topics which is worth consulting whether you’re an expert or novice.
  3. To note but two examples, the President-Elect persistently made false statements during his campaign and “fake news” appeared as a distinct phenomenon shortly after the election.

A High-Level Look at an ILS Migration

My library recently performed that most miraculous of feats—a full transition from one integrated library system to another, specifically Innovative’s Millennium to the open source Koha (supported by ByWater Solutions). We were prompted to migrate by Millennium’s approaching end-of-life and a desire to move to a more open system where we feel in greater control of our data. I’m sure many librarians have been through ILS migrations, and plenty has been written about them, but as this was my first I wanted to reflect upon the process. If you’re considering changing your ILS, or if you work in another area of librarianship & wonder how a migration looks from the systems end, I hope this post holds some value for you.

Challenges

No migration is without its problems. For starters, certain pieces of data in our old ILS weren’t accessible in any meaningful format. While Millennium has a robust “Create Lists” feature for querying & exporting different types of records (patron, bibliographic, vendor, etc.), it does not expose certain types of information. We couldn’t find a way to export detailed fines information, only a lump sum for each patron. To help with this post-migration, we saved an email listing of all itemized fines that we can refer to later. The email is saved as a shared Google Doc which allows circulation staff to comment on it as fines are resolved.

We also discovered that patron checkout history couldn’t be exported in bulk. While each patron can opt-in to a reading history & view it in the catalog, there’s no way for an administrator to download everyone’s history at once. As a solution, we kept our self-hosted Millennium instance running & can login to patrons’ accounts to retrieve their reading history upon request. Luckily, this feature wasn’t heavily used, so access to it hasn’t come up many times. We plan to keep our old, self-hosted ILS running for a year and then re-evaluate whether it’s prudent to shut it down, losing the data.

While some types of data simply couldn’t be exported, many more couldn’t emigrate in their exact same form. An ILS is a complicated piece of software, with many interdependent parts, and no two are going to represent concepts in the exact same way. To provide a concrete example: Millennium’s loan rules are based upon patron type & the item’s location, so a rule definition might resemble

  • a FACULTY patron can keep items from the MAIN SHELVES for four weeks & renew them once
  • a STUDENT patron can keep items from the MAIN SHELVES for two weeks & renew them two times

Koha, however, uses patron category & item type to determine loan rules, eschewing location as the pivotal attribute of an item. Neither implementation is wrong in any way; they both make sense, but are suited to slightly different situations. This difference necessitated completely reevaluating our item types, which didn’t previously affect loan rules. We had many, many item types because they were meant to represent the different media in our collection, not act as a hook for particular ILS functionality. Under the new system, our Associate Director of Libraries put copious work into reconfiguring & simplifying our types such that they would be compatible with our loan rules. This was a time-consuming process & it’s just one example of how a straightforward migration from one system to the next was impossible.

While some data couldn’t be exported, and others needed extensive rethinking in the new ILS, there was also information that could only be migrated after much massaging. Our patron records were a good example: under Millennium, users logged in on an insecure HTTP page with their barcode & last name. Yikes. I know, I felt terrible about it, but integration with our campus authentication & upgrading to HTTPS were both additional costs that we couldn’t afford. Now, under Koha, we can use the campus CAS (a central authentication system) & HTTPS (yay!), but wait…we don’t have the usernames for any of our patrons. So I spent a while writing Python scripts to parse our patron data, attempting to extract usernames from institutional email addresses. A system administrator also helped use unique identifying information (like phone number) to find potential patron matches in another campus database.

A more amusing example of weird Millennium data was active holds, which are stored in a single field on item records & looks like this:

P#=12312312,H#=1331,I#=999909,NNB=12/12/2016,DP=09/01/2016

Can you tell what’s going on here? With a little poking around in the system, it became apparent that letters like “NNB” stood for “date not needed by” & that other fields were identifiers connecting to patron & item records. So, once again, I wrote scripts to extract meaningful details from this silly format.

I won’t lie, the data munging was some of the most enjoyable work of the migration. Maybe I’m weird, but it was both challenging & interesting as we were suddenly forced to dive deeper into our old system and understand more of its hideous internal organs, just as we were leaving it behind. The problem-solving & sleuthing were fun & distracted me from some of the more frustrating challenges detailed above.

Finally, while we had a migration server where we tested our data & staff played around for almost a month’s time, when it came to the final leap things didn’t quite work as expected. The CAS integration, which I had so anticipated, didn’t work immediately. We started bumping into errors we hadn’t seen on the migration server. Much of this is inevitable; it’s simply unrealistic to create a perfect replica of our live catalog. We cannot, for instance, host the migration server on the exact same domain, and while that seems like a trivial difference it does affect a few things. Luckily, we had few summer classes so there was time to suffer a few setbacks & now that our fall semester is about to begin, we’re in great shape.

Difference & Repetition

Koha is primarily used by public libraries, and as such we’ve run into a few areas where common academic library functions aren’t implemented in a familiar way or are unavailable. Often, it’s that our perspective is so heavily rooted in Millennium that we need to think differently to achieve the same effect in Koha. But sometimes it’s clear that what’s a concern to us isn’t to other libraries.

For instance, bib records for serials with large numbers of issues is an ongoing struggle for us. We have many print periodicals where we have extensive holdings, including bound editions of past issues. The holdings display in the catalog is more oriented towards recent periodicals & displaying whether the latest few issues have arrived yet. That’s fine for materials like newspapers or popular magazines with few back issues, and I’ve seen a few public libraries using Koha that have minimalistic periodical records intended only to point the patron to a certain shelf. However, we have complex holdings like “issues 1 through 10 are bound together, issue 11 is missing, issues 12 through 18 are held in a separate location…” Parsing the catalog record to determine if we have a certain issue, and where it might be, is quite challenging.

Another example of the public versus academic functions: there’s no “recall” feature per se in Koha, wherein a faculty member could retrieve an item they want to place on course reserve from a student. Instead, we have tried to simulate this feature with a mixture of adjustments to our loan rules & internal reports which show the status of contested items. Recall isn’t a huge feature & isn’t used all the time, it’s not something we thought to research when selecting our new ILS, but it’s a great example of a minute difference that ended up creating a headache as we adapted to a new piece of software.

Moving from Millennium to Koha also meant we were shifting from a closed source system where we had to pay additional fees for limited API access to an open source system which boasts full read access to the database via its reporting feature. Koha’s open source nature has been perhaps the biggest boon for me during our migration. It’s very simple to look at the actual server-side code generating particular pages, or pull up specific rows in database tables, to see exactly what’s happening. In a black box ILS, everything we do is based on a vague adumbration of how we think the system operates. We can provide an input & record the output, but we’re never sure about edge cases or whether strange behavior is a bug or somehow intentional.

Koha has its share of bugs, I’ve discovered, but thankfully I’m able to jump right into the source code itself to determine what’s occurring. I’ve been able to diagnose problems by looking at open bug reports on Koha’s bugzilla tracker, pondering over perl code, and applying snippets of code from the Koha wiki or git repository. I’ve already submitted two bug patches, one of which has been pulled into the project. It’s empowering to be able to trace exactly what’s happening when troubleshooting & submit one’s own solution, or just a detailed bug report, for it. Whether or not a patch is the best way to fix an issue, being able to see precisely how the system works is deeply satisfying. It also makes it much easier to me to design JavaScript hacks that smooth over issues on the client side, be it in the staff-facing administrative functions or the public catalog.

What I Would Do Differently

Set clearer expectations.

We had Millennium for more than a decade. We invested substantial resources, both monetary & temporal, in customizing it to suit our tastes & unique collections. As we began testing the new ILS, the most common feedback from staff fell along the lines “this isn’t like it was in Millennium”. I think that would have been a less common observation, or perhaps phrased more productively, if I’d made it clear that a) it’ll take time to customize our new ILS to the degree of the old one, and b) not everything will be or needs to be the same.

Most of the customization decisions were made years ago & were never revisited. We need to return to the reason why things were set up a certain way, then determine if that reason is still legitimate, and finally find a way to achieve the best possible result in the new system. Instead, it’s felt like the process was framed more as “how do we simulate our old ILS in the new one” which sets us up for disappointment & failure from the start. I think there’s a feeling that a new system should automatically be better, and it’s true that we’re gaining several new & useful features, but we’re also losing substantial Millennium-specific customization. It’s important to realize that just because everything is not optimal out of the box doesn’t mean we cannot discover even better solutions if we approach our problems in a new light.

Encourage experimentation, deny expertise.

Because I’m the Systems Librarian, staff naturally turn to me with their systems questions. Here’s a secret: I know very little about the ILS. Like them, I’m still learning, and what’s more I’m often unfamiliar with the particular quarters of the system where they spend large amounts of time. I don’t know what it’s like to check in books & process holds all day, but our circulation staff do. It’s been tough at times when staff seek my guidance & I’m far from able to help them. Instead, we all need to approach the ongoing migration as an exploration. If we’re not sure how something works, the best way is to research & test, then test again. While Koha’s manual is long & quite detailed, it cannot (& arguably should not, lest it grow to unreasonable lengths) specify every edge case that can possibly occur. The only way to know is to test & document, which we should have emphasized & encouraged more towards the start of the process.

To be fair, many staff had reasonable expectations & performed a lot of experiments. Still, I did not do a great job of facilitating either of those as a leader. That’s truly my job as Systems Librarian during this process; I’m not here merely to mold our data so it fits perfectly in the new system, I’m here to oversee the entire transition as a process that involves data, workflows, staff, and technology.

Take more time.

Initially, the ILS migration was such an enormous amount of work that it was not clear where to start. It felt as if, for a few months before our on-site training, we did little but sit around & await a whirlwind of busyness. I wish we had a better sense of the work we could have front-loaded such that we could focus efforts on other tasks later on. For example, we ended up deleting thousands of patron, item, and bibliographic records in an effort to “clean house” & not spend effort migrating data that was unneeded in the first place. We should have attacked that much earlier, and it might have obviated the need for some work. For instance, if in the course of cleaning up Millennium we delete invalid MARC records or eliminate obscure item types, those represent fewer problems encountered later in the migration process.

Finished?

As we start our fall semester, I feel accomplished. We raced through this migration, beginning the initial stages only in April for a go-live date that would occur in June. I learned a lot & appreciated the challenge but also had one horrible epiphany: I’m still relatively young, and I hope to be in librarianship for a long time, so this is likely not the last ILS migration I’ll participate in. While that very thought gives me chills, I hope the lessons I’ve taken from this one will serve me well in the future.

Cybersecurity, Usability, Online Privacy, and Digital Surveillance

Cybersecurity is an interesting and important topic, one closely connected to those of online privacy and digital surveillance. Many of us know that it is difficult to keep things private on the Internet. The Internet was invented to share things with others quickly, and it excels at that job. Businesses that process transactions with customers and store the information online are responsible for keeping that information private. No one wants social security numbers, credit card information, medical history, or personal e-mails shared with the world. We expect and trust banks, online stores, and our doctor’s offices to keep our information safe and secure.

However, keeping private information safe and secure is a challenging task. We have all heard of security breaches at J.P Morgan, Target, Sony, Anthem Blue Cross and Blue Shield, the Office of Personnel Management of the U.S. federal government, University of Maryland at College Park, and Indiana University. Sometimes, a data breach takes place when an institution fails to patch a hole in its network systems. Sometimes, people fall for a phishing scam, or a virus in a user’s computer infects the target system. Other times, online companies compile customer data into personal profiles. The profiles are then sold to data brokers and on into the hands of malicious hackers and criminals.

https://www.flickr.com/photos/topgold/4978430615
Image from Flickr – https://www.flickr.com/photos/topgold/4978430615

Cybersecurity vs. Usability

To prevent such a data breach, institutional IT staff are trained to protect their systems against vulnerabilities and intrusion attempts. Employees and end users are educated to be careful about dealing with institutional or customers’ data. There are systematic measures that organizations can implement such as two-factor authentication, stringent password requirements, and locking accounts after a certain number of failed login attempts.

While these measures strengthen an institution’s defense against cyberattacks, they may negatively affect the usability of the system, lowering users’ productivity. As a simple example, security measures like a CAPTCHA can cause an accessibility issue for people with disabilities.

Or imagine that a university IT office concerned about the data security of cloud services starts requiring all faculty, students, and staff to only use cloud services that are SOC 2 Type II certified as an another example. SOC stands for “Service Organization Controls.” It consists of a series of standards that measure how well a given service organization keeps its information secure. For a business to be SOC 2 certified, it must demonstrate that it has sufficient policies and strategies that will satisfactorily protect its clients’ data in five areas known as “Trust Services Principles.” Those include the security of the service provider’s system, the processing integrity of this system, the availability of the system, the privacy of personal information that the service provider collects, retains, uses, discloses, and disposes of for its clients, and the confidentiality of the information that the service provider’s system processes or maintains for the clients. The SOC 2 Type II certification means that the business had maintained relevant security policies and procedures over a period of at least six months, and therefore it is a good indicator that the business will keep the clients’ sensitive data secure. The Dropbox for Business is SOC 2 certified, but it costs money. The free version is not as secure, but many faculty, students, and staff in academia use it frequently for collaboration. If a university IT office simply bans people from using the free version of Dropbox without offering an alternative that is as easy to use as Dropbox, people will undoubtedly suffer.

Some of you may know that the USPS website does not provide a way to reset the password for users who forgot their usernames. They are instead asked to create a new account. If they remember the account username but enter the wrong answers to the two security questions more than twice, the system also automatically locks their accounts for a certain period of time. Again, users have to create a new account. Clearly, the system that does not allow the password reset for those forgetful users is more secure than the one that does. However, in reality, this security measure creates a huge usability issue because average users do forget their passwords and the answers to the security questions that they set up themselves. It’s not hard to guess how frustrated people will be when they realize that they entered a wrong mailing address for mail forwarding and are now unable to get back into the system to correct because they cannot remember their passwords nor the answers to their security questions.

To give an example related to libraries, a library may decide to block all international traffic to their licensed e-resources to prevent foreign hackers who have gotten hold of the username and password of a legitimate user from accessing those e-resources. This would certainly help libraries to avoid a potential breach of licensing terms in advance and spare them from having to shut down compromised user accounts one by one whenever those are found. However, this would make it impossible for legitimate users traveling outside of the country to access those e-resources as well, which many users would find it unacceptable. Furthermore, malicious hackers would probably just use a proxy to make their IP address appear to be located in the U.S. anyway.

What would users do if their organization requires them to reset passwords on a weekly basis for their work computers and several or more systems that they also use constantly for work? While this may strengthen the security of those systems, it’s easy to see that it will be a nightmare having to reset all those passwords every week and keeping track of them not to forget or mix them up. Most likely, they will start using less complicated passwords or even begin to adopt just one password for all different services. Some may even stick to the same password every time the system requires them to reset it unless the system automatically detects the previous password and prevents the users from continuing to use the same one. Ill-thought-out cybersecurity measures can easily backfire.

Security is important, but users also want to be able to do their job without being bogged down by unwieldy cybersecurity measures. The more user-friendly and the simpler the cybersecurity guidelines are to follow, the more users will observe them, thereby making a network more secure. Users who face cumbersome and complicated security measures may ignore or try to bypass them, increasing security risks.

Image from Flickr - https://www.flickr.com/photos/topgold/4978430615
Image from Flickr – https://www.flickr.com/photos/topgold/4978430615

Cybersecurity vs. Privacy

Usability and productivity may be a small issue, however, compared to the risk of mass surveillance resulting from aggressive security measures. In 2013, the Guardian reported that the communication records of millions of people were being collected by the National Security Agency (NSA) in bulk, regardless of suspicion of wrongdoing. A secret court order prohibited Verizon from disclosing the NSA’s information request. After a cyberattack against the University of California at Los Angeles, the University of California system installed a device that is capable of capturing, analyzing, and storing all network traffic to and from the campus for over 30 days. This security monitoring was implemented secretly without consulting or notifying the faculty and those who would be subject to the monitoring. The San Francisco Chronicle reported the IT staff who installed the system were given strict instructions not to reveal it was taking place. Selected committee members on the campus were told to keep this information to themselves.

The invasion of privacy and the lack of transparency in these network monitoring programs has caused great controversy. Such wide and indiscriminate monitoring programs must have a very good justification and offer clear answers to vital questions such as what exactly will be collected, who will have access to the collected information, when and how the information will be used, what controls will be put in place to prevent the information from being used for unrelated purposes, and how the information will be disposed of.

We have recently seen another case in which security concerns conflicted with people’s right to privacy. In February 2016, the FBI requested Apple to create a backdoor application that will bypass the current security measure in place in its iOS. This was because the FBI wanted to unlock an iPhone 5C recovered from one of the shooters in San Bernadino shooting incident. Apple iOS secures users’ devices by permanently erasing all data when a wrong password is entered more than ten times if people choose to activate this option in the iOS setting. The FBI’s request was met with strong opposition from Apple and others. Such a backdoor application can easily be exploited for illegal purposes by black hat hackers, for unjustified privacy infringement by other capable parties, and even for dictatorship by governments. Apple refused to comply with the request, and the court hearing was to take place in March 22. The FBI, however, withdrew the request saying that it found a way to hack into the phone in question without Apple’s help. Now, Apple has to figure out what the vulnerability in their iOS if it wants its encryption mechanism to be foolproof. In the meanwhile, iOS users know that their data is no longer as secure as they once thought.

Around the same time, the Senate’s draft bill titled as “Compliance with Court Orders Act of 2016,” proposed that people should be required to comply with any authorized court order for data and that if that data is “unintelligible” – meaning encrypted – then it must be decrypted for the court. This bill is problematic because it practically nullifies the efficacy of any end-to-end encryption, which we use everyday from our iPhones to messaging services like Whatsapp and Signal.

Because security is essential to privacy, it is ironic that certain cybersecurity measures are used to greatly invade privacy rather than protect it. Because we do not always fully understand how the technology actually works or how it can be exploited for both good and bad purposes, we need to be careful about giving blank permission to any party to access, collect, and use our private data without clear understanding, oversight, and consent. As we share more and more information online, cyberattacks will only increase, and organizations and the government will struggle even more to balance privacy concerns with security issues.

Why Libraries Should Advocate for Online Privacy?

The fact that people may no longer have privacy on the Web should concern libraries. Historically, libraries have been strong advocates of intellectual freedom striving to keep patron’s data safe and protected from the unwanted eyes of the authorities. As librarians, we believe in people’s right to read, think, and speak freely and privately as long as such an act itself does not pose harm to others. The Library Freedom Project is an example that reflects this belief held strongly within the library community. It educates librarians and their local communities about surveillance threats, privacy rights and law, and privacy-protecting technology tools to help safeguard digital freedom, and helped the Kilton Public Library in Lebanon, New Hampshire, to become the first library to operate a Tor exit relay, to provide anonymity for patrons while they browse the Internet at the library.

New technologies brought us the unprecedented convenience of collecting, storing, and sharing massive amount of sensitive data online. But the fact that such sensitive data can be easily exploited by falling into the wrong hands created also the unparalleled level of potential invasion of privacy. While the majority of librarians take a very strong stance in favor of intellectual freedom and against censorship, it is often hard to discern a correct stance on online privacy particularly when it is pitted against cybersecurity. Some even argue that those who have nothing to hide do not need their privacy at all.

However, privacy is not equivalent to hiding a wrongdoing. Nor do people keep certain things secrets because those things are necessarily illegal or unethical. Being watched 24/7 will drive any person crazy whether s/he is guilty of any wrongdoing or not. Privacy allows us safe space to form our thoughts and consider our actions on our own without being subject to others’ eyes and judgments. Even in the absence of actual massive surveillance, just the belief that one can be placed under surveillance at any moment is sufficient to trigger self-censorship and negatively affects one’s thoughts, ideas, creativity, imagination, choices, and actions, making people more conformist and compliant. This is further corroborated by the recent study from Oxford University, which provides empirical evidence that the mere existence of a surveillance state breeds fear and conformity and stifles free expression. Privacy is an essential part of being human, not some trivial condition that we can do without in the face of a greater concern. That’s why many people under political dictatorship continue to choose death over life under mass surveillance and censorship in their fight for freedom and privacy.

The Electronic Frontier Foundation states that privacy means respect for individuals’ autonomy, anonymous speech, and the right to free association. We want to live as autonomous human beings free to speak our minds and think on our own. If part of a library’s mission is to contribute to helping people to become such autonomous human beings through learning and sharing knowledge with one another without having to worry about being observed and/or censored, libraries should advocate for people’s privacy both online and offline as well as in all forms of communication technologies and devices.