Voice, Natural Language Processing, and the Future of Library Experiences

Is the future of research voice controlled? It might be, because when I originally had the idea for this post my first instinct was to grab my phone and dictate my half-formed ideas into a note, rather than typing it out. Writing things down often makes them seem wrong and not at all what we are trying to say in our heads. (Maybe it’s not so new, since as you may remember Socrates had a similar instinct.) The idea came out of a few different talks at the national Code4Lib conference held in Los Angeles in March of 2017 and a talk given by Chris Bourg. Among these presentations the themes of machine learning, artificial intelligence, natural language processing, voice search, and virtual assistants intersect to give us a vision for what is coming. The future might look like a system that can parse imprecise human language and turn it into an appropriately structured search query in a database or variety of databases, bearing in mind other variables, and return the correct results. Pieces of this exist already, of course, but I suspect over the next few years we will be building or adapting tools to perform these functions. As we do this, we should think about how we can incorporate our values and skills as librarians into these tools along the way.

Natural Language Processing

I will not attempt to summarize natural language processing (NLP) here, except to say that speaking to a computer requires that the computer be able to understand what we are saying. Human—or natural—language is messy, full of nuance and context that requires years for people to master, and even then often leads to misunderstandings that can range from funny to deadly. Using a machine to understand and parse natural language requires complex techniques, but luckily there are a lot of tools that can make the job easier. For more details, you should review the NLP talks by Corey Harper and Nathan Lomeli at Code4Lib. Both these talks showed that there is a great deal of complexity involved in NLP, and that its usefulness is still relatively confined. Nathan Lomeli puts it like this. NLP can “cut strings, count beans, classify things, and correlate everything”. 1 Given a corpus, you can use NLP tools to figure out what certain words might be, how many of those words there are, and how they might connect to each other.

Processing language to understand a textual corpus has a long history but is now relatively easy for anyone to do with the tools out there. The easiest is Voyant Tools, which is a project by Sinclair, Stéfan Sinclair and Geoffrey Rockwell. It is a portal to a variety of tools for NLP. You can feed it a corpus and get back all kind of counts and correlations. For example, Franny Gaede and I used VoyantTools to analyze social justice research websites to develop a social justice term corpus for a research project. While a certain level of human review is required for any such project, it’s possible to see that this technology can replace a lot of human-created language. This is already happening, in fact. A tool called Wordsmith can create convincing articles about finance, sports, and technology, or really any field with a standard set of inputs and outputs in writing. If computers are writing stories, they can also find stories.

Talking to the Voice in the Machine

Finding those stories, and in turn, finding the data with which to tell more stories, is where machine learning and artificial intelligence enter. In libraries we have a lot of words, and while we have various projects that are parsing those words and doing things with them, we have only begun to see where this can go. There are two sides to this. Chris Bourg’s talk at Harvard Library Leadership in a Digital Age, asks the question “What happens to libraries and librarians when machines can read all the books?” One suggestion she makes is that:

we would be wise to start thinking now about machines and algorithms as a new kind of patron  — a patron that doesn’t replace human patrons, but has some different needs and might require a different set of skills and a different way of thinking about how our resources could be used. 2

One way in which we can start to address the needs of machines as patrons is by creating searches that work with them, which is for now ultimately to serve the needs of humans, but in the future could be for their own artificial intelligence purposes. Most people are familiar with virtual assistants that have popped up on all platforms over the past few years. As an iOS and a Windows user, I am now constantly invited to speak to Siri or Cortana to search for answers to my questions or fix something in my schedule. While I’m perfectly happy to ask Siri to remind me to bring my laptop to work at 7:45 AM or to wake me up in 20 minutes, I find mixed results when I try to ask a more complex question. 3 Sometimes when I ask the temperature on the surface of Jupiter I get the answer, other times I get today’s weather in a town called Jupiter. This is not too surprising, as asking “What is the temperature of Jupiter?” could mean a number of things. It’s on the human to specify to the computer to which domain of knowledge you are referring, which requires knowing exactly how to ask the question. Computers cannot yet do a reference interview, since they cannot pick up on the subtle hidden meanings or helping with the struggle for the right words that librarians do so well. But they can help with certain types of research tasks quite well, if you know how to ask the question.  Eric Frierson (PPT) gave a demonstration of his project working on voice powered search in EBSCO using Alexa. In the presentation he demonstrates the Alexa “skills” he set up for people to ask Alexa for help. They are “do you have”, “the book”, “information about”, “an overview of”, “what I should read after”, or “books like”. There is a demonstration of what this looks like on YouTube. The results are useful when you say the correct thing in the correct order, and for an active user it would be fairly quick to learn what to say, just as we learn how best to type in a search query in various services.

Some Considerations

People in the Sixties: The government will wiretap your home. People now: Hey wire tap, can cats eat pancakes?

Alexa meme

Why ask a question of a computer rather than type in a question to a computer? For the reason I started this piece with, certainly–voice is there, and it’s often easier to say what you mean than write it. This can be taken pragmatically as well. If you find typing difficult, being able to speak makes life easier. When I was home with a newborn baby I really appreciated being able to dictate and ask Siri about the weather forecast and what time the doctor’s appointment was. Herein lies one of the many potential pitfalls of voice: who is listening to what you are saying? One recent news story puts this in perspective, as Amazon agreed to turn over data from Alexa to police in a murder investigation after the suspect gave the ok. They refused to do at first, but it is an open question as to the legal nature of the conversation with a virtual assistant. Nor is it entirely clear when you speak to a device where the data is being processed. So before we all rush out and write voice search tools for all our systems, it is useful to think about where that data lives what the purpose of it is.

If we would protect a user’s search query by ensuring that our catalogs are encrypted (and let’s be honest, we aren’t there yet), how do we do the same for virtual search assistants in the library catalog? For Alexa, that’s built into creating an Alexa skill, since a basic requirement for the web service used is that it meet Amazon’s security requirements. But if this data is subject to subpoena, we would have to think about it in the same way we would any other search data on a third party system. And we also have to recognize that these tools are created by these companies for commercial purposes, and part of that is to gather data about people and sell things to them based on that data. Machine learning could eventually build on that to learn a lot more about people than they think, which the Amazon Echo Look recently brought up as a subject of debate. There are likely to be other services popping up in addition to those offered by Amazon, Google, Apple, and Microsoft. Before long, we might expect our vendors to be offering voice search in their interfaces, and we need to be aware of the transmission of that data and where it is being processed. A recent alliance formed called The Voice Privacy Alliance, which is developing some standards for this.

The invisibility of the result processing has another dark side. The biases inherent in the algorithms become even more hidden, as the first result becomes the “right” one. If Siri tells me the weather in Jupiter, that’s a minor inconvenience, but if Siri tells me that “Black girls” are something hypersexualized, as Safiya Noble has found that Google does, do I (or let’s say, a kid) necessarily know something has gone wrong? 4 Without human intervention and understanding, machines can perpetuate the worst side of humanity.

This comes back to Chris Bourg’s question. What happens to librarians when machines can read all the books, and have a conversation with patrons about those books? Luckily for us, it is unlikely that artificial intelligence will ever be truly self-aware with desires, metacognition, love, and need for growth and adventure. Those qualities will continue to make librarians useful to creating vibrant and unique collections and communities. But we will need to fit that in a world where we are having conversations with our computers about those collections and communities.


  1. Lomeli, Nathan. “Natural Language Processing: Parsing Through The Hype”. Code4Lib. Los Angeles, CA. March 7, 2017.
  2. “What Happens to Libraries and Librarians When Machines Can Read All the Books?” Feral Librarian, March 17, 2017. https://chrisbourg.wordpress.com/2017/03/16/what-happens-to-libraries-and-librarians-when-machines-can-read-all-the-books/.
  3. As a side issue, I don’t have a private office and I feel weird speaking to my computer when there are people around.
  4. Noble, Safiya Umoja. “Google Search: Hyper-Visibility as a Means of Rendering Black Women and Girls Invisible – InVisible Culture.” InVisible Culture: An Electronic Journal for Visual Culture, no. 19 (2013). http://ivc.lib.rochester.edu/google-search-hyper-visibility-as-a-means-of-rendering-black-women-and-girls-invisible/.

Online Privacy in Post-Election America

A commitment to protecting the privacy of our patrons is enshrined in the ALA Code of Ethics. While that has always been an important aspect of librarianship, it’s become even more pivotal in an information age where privacy is far more nuanced and difficult to achieve. Given the rhetoric of the election season, and statements made by our President-Elect as well as his Cabinet nominees 1, the American surveillance state has become even more disconcerting. As librarians, we have an obligation to empower our communities with the knowledge they need to secure their own personal information. This post will cover, at a high level, a few areas where librarians of various types can assist patrons.

The Tools

Given that so much information is exchanged online these days, librarians are in a unique position to educate patrons about the Internet. We spend so much time either building web services or utilizing them, it’s highly likely that a librarian knows more about the web than your average citizen. As such, we can relate some of the powerful pieces of software and services that aid in protecting one’s online presence. To name just a handful that almost everyone could benefit from knowing:

DuckDuckGo is a privacy-aware search engine which explicitly does not track individual users. While it is a for-profit endeavor earning money through ad revenue, its policies set it apart from major competitors such as Google and Bing.

TorBrowser is a web browser utilizing The Onion Router protocol which obfuscates the user’s IP address, essentially masking their online activities behind a web of redirects. The Tor network is run by volunteers and TorBrowser is open source software developed by a non-profit organization.

HTTPS is the encrypted version of HTTP, the data transfer protocol that powers the internet. HTTPS sites are less likely to have their traffic intercepted or surveilled. Tools like HTTPS Everywhere help one to find HTTPS versions of sites without too much trouble.

Two-factor authentication is available for many apps and web services. It decreases the possibility that a third-party can access your account by providing an additional layer of protection beyond your password, e.g. through a code sent to your phone.

Signal is an open source private messaging app which uses end-to-end encryption, think of it as HTTPS for your text messages. Signal is made by Open Whisper Systems which, like the Tor Foundation, is a non-profit.

These are just a few major tools in different areas, all of which are worth knowing about. Many have usability trade-offs but switching to just one or two is enough to substantially improve an individual’s privacy.

Privacy Workshops

Merely knowing about particular pieces of software is not enough to secure one’s communications. Tor perhaps says it best in their “Tips on Staying Anonymous“:

Tor is NOT all you need to browse anonymously! You may need to change some of your browsing habits to ensure your identity stays safe.

A laundry list of web browsers, extensions, and apps doesn’t do much by itself. A person’s behavior is still the largest factor in how private their information is. One can visit a secure HTTPS site but still use a password that’s trivial to crack; one can use the “incognito” or “privacy” mode of a browser but still be tracked by their IP address. Online privacy is an immensely complicated and difficult subject which requires knowledge of practices as well tools. As such, libraries can offer workshops that teach both at once. Most libraries teach skills-based workshops, whether they’re on using a citation manager or how to evaluate information sources for credibility. Adding privacy skills is a natural extension of work we already do. Workshops can fit into particular classes—whether they’re history, computer science, or ethics—or be extra-curricular. Look for sympathetic partners on campus, such as student groups or concerned faculty, to see if you can collaborate or at least find an avenue for advertising your events.

Does your library not have anyone qualified or willing to teach a privacy workshop? Consider contacting an outside expert. The Library Freedom Project immediately comes to mind as a wonderful resource offering: a privacy toolkit for librarians, an online class, “train the trainers” type events, and community-focused workshops.2 Academic librarians may also have access to local computer security experts, whether they’re computer science instructors or particularly savvy students, who would be willing to lend their expertise. My one caution would be that just because someone is a subject expert doesn’t mean they’re equipped to effectively lead a workshop, and that working with an expert to ensure an event is tailored to your community will be more successful than simply outsourcing the entire task.

Patron Data

Depending on your position at your library, this final section might either be the most or least obvious thing to be done: control access to data about your patrons. If you’re an instruction or reference librarian, I imagine workshops were the first thing on your mind. If you’re a systems librarian such as myself, you may have thought of technologies like HTTPS or considered data security measures. This section will be longer not because it’s more important, but because these are topics I think about often as they directly relate to my job responsibilities.

Patron data is tricky. I’ll be the first to admit that my library collects quite a bit of data about patrons, a rather small amount of which contains personally identifying information. Data is extremely useful both in fine-tuning our services to meet community needs as well as in demonstrating our value to stakeholders like the college administration. Still, there is good reason to review data practices and web services to see if anything can be improved. Here’s a brief list of heuristics to use:

Are your websites using HTTPS? Secure sites, especially for one’s with patron accounts that hold sensitive information, help prevent data from being intercepted by third parties. I fully realize this is actually more difficult than it appears; our previous ILS offered HTTPS but only as a paid add-on which we couldn’t afford. If a vendor is the holdup here, pester them relentlessly until progress is made. I’ve found that most vendors understand that HTTPS is important, it’s just further down in their development priorities. Making a fuss can change that.

Is personal information being unnecessarily collected? What’s “necessary” is subjective, certainly. A good measure is looking at when the last time personal information was actually used in any substantive manner. If you’re tracking the names of students who ask reference questions, have you ever actually needed them for follow-ups? Could an anonymized ID be used instead? Could names be deleted after a certain amount of time has passed? Which brings us to…

Where personal information is collected, do retention policies exist? E.g. if you’re doing website user studies that record someone’s name, likeness, or voice, do you eventually delete the files? This goes for paper files as well, which can be reviewed and then shredded if deemed unnecessary. Retention policies are beneficial in a few ways. They not only prevent old data from leaking into the wrong hands, they often help with organization and “spring cleaning” tasks. I try to review my hard drive periodically for random files I’ve been sent by faculty or students which can be cleaned out.

Can patrons be empowered with options regarding their own data? Opt-in policies regarding data retention are desirable because they allow a library to collect information that might prove valuable while also giving people the ability to limit their vulnerabilities. Catalog reading lists are the quintessential example: some patrons find these helpful as a tool to review what they’ve read, while others would prefer to obscure their checkout history. It should go without saying that these options existing without any surrounding education is rather useless. Patrons need to know what’s at stake and how to use the systems at their disposal; the setting does nothing by itself. While optional workshops typically only touch a fragment of the overall student population, perhaps in-browser tips and suggestions can be presented to prompt our users to consider about the ramifications of their account’s configuration.

Relevance Ranking

Every so often, an event will happen which foregrounds the continued relevance of our profession. The most recent American election was an unmitigated disaster in terms of information literacy 3, but it also presents an opportunity for us to redouble our efforts where they are needed. Like the terrifying revelations of Edward Snowden, we are reminded that we serve communities that are constantly at risk of oppression, surveillance, and strife. As information professionals, we should strive to take on the challenge of protecting our patrons, and much of that protection occurs online. We can choose to be paralyzed by distress when faced with the state of affairs in our country, or to be challenged to rise to the occasion.


  1. To name a few examples, incoming CIA chief Mike Pompeo supports NSA bulk data collection and President-Elect Trump has been ambiguous as to whether he supports the idea of a registry or database for Muslim Americans.
  2. Library Freedom Director Alison Macrina has an excellent running Twitter thread on privacy topics which is worth consulting whether you’re an expert or novice.
  3. To note but two examples, the President-Elect persistently made false statements during his campaign and “fake news” appeared as a distinct phenomenon shortly after the election.

Cybersecurity, Usability, Online Privacy, and Digital Surveillance

Cybersecurity is an interesting and important topic, one closely connected to those of online privacy and digital surveillance. Many of us know that it is difficult to keep things private on the Internet. The Internet was invented to share things with others quickly, and it excels at that job. Businesses that process transactions with customers and store the information online are responsible for keeping that information private. No one wants social security numbers, credit card information, medical history, or personal e-mails shared with the world. We expect and trust banks, online stores, and our doctor’s offices to keep our information safe and secure.

However, keeping private information safe and secure is a challenging task. We have all heard of security breaches at J.P Morgan, Target, Sony, Anthem Blue Cross and Blue Shield, the Office of Personnel Management of the U.S. federal government, University of Maryland at College Park, and Indiana University. Sometimes, a data breach takes place when an institution fails to patch a hole in its network systems. Sometimes, people fall for a phishing scam, or a virus in a user’s computer infects the target system. Other times, online companies compile customer data into personal profiles. The profiles are then sold to data brokers and on into the hands of malicious hackers and criminals.


Image from Flickr – https://www.flickr.com/photos/topgold/4978430615

Cybersecurity vs. Usability

To prevent such a data breach, institutional IT staff are trained to protect their systems against vulnerabilities and intrusion attempts. Employees and end users are educated to be careful about dealing with institutional or customers’ data. There are systematic measures that organizations can implement such as two-factor authentication, stringent password requirements, and locking accounts after a certain number of failed login attempts.

While these measures strengthen an institution’s defense against cyberattacks, they may negatively affect the usability of the system, lowering users’ productivity. As a simple example, security measures like a CAPTCHA can cause an accessibility issue for people with disabilities.

Or imagine that a university IT office concerned about the data security of cloud services starts requiring all faculty, students, and staff to only use cloud services that are SOC 2 Type II certified as an another example. SOC stands for “Service Organization Controls.” It consists of a series of standards that measure how well a given service organization keeps its information secure. For a business to be SOC 2 certified, it must demonstrate that it has sufficient policies and strategies that will satisfactorily protect its clients’ data in five areas known as “Trust Services Principles.” Those include the security of the service provider’s system, the processing integrity of this system, the availability of the system, the privacy of personal information that the service provider collects, retains, uses, discloses, and disposes of for its clients, and the confidentiality of the information that the service provider’s system processes or maintains for the clients. The SOC 2 Type II certification means that the business had maintained relevant security policies and procedures over a period of at least six months, and therefore it is a good indicator that the business will keep the clients’ sensitive data secure. The Dropbox for Business is SOC 2 certified, but it costs money. The free version is not as secure, but many faculty, students, and staff in academia use it frequently for collaboration. If a university IT office simply bans people from using the free version of Dropbox without offering an alternative that is as easy to use as Dropbox, people will undoubtedly suffer.

Some of you may know that the USPS website does not provide a way to reset the password for users who forgot their usernames. They are instead asked to create a new account. If they remember the account username but enter the wrong answers to the two security questions more than twice, the system also automatically locks their accounts for a certain period of time. Again, users have to create a new account. Clearly, the system that does not allow the password reset for those forgetful users is more secure than the one that does. However, in reality, this security measure creates a huge usability issue because average users do forget their passwords and the answers to the security questions that they set up themselves. It’s not hard to guess how frustrated people will be when they realize that they entered a wrong mailing address for mail forwarding and are now unable to get back into the system to correct because they cannot remember their passwords nor the answers to their security questions.

To give an example related to libraries, a library may decide to block all international traffic to their licensed e-resources to prevent foreign hackers who have gotten hold of the username and password of a legitimate user from accessing those e-resources. This would certainly help libraries to avoid a potential breach of licensing terms in advance and spare them from having to shut down compromised user accounts one by one whenever those are found. However, this would make it impossible for legitimate users traveling outside of the country to access those e-resources as well, which many users would find it unacceptable. Furthermore, malicious hackers would probably just use a proxy to make their IP address appear to be located in the U.S. anyway.

What would users do if their organization requires them to reset passwords on a weekly basis for their work computers and several or more systems that they also use constantly for work? While this may strengthen the security of those systems, it’s easy to see that it will be a nightmare having to reset all those passwords every week and keeping track of them not to forget or mix them up. Most likely, they will start using less complicated passwords or even begin to adopt just one password for all different services. Some may even stick to the same password every time the system requires them to reset it unless the system automatically detects the previous password and prevents the users from continuing to use the same one. Ill-thought-out cybersecurity measures can easily backfire.

Security is important, but users also want to be able to do their job without being bogged down by unwieldy cybersecurity measures. The more user-friendly and the simpler the cybersecurity guidelines are to follow, the more users will observe them, thereby making a network more secure. Users who face cumbersome and complicated security measures may ignore or try to bypass them, increasing security risks.

Image from Flickr - https://www.flickr.com/photos/topgold/4978430615

Image from Flickr – https://www.flickr.com/photos/topgold/4978430615

Cybersecurity vs. Privacy

Usability and productivity may be a small issue, however, compared to the risk of mass surveillance resulting from aggressive security measures. In 2013, the Guardian reported that the communication records of millions of people were being collected by the National Security Agency (NSA) in bulk, regardless of suspicion of wrongdoing. A secret court order prohibited Verizon from disclosing the NSA’s information request. After a cyberattack against the University of California at Los Angeles, the University of California system installed a device that is capable of capturing, analyzing, and storing all network traffic to and from the campus for over 30 days. This security monitoring was implemented secretly without consulting or notifying the faculty and those who would be subject to the monitoring. The San Francisco Chronicle reported the IT staff who installed the system were given strict instructions not to reveal it was taking place. Selected committee members on the campus were told to keep this information to themselves.

The invasion of privacy and the lack of transparency in these network monitoring programs has caused great controversy. Such wide and indiscriminate monitoring programs must have a very good justification and offer clear answers to vital questions such as what exactly will be collected, who will have access to the collected information, when and how the information will be used, what controls will be put in place to prevent the information from being used for unrelated purposes, and how the information will be disposed of.

We have recently seen another case in which security concerns conflicted with people’s right to privacy. In February 2016, the FBI requested Apple to create a backdoor application that will bypass the current security measure in place in its iOS. This was because the FBI wanted to unlock an iPhone 5C recovered from one of the shooters in San Bernadino shooting incident. Apple iOS secures users’ devices by permanently erasing all data when a wrong password is entered more than ten times if people choose to activate this option in the iOS setting. The FBI’s request was met with strong opposition from Apple and others. Such a backdoor application can easily be exploited for illegal purposes by black hat hackers, for unjustified privacy infringement by other capable parties, and even for dictatorship by governments. Apple refused to comply with the request, and the court hearing was to take place in March 22. The FBI, however, withdrew the request saying that it found a way to hack into the phone in question without Apple’s help. Now, Apple has to figure out what the vulnerability in their iOS if it wants its encryption mechanism to be foolproof. In the meanwhile, iOS users know that their data is no longer as secure as they once thought.

Around the same time, the Senate’s draft bill titled as “Compliance with Court Orders Act of 2016,” proposed that people should be required to comply with any authorized court order for data and that if that data is “unintelligible” – meaning encrypted – then it must be decrypted for the court. This bill is problematic because it practically nullifies the efficacy of any end-to-end encryption, which we use everyday from our iPhones to messaging services like Whatsapp and Signal.

Because security is essential to privacy, it is ironic that certain cybersecurity measures are used to greatly invade privacy rather than protect it. Because we do not always fully understand how the technology actually works or how it can be exploited for both good and bad purposes, we need to be careful about giving blank permission to any party to access, collect, and use our private data without clear understanding, oversight, and consent. As we share more and more information online, cyberattacks will only increase, and organizations and the government will struggle even more to balance privacy concerns with security issues.

Why Libraries Should Advocate for Online Privacy?

The fact that people may no longer have privacy on the Web should concern libraries. Historically, libraries have been strong advocates of intellectual freedom striving to keep patron’s data safe and protected from the unwanted eyes of the authorities. As librarians, we believe in people’s right to read, think, and speak freely and privately as long as such an act itself does not pose harm to others. The Library Freedom Project is an example that reflects this belief held strongly within the library community. It educates librarians and their local communities about surveillance threats, privacy rights and law, and privacy-protecting technology tools to help safeguard digital freedom, and helped the Kilton Public Library in Lebanon, New Hampshire, to become the first library to operate a Tor exit relay, to provide anonymity for patrons while they browse the Internet at the library.

New technologies brought us the unprecedented convenience of collecting, storing, and sharing massive amount of sensitive data online. But the fact that such sensitive data can be easily exploited by falling into the wrong hands created also the unparalleled level of potential invasion of privacy. While the majority of librarians take a very strong stance in favor of intellectual freedom and against censorship, it is often hard to discern a correct stance on online privacy particularly when it is pitted against cybersecurity. Some even argue that those who have nothing to hide do not need their privacy at all.

However, privacy is not equivalent to hiding a wrongdoing. Nor do people keep certain things secrets because those things are necessarily illegal or unethical. Being watched 24/7 will drive any person crazy whether s/he is guilty of any wrongdoing or not. Privacy allows us safe space to form our thoughts and consider our actions on our own without being subject to others’ eyes and judgments. Even in the absence of actual massive surveillance, just the belief that one can be placed under surveillance at any moment is sufficient to trigger self-censorship and negatively affects one’s thoughts, ideas, creativity, imagination, choices, and actions, making people more conformist and compliant. This is further corroborated by the recent study from Oxford University, which provides empirical evidence that the mere existence of a surveillance state breeds fear and conformity and stifles free expression. Privacy is an essential part of being human, not some trivial condition that we can do without in the face of a greater concern. That’s why many people under political dictatorship continue to choose death over life under mass surveillance and censorship in their fight for freedom and privacy.

The Electronic Frontier Foundation states that privacy means respect for individuals’ autonomy, anonymous speech, and the right to free association. We want to live as autonomous human beings free to speak our minds and think on our own. If part of a library’s mission is to contribute to helping people to become such autonomous human beings through learning and sharing knowledge with one another without having to worry about being observed and/or censored, libraries should advocate for people’s privacy both online and offline as well as in all forms of communication technologies and devices.

Libraries & Privacy in the Internet Age

Recently, we covered library data collection practices with an eye towards identifying what your library really needs to retain. In an era of seemingly comprehensive surveillance, libraries do their best to afford their patrons some privacy. Limiting our circulation statistics is a prime example: while many libraries track how many times a particular item circulates, it’s common practice to delete loan histories in patron records once items have been returned. Thus, in keeping with the Library Code of Ethics, we can “protect each library user’s right to privacy and confidentiality” while at once using data to improve our services.

However, not all information lives in books and our privacy protections must stay current with technology. Obfuscating the circulation of physical items is one thing, but what about all of our online resources? Most of the data noted in the data collection post is in and of the digital: web analytics, server logs, and heat maps. Today, people expose more and more of their personal information online and do so mostly on for-profit websites. In this post, I’ll go beyond library-specific data to talk further about how we can offer patrons enhanced privacy even when they’re not using resources we control, such as the library website or ILS.

Public Computers

Libraries are a great bastion of public computer access. We’re pretty much the only institution in modern society that a community can rely upon for free software use and web access. But how much thought do we put into the configuration of our public computers? Are we sure that each user’s session is completely isolated, unable to be accessed by others?

For a while, I tried to do quantitative research on how well libraries handled web browser settings on public computers. I went to whatever libraries I could—public, academic, law, anyone who would let me in the door and sit down at a computer, typically without a library card, which is not everyone. If I could get to a machine, I ran a brief audit of sorts, these being the main items:

  • List the web browsers on the machine, their versions, settings, & any add-ons present
  • Run Mozilla’s Plugin Check to test for outdated plugins, a common security vulnerability for browsers
  • Attempt to install problematic add-ons, such as keyloggers [1]
  • Attempt to change the browser’s settings, e.g. set it to offer to save passwords
  • Close the browser, then reopen it to see if my history and settings changes persisted

After awhile, I gave up on this effort, because I became busy with other projects and I never received a satisfactory sample size. Of the fourteen browsers across six (see what I mean about sample size?) libraries I tested, results were discouraging:

  • 93% (all but one) of browsers were outdated
  • On average, browsers had two plug-ins with known security vulnerabilities and two-and-a-half more which were outdated
  • The majority of browsers (79%) retained their history after being closed
  • A few (36%) offered to remember passwords, which could lead to dangerous accidents on shared computers
  • The majority (86%) had no add-ons installed
  • All but one allowed its settings to be changed
  • All but one allowed arbitrary add-ons to be installed

I understand that IT departments often control public computer settings and that there are issues beyond privacy which dictate these settings. But these are miserable figures by any standard. I encourage all librarians to run similar audits on their public computers and see if any improvements can be made. We’re allowing users’ sessions to bleed over into each other and giving them too much power to monitor each others’ activity. Much as libraries commonly anonymize circulation information to protect patrons from invasive government investigations, we should strive to keep web activities safe with sensible defaults. [2]

Many libraries force users to sign in or reserve a computer. Academic libraries may use Active Directory, wherein students sign in with a common login they use for other services like email, while public libraries may use PC reservation software like EnvisionWare. These approaches go a long way towards isolating user sessions, but at the cost of imposing access barriers and slowing start-up times. Now users need an AD account or library card to use your computers. Furthermore, users don’t always remember to sign off at the end of their session, meaning someone else could still sit down at their machine and potentially access their information. These can seem like unimportant edge cases, but they’re still worthy of consideration. Privacy almost always involves some kind of tradeoff, for users and for libraries. We need to ensure we’re making the right tradeoffs with due diligence.

Proactive Privacy

Libraries needn’t be on the defensive about privacy. We can also proactively help patrons in two ways: by modifying the browsers on our public computers to offer enhanced protections and by educating the public about their privacy.

While providing sensible defaults, such as not offering to remember passwords and preventing the installation of keylogging software, is helpful, it does little to offer privacy above and beyond what one would experience on a personal machine. However, libraries can use a little knowledge and research to offer default settings which are unobtrusive and advantageous. The most obvious example is HTTPS. HTTPS is probably familiar to most people; when you see a lock or other security-connoting icon in your browser’s address bar, it’ll be right alongside a URL that begins with the HTTPS scheme. You can think of the S in HTTPS as standing for “Secure,” meaning your web traffic is encrypted as it goes from node to node in between your browser and the server delivering data.

Banking sites, social media, and indeed most web accounts are commonly accessed over HTTPS connections. They operate rather seamlessly, the same as HTTP connections, with one slight caveat: HTTPS sites don’t load HTTP resources (e.g. if https://example.com happens to include the image http://cats.com/lol.jpg) by default, meaning sometimes pieces of a page are missing or broken. This commonly results in a “mixed content” warning which the user can override, though how intuitive that process is varies widely across browser user interfaces.

In any case, mixed content happens rarely enough that HTTPS, when available, is a no-brainer benefit. But here’s the rub: not all sites default to HTTPS, even if they should. Most notably, Facebook doesn’t. Do you want your patrons logging into Facebook with unencrypted credentials? No, you don’t, because anyone monitoring network traffic, using a tool like Firesheep for instance, can grab and reuse those credentials. So installing an extension like the superlative HTTPS Everywhere [3], which uses a crowdsourced set of formulas to deliver HTTPS sites where available, is of immense benefit to users even though they likely will never notice it.

HTTPS is just a start: there are numerous add-ons which offer security and privacy enhancements, from blocking tracking cookies to the NoScript Security Suite which blocks, well, pretty much everything. How disruptive these add-ons are is variable and putting NoScript or a similar script-blocking tool on public computers is probably a bad idea; it’s simply too strange for unacquainted users to understand. But awareness of these tools is vital and some of the less disruptive ones still offer benefits that the majority of your patrons would enjoy. If you’re on the fence about a particular option, a little targeted usability testing could highlight whether it’s worth it or not.


In terms of education, online privacy is a massively under-taught field. Workshops in public libraries and courses in academic libraries are obvious and in-demand services we can provide. They can cater to users of all skill levels. A basic introduction might appeal to people just beginning to use the web, covering core concepts like HTTPS, session data (e.g. cookies), and the importance of auditing account settings. An advanced workshop could cover privacy software, two-factor authentication, and pivotal extensions that have a more niche appeal.

Password management alone is a rich topic. Why? Because it’s a problem for everyone. Being a modern web user virtually necessitates maintaining a double-digit number of accounts. Password best practices are fairly well-known: use lengthy, unique passwords with a mixture of character types (lowercase and uppercase letters, numbers, and punctuation). Applying them is another matter. Repeating one password across accounts means if one company get hacked, suddenly all your accounts are potentially vulnerable. Using tricky number-letter replacement strategies can lead to painful forgetting—was it LibrarianFervor with “1”s instead of “i”s, a “3” instead of an “e”, a “0” instead of an “o”, or any combination thereof? Or did I spell it in reverse? These strategies aren’t much more secure and yet they make remembering passwords tough.

Users aren’t to be blamed: creating a well-considered and scalable approach to managing online accounts is difficult. But many wonderful software packages exist for this, e.g. the open source KeePass or paid solutions like 1Password and LastPass. Merely showing users these options and explaining their immense benefits is a public service.

To use a specific example, I co-taught an interdisciplinary course recently with a title broad enough—”The Nature of Knowledge,” try that on for size—that sneaking in privacy, social media, and web browsers was easy. One task I had willing students perform was to install the PrivacyFix extension [4] and then report back on their findings. PrivacyFix analyzes your Google and Facebook settings, telling you how much you’re worth to each company and pointing out places where you might be overexposing your information. It also includes a database of site ratings, judging sites based on how well they handle users data.

Our class was as diverse as any at my community college: we had adult students, teenage students, working parents, athletes, future teachers, future nurses, future police officers, black students, white students, Latino students, women, men. And you know what? Virtually everyone was shocked by their findings. They gasped, they changed their settings, they did independent research on online privacy, and at the end of the course they said still wanted to learn more. I hardly think this class was an anomaly. Americans know they’re being monitored at every turn. They want to share information online but they want to do so intelligently. If we offer them the tools to do so, they’ll jump at the chance.


For those who are curious about browser extensions, I wrote (shameless plug) a RUSQ column on web privacy that covers most of this post but goes further in detail in terms of recommendations. The Sec4Lib listserv is worth keeping an eye on as well, and if you really want to go the extra mile you could attend the Security preconference at the upcoming LITA Forum in November. Online privacy is not likely to get any less complicated in the future, but libraries should see that as an opportunity. We’re uniquely poised, both as information professionals with a devotion to privacy and as providers of public computing services, to tackle this issue. And no one is going to do it for us.


[1]^ Keyloggers are software which record each keystroke. As such, they can be used to find username and password information entered into web forms. I couldn’t find a free keylogger add-on for every browser so I only tested in browsers which had one available.

[2]^ I have a GitHub repository of what I consider to be sensible defaults for Mozilla Firefox, which happens to store settings in a JavaScript file and thus makes it easy to version and share them. The settings are liberally commented but not tested in production.

[3]^ As you’ll notice if you visit that link, HTTPS Everywhere is only available for Google Chrome and Mozilla Firefox. In my experience, it almost never causes problems, especially with major websites like Facebook, and there are a few similar extensions which one could try e.g. KB SSL for Chrome. Unfortunately, Internet Explorer has a much weaker add-on ecosystem with no real HTTPS solution that I’m aware of. Safari also has a weak extension ecosystem, though there is at least one HTTPS Everywhere-type option that I haven’t tried and has acknowledged limitations.

[4]^ Update 2016-05-27: PrivacyFix has been discontinued, sadly. Here’s a post discussing what your options are.

At the very least, installing HTTPS Everywhere on Firefox and Chrome still helps users who employ those browsers, without affecting users who prefer the others.