Academic libraries have long provided workshops that focus on research skills and tools to the community. Topics often include citation software or specific database search strategies. Increasingly, however, libraries are offering workshops on topics that some may consider untraditional or outside the natural home of the library. These topics include using R and other analysis packages, data visualization software, and GIS technology training, to name a few. Librarians are becoming trained as Data and Software Carpentry instructors in order to pull from their established lesson plans and become part of a larger instructional community. Librarians are also partnering with non-profit groups like Mozilla’s Science Lab to facilitate research and learning communities.
Traditional workshops have generally been conceived and executed by librarians in the library. Collaborating with outside groups like Software Carpentry (SWC) and Mozilla is a relatively new endeavor. As an example, certified trainers from SWC can come to campus and teach a topic from their course portfolio (e.g. using SQL, Python, R, Git). These workshops may or may not have a cost associated with them and are generally open to the campus community. From what I know, the library is typically the lead organizer of these events. This shouldn’t be terribly surprising. Librarians are often very aware of the research hurdles that faculty encounter, or what research skills aren’t being taught in the classroom to students (more on this later).
Librarians are helpers. If you have some biology knowledge, I find it useful to think of librarians as chaperone proteins, proteins that help other proteins get into their functional conformational shape. Librarians act in the same way, guiding and helping people to be more prepared to do effective research. We may not be altering their DNA, but we are helping them bend in new ways and take on different perspectives. When we see a skills gap, we think about how we can help. But workshops don’t just *spring* into being. They take a huge amount of planning and coordination. Librarians, on top of all the other things we do, pitch the idea to administration and other stakeholders on campus, coordinate the space, timing, refreshments, travel for the instructors (if they aren’t available in-house), registration, and advocate for the funding to pay for the event in order to make it free to the community. A recent listserv discussion regarding hosting SWC workshops resulted in consensus around a recommended minimum six week lead time. The workshops have all been hugely successful at the institutions responding on the list and there are even plans for future Library Carpentry events.
A colleague once said that everything that librarians do in instruction are things that the disciplinary faculty should be doing in the classroom anyway. That is, the research skills workshops, the use of a reference manager, searching databases, the data management best practices are all appropriately – and possibly more appropriately – taught in the classroom by the professor for the subject. While he is completely correct, that is most certainly not happening. We know this because faculty send their students to the library for help. They do this because they lack curricular time to cover any of these topics in depth and they lack professional development time to keep abreast of changes in certain research methods and technologies. And because these are all things that librarians should have expertise in. The beauty of our profession is that information is the coin of the realm for us, regardless of its form or subject. With minimal effort, we should be able to navigate information sources with precision and accuracy. This is one of the reasons why, time and again, the library is considered the intellectual center, the hub, or the heart of the university. Have an information need? We got you. Whether those information sources are in GitHub as code, spreadsheets as data, or databases as article surrogates, we should be able to chaperone our user through that process.
All of this is to the good, as far as I am concerned. Yet, I have a persistent niggle at the back of my mind that libraries are too often taking a passive posture. [Sidebar: I fully admit that this post is written from a place of feeling, of suspicions and anecdotes, and not from empirical data. Therefore, I am both uncomfortable writing it, yet unable to turn away from it.] My concern is that as libraries extend to take on these workshops because there is a need on campus for discipline-agnostic learning experiences, we (as a community) do so without really fomenting what the expectations and compensations of an academic library are, or should be. This is a natural extension of the “what types of positions should libraries provide/support?” question that seems to persist. How much of this response is based on the work of individuals volunteering to meet needs, stretching the work to fit into a job description or existing work loads, and ultimately putting user needs ahead of organizational health? I am not advocating that we ignore these needs; rather I am advocating that we integrate the support for these initiatives within the organization, that we systematize it, and that we own our expertise in it.
This brings me back to the idea of workshops and how we claim ownership of them. Are libraries providing these workshops only because no one else on campus is meeting the need? Or are we asserting our expertise in the domain of information/data shepherding and producing these workshops because the library is the best home for them, not a home by default? And if we are making this assertion, then have we positioned our people to be supported in the continual professional development that this demands? Have we set up mechanisms within the library and within the university for this work to be appropriately rewarded? The end result may be the same – say, providing workshops on R – but the motivation and framing of the service is important.
Information is our domain. We navigate its currents and ride its waves. It is ever changing and evolving, as we must be. And while we must be agile and nimble, we must also be institutionally supported and rewarded. I wonder if libraries can table the self-reflection and self-doubt regarding the appropriateness of our services (see everything ever written regarding libraries and data, digital humanities, digital scholarship, altmetrics, etc.) and instead advocate for the resourcing and recognition that our expertise warrants.
Python is a great programming language to know if you work in a library: it’s (relatively) easy to learn, its syntax is fairly clear and intuitive, and it has great, robust libraries for doing routine library tasks like hacking MARC records and working with delimited data, CSV files, JSON and XML. 1 In this post, I’ll describe a couple of projects I’ve worked on recently that have enabled me to Do Library Stuff Faster using Python. For reference, both of these scripts were written with Python 2.7 2 in mind, but could easily be adapted for other versions of Python.
Library Holdings Lookup with Beautiful Soup
Here’s a very common library dilemma: A generous and well-meaning patron, faculty member, or friend of the library has a large personal collection of books or other materials that they would like to bequeath to your library. They have carefully created a spreadsheet (or word document, or hand-written index) of all of the titles and authors (and maybe dates and ISBNs) in their library and want to know if you want the items.
Many libraries (for very good reason) have policies to just say “no” to these kinds of gifts. Well-meaning library gift givers don’t always realize that it’s an enormous amount of work for a library to evaluate materials and decide whether or not they can be added to the library’s collection. Beyond relevance to their users and condition of the items, libraries don’t want to accept gifts of duplicate copies of titles they already have in their collection due to limited shelf space.
It’s that final point – how to avoid adding duplicate titles to the collection – that led me to develop a very simple (and very hacky) script to enable me to take a list of titles and authors and do a very simple lookup to see if, at minimum, we have those same titles already in the collection. Our ILS (Innovative Interface’s Millennium system) does not have a way to feed in a bunch of titles and generate a report of title matches – and I would venture to say that kind of functionality is probably not available in most library systems. Normally when presented with a dilemma of having to check to see if the library already has a set of titles, we’d sit down an unfortunate student worker and have them manually work through the list – copying and pasting titles into the library catalog and noting down any matches found. This work is incredibly boring for the student worker, and is a prime candidate for automation (the same task is done over and over again, with a very specific set of criteria as output (match or no match).
Python’s Beautiful Soup library is built for exactly this kind of task – instead of having your student worker scan a bunch of web pages in your catalog, the script can do approximately the same thing by sending search terms to your catalog via URL, and returning back page elements that can tell you whether or not any matches were found. In my example script, I’m using title and author elements, but you could modify this script to use other elements as long as they are indexed in your catalog – for example, you could send ISBNs, OCLC numbers, etc.
First, using Excel I concatenate a list of titles and authors with a domain and other URL elements to search my library’s catalog. Here’s a few examples of what the URLs look like:
http://suncat.csun.edu/search~S9/X?SEARCH=t:(Los%20Angeles%20Two%20Hundred)+and+a:(Lavender)&searchscope=9&SORT=DX http://suncat.csun.edu/search~S9/X?SEARCH=t:(The%20Land%20of%20Journeys'%20Ending)+and+a:(Austin)&searchscope=9&SORT=DX http://suncat.csun.edu/search~S9/X?SEARCH=t:(Mathematics%20and%20Sex)+and+a:(Ernest)&searchscope=9&SORT=DX
I’ll save the full list of these (in my example, I have over 1000 titles and authors to check) in a plain text file called advancedtitleauth.txt.
Next, I start my Python script by calling the Beautiful Soup library, and some other libraries that are useful (urllib – a library built for fetching data by URLs; csv – a library for working with CSV files; and re, for working with regular expressions ). You’ll probably have to install Beautiful Soup on your system first, which you can do if you have the pip Python package management system 3 installed by using sudo pip install beautifulsoup4 on your system’s command line.
from bs4 import BeautifulSoup import urllib import csv import re
Then I create a blank array and define a CSV file into which to write the output of the script:
url_list = 
csv_out = csv.writer(open('output.txt', 'w'), delimiter = '\t')
The CSV file I’m creating will use tabs instead of commas as delimiters (hence delimiter = ‘\t’). Typically when working with library data, I prefer tab-delimited text files over comma-separated files, because you never know when a random comma is going to show up in a title and create a delimiter where their should not be one.
Then I open my list of URLs, read it, append each URL to my array, and feed each URL into Beautiful Soup:
try: f = open('advancedtitleauth.txt', 'rb') for line in f: url_list.append(line) r = urllib.urlopen(line).read() soup = BeautifulSoup(r)
Beautiful Soup will go fetch the web page of each URL. Now that I have the web pages, Beautiful Soup can parse out specific features of each page. In my case, my catalog returns a page with a single record if a match is found, and a browsable index when a match is found (e.g., your title would be here, but it isn’t, so here’s some stuff with titles that would be nearby). I can use Beautiful Soup to return page elements that tell me whether a match was found, and if a match is found, to write the permanent URL of the match for later evaluation. This bit of code looks for an HTML div element with the class “bibRecordLink” on the page, which only appears when a single match is found. If this div is present on the page, the script grabs the link and drops it into the output file.
try: link = soup.find_all("div", class_="bibRecordLink") directlink = str(link) directlink = "http://suncat.csun.edu" + directlink[36:]
In the code above, [36:] is Python’s way of noting the start position of a string – so in this case, I’m getting the end of the string starting with the 36th character (which in my case, is the bibliographic ID number of the item that allows me to construct a permalink).
If a title/author search results in multiple possible matches – that is, we might have multiple copies, or the title/author combo is too vague to land on just one item, the page that displays in our catalog shows a browsable list of brief record info. In my script, I just grab the top result:
try: briefcit = soup.find_all("span", class_="briefcitTitle") bestmatch = str(briefcit) sep = "&" bestmatch = bestmatch.split(sep, 1) bestmatch = "http://suncat.csun.edu/" + bestmatch[39:]
In the code above, Beautiful Soup finds all the<span> elements with the class “briefcitTitle”, the script returns the first one, and again returns a URL stored in the bestmatch variable.
You can see a sample output of my lookup script here. You can see that for each entry, I include publication information, direct links, or a best match link if the elements are found. If none of the elements are found for a lookup URL, the line reads:
nopub nolink nomatch
We can now divide the output file into “no match” entries, direct links, or best match links. Direct links and best match links will need to be double-checked by a student worker to make sure they actually represent the item we looked up, including the date and edition. The “no match” entries represent titles we don’t have in our collection, so those can be evaluated more closely to determine if we want them.
The script certainly has room for improvement; I could write in a lot more functionality to better identify publication information, for example, to possibly reduce or eliminate the need for manual review of direct or partial matches. But the return on investment for this script is fairly highfor a 37-line script written in an afternoon; we can re-use this dozens of times, and hopefully save countless hours of student worker boredom (and re-assign those student workers to more complex and meaningful tasks!).
Rudimentary Keyword Frequency Analysis
This second example involves, again, dealing with a task that could be done manually, but can be done much more quickly with a script.
My university needed to submit data for the AASHE Sustainability Tracking, Assessment, and Rating System (STARS) Report (https://stars.aashe.org/), which requires the analysis of data from campus course offerings as well as research output by faculty. To submit the report, we needed to know how many courses we offer in sustainability (defined by AASHE “in an inclusive way, encompassing human and ecological health, social justice, secure livelihoods and a better world for all generations”) and how many faculty do research in sustainability. This project was broken up into two components: Analysis of research data by faculty and analysis of course data.
Before we even started analyzing research or course data, we needed to define an approach to identify what counts as “sustainability.” Thankfully, there was precedent from the University of North Carolina, which had developed a list of sustainability-related keywords used to search against faculty research output 4 We adopted this list of keywords to lookup in faculty research articles and course descriptions.
Research data by faculty
We don’t have a comprehensive inventory of research done by faculty at our campus. Because we were on a somewhat tight deadline to do the analysis, I came up with a very quick and dirty way of getting a lot of citations by using Web of Science. Web of Science enables you to do a search for research published by affiliates of your university. I was able to retrieve about 8,000 citations written by current or former faculty associated with my institution going back about 15 years. Of course, we cannot consider the data in Web of Science to be fully representative of faculty research output, but it seemed like a good start at least.
Web of Science enables you to export 500-record chunks of metadata, so it took an hour or so to export the metadata in several pieces (see Figure 1 for my Web of Science export criteria).
Once I had all of the metadata for the 8,000 or so records written by faculty at my institution, I combined them into a single file. Next, I needed to identify records that had sustainability keywords in either the title or abstract.
First, I created an array of all of the keywords, and turned that list into a Python set. A Python set is different from a list in that the order of terms does not matter, and is ideal for checking membership of items in the set against strings (or in my case, a bunch of citation and abstract strings).
word_list = 'Agriculture,Alternative,Applied%Science [..snip..]' word_set <span class="pl-k">=</span> <span class="pl-c1">set</span>(word_list.split(<span class="pl-s"><span class="pl-pds">'</span>,<span class="pl-pds">'</span></span>))
Note the % in “Applied%Science”. For some reason my set lookup couldn’t match terms with spaces. My hacky solution was to replace spaces with % characters, and then do a find/replace in my spreadsheet of Web of Science data to replace all keyword matches with spaces (such as Applied Science) with percentage signs (Applied%Science). Luckily, there were only 10 or so keywords on the list with spaces, so the find/replace did not take very long. Note also that the set match lookup is case sensitive, so I actually found it easier to just turn everything to lower case in my Web of Science spreadsheet and match on the lower case term (though I kept both upper and lower case terms in my lookup set).
Then I checked to see if any words were in the title, abstract, or both, and constructed my query so that a new column would be added to an output spreadsheet indicating *which* matches were found:
for row in csv_reader: if (set(row.split()) & word_set) & (set(row.split()) & word_set) : csv_out.writerow(["title & abstract match",row,row,row,row,(set(row.split()) & word_set), (set(row.split()) & word_set)])
If any of the words in my set were found in the 23rd cell of the spreadsheet (the abstract) and the 9th cell of the spreadsheet (the title), then a row would be written to an output sheet indicating that sustainability keywords were found in the title and abstract, pulling in some citation details about the article (including author names), as well as a cell with a list of the matches found for both title and abstract fields.
I did similar conditionals for rows that found, for example, just a title match, or just an author match:
elif set(row.split()) & word_set: csv_out.writerow(["title match",row,row,row,row, (set(row.split()) & word_set)]) elif set(row.split()) & word_set: csv_out.writerow(["abstract match",row,row,row,row, (set(row.split()) & word_set)])
And that is pretty much the whole script! With the output file, I did have to do a bit more work to identify current faculty at my institution, but I basically used the same set matching method above using a list provided by HR.
Because the STARS report also required analysis of courses related to sustainability, I also created a very similar script to lookup key terms found in course titles and descriptions.
Of course, just because a research article or course description has a keyword, or even multiple keywords, does not mean it’s relevant at all to sustainability. One of the keywords identified as related to sustainability, for example, is “invest”, which basically meant that almost every finance class returned as a match. Manual work was required to review the matches and weed out false positives, but because the keyword matching was already done and we could easily see what matches were found, this work was done fairly quickly. We could, for example, identify courses and research articles that only had a single keyword match. If that single keyword match was something like “sustainability” it was likely a sustainability-related course and would merit further review; if the single keyword match was something like “systems” it could probably be weeded out.
As with my author/title lookup script, if I had a bit more time to fuss with the script, I could have probably optimized it further (for example, by assigning weight to more sustainability-related keywords to help calculate a relevance score). But again, a short amount of time invested in this script saved a huge amount of time, and enabled us to do something we would not otherwise have been able to do.
If you’re interested in learning more about Python and its syntax, and don’t have a lot of Python experience, a good (free) place to start is Google’s Python Class, created by Nick Parlante for Google (I actually took a similar class several years ago, also created by Dr. Parlante, through Coursera, which looks to still be available). If you want to get started using Python right away and don’t want to have to fuss with installing it on your computer, you can check out the interactive course How to Think Like a Computer Scientist created by Brad Miller and David Ranum at Luther College. For more examples of usage in Python for library work, check out Charles Ed Hill, Heidi Frank, and Mark Pernotto’s Python chapter in the just-released LITA Guide The Librarian’s Introduction to Programming Languages, edited by Beth Thomsett-Scott (full-disclosure: I am a contributor to this book).
- Working with CSV files and JSON Data. In Sweigart, Al (2015). Automate the Boring Stuff with Python: Practical Programming for Total Beginners. San Francisco: No Starch Press. ↩
- For an explanation of the difference between Python 2 and 3, see https://wiki.python.org/moin/Python2orPython3. The reason I use Python 2.7 for these scripts is because of my computing environment (in which Python 2 is installed by default), but if you have Python 3 installed on your computer, note that syntactical changes in Python 3 mean that many Python 2.x scripts may require revision in order to work. ↩
- For instructions on using Pip with your Python installation, see: https://pip.pypa.io/en/latest/installing/ ↩
- Blank-White, Kristen. 2014. Researching the Researchers: Developing a Sustainability Research Inventory. Presented at the 2014 AASHE Conference and Expo, Portland OR. http://www.aashe.org/files/2014conference/presentations/secondPresentationUpload/Blank-White-Kristin_Researching-the-Researchers-Developing-a-Sustainability-Research-Inventory.pdf. ↩
Cybersecurity is an interesting and important topic, one closely connected to those of online privacy and digital surveillance. Many of us know that it is difficult to keep things private on the Internet. The Internet was invented to share things with others quickly, and it excels at that job. Businesses that process transactions with customers and store the information online are responsible for keeping that information private. No one wants social security numbers, credit card information, medical history, or personal e-mails shared with the world. We expect and trust banks, online stores, and our doctor’s offices to keep our information safe and secure.
However, keeping private information safe and secure is a challenging task. We have all heard of security breaches at J.P Morgan, Target, Sony, Anthem Blue Cross and Blue Shield, the Office of Personnel Management of the U.S. federal government, University of Maryland at College Park, and Indiana University. Sometimes, a data breach takes place when an institution fails to patch a hole in its network systems. Sometimes, people fall for a phishing scam, or a virus in a user’s computer infects the target system. Other times, online companies compile customer data into personal profiles. The profiles are then sold to data brokers and on into the hands of malicious hackers and criminals.
Cybersecurity vs. Usability
To prevent such a data breach, institutional IT staff are trained to protect their systems against vulnerabilities and intrusion attempts. Employees and end users are educated to be careful about dealing with institutional or customers’ data. There are systematic measures that organizations can implement such as two-factor authentication, stringent password requirements, and locking accounts after a certain number of failed login attempts.
While these measures strengthen an institution’s defense against cyberattacks, they may negatively affect the usability of the system, lowering users’ productivity. As a simple example, security measures like a CAPTCHA can cause an accessibility issue for people with disabilities.
Or imagine that a university IT office concerned about the data security of cloud services starts requiring all faculty, students, and staff to only use cloud services that are SOC 2 Type II certified as an another example. SOC stands for “Service Organization Controls.” It consists of a series of standards that measure how well a given service organization keeps its information secure. For a business to be SOC 2 certified, it must demonstrate that it has sufficient policies and strategies that will satisfactorily protect its clients’ data in five areas known as “Trust Services Principles.” Those include the security of the service provider’s system, the processing integrity of this system, the availability of the system, the privacy of personal information that the service provider collects, retains, uses, discloses, and disposes of for its clients, and the confidentiality of the information that the service provider’s system processes or maintains for the clients. The SOC 2 Type II certification means that the business had maintained relevant security policies and procedures over a period of at least six months, and therefore it is a good indicator that the business will keep the clients’ sensitive data secure. The Dropbox for Business is SOC 2 certified, but it costs money. The free version is not as secure, but many faculty, students, and staff in academia use it frequently for collaboration. If a university IT office simply bans people from using the free version of Dropbox without offering an alternative that is as easy to use as Dropbox, people will undoubtedly suffer.
Some of you may know that the USPS website does not provide a way to reset the password for users who forgot their usernames. They are instead asked to create a new account. If they remember the account username but enter the wrong answers to the two security questions more than twice, the system also automatically locks their accounts for a certain period of time. Again, users have to create a new account. Clearly, the system that does not allow the password reset for those forgetful users is more secure than the one that does. However, in reality, this security measure creates a huge usability issue because average users do forget their passwords and the answers to the security questions that they set up themselves. It’s not hard to guess how frustrated people will be when they realize that they entered a wrong mailing address for mail forwarding and are now unable to get back into the system to correct because they cannot remember their passwords nor the answers to their security questions.
To give an example related to libraries, a library may decide to block all international traffic to their licensed e-resources to prevent foreign hackers who have gotten hold of the username and password of a legitimate user from accessing those e-resources. This would certainly help libraries to avoid a potential breach of licensing terms in advance and spare them from having to shut down compromised user accounts one by one whenever those are found. However, this would make it impossible for legitimate users traveling outside of the country to access those e-resources as well, which many users would find it unacceptable. Furthermore, malicious hackers would probably just use a proxy to make their IP address appear to be located in the U.S. anyway.
What would users do if their organization requires them to reset passwords on a weekly basis for their work computers and several or more systems that they also use constantly for work? While this may strengthen the security of those systems, it’s easy to see that it will be a nightmare having to reset all those passwords every week and keeping track of them not to forget or mix them up. Most likely, they will start using less complicated passwords or even begin to adopt just one password for all different services. Some may even stick to the same password every time the system requires them to reset it unless the system automatically detects the previous password and prevents the users from continuing to use the same one. Ill-thought-out cybersecurity measures can easily backfire.
Security is important, but users also want to be able to do their job without being bogged down by unwieldy cybersecurity measures. The more user-friendly and the simpler the cybersecurity guidelines are to follow, the more users will observe them, thereby making a network more secure. Users who face cumbersome and complicated security measures may ignore or try to bypass them, increasing security risks.
Cybersecurity vs. Privacy
Usability and productivity may be a small issue, however, compared to the risk of mass surveillance resulting from aggressive security measures. In 2013, the Guardian reported that the communication records of millions of people were being collected by the National Security Agency (NSA) in bulk, regardless of suspicion of wrongdoing. A secret court order prohibited Verizon from disclosing the NSA’s information request. After a cyberattack against the University of California at Los Angeles, the University of California system installed a device that is capable of capturing, analyzing, and storing all network traffic to and from the campus for over 30 days. This security monitoring was implemented secretly without consulting or notifying the faculty and those who would be subject to the monitoring. The San Francisco Chronicle reported the IT staff who installed the system were given strict instructions not to reveal it was taking place. Selected committee members on the campus were told to keep this information to themselves.
The invasion of privacy and the lack of transparency in these network monitoring programs has caused great controversy. Such wide and indiscriminate monitoring programs must have a very good justification and offer clear answers to vital questions such as what exactly will be collected, who will have access to the collected information, when and how the information will be used, what controls will be put in place to prevent the information from being used for unrelated purposes, and how the information will be disposed of.
We have recently seen another case in which security concerns conflicted with people’s right to privacy. In February 2016, the FBI requested Apple to create a backdoor application that will bypass the current security measure in place in its iOS. This was because the FBI wanted to unlock an iPhone 5C recovered from one of the shooters in San Bernadino shooting incident. Apple iOS secures users’ devices by permanently erasing all data when a wrong password is entered more than ten times if people choose to activate this option in the iOS setting. The FBI’s request was met with strong opposition from Apple and others. Such a backdoor application can easily be exploited for illegal purposes by black hat hackers, for unjustified privacy infringement by other capable parties, and even for dictatorship by governments. Apple refused to comply with the request, and the court hearing was to take place in March 22. The FBI, however, withdrew the request saying that it found a way to hack into the phone in question without Apple’s help. Now, Apple has to figure out what the vulnerability in their iOS if it wants its encryption mechanism to be foolproof. In the meanwhile, iOS users know that their data is no longer as secure as they once thought.
Around the same time, the Senate’s draft bill titled as “Compliance with Court Orders Act of 2016,” proposed that people should be required to comply with any authorized court order for data and that if that data is “unintelligible” – meaning encrypted – then it must be decrypted for the court. This bill is problematic because it practically nullifies the efficacy of any end-to-end encryption, which we use everyday from our iPhones to messaging services like Whatsapp and Signal.
Because security is essential to privacy, it is ironic that certain cybersecurity measures are used to greatly invade privacy rather than protect it. Because we do not always fully understand how the technology actually works or how it can be exploited for both good and bad purposes, we need to be careful about giving blank permission to any party to access, collect, and use our private data without clear understanding, oversight, and consent. As we share more and more information online, cyberattacks will only increase, and organizations and the government will struggle even more to balance privacy concerns with security issues.
Why Libraries Should Advocate for Online Privacy?
The fact that people may no longer have privacy on the Web should concern libraries. Historically, libraries have been strong advocates of intellectual freedom striving to keep patron’s data safe and protected from the unwanted eyes of the authorities. As librarians, we believe in people’s right to read, think, and speak freely and privately as long as such an act itself does not pose harm to others. The Library Freedom Project is an example that reflects this belief held strongly within the library community. It educates librarians and their local communities about surveillance threats, privacy rights and law, and privacy-protecting technology tools to help safeguard digital freedom, and helped the Kilton Public Library in Lebanon, New Hampshire, to become the first library to operate a Tor exit replay, to provide anonymity for patrons while they browse the Internet at the library.
New technologies brought us the unprecedented convenience of collecting, storing, and sharing massive amount of sensitive data online. But the fact that such sensitive data can be easily exploited by falling into the wrong hands created also the unparalleled level of potential invasion of privacy. While the majority of librarians take a very strong stance in favor of intellectual freedom and against censorship, it is often hard to discern a correct stance on online privacy particularly when it is pitted against cybersecurity. Some even argue that those who have nothing to hide do not need their privacy at all.
However, privacy is not equivalent to hiding a wrongdoing. Nor do people keep certain things secrets because those things are necessarily illegal or unethical. Being watched 24/7 will drive any person crazy whether s/he is guilty of any wrongdoing or not. Privacy allows us safe space to form our thoughts and consider our actions on our own without being subject to others’ eyes and judgments. Even in the absence of actual massive surveillance, just the belief that one can be placed under surveillance at any moment is sufficient to trigger self-censorship and negatively affects one’s thoughts, ideas, creativity, imagination, choices, and actions, making people more conformist and compliant. This is further corroborated by the recent study from Oxford University, which provides empirical evidence that the mere existence of a surveillance state breeds fear and conformity and stifles free expression. Privacy is an essential part of being human, not some trivial condition that we can do without in the face of a greater concern. That’s why many people under political dictatorship continue to choose death over life under mass surveillance and censorship in their fight for freedom and privacy.
The Electronic Frontier Foundation states that privacy means respect for individuals’ autonomy, anonymous speech, and the right to free association. We want to live as autonomous human beings free to speak our minds and think on our own. If part of a library’s mission is to contribute to helping people to become such autonomous human beings through learning and sharing knowledge with one another without having to worry about being observed and/or censored, libraries should advocate for people’s privacy both online and offline as well as in all forms of communication technologies and devices.
Are Your LibGuides 2.0 (images, tables, & videos) mobile friendly? Maybe not, and here’s what you can do about it.Posted: April 28, 2016 | Author: Danielle Rosenthal | Filed under: academic librarianship, coding, library, mobile | 3 Comments »
LibGuides version 2 was released in summer 2014, and built on Bootstrap 3. However, after examining my own institutions’ guides, and conducting a simple random sampling of academic libraries in the United States, I found that many LibGuides did not display well on phones or mobile devices when it came to images, videos, and tables. Springshare documentation stated that LibGuides version 2 is mobile friendly out of the box and no additional coding is necessary, however, I found this not to necessarily be accurate. While the responsive features are available, they aren’t presented clearly as options in the graphical interface and additional coding needs to be added using the HTML editor in order for mobile display to be truly responsive when it comes to images, videos, and tables.
At my institution, our LibGuides are reserved for our subject librarians to use for their research and course guides. We also use the A-Z database list and other modules. As the LibGuides administrator, I’d known since its version 2 release that the new system was built on Bootstrap, but I didn’t know enough about responsive design to do anything about it at the time. It wasn’t until this past October when I began redesigning our library’s website using Bootstrap as the framework that I delved into customizing our Springshare products utilizing what I had learned.
I have found while looking at our own guides individually, and speaking to the subject librarians about their process, that they have been creating and designing their guides by letting the default settings take over for images, tables, and videos. As a result, several tables are running out of their boxes, images are getting distorted, and videos are stretched vertically and have large black top and bottom margins. This is because additional coding and/or tweaking is indeed necessary in most cases for these to display correctly on mobile.
I’m by no means a Bootstrap expert, but my findings have been verified with Springshare, and I was told by Springshare support that they will be looked at by the developers. Support indicated that there may be a good reason things work as they do, perhaps to give users flexibility in their decisions, or perhaps a technical reason. I’m not sure, but for now we have begun work on making the adjustments so they display correctly. I’d be interested to hear others’ experiences with these elements and what they have had to do, if anything to assure they are responsive.
Initially, as I learned how to use Bootstrap with the LibGuides system, I looked at my own library’s subject guides and testing the responsiveness and display. To start, I browsed through our guides with my Android phone. I then used Chrome and IE11 on desktop and resized the windows to see if the tables stayed within their boxes, and images respond appropriately. I peeked at the HTML and elements within LibGuides to see how the librarians had their items configured. Once I realized the issues were similar across all guides, I took my search further. Selfishly hoping it wasn’t just us, I used the LibGuides Community site where I sorted the list by libraries on version 2, then sorted by academic libraries. Each state’s list had to be looked at separately (you can’t sort by the whole United States). I placed all libraries from each state in a separate Excel sheet in alphabetical order. Using the random sort function, I examined two to three, sometimes five libraries per state (25 states viewed) by following the link provided in the community site list. I also inspected the elements of several LibGuides in my spreadsheet live in Chrome. I removed dimensions or styling to see how the pages responded since I don’t have admin access to any other universities guides. I created a demo guide for the purposes of testing where I inserted various tables, images and videos.
Some Things You Can Try
Even if you or your LibGuides authors may or may not be familiar with Bootstrap or fundamentals of responsive design, anyone should be able to design or update these guide elements using the instructions below; there is no serious Bootstrap knowledge needed for these solutions.
As we know, tables should not be used for layout. They are meant to display tabular data. This is another issue I came across in my investigation. Many librarians are using tables in this manner. Aside from being an outdated practice, this poses a more serious issue on a mobile device. Authors can learn how to float images, or create columns and rows right within the HTML Editor as an alternative. So for the purposes of this post, I’ll only be using a table in a tabular format.
When inserting a table using the table icon in the rich text editor you are asked typical table questions. How many rows? How many columns? In speaking with the librarians here at my institution, no one is really giving it much thought beyond this. They are filling in these blanks, inserting the table, and populating it. Or worse, copying and pasting a table created in Word.
However, if you leave things as they are and the table has any width to it, this will be your result once minimized or viewed on mobile device:
Figure 1: LibGuides default table with no responsive class added
As you can see, the table runs out of the container (the box). To alleviate this, you will have to open the HTML Editor, find where the table begins, and wrap the table in the table-responsive class. The HTML Editor is available to all regular users, no administrative access is needed. If you aren’t familiar with adding classes, you will also need to close the tag after the last table code you see. The HTML looks something like this:
<div class="table-responsive"> all the other elements go here </div>
Below is the result of wrapping all table elements in the table-responsive class. As you can see it is cleaner, there is no run-off, and bootstrap added a horizontal scroll bar since the table is really too big for the box once it is resized. On a phone, you can now swipe sideways to scroll through the table.
Figure 2: Result of adding the responsive table class.
Springshare has also made the Bootstrap table styling classes available, which you can see in the editor dropdown as well. You can experiment with these to see which styling you prefer (borders, hover rows, striped rows…), but they don’t replace adding the table-responsive class to the table.
When inserting an image in a LibGuides box, the system brings the dimensions of the image with it into the Image Properties box by default. After various tests I found it best to match the image size to the layout/box prior to uploading, and then remove the dimensions altogether from within the Image Properties box (and don’t place it in an unresponsive table). This can easily be done right within the Image Properties box when the image is inserted. It can also be done in the HTML Editor afterwards.
Figure 3: Image dimensions can be removed in the Image Properties box.
On the left: dimensions in place. On the right: dimensions removed.
By removing the dimensions, the image is better able to resize accordingly, especially in IE which seems to be less forgiving than Chrome. Guide creators should also add descriptive Alternative Text while in the Image Properties box for accessibility purposes.
Some users may be tempted to resize large images by adjusting the dimensions right in the Properties box . However, doing this doesn’t actually decrease the size that gets passed to the user so it doesn’t help download speed. Substantial resizing needs to be done prior to upload. Springshare recommends adjustments of no more than 10-15%.1
There are a few things I tried while figuring the best way to embed a YouTube video:
- Use the YouTube embed code as is. Which can result in a squished image, and a lot of black border in the top and bottom margins.
- Use the YouTube embed code but remove the iframe dimensions (width=”560″ height=”315″). Results in a small image that looks fine, but stays small regardless of the box size.
- Use the YouTube embed code, remove the iframe dimensions and add the embed-responsive class. In this case, 16by9. This results in a nice responsive display, with no black margins. Alternately, I discovered that leaving the iframe dimensions while adding the responsive class looks nearly the same.
It should also be noted that LibGuides creators and editors should manually add a “title” attribute to the embed code for accessibility.2 Neither LibGuides nor YouTube does this automatically, so it’s up to the guide creator to add it in the HTML Editor. In addition, the “frameborder=0” will be overwritten by Bootstrap, so you can remove it or leave, it’s up to you.
Considering Box Order/Stacking
The way boxes stack and order on smaller devices is also something LibGuides creators or editors should take into consideration. The layout is essentially comprised of columns, and in Bootstrap the columns stack a certain way depending on device size.
I’ve tested several guides and believe the following are representative of how boxes will stack on a phone, or small mobile device. However, it’s always best to test your layout to be sure. Test your own guides by minimizing and resizing your browser window and watch how they stack.
Box stacking order of a guide with no large top box and three columns.
Box stacking order of a guide with a large top box and two columns.
After looking at the number of libraries that have these same issues, it may be safe to say that our subject librarians are similar to others in regard to having limited HTML, CSS, or design skills. They rely on LibGuides easy to use interface and system to do most of the work as their time is limited, or they have no interest in learning these additional skills. Our librarians spend most of their teaching time in a classroom, using a podium and large screen, or at the reference desk on large screens. Because of this they are not highly attuned to the mobile user and how their guides display on other devices, even though their guides are being accessed by students on phones or tablets. We will be initiating a mobile reference service soon, perhaps this will help bring further awareness. For now, I recently taught an internal workshop in order demonstrate and share what I have learned in hopes of helping the librarians get these elements fixed. Helping ensure new guides will be created with mobile in mind is also a priority. To date, several librarians have gone through their guides and made the changes where necessary. Others have summer plans to update their guides and address these issues at the same time. I’m not aware of any way to make these changes in bulk, since they are very individual in nature.
Danielle Rosenthal is the Web Development & Design Librarian at Florida Gulf Coast University. She is responsible for the library’s web site and its applications in support of teaching, learning, and scholarship activities of the FGCU Library community. Her interests include user interface, responsive, and information design.
1 Maximizing your LibGuides for Mobile http://buzz.springshare.com/springynews/news-29/tips
When it comes to digital preservation, everyone agrees that a little bit is better than nothing. Look no further than these two excellent presentations from Code4Lib 2016, “Can’t Wait for Perfect: Implementing “Good Enough” Digital Preservation” by Shira Peltzman and Alice Sara Prael, and “Digital Preservation 101, or, How to Keep Bits for Centuries” by Julie Swierczek. I highly suggest you go check those out before reading more of this post if you are new to digital preservation, since they get into some technical details that I won’t.
The takeaway from these for me was twofold. First, digital preservation doesn’t have to be hard, but it does have to be intentional, and secondly, it does require institutional commitment. If you’re new to the world of digital preservation, understanding all the basic issues and what your options are can be daunting. I’ve been fortunate enough to lead a group at my institution that has spent the last few years working through some of these issues, and so in this post I want to give a brief overview of the work we’ve done, as well as the current landscape for digital preservation systems. This won’t be an in-depth exploration, more like a key to the map. Note that ACRL TechConnect has covered a variety of digital preservation issues before, including data management and preservation in “The Library as Research Partner” and using bash scripts to automate digital preservation workflow tasks in “Bash Scripting: automating repetitive command line tasks”.
The committee I chair started examining born digital materials, but expanded focus to all digital materials, since our digitized materials were an easier test case for a lot of our ideas. The committee spent a long time understanding the basic tenets of digital preservation–and in truth, we’re still working on this. For this process, we found working through the NDSA Levels of Digital Preservation an extremely helpful exercise–you can find a helpfully annotated version with tools by Shira Peltzman and Alice Sara Prael, as well as an additional explanation by Shira Peltman. We also relied on the Library of Congress Signal blog and the work of Brad Houston, among other resources. A few of the tasks we accomplished were to create a rough inventory of digital materials, a workflow manual, and to acquire many terabytes (currently around 8) of secure networked storage space for files to replace all removable hard drives being used for backups. While backups aren’t exactly digital preservation, we wanted to at the very least secure the backups we did have. An inventory and workflow manual may sound impressive, but I want to emphasize that these are living and somewhat messy documents. The major advantage of having these is not so much for what we do have, but for identifying gaps in our processes. Through this process, we were able to develop a lengthy (but prioritized) list of tasks that need to be completed before we’ll be satisfied with our processes. An example of this is that one of the major workflow gaps we discovered is that we have many items on obsolete digital media formats, such as floppy disks, that needs to be imaged before it can even be inventoried. We identified the tool we wanted to use for that, but time and staffing pressures have left the completion of this project in limbo. We’re now working on hiring a graduate student who can help work on this and similar projects.
The other piece of our work has been trying to understand what systems are available for digital preservation. I’ll summarize my understanding of this below, with several major caveats. This is a world that is currently undergoing a huge amount of change as many companies and people work on developing new systems or improving existing systems, so there is a lot missing from what I will say. Second, none of these solutions are necessarily mutually exclusive. Some by design require various pieces to be used together, some may not require it, but your circumstances may dictate a different solution. For instance, you may not like the access layer built into one system, and so will choose something else. The dream that you can just throw money at the problem and it will go away is, at present, still just a dream–as are so many library technology problems.
The closest to such a dream is the end-to-end system. This is something where at one end you load in a file or set of files you want to preserve (for example, a large set of donated digital photographs in TIFF format), and at the other end have a processed archival package (which might include the TIFF files, some metadata about the processing, and a way to check for bit rot in your files), as well as an access copy (for example, a smaller sized JPG appropriate for display to the public) if you so desire–not all digital files should be available to the public, but still need to be preserved.
Examples of such systems include Preservica, ArchivesDirect, and Rosetta. All of these are hosted vended products, but ArchivesDirect is based on open source Archivematica so it is possible to get some idea of the experience of using it if you are able to install the tools on which it based. The issues with end-t0-end systems are similar to any other choice you make in library systems. First, they come at a high price–Preservica and ArchivesDirect are open about their pricing, and for a plan that will meet the needs of medium-sized libraries you will be looking at $10,000-$14,000 annual cost. You are pretty much stuck with the options offered in the product, though you still have many decisions to make within that framework. Migrating from one system to another if you change your mind may involve some very difficult processes, and so inertia dictates that you will be using that system for the long haul, which a short trial period or demos may not be enough to really tell you that it’s a good idea. But you do have the potential for more simplicity and therefore a stronger likelihood that you will actually use them, as well as being much more manageable for smaller staffs that lack dedicated positions for digital preservation work–or even room in the current positions for digital preservation work. A hosted product is ideal if you don’t have the staff or servers to install anything yourself, and helps you get your long-term archival files onto Amazon Glacier. Amazon Glacier is, by the way, where pretty much all the services we’re discussing store everything you are submitting for long-term storage. It’s dirt cheap to store on Amazon Glacier and if you can restore slowly, not too expensive to restore–only expensive if you need to restore a lot quickly. But using it is somewhat technically challenging since you only interact with it through APIs–there’s no way to log in and upload files or download files as with a cloud storage service like Dropbox. For that reason, when you’re paying a service hundreds of dollars a terabyte that ultimately stores all your material on Amazon Glacier which costs pennies per gigabye, you’re paying for the technical infrastructure to get your stuff on and off of there as much as anything else. In another way you’re paying an insurance policy for accessing materials in a catastrophic situation where you do need to recover all your files–theoretically, you don’t have to pay extra for such a situation.
A related option to an end-to-end system that has some attractive features is to join a preservation network. Examples of these include Digital Preservation Network (DPN) or APTrust. In this model, you pay an annual membership fee (right now $20,000 annually, though this could change soon) to join the consortium. This gives you access to a network of preservation nodes (either Amazon Glacier or nodes at other institutions), access to tools, and a right (and requirement) to participate in the governance of the network. Another larger preservation goal of such networks is to ensure long-term access to material even if the owning institution disappears. Of course, $20,000 plus travel to meetings and work time to participate in governance may be out of reach of many, but it appears that both DPN and APTrust are investigating new pricing models that may meet the needs of smaller institutions who would like to participate but can’t contribute as much in money or time. This a world that I would recommend watching closely.
Up until recently, the way that many institutions were achieving digital preservation was through some kind of repository that they created themselves, either with open source repository software such as Fedora Repository or DSpace or some other type of DIY system. With open source Archivematica, and a few other tools, you can build your own end-to-end system that will allow you to process files, store the files and preservation metadata, and provide access as is appropriate for the collection. This is theoretically a great plan. You can make all the choices yourself about your workflows, storage, and access layer. You can do as much or as little as you need to do. But in practice for most of us, this just isn’t going to happen without a strong institutional commitment of staff and servers to maintain this long term, at possibly a higher cost than any of the other solutions. That realization is one of the driving forces behind Hydra-in-a-Box, which is an exciting initiative that is currently in development. The idea is to make it possible for many different sizes of institutions to take advantage of the robust feature sets for preservation in Fedora and workflow management/access in Hydra, but without the overhead of installing and maintaining them. You can follow the project on Twitter and by joining the mailing list.
After going through all this, I am reminded of one of my favorite slides from Julie Swierczek’s Code4Lib presentation. She works through the Open Archival Initiative System model graph to explain it in depth, and comes to a point in the workflow that calls for “Sustainable Financing”, and then zooms in on this. For many, this is the crux of the digital preservation problem. It’s possible to do a sort of ok job with digital preservation for nothing or very cheap, but to ensure long term preservation requires institutional commitment for the long haul, just as any library collection requires. Given how much attention digital preservation is starting to receive, we can hope that more libraries will see this as a priority and start to participate. This may lead to even more options, tools, and knowledge, but it will still require making it a priority and putting in the work.
About a month ago was the 2016 Code4Lib conference in sunny Philadelphia. I’ve only been to a few Code4Lib conferences, starting with Raleigh in 2014, but it’s quickly become my favorite libraryland conference. This won’t be a comprehensive recap but a little taste of what makes the event so special.
One of the best things about Code4Lib is the affordable preconferences. It’s often a pittance to add on a preconference or two, extending your conference for a whole day. Not only that, there’s typically a wealth of options: the 2015 conference boasted fifteen preconferences to choose from, and Philadelphia somehow managed to top that with an astonishing twenty-four choices. Not only are they numerous, the preconferences vary widely in their topics and goals. There’s always intensely practical ones focused on bootstrapping people new to a particular framework, programming language, or piece of software (e.g. Railsbridge, workshops focused on Blacklight or Hydra). But there are also events for practicing your presentation or the aptly named “Getting Ready for Workshops” Workshop. One of my personal favorite ideas—though I must admit I’ve never attended—is the perennial “Fail4Lib” sessions where attendees examine their projects that haven’t succeeded and discuss what they’ve learned.
This year, I wanted to run a preconference of my own. I enjoy teaching, but I rarely get to do it in my current position. Previously, in a more generalist technologist position, I would teach information literacy alongside the other librarians. But as a Systems Librarian, it can sometimes feel like I rarely get out from behind my terminal. A preconference was an appealing chance to teach information professionals on a topic that I’ve accumulated some expertise in. So I worked with Coral Sheldon-Hess to put together a workshop focused on the fundamentals of the command line: what it is, how to use it, and some of the pivotal concepts. I won’t say too much more about the workshop because Coral wrote an excellent, detailed blog post right after we were done. The experience was great and feedback we received, including a couple kind emails from our participants, was very positive. Perhaps we, or someone else, can repeat the workshop in the future, as we put all our materials online.
Main Course: Presentations
Thankfully I don’t have to detail the conference talks too much, because they’re all available on YouTube. If a talk looks intriguing, I strongly encourage you to check out the recording. I’m not too ashamed to admit that a few went way over my head, so seeing the original will certainly be more informative than any summary I could offer.
One thing that was striking was how the two keynotes centered on themes of privacy and surveillance. Kate Krauss, Director of Communications of the Tor Project, lead the conference off. Naturally, Tor being privacy software, Krauss focused on stories of government surveillance. She noted how surveillance focuses on the most marginalized people, citing #BlackLivesMatter and the transgender community as examples. Krauss’ talk provided concrete steps that librarians could take, for instance examining our own data collection practices, ensuring our services are secure, hosting privacy workshops, and running a Tor relay. She even mentioned The Library Freedom Project as a positive example of librarians fighting online surveillance, which she posited as one of the premier civil rights issues of our time.
— Ranti Junus (@ranti) March 11, 2016
On the final day, Gabriel Weinberg of the search engine DuckDuckGo spoke on similar themes, except he concentrated on how his company’s lack of personalization and tracking differentiated it from companies like Google and Apple. To me, Weinberg’s talk bookended well with Krauss’ because he highlighted the dangers of corporate surveillance. While the government certainly has abused its access to certain fundamental pieces of our country’s infrastructure—obtaining records from major telecom companies without a warrant comes to mind—tech companies are also culpable in enabling the unparalleled degree of surveillance possible in the modern era, simply by collecting such massive quantities of data linked to individuals (and, all too often, by failing to secure their applications properly).
While the pair of keynotes were excellent and thematic, my favorite moments of the conference were the talks by librarians. Becky Yoose gave perhaps the most rousing, emotional talk I’ve ever heard at a conference on the subject of burnout. Burnout is all too real in our profession, but not often spoken of, particularly in such a public venue. Becky forced us all to confront the healthiness and sustainability of our work/life balance, stressing the importance not only of strong organizational policies to prevent burnout but also personal practices. Finally, Andreas Orphanides gave a thoughtful presentation on the political implications of design choices. Dre’s well-chosen, alternatingly brutal and funny examples—from sidewalk spikes that prevent homeless people from lying in doorways, to an airline website labelling as “lowest” a price clearly higher than others on the very same page—outlined how our design choices reflect our values, and how we can better align our values with those of our users.
I don’t mean to discredit anyone else’s talks—there were many more excellent ones, on a variety of topics. Dinah Handel captured my feelings best in this enthusiastic tweet:
OMFG this data/linked open data/metadata panel is 🔥🔥🔥🔥🔥🔥🔥🔥 👏🏻 #c4l16 transparency, collaboration, version control, distributed systems 💓
— Dinah Handel (@DinahHandel) March 8, 2016
My main enjoyment from Code4Lib is the sense of community. You’ll hear a lot of people at conferences state things like “I feel like these are my people.” And we are lucky as a profession to have plenty of strong conference options, depending on our locality, specialization, and interests. At Code4Lib, I feel like I can strike up a conversation with anyone I meet about an impending ILS migration, my favorite command-line tool, or the vagaries of mapping between metadata schemas. While I love my present position, I’m mostly a solo systems person surrounded by a few other librarians all with a different expertise. As much as I want to discuss how ludicrous the webpub.def syntax is, or why reading XSLT makes me faintly ill, I know it’d bore my colleagues to death. At Code4Lib, people can at least tolerate such subjects of conversation, if not revel in them.
Code4Lib is great not solely because of it’s focus on technology and code, which a few other library organizations share, but because of the efforts of community members to make it a pleasurable experience for all. To name just a couple of the new things Code4Lib introduced this year: while previous years have had Duty Officers whom attendees could safely report harassment to, they were announced & much more visible this year; sponsored child care was available for conference goers with small children; and a service provided live transcription of all the talks.1 This is in addition to a number of community-building measures that previous Code4Lib conferences featured, such as a series of newcomers dinners on the first night, a “share and play” game night, and diversity scholarships. Overall, it’s evident that the Code4Lib community is committed to being positive and welcoming. Not that other library organizations aren’t, but it should be evident that our profession isn’t immune from problems. Being proactive and putting in place measures to prevent issues like harassment is a shining example of what makes Code4Lib great.
All this said, the community does have its issues. While a 40% female attendance rate is fair for a technology conference, it’s clear that the intersection of coding and librarianship is more male-dominated than the rest of the profession at large. Notably, Code4Lib has done an incredible job of democratically selecting keynote speakers over the past few years—five female and one male for the past three conferences—but the conference has also been largely white, so much so that the 2016 conference’s Program Committee gave a lightning talk addressing the lack of speaker diversity. Hopefully, measures like the diversity scholarships and conscious efforts on the part of the community can make progress here. But the unbearable whiteness of librarianship remains a very large issue.
Finally, it’s worth noting that Code4Lib is entirely volunteer-run. Since it’s not an official professional organization with membership dues and full-time staff members, everything is done by people willing to spare their own time to make the occasion a great one. A huge thanks to the local planning committee and all the volunteers who made such a great event possible. It’s pretty stunning to me that Code4Lib manages to put together some of the nicest benefits of any conference—the live streaming and transcribed talks come to mind—without a huge backing organization, and while charging pretty reasonable registration prices.
I’d recommend Code4Lib to anyone in the library community who deals with technology, whether you’re a manager, cataloger, systems person, or developer. There’s a wide breadth of material suitable for anyone and a great, supportive community. If that’s not enough, the proportion of presentations featuring pictures of cats and/or animated gifs is higher than your average conference.
Keeping up with technical skills and finding time to learn new things can be a struggle, no matter your role in a library (or in any organization, for that matter). In some academic libraries, professional development opportunities have been historically available to librarians and library faculty, and less available (or totally unavailable) for staff positions. In this post, I argue that this disparity, where it may exist, is not only prima facie unfair, but can reduce innovation and willingness to change in the library. If your library does not have a policy that specifically addresses training and professional development for all library staff, this post will provide some ideas on how to start crafting one.
In this post, when referring to “training and professional development,” I mostly have in mind technology training – though a training policy could cover non-technical training, such as leadership, time management, or project management training (though of course, some of those skills are closely related to technology).
In the absence of a staff training policy or formal support for staff training, staff are likely still doing the training, but may not feel supported by the library to do so. In ACRL TechConnect’s 2015 survey on learning programming in libraries, respondents noted disparities at their libraries between support for technical training for faculty or librarian positions and staff positions. Respondents also noted that even though support for training was available in principle (e.g., funding was potentially available for travel or training), workloads were too high to find the time to complete training and professional development, and some respondents indicated learning on their own time was the only feasible way to train. A policy promoting staff training and professional development should therefore explicitly allocate time and resources for training, so that training can actually be completed during work hours.
There is not a significant amount of recent research reflecting the impact of staff training on library operations. Research in other industries has found that staff training can improve morale, reduce employee turnover and increase organizational innovation.1 In a review of characteristics of innovative companies, Choudhary (2014) found that “Not surprisingly, employees are the most important asset of an organization and the most important source of innovation.” 2 Training and workshops – particularly those that feature “lectures/talks from accomplished persons outside the organization” are especially effective in fostering happy and motivated employees 3 – and it’s happy and motivated employees that contribute most to a culture of innovation in an organization.
Key Policy Elements
Your policy should outline how much time for training is available to each employee (for example, 2 hours a week or 8 hours a month). Ensuring that staff have enough time for training while covering their existing duties is the most challenging part of implementing a training policy or plan. For service desks in particular, scheduling adequate coverage while staff are doing professional development can be very difficult – especially as many libraries are understaffed. To free up time, an option might be to train and promote a few student workers to do higher-level tasks to cover staff during training (you’ll need to budget to pay these students a higher wage for this work). If your library wants to promote a culture of learning among staff, but there really is no time available to staff to do training, then the library probably needs more staff.
A training policy should be clear that training should be scheduled in advance with supervisor approval, and supervisors should be empowered to integrate professional development time into existing schedules. Your policy may also specify that training hours can be allocated more heavily during low-traffic times in the library, such as summer, spring, and winter breaks, and that employees will likely train less during high-traffic or project-intensive times of the year. In this way, a policy that specifies that an employee has X number of training hours per month or year might be more flexible than a policy that calls for X number of training hours per week.
Equipment and Space
Time is not enough. Equipment, particularly mobile devices such as iPads or laptops – should also be available for staff use and checkout. These devices should be configured to enable staff to install required plugins and software for viewing webinars and training videos. Library staff whose offices are open and vulnerable to constant interruption by patrons or student workers may find training is more effective if they have the option to check out a mobile device and head to another area – away from their desk – to focus. Quiet spaces and webinar viewing rooms may also be required, and most libraries already have group or individual study areas. Ensure that your policy states whether or how staff may reserve these spaces for training use.
There are tons of training materials, videos, and courses that are freely available online – but there are also lots of webinars and workshops that have a cost that are totally worth paying for. A library that offers funding for professional development for some employees (such as librarians or those with faculty status), but not others, risks alienating staff and sending the message that staff learning is not valued by the organization. Staff should know what the process is to apply for funding to travel, attend workshops, and view webinars. Be sure to write up the procedures for requesting this funding either in the training policy itself or documented elsewhere but available to all employees. Funding might be limited, but it’s vital to be transparent about travel funding request procedures.
An issue that is probably outside of the scope of a training policy, but is nonetheless very closely related, is staff pay. If you’re asking staff to train more, know more, and do more, compensation needs to reflect this. Pay scales may not have caught up to the reality that many library staff positions now require technology skills that were not necessary in the past; some positions may need to be re-classed. For this reason, creating a staff training policy may not be possible in a vacuum, but this process may need to be integrated with a library strategic planning and/or re-organization plan. It’s incredibly important on this point that library leadership is on board with a potential training policy and its strategic and staffing implications.
Align Training with Organizational Goals
It likely goes without saying that training and professional development should align with organizational goals, but you should still say it in your policy – and specify where those organizational goals are documented. How those goals are set is determined by the strategic planning process at your library, but you may wish to outline in your policy that supervisors and department heads can set departmental goals and encourage staff to undertake training that aligns with these goals. This can, in theory, get a little tricky: if we want to take a yoga class as part of our professional development, is that OK? If your organization values mindfulness and/or wellness, it might be!
If your library wants to promote a culture of experimentation and risk-taking, consider explicitly defining and promoting those values in your policy. This can help guide supervisors when working with staff to set training priorities. One exciting potential outcome of implementing a training policy is to foster an environment where employees feel secure in trying out new skills, so make it clear that employees are empowered to do so. Communication / Collaboration
Are there multiple people in your library interested in learning Ruby? If there were, would you have any way of knowing? Effective communication can be a massive challenge on its own (and is way beyond the scope of this post), but when setting up and documenting a training policy staff, you could include guidance for how staff should communicate their training activities with the rest of the library. This could take the form of something totally low-tech (like a bulletin board or shared training calendar in the break room) or could take the form of an intranet blog where everyone is encouraged to write a post about their recent training and professional development experiences. Consider planning to hold ‘share-fests’ a few times a year where staff can share new ideas and skills with others in the library to further recognize training accomplishments.
Training is in the Job Description
Training and professional development should be included in all job descriptions (a lot easier said than done, admittedly). Employees need to know they are empowered to use work time to complete training and professional development. There may be union, collective bargaining, and employee review implications to this – which I certainly am not qualified to speak on – but these issues should be addressed when planning to implement a training policy. For new hires going forward, expect to have a period of ‘onboarding’ during which time the new staff member will devote a significant amount of time to training (this may already be happening informally, but I have certainly had experiences as a staff member being hired in and spending the first few weeks of my new job trying to figure out what my job is on my own!).
Closing the Loop: Idea and Innovation Management
OK, so you’ve implemented a training policy, and now training and professional development is happening constantly in your library. Awesome! Not only is everyone learning new skills, but staff have great ideas for new services, or are learning about new software they want to implement. How do you keep the momentum going?
One option might be to set up a process to track ideas and innovative projects in your library. There’s a niche software industry around idea and innovation management that features some highly robust and specialized products (Brightidea, Spigit and Ideascale are some examples), but you could start small and integrate idea tracking into an existing ticket system like SpiceWorks, OSTicket, or even LibAnswers. A periodic open vote could be held to identify high-impact projects and prioritize new ideas and services. It’s important to be transparent and accountable for this – adopting internally-generated ideas can in and of itself be a great morale-booster if handled properly, but if staff feel like their ideas are not valued, a culture of innovation will die before it gets off the ground.
Does your library have a truly awesome culture of learning and employee professional development? I’d love to hear about it in the comments or @lpmagnuson.
- Sung, S. , & Choi, J. (2014). Do organizations spend wisely on employees? effects of training and development investments on learning and innovation in organizations. Journal of Organizational Behavior,35(3), 393-412. ↩
- Choudhary, A. (2014). Four Critical Traits of Innovative Organizations. Journal of Organizational Culture, Communication and Conflict, 18(2), 45-58. ↩
- Ibid. ↩
After much hard work over years by the Drupal community, Drupal users rejoiced when Drupal 8 came out late last year. The system has been completely rewritten and does a lot of great stuff–but can it do what we need Drupal websites to do for libraries? The quick answer seems to be that it’s not quite ready, but depending on your needs it might be worth a look.
For those who aren’t familiar with Drupal, it’s a content management system designed to manage complex sites with multiple types of content, users, features, and appearances. Certain “core” features are available to everyone out of the box, but even more useful are the “modules”, which extend the features to do all kinds of things from the mundane but essential backup of a site to a flashy carousel slider. However, the modules are created by individuals or companies and contributed back to the community, and thus when Drupal makes a major version change they need to be rewritten, quite drastically in the case of Drupal 8. That means that right now we are in a period where developers may or may not be redoing their modules, or they may be rethinking about how a certain task should be done in the future. Because most of these developers are doing this work as volunteers, it’s not reasonable to expect that they will complete the work on your timeline. The expectation is that if a feature is really important to you, then you’ll work on development to make it happen. That is, of course, easier said than done for people who barely have enough time to do the basic web development asked of them, much less complex programming or learning a new system top to bottom, so most of us are stuck waiting or figuring out our own solutions.
Despite my knowledge of the reality of how Drupal works, I was very excited at the prospect of getting into Drupal 8 and learning all the new features. I installed it right away and started poking around, but realized pretty quickly I was going to have to do a complete evaluation for whether it was actually practical to use it for my library’s website. Our website has been on Drupal 7 since 2012, and works pretty well, though it does need a new theme to bring it into line with 2016 design and accessibility standards. Ideally, however, we could be doing even more with the site, such as providing better discovery for our digital special collections and making the site information more semantic web friendly. It was those latter, more advanced, feature desires that made me really wish to use Drupal 8, which includes semantic HTML5 integration and schema.org markup, as well as better integration with other tools and libraries. But the question remains–would it really be practical to work on migrating the site immediately, or would it make more sense to spend some development time on improving the Drupal 7 site to make it work for the next year or so while working on Drupal 8 development more slowly?
A bit of research online will tell you that there’s no right answer, but that the first thing to do in an evaluation is determine whether any the modules on which your site depends are available for Drupal 8, and if not, whether there is a good alternative. I must add that while all the functions I am going to mention can be done manually or through custom code, a lot of that work would take more time to write and maintain than I expect to have going forward. In fact, we’ve been working to move more of our customized code to modules already, since that makes it possible to distribute some of the workload to others outside of the very few people at our library who write code or even know HTML well, not to mention taking advantage of all the great expertise of the Drupal community.
I tried two different methods for the evaluation. First, I created a spreadsheet with all the modules we actually use in Drupal 7, their versions, and the current status of those modules in Drupal 8 or if I found a reasonable substitute. Next, I tried a site that automates that process, d8upgrade.org. Basically you fill in your website URL and email, and wait a day for your report, which is very straightforward with a list of modules found for your site, whether there is a stable release, an alpha or beta release, or no Drupal 8 release found yet. This is a useful timesaver, but will need some manual work to complete and isn’t always completely up to date.
My manual analysis determined that there were 30 modules on which we depend to a greater or lesser extent. Of those, 10 either moved into Drupal core (so would automatically be included) or the functions on which used them moved into another piece of core. 5 had versions available in Drupal 8, with varying levels of release (i.e. several in stable alpha release, so questionable to use for production sites but probably fine), and 5 were not migrated but it was possible to identify substitute Drupal 8 modules. That’s pretty good– 18 modules were available in Drupal 8, and in several cases one module could do the job that two or more had done in Drupal 7. Of the additional 11 modules that weren’t migrated and didn’t have an easy substitution, three of them are critical to maintaining our current site workflows. I’ll talk about those in more detail below.
d8upgrade.org found 21 modules in use, though I didn’t include all of them on my own spreadsheet if I didn’t intend to keep using them in the future. I’ve included a screenshot of the report, and there are a few things to note. This list does not have all the modules I had on my list, since some of those are used purely behind the scenes for administrative purposes and would have no indication of use without administrative access. The very last item on the list is Core, which of course isn’t going to be upgraded to Drupal 8–it is Drupal 8. I also found that it’s not completely up to date. For instance, my own analysis found a pre-release version of Workbench Moderation, but that information had not made it to this site yet. A quick email to them fixed it almost immediately, however, so this screenshot is out of date.
I decided that there were three dealbreaker modules for the upgrade, and I want to talk about why we rely on them, since I think my reasoning will be applicable to many libraries with limited web development time. I will also give honorable mention to a module that we are not currently using, but I know a lot of libraries rely on and that I would potentially like to use in the future.
Webform is a module that creates a very simple to use interface for creating webforms and doing all kinds of things with them beyond just simply sending emails. We have many, many custom PHP/MySQL forms throughout our website and intranet, but there are only two people on the staff who can edit those or download the submitted entries from them. They also occasionally have dreadful spam problems. We’ve been slowly working on migrating these custom forms to the Drupal Webform module, since that allows much more distribution of effort across the staff, and provides easier ways to stop spam using, for instance, the Honeypot module or Mollom. (We’ve found that the Honeypot module stopped nearly all our spam problems and didn’t need to move to Mollom, since we don’t have user comments to moderate). The thought of going back to coding all those webforms myself is not appealing, so for now I can’t move forward until I come up with a Drupal solution.
Redirect does a seemingly tiny job that’s extremely helpful. It allows you to create redirects for URLs on your site, which is incredibly helpful for all kinds of reasons. For instance, if you want to create a library site branded link that forwards somewhere else like a database vendor or another page on your university site, or if you want to change a page URL but ensure people with bookmarks to the old page will still find it. This is, of course, something that you can do on your web server, assuming you have access to it, but this module takes a lot of the administrative overhead away and helps keep things organized.
Backup and Migrate is my greatest helper in my goal to be someone who would like to at least be in the neighborhood of best practices for web development when web development is only half my job, or some weeks more like a quarter of my job. It makes a very quick process of keeping my development, staging, and production sites in sync, and since I created a workflow using this module I have been far more successful in keeping my development processes sane. It provides an interface for creating a backup of your site database, files directories, or your database and files that you can use in the Backup and Migrate module to completely restore a site. I use it at least every two weeks, or more often when working on a particular feature to move the database between servers (I don’t move the files with the module for this process, but that’s useful for backups that are for emergency restoration of the site). There are other ways to accomplish this work, but this particular workflow has been so helpful that I hate to dump a lot of time into redoing it just now.
One last honorable mention goes to Workbench, which we don’t use but I know a lot of libraries do use. This allows you to create a much more friendly interface for content editors so they don’t have to deal with the administrative backend of Drupal and allows them to just see their own content. We do use Workbench Moderation, which does have a Drupal 8 release, and allows a moderation queue for the six or so members of staff who can create or edit content but don’t have administrative rights to have their content checked by an administrator. None of them particularly like the standard Drupal content creation interface, and it’s not something that we would ever ask the rest of the staff to use. We know from the lack of use of our intranet, which also is on Drupal, that no one particularly cares for editing content there. So if we wanted to expand access to website editing, which we’ve talked about a lot, this would be a key module for us to use.
Given the current status of these modules with rewrites in progress, it seems likely that by the end of the year it may be possible to migrate to Drupal 8 with our current setup, or in playing around with Drupal 8 on a development site that we determine a different way to approach these needs. If you have the interest and time to do this, there are worse ways to pass the time. If you are creating a completely new Drupal site and don’t have a time crunch, starting in Drupal 8 now is probably the way to go, since by the time the site would be ready you may have additional modules available and get to take advantage of all the new features. If this is something you’re trying to roll out by the end of the semester, maybe wait on it.
Have you considered upgrading your library’s site to Drupal 8? Have you been successful? Let us know in the comments.
Store and display high resolution images with the International Image Interoperability Framework (IIIF)Posted: February 25, 2016 | Author: Lauren Magnuson | Filed under: digital libraries | Comments Off on Store and display high resolution images with the International Image Interoperability Framework (IIIF)
Recently a faculty member working in the Digital Humanities on my campus asked the library to explore International Image Interoperability Framework (IIIF) image servers, with the ultimate goal of determining whether it would be feasible for the library to support a IIIF server as a service for the campus. I typically am not very involved in supporting work in the Digital Humanities on my campus, despite my background in (and love for) the humanities (philosophy majors, unite!). Since I began investigating this technology, I seem to see references to IIIF-compliance popping up all over the place, mostly in discussions related to IIIF compatibility in Digital Asset Management System (DAMS) repositories like Hydra 1 and Rosetta 2, but also including ArtStor3 and the Internet Archive 4.
IIIF was created by a group of technologists from Stanford, the British Library, and Oxford to solve three problems: 1) slow loading of high resolution images in the browser, 2) high variation of user experience across image display platforms, requiring users to learn new controls and navigation for different image sites, and 3) the complexity of setting up high performance image servers.5 Image servers traditionally have also tended to silo content, coupling back-end storage with either customized or commercial systems that do not allow additional 3rd party applications to access the stored data.
By storing your images in a way that multiple applications can access them and render them, you enable users to discover your content through a variety of different portals. With IIIF, images can be stored in a way that facilitates API access to them. This enables a variety of applications to retrieve the data. For example, if you have images stored in a IIIF-compatible server, you could have multiple front-end discovery platforms access the images through API, either at your own institution or other institutions that would be interested in providing gateways to your content. You might have images that are relevant to multiple repositories or collections; for instance, you might want your images to be discoverable through your institutional repository, discovery system, and digital archives system.
IIIF systems are designed to work with two components: an image server (such as the Python-based Loris application)6 and a front-end viewer (such as Mirador 7 or OpenSeadragon8). There are other viewer options out there (IIIF Viewer 9, for example), and you could conceivably write your own viewer application, or write a IIIF display plugin that can retrieve images from IIIF servers. Your image server can serve up images via APIs (discussed below) to any IIIF-compatible front-end viewer, and any IIIF-compatible front-end viewer can be configured to access information served by any IIIF-compatible image server.
IIIF Image API and Presentation API
IIIF-compatible software enables retrieval of content from two APIs: the Image API and the Presentation API. As you might expect, the Image API is designed to enable the retrieval of actual images. Supported file types depends on the image server application being used, but API calls enable the retrieval of specific file type extensions including .jpg, .tif, .png, .gif, .jp2, .pdf, and .webp.10. A key feature of the API is the ability to request images to be returned by image region – meaning that if only a portion of the image is requested, the image server can return precisely the area of the image requested.11 This enables faster, more nimble rendering of detailed image regions in the viewer.
The basic structure of a request to a IIIF image server follows a standard scheme:
An example request to a IIIF image server might look like this:
The Presentation API returns contextual and descriptive information about images, such as how an image fits in with a collection or compound object, or annotations and properties to help the viewer understand the origin of the image. The Presentation API retrieves metadata stored as “manifests” that are often expressed as JSON for Linked Data, or JSON-LD.13 Image servers such as Loris may only provide the ability to work with the Image API; Presentation API data and metadata can be stored on any server and image viewers such as Mirador can be configured to retrieve presentation API data.14
Why would you need a IIIF Image Server or Viewer?
IIIF servers and their APIs are particularly suited for use by cultural heritage organizations. The ability to use APIs to render high resolution images in the browser efficiently is essential for collections like medieval manuscripts that have very fine details that lower-quality image rendering might obscure. Digital humanities, art, and history scholars who need access to high quality images for their research would be able to zoom, pan and analyze images very closely. This sort of an analysis can also facilitate collaborative editing of metadata – for example, a separate viewing client could be set up specifically to enable scholars to add metadata, annotations, or translations to documents without necessarily publishing the enhanced data to other repositories.
A nice example of the power of the IIIF Framework is with the Biblissima Mirador demo site. As the project website describes it,
In this demo, the user can consult a number of manuscripts, held by different institutions, in the same interface. In particular, there are several manuscripts from Stanford and Yale, as well as the first example from Gallica and served by Biblissima (BnF Français 1728)….
It is important to note that the images displayed in the viewer do not leave their original repositories; this is one of the fundamental principles of the IIIF initiative. All data (images and associated metadata) remain in their respective repositories and the institutions responsible for them maintain full control over what they choose to share. 15.
The approach described by Biblissima represents the increasing shift toward designing repositories to guide users toward linked or related information that may not be actually held by the repository. While I can certainly anticipate some problems with this approach for some archival collections – injecting objects from other collections might skew the authentic representation of some collections, even if the objects are directly related to each other – this approach might work well to help represent provenance for collections that have been broken up across multiple institutions. Without this kind of architecture, researchers would have to visit and keep track of multiple repositories that contain similar collections or associated objects. Manuscript collections are particularly suited to this kind approach, where a single manuscript may have been separated into individual leaves that can be found in multiple institutions worldwide – these manuscripts can be digitally re-assembled without requiring institutions to transfer copies of files to multiple repositories.
One challenge we are running into in exploring IIIF is how to incorporate this technology into existing legacy applications that host high resolution images (for example, ContentDM and DSpace). We wouldn’t necessarily want to build a separate IIIF image server – it would be ideal if we could continue storing our high res images on our existing repositories and pull them together with a IIIF viewer such as Loris). There is a Python-based translator to enable ContentDM to serve up images using the IIIF standard16, but I’ve found it difficult to find case studies or step-by-step implementation and troubleshooting information (if you have set up IIIF with ContentDM, I’d love to know about your experience!). To my knowledge, there is not an existing way to integrate IIIF with DSpace (but again, I would love to stand corrected if there is something out there). Because IIIF is such a new standard, and legacy applications were not necessarily built to enable this kind of content distribution, it may be some time before legacy digital asset management applications integrate IIIF easily and seamlessly. Apart from these applications serving up content for use with IIIF viewers, embedding IIIF viewer capabilities into existing applications would be another challenge.
Finally, another challenge is discovering IIIF repositories from which to pull images and content. Libraries looking to explore supporting IIIF viewers will certainly need to collaborate with content experts, such as archivists, historians, digital humanities and/or art scholars, who may be familiar with external repositories and sources of IIIF content that would be relevant to building coherent collections for IIIF viewers. Viewers are manually configured to pull in content from repositories, and so any library wanting to support a IIIF viewer will need to locate sources of content and configure the viewer to pull in that content.
Undertaking support for IIIF servers and viewers is fundamentally not a trivial project, but can be a way for libraries to potentially expand the visibility and findability of their own high-resolution digital collections (by exposing content through a IIIF-compatible server) or enable their users to find content related to their collections (by supporting a IIIF viewer). While my library hasn’t determined what exactly our role will be in supporting IIIF technology, we will definitely be taking information learned from this experiences to shape our exploration of emerging digital asset management systems, such as Hydra and Islandora.
- IIIF Website: http://search.iiif.io/
- IIIF Metadata Overview: https://lib.stanford.edu/home/iiif-metadata-overview
- IIIF Google Group: https://groups.google.com/forum/#!forum/iiif-discuss
- https://wiki.duraspace.org/display/hydra/Page+Turners+%3A+The+Landscape ↩
- Tools for Digital Humanities: Implementation of the Mirador high-resolution viewer on Rosetta – Roxanne Wyns, Business Consultant, KU Leuven/LIBIS – Stephan Pauls, Software architect. http://igelu.org/wp-content/uploads/2015/08/5.42-IGeLU2015_5.42_RoxanneWyns_StephanPauls_v1.pptx ↩
- D-Lib Magazine. 2015. “”Bottled or Tap?” A Map for Integrating International Image Interoperability Framework (IIIF) into Shared Shelf and Artstor”. D-Lib Magazine. 2015-08. http://www.dlib.org/dlib/july15/ying/07ying.html ↩
- https://blog.archive.org/2015/10/23/zoom-in-to-9-3-million-internet-archive-books-and-images-through-iiif/ ↩
- Snydman, Stuart, Robert Sanderson and Tom Cramer. 2015. The International Image Interoperability Framework (IIIF): A
community & technology approach for web-based images. Archiving Conference 1. 16-21(6). https://stacks.stanford.edu/file/druid:df650pk4327/2015ARCHIVING_IIIF.pdf. ↩
- https://github.com/pulibrary/loris ↩
- http://github.com/IIIF/mirador ↩
- http://openseadragon.github.io/ ↩
- http://klokantech.github.io/iiifviewer/ ↩
- http://iiif.io/api/image/2.0/#format ↩
- http://iiif.io/api/image/2.0/#region ↩
- Snydman, Sanderson, and Cramer, The International Image Interoperability Framework (IIIF), 2 ↩
- http://iiif.io/api/presentation/2.0/#primary-resource-types-1 ↩
- https://groups.google.com/d/msg/iiif-discuss/F2_-gA6EWjc/2E0B7sIs2hsJ ↩
- http://www.biblissima-condorcet.fr/en/news/interoperable-viewer-prototype-now-online-mirador ↩
- https://github.com/IIIF/image-api/tree/master/translators/ContentDM ↩