This year’s Open Access Week at my institution was a bit different than before. With our time constrained by conference travel and staff shortages leaving everyone over-scheduled, we decided to aim for a week of “virtual programming”, with a week of blog posts and an invitation to view our open access research guide. While this lacked the splashiness of programming in prior years, in another way it felt important to do this work in this way. Yes, it may well be that only people already well-connected to the library saw any of this material. But promotion of open access requires a great deal of self-education among librarians or other library insiders before we can promote it more broadly. For many libraries, it may be the case that there are only a few “open access” people, and Open Access Week ends up being the only time during the year the topic is addressed by the library as a whole.
All the Colors of Open Access: Black and Green and Gold
There were a few shakeups in scholarly communication and open access over the past few months that made some of these discussions more broadly interesting across the academic landscape. The on-going saga of the infamous Beall’s List has been a major 2017 story. An article in the Chronicle of Higher Education about Jeffrey Beall was emailed to me more than once, and captured the complexity of why such a list is both an appealing solution to a problem but also reliant on sometimes questionable personal judgements. Jeffrey Beall’s attitude towards other scholarly communications librarians can be simplistic and vindictive, as an interview with Times Higher Education in August made clear. June saw the announcement of Cabell’s Blacklist, which is based on Beall’s list, and uses a list of criteria to judge journal quality. At my own institution I know this prompted discussions of what the purpose of a blacklist is, versus using a vetted list of open access journals like the Directory of Open Access Journals. As a researcher in an article in Nature about this product states, it’s likely that a blacklist is more useful for promotion and tenure committees or hiring committees to judge applicants more than for potential authors to find good journals in which to publish.
This also completely leaves aside the green open access options, in which authors can negotiate with their publisher to make a version of their article openly available–often the final published version, but at least the text before layout. While publishing an article in an open access journal has many benefits, green open access can meet the open access goals of faculty without worrying about paying additional fees or worrying about journal quality. But we still need to educate people on green open access. I was chatting with a friend who is an economist recently, and he was wondering about how open access worked in other disciplines, since he was used to all papers being released as working papers before being published in traditional journals. I contrast this conversation with another where someone in a very different discipline who was concerned that putting even a summary of research could constitute prior publication. Given this wide disparity between disciplines, we will always struggle with widely casting a message about green open access. But I firmly believe that there are individuals within all disciplines who will be excited about open access, and that they will get at least some of their colleagues on board–or perhaps their graduate students. These people may be located in the interdisciplinary side, with one foot in a more preprint-friendly discipline. For instance, the bioethicists in the theology department, or the history of science people in the history department. And even the most well-meaning people forget to make their work open access, so making it as easy as possible while not making it so easy that people don’t know why they would do it–make sure there are still avenues for conversation.
Making things easy to do requires having a good platform, but that became more complicated in August when Elsevier acquired bepress, which prompted discussions among many librarians about their values around open access and whether relying on vendors for open access platforms was a foolish gamble (the Library Loon summarizes this discussion well). This is a complex question, as the kinds of services provided by bepress’s Digital Commons go well beyond a simple hosting platform, and goes along with the strategy I pointed out Elsevier was pursuing in my Open Access 2016 post. Convincing faculty to participate in open access requires a number of strategies, and things like faculty profiles, readership dashboards, and attractive interfaces go a long way. No surprise that after purchasing platforms that make this easy, Elsevier (along other publishers) would go after ResearchGate in October, which is even easier to use in some ways, and certainly appealing for researchers.
All the discussion of predatory journals and blacklists (not to mention SciHub being ordered blocked thanks to an ACS lawsuit) seems old to those of us who have been doing this work for years, but it is still a conversation we need to have. More importantly, focusing on the positive aspects of open access helps get at the reasons people to participate in open access and move the conversation forward. We can do work to educate our communities about finding good open access journals, and how to participate legally. I believe that publishers are providing more green access options because their authors are asking for them, and we are helping authors to know how to ask.
I hope we were not too despairing this Open Access Week. We are doing good work, even if there is still a lot of poisonous rhetoric floating around. In the years I’ve worked in scholarly communication I’ve helped make thousands of articles, book chapters, dissertations, and books open access. Those items have in turn gone on to be cited in new publications. The scholarly communication cycle still goes on.
UPDATE: Just after this post was published, the U.S. Copyright Office released the long-awaited Discussion Document that was referenced below in this post. In this document the Copyright Office affirms a commitment to retaining the Fair Use Savings clause.
Libraries rely on exceptions to copyright law and provisions for fair use to provide services. Any changes to those rules have big implications to services we provide. With potential changes coming in an uncertain political climate, I would like to take a look at what we know, what we don’t know, and how it’s all related. Each piece as it currently stands works in relation to the others, and a change to any one of them changes the overall situation for libraries. We need to understand how everything relates, so that when we approach lawmakers or create policies we think holistically.
The International Situation
A few months back at the ALA Annual Conference in Chicago, I attended a panel called “Another Report from the Swamp,” which was a copyright policy specific session put on by the ALA Office of Information Technology Policy (OITP) featuring copyright experts Carrie Russell (OITP), Stephen Wyber (IFLA), Krista Cox (the Association of Research Libraries [ARL]). This panel addressed international issues in copyright in addition to those in the United States, which was a helpful perspective. They covered a number of topics, but I will focus on the Marrakesh Treaty and potential changes to US Code Title 17, Section 108 (Reproduction by libraries and archives).
Stephen Wyber and Krista Cox covered the WIPO Marrakesh Treaty to Facilitate Access to Published Works for Persons Who Are Blind, Visually Impaired or Otherwise Print Disabled (aka the Marrakesh Treaty), which the US is a signatory to, but has not yet been ratified by the US Senate (see Barack Obama’s statement in February 2016). According to them, in much of the developing world only 1% of published work is available for those with print disabilities. This was first raised as issue in 1980, and 33 years later debates at WIPO began to address the situation. This treaty was ratified last year, and permits authorized parties (including libraries) to make accessible copies of any work in a medium that is appropriate for the disabled individual. In the United States, this is generally understood to be permitted by Title 17 sections Fair Use (Section 107) and Section 121 (aka the Chaffee amendment), though this is still legally murky 1. This is not the case internationally. Stephen Wyber pointed out that IFLA must engage at the European level with the European Commission for negotiations at WIPO, and there is no international or cross-border provision for libraries, archives, or museums.
According to Krista Cox, a reason for the delay in ratification was that the Senate Committee on Foreign Relations wouldn’t move it to ratification unless it was a non-controversial treaty with no changes required for US law (and it should not have required changes). The American Association of Publishers (AAP) wanted to include recordkeeping requirements, which disability and library advocates argued would be onerous. (A good summary of the issues is available from the ALA Committee on Legislation). During the session, a staff attorney from the AAP stood up and made the point that their position was that it would be helpful for libraries to track what accessible versions of material they had made. While not covered in the session explicitly, a problem with this approach is that it would create a list of potentially sensitive information about patron activities. Even if no names were attached, the relatively few people making use of the service would make it possible to identify individual users. In any event, the 114th Congress took no action, and it is unclear when this issue will be taken up again. For this reason, we have to continue to rely on existing provisions of the US Code.
Along those lines, the panel gave a short update on potential changes to Section 108 of the Copyright Act, which have been under discussion for many years. Last year, the Copyright Office invited stakeholders to set up meetings to discussion revisions. The library associations met with them last July, and generally while the beneficiaries of Section 108 find revisions controversial and oppose reform, the Copyright Office is continuing work on this. One fear with revisions is that the Fair Use exception clause (17 § 108 (F)(4)) would be removed. Krista Cox reported that at the Copyright Society of the USA meeting in early June 2017, the Copyright Office reported that they were working on a report with proposed legislation, but no one has seen this report [NOTE: the report is now available.].
Implications for Revisions to Title 17
Moving beyond the panel, let’s look at the larger implications for revisions to Title 17. There are some excellent reasons to revise Section 108 and others–just as the changes in 1976 reflected changes in photocopying technology 2, changes in digital technology and the services of libraries require additional help. In 2008, the Library of Congress Section 108 Study Group released a lengthy report with a set of recommendations for revisions, which can be boiled down into extending permissions for preservation activities (though that is a gross oversimplification). In 2015 Maria A. Pallante testified to the Committee on the Judiciary of the House of Representatives on a wide variety of changes to the Copyright Act (not just for libraries), which incorporated the themes from that 2008 report, in addition to other later discussions. Essentially, she says that changes in technology and culture in the past 20 years made much of the act unclear and required application of loopholes and workarounds that were legally unclear. For instance, libraries rely heavily on Section 107, which covers fair use, to perform their daily functions. This report points out that those activities should be explicitly permitted rather than relying on potentially ambiguous language in Section 107, since the ambiguity means some institutions are unwilling to perform activities that may be permitted due to fear. On the other hand, that ambiguous language opens up more possibilities that adventurous projects such as HathiTrust have used to push on boundaries and expand the nature of fair use and customary practice. The ARL has a Code of Best Practices in Fair Use that details what is currently considered customary practice. With revisions, there enters the possibility that what is allowed will be dictated by, for instance, the publishing lobby, and that what libraries can do will be overly circumscribed. Remember, too, that one reason for not ratifying the Marrakesh Treaty is that allowances for reproductions for the disabled are covered by Fair Use and the Chaffee amendment.
Orphan works are another issue. While the Pallante report suggests that it would be in everyone’s interest to have clear guidance on what a good faith effort to identify a copyright holder actually meant, in many ways we would rather have general practice mandate this. Speaking as someone who spends a good portion of my time clearing permissions for material and frequently running into unresponsive or unknown copyright holders, I feel more comfortable pushing the envelope if I have clearly documented and consistently followed procedures based on practices that I know other institutions follow as well (see the Statement of Best Practices). This way I have been able to make useful scholarship more widely available despite the legal gray area. But there is a calculated risk, and many institutions choose to never make such works available due to the legal uncertainty. Last year the Harvard Office of Scholarly Communication released a report on legal strategies for orphan work digitization to give some guidance in this area. To summarize over 100 pages, there are a variety of legal strategies libraries can take to either minimize the risk of a dispute or reduce negative consequences of a copyright dispute–which remains largely hypothetical when it comes to orphan works and libraries anyway.
There is one other important wrinkle in all this. The Copyright Office’s future is politically uncertain. It could be removed from the purview of the Library of Congress, and the Register of Copyrights be made a political appointment. This was passed by the House in April and introduced in the Senate in May, and was seen as a rebuke to Carla Haydenn. Karyn Temple Claggett is the acting Registrar, replacing Maria Pallante who resigned last year after Carla Hayden became the new Librarian of Congress and appointed (some say demoted) her to the post of Senior Advisor for Digital Strategy. Maria Pallante is now CEO of–you guessed it–the American Association of Publishers. The story is full of intrigue and clashing opinions–one only has to see the “possibly not neutral” banner on Pallante’s Wikipedia page to see that no one will agree on the reasons for Pallante’s move from Register of Copyrights (it may have been related to wasteful spending), but libraries do not see the removal of copyright from the Library of Congress as a good thing. More on this is available at the recent ALA report “Lessons From History: The Copyright Office Belongs in the Library of Congress.”
Given that we do not know what will happen to the Copyright Office, nor exactly what their report will recommend, it is critical that we pay attention to what is happening with copyright. While more explicit provisions to allow more would be excellent news, as the panel at ALA pointed out, lawmakers are more exposed to Hollywood and the content creator organizations such as AAP, RIAA and MPAA and so may be more likely to see arguments from their point of view. We should continue to take advantage of provisions we currently have for fair use and providing access to orphan works, since exercising this right is one way we keep it.
- “Briefing: Accessibility, the Chafee Amendment, and Fair Use. ” (2012). Association of Research Libraries. http://www.arl.org/focus-areas/copyright-ip/fair-use/code-of-best-practices/2445-briefing-accessibility-the-chafee-amendment-and-fair-use#.WbadWMiGNPY ↩
- Federal Register Vol. 81, No. 109 https://www.federalregister.gov/d/2016-13426/p-8 ↩
Is the future of research voice controlled? It might be, because when I originally had the idea for this post my first instinct was to grab my phone and dictate my half-formed ideas into a note, rather than typing it out. Writing things down often makes them seem wrong and not at all what we are trying to say in our heads. (Maybe it’s not so new, since as you may remember Socrates had a similar instinct.) The idea came out of a few different talks at the national Code4Lib conference held in Los Angeles in March of 2017 and a talk given by Chris Bourg. Among these presentations the themes of machine learning, artificial intelligence, natural language processing, voice search, and virtual assistants intersect to give us a vision for what is coming. The future might look like a system that can parse imprecise human language and turn it into an appropriately structured search query in a database or variety of databases, bearing in mind other variables, and return the correct results. Pieces of this exist already, of course, but I suspect over the next few years we will be building or adapting tools to perform these functions. As we do this, we should think about how we can incorporate our values and skills as librarians into these tools along the way.
Natural Language Processing
I will not attempt to summarize natural language processing (NLP) here, except to say that speaking to a computer requires that the computer be able to understand what we are saying. Human—or natural—language is messy, full of nuance and context that requires years for people to master, and even then often leads to misunderstandings that can range from funny to deadly. Using a machine to understand and parse natural language requires complex techniques, but luckily there are a lot of tools that can make the job easier. For more details, you should review the NLP talks by Corey Harper and Nathan Lomeli at Code4Lib. Both these talks showed that there is a great deal of complexity involved in NLP, and that its usefulness is still relatively confined. Nathan Lomeli puts it like this. NLP can “cut strings, count beans, classify things, and correlate everything”. 1 Given a corpus, you can use NLP tools to figure out what certain words might be, how many of those words there are, and how they might connect to each other.
Processing language to understand a textual corpus has a long history but is now relatively easy for anyone to do with the tools out there. The easiest is Voyant Tools, which is a project by Sinclair, Stéfan Sinclair and Geoffrey Rockwell. It is a portal to a variety of tools for NLP. You can feed it a corpus and get back all kind of counts and correlations. For example, Franny Gaede and I used VoyantTools to analyze social justice research websites to develop a social justice term corpus for a research project. While a certain level of human review is required for any such project, it’s possible to see that this technology can replace a lot of human-created language. This is already happening, in fact. A tool called Wordsmith can create convincing articles about finance, sports, and technology, or really any field with a standard set of inputs and outputs in writing. If computers are writing stories, they can also find stories.
Talking to the Voice in the Machine
Finding those stories, and in turn, finding the data with which to tell more stories, is where machine learning and artificial intelligence enter. In libraries we have a lot of words, and while we have various projects that are parsing those words and doing things with them, we have only begun to see where this can go. There are two sides to this. Chris Bourg’s talk at Harvard Library Leadership in a Digital Age, asks the question “What happens to libraries and librarians when machines can read all the books?” One suggestion she makes is that:
we would be wise to start thinking now about machines and algorithms as a new kind of patron — a patron that doesn’t replace human patrons, but has some different needs and might require a different set of skills and a different way of thinking about how our resources could be used. 2
One way in which we can start to address the needs of machines as patrons is by creating searches that work with them, which is for now ultimately to serve the needs of humans, but in the future could be for their own artificial intelligence purposes. Most people are familiar with virtual assistants that have popped up on all platforms over the past few years. As an iOS and a Windows user, I am now constantly invited to speak to Siri or Cortana to search for answers to my questions or fix something in my schedule. While I’m perfectly happy to ask Siri to remind me to bring my laptop to work at 7:45 AM or to wake me up in 20 minutes, I find mixed results when I try to ask a more complex question. 3 Sometimes when I ask the temperature on the surface of Jupiter I get the answer, other times I get today’s weather in a town called Jupiter. This is not too surprising, as asking “What is the temperature of Jupiter?” could mean a number of things. It’s on the human to specify to the computer to which domain of knowledge you are referring, which requires knowing exactly how to ask the question. Computers cannot yet do a reference interview, since they cannot pick up on the subtle hidden meanings or helping with the struggle for the right words that librarians do so well. But they can help with certain types of research tasks quite well, if you know how to ask the question. Eric Frierson (PPT) gave a demonstration of his project working on voice powered search in EBSCO using Alexa. In the presentation he demonstrates the Alexa “skills” he set up for people to ask Alexa for help. They are “do you have”, “the book”, “information about”, “an overview of”, “what I should read after”, or “books like”. There is a demonstration of what this looks like on YouTube. The results are useful when you say the correct thing in the correct order, and for an active user it would be fairly quick to learn what to say, just as we learn how best to type in a search query in various services.
Why ask a question of a computer rather than type in a question to a computer? For the reason I started this piece with, certainly–voice is there, and it’s often easier to say what you mean than write it. This can be taken pragmatically as well. If you find typing difficult, being able to speak makes life easier. When I was home with a newborn baby I really appreciated being able to dictate and ask Siri about the weather forecast and what time the doctor’s appointment was. Herein lies one of the many potential pitfalls of voice: who is listening to what you are saying? One recent news story puts this in perspective, as Amazon agreed to turn over data from Alexa to police in a murder investigation after the suspect gave the ok. They refused to do at first, but it is an open question as to the legal nature of the conversation with a virtual assistant. Nor is it entirely clear when you speak to a device where the data is being processed. So before we all rush out and write voice search tools for all our systems, it is useful to think about where that data lives what the purpose of it is.
If we would protect a user’s search query by ensuring that our catalogs are encrypted (and let’s be honest, we aren’t there yet), how do we do the same for virtual search assistants in the library catalog? For Alexa, that’s built into creating an Alexa skill, since a basic requirement for the web service used is that it meet Amazon’s security requirements. But if this data is subject to subpoena, we would have to think about it in the same way we would any other search data on a third party system. And we also have to recognize that these tools are created by these companies for commercial purposes, and part of that is to gather data about people and sell things to them based on that data. Machine learning could eventually build on that to learn a lot more about people than they think, which the Amazon Echo Look recently brought up as a subject of debate. There are likely to be other services popping up in addition to those offered by Amazon, Google, Apple, and Microsoft. Before long, we might expect our vendors to be offering voice search in their interfaces, and we need to be aware of the transmission of that data and where it is being processed. A recent alliance formed called The Voice Privacy Alliance, which is developing some standards for this.
The invisibility of the result processing has another dark side. The biases inherent in the algorithms become even more hidden, as the first result becomes the “right” one. If Siri tells me the weather in Jupiter, that’s a minor inconvenience, but if Siri tells me that “Black girls” are something hypersexualized, as Safiya Noble has found that Google does, do I (or let’s say, a kid) necessarily know something has gone wrong? 4 Without human intervention and understanding, machines can perpetuate the worst side of humanity.
This comes back to Chris Bourg’s question. What happens to librarians when machines can read all the books, and have a conversation with patrons about those books? Luckily for us, it is unlikely that artificial intelligence will ever be truly self-aware with desires, metacognition, love, and need for growth and adventure. Those qualities will continue to make librarians useful to creating vibrant and unique collections and communities. But we will need to fit that in a world where we are having conversations with our computers about those collections and communities.
- Lomeli, Nathan. “Natural Language Processing: Parsing Through The Hype”. Code4Lib. Los Angeles, CA. March 7, 2017. ↩
- “What Happens to Libraries and Librarians When Machines Can Read All the Books?” Feral Librarian, March 17, 2017. https://chrisbourg.wordpress.com/2017/03/16/what-happens-to-libraries-and-librarians-when-machines-can-read-all-the-books/. ↩
- As a side issue, I don’t have a private office and I feel weird speaking to my computer when there are people around. ↩
- Noble, Safiya Umoja. “Google Search: Hyper-Visibility as a Means of Rendering Black Women and Girls Invisible – InVisible Culture.” InVisible Culture: An Electronic Journal for Visual Culture, no. 19 (2013). http://ivc.lib.rochester.edu/google-search-hyper-visibility-as-a-means-of-rendering-black-women-and-girls-invisible/. ↩
Libraries who use a flexible content management system such as Drupal or WordPress for their library website and/or resource discovery have a challenge in ensuring that their data is accessible to the rest of the library world. Whether making metadata useable by other libraries or portals such as DPLA, or harvesting content in a discovery layer, there are some additional steps libraries need to take to make this happen. While there are a number of ways to accomplish this, the most straightforward is to create an OAI-PMH feed. OAI-PMH stands for Open Archives Initiative Protocol for Metadata Harvesting, and is a well-supported and understood protocol in many metadata management systems. There’s a tutorial available to understand the details you might want to know, and the Open Archives Initiative has detailed documentation.
Content management tools designed specifically for library and archives usage, such as LibGuides and Omeka, have a built in OAI-PMH feed, and generally all you need to do is find the base URL and plug it in. (For instance, here is what a LibGuides OAI feed looks like). In this post I’ll look at what options are available for Drupal and WordPress to create the feed and become a data provider.
This is short, since there aren’t that many options. If you use WordPress for your library website you will have to experiment, as there is nothing well-supported. Lincoln University in New Zealand has created a script that converts a WordPress RSS feed to a minimal OAI feed. This requires editing a PHP file to include your RSS feed URL, and uploading to a server. I admit that I have been unsuccessful at testing this, but Lincoln University has a working example, and uses this to harvest their WordPress library website into Primo.
If you use Drupal, you will need to first install a module called Views OAI-PMH. What this does is create a Drupal view formatted as an OAI-PMH data provider feed. Those familiar with Drupal know that you can use the Views module to present content in a variety of ways. For instance, you can include certain fields from certain content types in a list or chart that allows you to reuse content rather than recreating it. This is no different, only the formatting is an OAI-PMH compliant XML structure. Rather than placing the view in a Drupal page or block, you create a separate page. This page becomes your base URL to provide to others or reuse in whatever way you need.
The Views OAI-PMH module isn’t the most obvious module to set up, so here are the basic steps you need to follow. First, enable and set permissions as usual. You will also want to refresh your caches (I had trouble until I did this). You’ll discover that unlike other modules the documentation and configuration is not in the interface, but in the README file, so you will need to open that out of the module directory to get the configuration instructions.
To create your OAI-PMH view you have two choices. You can add it to a view that is already created, or create a new one. The module will create an example view called Biblio OAI-PMH (based on an earlier Biblio module used for creating bibliographic metadata). You can just edit this to create your OAI feed. Alternatively, if you have a view that already exists with all the data you want to include, you can add an OAI-PMH display as an additional display. You’ll have to create a path for your view that will make it accessible via a URL.
The Views OAI-PMH module only supports Dublin Core at this time. If you are using Drupal for bibliographic metadata of some kind, mapping the fields is a fairly straightforward process. However, choosing the Dublin Core mappings for data that is not bibliographic by nature requires some creativity and thought about where the data will end up. When I was setting this up I was trying to harvest most of the library website into our discovery layer, so I knew how the discovery layer parsed OAI DC and could choose fields accordingly.
After adding fields to the view (just as you normally would in creating a view), you will need to select settings for the OAI view to select the Dublin Core element name for each content field.
You can then map each element to the appropriate Dublin Core field. The example from my site includes some general metadata that appears on all content (such as Title), and some that only appears in specific content types. For instance, Collection Description only appears on digital collection content types. I did not choose to include the body content for any page on the site, since most of those pages contain a lot of scripts or other code that wasn’t useful to harvest into the discovery layer. Explanatory content such as the description of a digital collection or a database was more useful to display in the discovery layer, and exists only in special fields for those content types on my Drupal site, so we could pull those out and display those.
In the end, I have a feed that looks like this. Regular pages end up with very basic metadata in the feed:
<metadata> <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Hours</dc:title> <dc:identifier>http://libraries.luc.edu/hours</dc:identifier><dc:creator>Loyola University Libraries</dc:creator></oai_dc:dc> </metadata>
Whereas databases get more information pulled in. Note that there are two identifiers, one for the database URL, and one for the database description link. We will make these both available, but may choose one to use only one in the discovery layer and hide the other one.
<metadata> <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Annual Bibliography of English Language and Literature</dc:title> <dc:identifier>http://flagship.luc.edu/login?url=http://collections.chadwyck.com/home/home_abell.jsp</dc:identifier> <dc:subject>Modern Languages</dc:subject> <dc:type>Index/Database</dc:type> <dc:identifier>http://libraries.luc.edu/annual-bibliography-english-language-and-literature</dc:identifier> <dc:creator>Loyola University Libraries</dc:creator> </oai_dc:dc> </metadata>
When someone does a search in the discovery layer for something on the library website, the result shows the page right in the interface. We are still doing usability tests on this right now, but expect to move it into production soon.
I’ve just touched on two content management systems, but there are many more out there. Do you create OAI-PMH feeds of your data? What do you do with them? Share your examples in the comments.
It’s Open Access Week, which for scholarly communications librarians and institutional repository managers is one of the big events of the year to reflect on our work and educate others. Over the years, it has become less necessary to explain what open access is. Rather, everyone seems to have a perception of open access and an opinion about it. But those perceptions and opinions may not be based on the original tenets of the open access movement. The commercialization of open access means that it may now seem too expensive to pursue for individuals to publish open access, and too complicated for institutions to attempt without buying a product.
In some ways, the open access movement is analogous to punk music–a movement that grew out of protest and DIY sensibilities, but was quickly coopted as soon as it became financially successful. While it was never free or easy to make work open access, changes in the market recently might make it feel even more expensive and complicated. Those who want to continue to build open access repositories and promote open access need to understand where their institution fits in the larger picture, the motivations of researchers and administration, and be able to put the right solutions together to avoid serious missteps that will set back the open access movement.
Like many exciting new ideas, open access is partially a victim of its own success. Heather Morrison has kept for the past ten years a tally of the dramatic growth of open access in an on-going series. Her post for this year’s Open Access Week is the source for the statistics in this paragraph. Open access content makes up a sizeable portion of online content, and therefore is more of a market force. BASE now includes 100 million articles. Directory of Open Access Journals, even after the stricter inclusion process, has an 11% growth in article level searching with around 500,000 items. There are well over a billion items with a Creative Commons license. These numbers are staggering, and give a picture of how overwhelming the amount of content available all told is, much less open access. But it also means that almost everyone doing academic research will have benefited from open access content. Not everyone who has used open access (or Creative Commons licensed) content will know what it is, but as it permeates more of the web it becomes more and more relevant. It also becomes much harder to manage, and dealing with that complexity requires new solutions–which may bring new areas of profit.
An example of this new type of service is 1Science, which launched a year ago. This is a service that helps libraries manage their open access collections, both in terms of understanding what is available in their subscribed collections as well as their faculty output. 1Science grew out of longer term research projects around emerging bibliometrics, started by Eric Archambault, and according to their About Us page started as a way to improve findability of open access content, and grew into a suite of tools that analyzes collections for open access content availability. The market is now there for this to be a service that libraries are interested in purchasing. Similar moves happened with alternative metrics in the last few years as well (for instance, Plum Analytics).
But the big story for commercial open access in 2016 was Elsevier. Elsevier already had a large stable of open access author-pays journals, with fees of up to $5000. That is the traditional way that large commercial publishers have participated in open access. But Elsevier has made several moves in 2016 that are changing the face of open access. They acquired SSRN in May, which built on their acquisition of Mendeley in 2013, and hints at a longer term strategy for combining a content platform and social citation network that potentially could form a new type of open access product that could be marketed to libraries. Their move into different business models for open access is also illustrated in their controversial partnership with the University of Florida. This uses an API to harvest content from ScienceDirect published by UF researchers, but will not provide access to those without subscriptions except to certain accepted manuscripts, and grew out of a recognition that UF researchers published heavily in Elsevier journals and that working directly with Elsevier would allow them to get a large dataset of their researchers’ content and funder compliance status more easily. 1 There is a lot to unpack in this partnership, but the fact that it can even take place shows that open access–particularly funder compliance for open access versions–is something about which university administration outside the library in the Office of Research Services is taking note. Such a partnership serves certain institutional needs, but it does not create an open access repository, and in most ways serves the needs of the publisher in driving content to their platform (though UF did get a mention of interlibrary loan into the process rather than just a paywall). It also removes incentives for UF faculty to publish in non-Elsevier journals, since their content in those journals will become even easier to find, and there will be no need to look elsewhere for open access grant compliance. Either way, this type of move takes the control of open access out of the hands of libraries, just as so many previous deals with commercial enterprises have done.
As I said in the beginning of this piece, more and more people already know about and benefit from open access, but all those people have different motivations. I break those into three categories, and which administrative unit I think is most likely to care about that aspect of open access:
- Open access is about the justice of wider access to academic content or getting back at the big publishers for exploitative practices. These people aren’t going to be that interested in a commercial open access solution, except inasmuch as it allows more access for a lower cost–for instance, a hosted institutional repository that doesn’t require institutional investment in developers. This group may include librarians and individual researchers.
- Open access is about following the rules for a grant-funded project since so many of those require open access versions of articles. Such requirements lead to an increase in author-pays open access, since publishers can command a higher fee that can be part of the grant award or subsidized by an institution. Repositories to serve these requirements and to address these needs are in progress but still murky to many. This group may include the Office of Research Services or Office of Institutional Research.
- “Open access” is synonymous with putting articles or article citations online to create a portfolio for reputation-building purposes. This last group is going to find something like that UF/Elsevier partnership to be a great situation, since they may not be aware of how many people cannot actually read the published articles. This last group may include administrators concerned with building the institution’s reputation.
For librarians who fall into the first category but are sensitive to the needs of everyone in each category, it’s important to select the right balance of solutions to meet everyone’s needs but still maintain the integrity of the open access repository. That said, this is not easy. Meeting these variety of needs is exactly why some of these new products are entering the market, and it may seem easier to go with one of them even if it’s not exactly the right long-term solution. I see this as an important continuing challenge facing librarians who believe in open access, and have to underpin future repository and outreach strategies.
- Russell, Judith C.; Wise, Alicia; Dinsmore, Chelsea S.; Spears, Laura I.; Phillips, Robert V.; and Taylor, Laurie (2016) “Academic Library and Publisher Collaboration: Utilizing an Institutional Repository to Maximize the Visibility and Impact of Articles by University Authors,” Collaborative Librarianship: Vol. 8: Iss. 2, Article 4.
Carousels are a popular website feature because they allow one to fit extra information within the same footprint and provide visual interest on a page. But as you most likely know, there is wide disagreement about whether they should ever be used. Reasons include: they can be annoying, no one spends long enough on a page to ever see beyond the first item, people rarely click on them (even if they read the information) and they add bloat to pages (Michael Schofield has a very compelling set of slides on this topic). But by far the most compelling argument against them is that they are difficult if not impossible to make accessible, and accessibility issues exist for all types of users.
In reality, however, it’s not always possible to avoid carousels or other features that may be less than ideal. We all work within frameworks, both technical and political, and we need to figure out how to create the best case scenario within those frameworks. If you work in a university or college library, you may be constrained by a particular CMS you need to use, a particular set of brand requirements, and historical design choices that may be slower to go away in academia than elsewhere. This post is a description of how I made some small improvements to my library website’s carousel to increase accessibility, but I hope it can serve as a larger discussion of how we can always make small improvements within whatever frameworks we work.
What Makes an Accessible Carousel?
We’ve covered accessibility extensively on ACRL TechConnect before. Cynthia Ng wrote a three part series in 2013 on making your website accessible, and Lauren Magnuson wrote about accessibility testing LibGuides in 2015. I am not an expect by any means on web accessibility, and I encourage you to do additional research about the basics of accessibility. For this specific project I needed to understand what it is specifically about carousels that makes them particular inaccessible, and how to ameliorate that. When I was researching this specific project, I found the following resources the most helpful.
The basic issues with carousels are that they move at their own pace but in a way that may be difficult to predict, and are an inherently visual medium. For people with visual impairments the slideshow images are irrelevant unless they provide useful information, and their presence on the page causes difficulty for screen reading software. For people with motor or cognitive impairments (which covers nearly everyone at some point in their lives) a constantly shifting image may be distracting and even if the content is interesting it may not be possible to click on the image at the rate it is set to move.
You can increase accessibility of carousels by making it obvious and easy for users to stop the slideshow and view images at their own pace, make the role of the slideshow and the controls on the page obvious to screen reading software, to make it possible to control the slideshow without a mouse, and to make it still work without stylesheets. Alternative methods of accessing the content have to be available and useful.
I chose to work on the slideshow as part of a retheming of the library website to bring it up to current university branding standards and to make it responsive. The current slideshow lacked obvious controls or any instructions for screen readers, and was not possible to control without a mouse. My general plan in approaching this was to ensure that there were obvious controls to control the slideshow (and that it would pause quickly without a lot of work), have ARIA roles for screen readers, and be keyboard controllable. I had to work with the additional constraints of making this something that would work in Drupal, be responsive, and that would allow the marketing committee to post their own images without my intervention but would still require alt tags and other crucial items for accessibility.
Because the library’s website uses Drupal, it made sense to look for a solution that was designed to work with Drupal. Many options exist, and everyone has a favorite or a more appropriate choice for a particular situation, so if you are looking for a good Drupal solution you’ll want to do your own research. I ended up choosing a Drupal module called Views Slideshow after looking at several options. It seemed to be customizable enough that I was pretty sure I could make it accessible even though it lacked some of the features out of the box. The important thing to me is that it would make it possible to give the keys to the slideshow operation to someone else. The way our slideshow traditionally worked required writing HTML into the middle of a hardcoded homepage and uploading the image to the server in a separate process. This meant that my department was a roadblock to updating the images, and required careful coordination before vacations or times away to ensure we could get the images changed. We all agreed that if the slideshow was going to stay, this process had to improve.
Why not just remove the slideshow entirely? That’s one option we definitely considered, but one important caveat I set early in the redesign process was to leave the site content and features alone and just update the look and feel of the site. Thus I wanted to leave every current piece of information that was an important part of the homepage as is, though slightly reorganized. I also didn’t want to change the size of the homepage slideshow images, since the PR committee already had a large stock of images they were using and I didn’t want them to have to redesign everything. In general, we are moving to a much more flexible and iterative process for changing website features and content, so nothing is ruled out for the future.
I won’t go into a lot of detail about the technical fixes I made, since this won’t be widely applicable. Views Slideshow uses a very standard Drupal module called Views to create a list of content. While it is a very popular module, I found it challenging to install correctly without a lot of help (I mainly used this site), since the settings are hard to figure out. In setting up the module, you are able to control things like whether alt text is required, the most basic type of accessibility feature, which allows users who cannot see images to understand their content through screen readers or other assistive technologies. Beyond that, you can set some things up in the templates for the modules. First I created a Drupal content type is called Featured Slideshow. It includes fields for title of the slide, image, and the link it should go to. The image has an alt and title field, which can be set automatically using tokens (text templates), or manually by the person entering data. The module uses jQuery Cycle to control which image is available. I then customized the templates (several PHP files) to include ARIA roles and to edit the controls to make them plain English rather than icons (I can think of downsides to this approach for sure, but at least it makes the point of them clear for many people).
ARIA role. This is frequently updated but non-essential page content. Its default ARIA live state is “off”, meaning unless the user is focused on it changes in state won’t be announced. You can change this to “polite” as well, which means a change in state will be announced at the next convenient opportunity. You would never want to use “assertive”, since that would interrupt the user for no reason.
Features I’m still working on are detailed in The Unbearable Inaccessibility of Slideshows, specifically keyboard focus order and improved performance with stylesheets unavailable. However with a few small changes I’ve improved accessibility of a feature on the site–and this technique can be applied to any feature on any site.
Making Small Improvements to Improve Accessibility.
While librarians who get the privilege of working on their own library’s website have the possibilities to guide the design choices, we are not always able to create exactly the ideal situation. Whether you are dealing with a carousel or any other feature that requires some work to improve accessibility, I would suggest the following strategy:
- Review what the basic requirements are for making the feature work with your platform and situation. This means both technically and politically.
- Research the approaches others have taken. You probably won’t be able to use someone else’s technique unless they are in a very similar situation, but you can at least use lessons learned.
- Create a step by step plan to ensure you’re not missing anything, as well as a list of questions to answer as you are working through the development process.
- Test the feature. You can use achecker or WAVE, which has a browser plugin to help you test sites in a local development environment.
- Review errors and fix these. If you can’t fix everything, list the problems and plan for future development, or see if you can pick a new solution.
- Test with actual users.
This may seem overwhelming, but taking it slow and only working on one feature at a time can be a good way to manage the process. And even better, you’ll improve your practices so that the next time you start a project you can do it correctly from the beginning.
When it comes to digital preservation, everyone agrees that a little bit is better than nothing. Look no further than these two excellent presentations from Code4Lib 2016, “Can’t Wait for Perfect: Implementing “Good Enough” Digital Preservation” by Shira Peltzman and Alice Sara Prael, and “Digital Preservation 101, or, How to Keep Bits for Centuries” by Julie Swierczek. I highly suggest you go check those out before reading more of this post if you are new to digital preservation, since they get into some technical details that I won’t.
The takeaway from these for me was twofold. First, digital preservation doesn’t have to be hard, but it does have to be intentional, and secondly, it does require institutional commitment. If you’re new to the world of digital preservation, understanding all the basic issues and what your options are can be daunting. I’ve been fortunate enough to lead a group at my institution that has spent the last few years working through some of these issues, and so in this post I want to give a brief overview of the work we’ve done, as well as the current landscape for digital preservation systems. This won’t be an in-depth exploration, more like a key to the map. Note that ACRL TechConnect has covered a variety of digital preservation issues before, including data management and preservation in “The Library as Research Partner” and using bash scripts to automate digital preservation workflow tasks in “Bash Scripting: automating repetitive command line tasks”.
The committee I chair started examining born digital materials, but expanded focus to all digital materials, since our digitized materials were an easier test case for a lot of our ideas. The committee spent a long time understanding the basic tenets of digital preservation–and in truth, we’re still working on this. For this process, we found working through the NDSA Levels of Digital Preservation an extremely helpful exercise–you can find a helpfully annotated version with tools by Shira Peltzman and Alice Sara Prael, as well as an additional explanation by Shira Peltman. We also relied on the Library of Congress Signal blog and the work of Brad Houston, among other resources. A few of the tasks we accomplished were to create a rough inventory of digital materials, a workflow manual, and to acquire many terabytes (currently around 8) of secure networked storage space for files to replace all removable hard drives being used for backups. While backups aren’t exactly digital preservation, we wanted to at the very least secure the backups we did have. An inventory and workflow manual may sound impressive, but I want to emphasize that these are living and somewhat messy documents. The major advantage of having these is not so much for what we do have, but for identifying gaps in our processes. Through this process, we were able to develop a lengthy (but prioritized) list of tasks that need to be completed before we’ll be satisfied with our processes. An example of this is that one of the major workflow gaps we discovered is that we have many items on obsolete digital media formats, such as floppy disks, that needs to be imaged before it can even be inventoried. We identified the tool we wanted to use for that, but time and staffing pressures have left the completion of this project in limbo. We’re now working on hiring a graduate student who can help work on this and similar projects.
The other piece of our work has been trying to understand what systems are available for digital preservation. I’ll summarize my understanding of this below, with several major caveats. This is a world that is currently undergoing a huge amount of change as many companies and people work on developing new systems or improving existing systems, so there is a lot missing from what I will say. Second, none of these solutions are necessarily mutually exclusive. Some by design require various pieces to be used together, some may not require it, but your circumstances may dictate a different solution. For instance, you may not like the access layer built into one system, and so will choose something else. The dream that you can just throw money at the problem and it will go away is, at present, still just a dream–as are so many library technology problems.
The closest to such a dream is the end-to-end system. This is something where at one end you load in a file or set of files you want to preserve (for example, a large set of donated digital photographs in TIFF format), and at the other end have a processed archival package (which might include the TIFF files, some metadata about the processing, and a way to check for bit rot in your files), as well as an access copy (for example, a smaller sized JPG appropriate for display to the public) if you so desire–not all digital files should be available to the public, but still need to be preserved.
Examples of such systems include Preservica, ArchivesDirect, and Rosetta. All of these are hosted vended products, but ArchivesDirect is based on open source Archivematica so it is possible to get some idea of the experience of using it if you are able to install the tools on which it based. The issues with end-t0-end systems are similar to any other choice you make in library systems. First, they come at a high price–Preservica and ArchivesDirect are open about their pricing, and for a plan that will meet the needs of medium-sized libraries you will be looking at $10,000-$14,000 annual cost. You are pretty much stuck with the options offered in the product, though you still have many decisions to make within that framework. Migrating from one system to another if you change your mind may involve some very difficult processes, and so inertia dictates that you will be using that system for the long haul, which a short trial period or demos may not be enough to really tell you that it’s a good idea. But you do have the potential for more simplicity and therefore a stronger likelihood that you will actually use them, as well as being much more manageable for smaller staffs that lack dedicated positions for digital preservation work–or even room in the current positions for digital preservation work. A hosted product is ideal if you don’t have the staff or servers to install anything yourself, and helps you get your long-term archival files onto Amazon Glacier. Amazon Glacier is, by the way, where pretty much all the services we’re discussing store everything you are submitting for long-term storage. It’s dirt cheap to store on Amazon Glacier and if you can restore slowly, not too expensive to restore–only expensive if you need to restore a lot quickly. But using it is somewhat technically challenging since you only interact with it through APIs–there’s no way to log in and upload files or download files as with a cloud storage service like Dropbox. For that reason, when you’re paying a service hundreds of dollars a terabyte that ultimately stores all your material on Amazon Glacier which costs pennies per gigabye, you’re paying for the technical infrastructure to get your stuff on and off of there as much as anything else. In another way you’re paying an insurance policy for accessing materials in a catastrophic situation where you do need to recover all your files–theoretically, you don’t have to pay extra for such a situation.
A related option to an end-to-end system that has some attractive features is to join a preservation network. Examples of these include Digital Preservation Network (DPN) or APTrust. In this model, you pay an annual membership fee (right now $20,000 annually, though this could change soon) to join the consortium. This gives you access to a network of preservation nodes (either Amazon Glacier or nodes at other institutions), access to tools, and a right (and requirement) to participate in the governance of the network. Another larger preservation goal of such networks is to ensure long-term access to material even if the owning institution disappears. Of course, $20,000 plus travel to meetings and work time to participate in governance may be out of reach of many, but it appears that both DPN and APTrust are investigating new pricing models that may meet the needs of smaller institutions who would like to participate but can’t contribute as much in money or time. This a world that I would recommend watching closely.
Up until recently, the way that many institutions were achieving digital preservation was through some kind of repository that they created themselves, either with open source repository software such as Fedora Repository or DSpace or some other type of DIY system. With open source Archivematica, and a few other tools, you can build your own end-to-end system that will allow you to process files, store the files and preservation metadata, and provide access as is appropriate for the collection. This is theoretically a great plan. You can make all the choices yourself about your workflows, storage, and access layer. You can do as much or as little as you need to do. But in practice for most of us, this just isn’t going to happen without a strong institutional commitment of staff and servers to maintain this long term, at possibly a higher cost than any of the other solutions. That realization is one of the driving forces behind Hydra-in-a-Box, which is an exciting initiative that is currently in development. The idea is to make it possible for many different sizes of institutions to take advantage of the robust feature sets for preservation in Fedora and workflow management/access in Hydra, but without the overhead of installing and maintaining them. You can follow the project on Twitter and by joining the mailing list.
After going through all this, I am reminded of one of my favorite slides from Julie Swierczek’s Code4Lib presentation. She works through the Open Archival Initiative System model graph to explain it in depth, and comes to a point in the workflow that calls for “Sustainable Financing”, and then zooms in on this. For many, this is the crux of the digital preservation problem. It’s possible to do a sort of ok job with digital preservation for nothing or very cheap, but to ensure long term preservation requires institutional commitment for the long haul, just as any library collection requires. Given how much attention digital preservation is starting to receive, we can hope that more libraries will see this as a priority and start to participate. This may lead to even more options, tools, and knowledge, but it will still require making it a priority and putting in the work.
After much hard work over years by the Drupal community, Drupal users rejoiced when Drupal 8 came out late last year. The system has been completely rewritten and does a lot of great stuff–but can it do what we need Drupal websites to do for libraries? The quick answer seems to be that it’s not quite ready, but depending on your needs it might be worth a look.
For those who aren’t familiar with Drupal, it’s a content management system designed to manage complex sites with multiple types of content, users, features, and appearances. Certain “core” features are available to everyone out of the box, but even more useful are the “modules”, which extend the features to do all kinds of things from the mundane but essential backup of a site to a flashy carousel slider. However, the modules are created by individuals or companies and contributed back to the community, and thus when Drupal makes a major version change they need to be rewritten, quite drastically in the case of Drupal 8. That means that right now we are in a period where developers may or may not be redoing their modules, or they may be rethinking about how a certain task should be done in the future. Because most of these developers are doing this work as volunteers, it’s not reasonable to expect that they will complete the work on your timeline. The expectation is that if a feature is really important to you, then you’ll work on development to make it happen. That is, of course, easier said than done for people who barely have enough time to do the basic web development asked of them, much less complex programming or learning a new system top to bottom, so most of us are stuck waiting or figuring out our own solutions.
Despite my knowledge of the reality of how Drupal works, I was very excited at the prospect of getting into Drupal 8 and learning all the new features. I installed it right away and started poking around, but realized pretty quickly I was going to have to do a complete evaluation for whether it was actually practical to use it for my library’s website. Our website has been on Drupal 7 since 2012, and works pretty well, though it does need a new theme to bring it into line with 2016 design and accessibility standards. Ideally, however, we could be doing even more with the site, such as providing better discovery for our digital special collections and making the site information more semantic web friendly. It was those latter, more advanced, feature desires that made me really wish to use Drupal 8, which includes semantic HTML5 integration and schema.org markup, as well as better integration with other tools and libraries. But the question remains–would it really be practical to work on migrating the site immediately, or would it make more sense to spend some development time on improving the Drupal 7 site to make it work for the next year or so while working on Drupal 8 development more slowly?
A bit of research online will tell you that there’s no right answer, but that the first thing to do in an evaluation is determine whether any the modules on which your site depends are available for Drupal 8, and if not, whether there is a good alternative. I must add that while all the functions I am going to mention can be done manually or through custom code, a lot of that work would take more time to write and maintain than I expect to have going forward. In fact, we’ve been working to move more of our customized code to modules already, since that makes it possible to distribute some of the workload to others outside of the very few people at our library who write code or even know HTML well, not to mention taking advantage of all the great expertise of the Drupal community.
I tried two different methods for the evaluation. First, I created a spreadsheet with all the modules we actually use in Drupal 7, their versions, and the current status of those modules in Drupal 8 or if I found a reasonable substitute. Next, I tried a site that automates that process, d8upgrade.org. Basically you fill in your website URL and email, and wait a day for your report, which is very straightforward with a list of modules found for your site, whether there is a stable release, an alpha or beta release, or no Drupal 8 release found yet. This is a useful timesaver, but will need some manual work to complete and isn’t always completely up to date.
My manual analysis determined that there were 30 modules on which we depend to a greater or lesser extent. Of those, 10 either moved into Drupal core (so would automatically be included) or the functions on which used them moved into another piece of core. 5 had versions available in Drupal 8, with varying levels of release (i.e. several in stable alpha release, so questionable to use for production sites but probably fine), and 5 were not migrated but it was possible to identify substitute Drupal 8 modules. That’s pretty good– 18 modules were available in Drupal 8, and in several cases one module could do the job that two or more had done in Drupal 7. Of the additional 11 modules that weren’t migrated and didn’t have an easy substitution, three of them are critical to maintaining our current site workflows. I’ll talk about those in more detail below.
d8upgrade.org found 21 modules in use, though I didn’t include all of them on my own spreadsheet if I didn’t intend to keep using them in the future. I’ve included a screenshot of the report, and there are a few things to note. This list does not have all the modules I had on my list, since some of those are used purely behind the scenes for administrative purposes and would have no indication of use without administrative access. The very last item on the list is Core, which of course isn’t going to be upgraded to Drupal 8–it is Drupal 8. I also found that it’s not completely up to date. For instance, my own analysis found a pre-release version of Workbench Moderation, but that information had not made it to this site yet. A quick email to them fixed it almost immediately, however, so this screenshot is out of date.
I decided that there were three dealbreaker modules for the upgrade, and I want to talk about why we rely on them, since I think my reasoning will be applicable to many libraries with limited web development time. I will also give honorable mention to a module that we are not currently using, but I know a lot of libraries rely on and that I would potentially like to use in the future.
Webform is a module that creates a very simple to use interface for creating webforms and doing all kinds of things with them beyond just simply sending emails. We have many, many custom PHP/MySQL forms throughout our website and intranet, but there are only two people on the staff who can edit those or download the submitted entries from them. They also occasionally have dreadful spam problems. We’ve been slowly working on migrating these custom forms to the Drupal Webform module, since that allows much more distribution of effort across the staff, and provides easier ways to stop spam using, for instance, the Honeypot module or Mollom. (We’ve found that the Honeypot module stopped nearly all our spam problems and didn’t need to move to Mollom, since we don’t have user comments to moderate). The thought of going back to coding all those webforms myself is not appealing, so for now I can’t move forward until I come up with a Drupal solution.
Redirect does a seemingly tiny job that’s extremely helpful. It allows you to create redirects for URLs on your site, which is incredibly helpful for all kinds of reasons. For instance, if you want to create a library site branded link that forwards somewhere else like a database vendor or another page on your university site, or if you want to change a page URL but ensure people with bookmarks to the old page will still find it. This is, of course, something that you can do on your web server, assuming you have access to it, but this module takes a lot of the administrative overhead away and helps keep things organized.
Backup and Migrate is my greatest helper in my goal to be someone who would like to at least be in the neighborhood of best practices for web development when web development is only half my job, or some weeks more like a quarter of my job. It makes a very quick process of keeping my development, staging, and production sites in sync, and since I created a workflow using this module I have been far more successful in keeping my development processes sane. It provides an interface for creating a backup of your site database, files directories, or your database and files that you can use in the Backup and Migrate module to completely restore a site. I use it at least every two weeks, or more often when working on a particular feature to move the database between servers (I don’t move the files with the module for this process, but that’s useful for backups that are for emergency restoration of the site). There are other ways to accomplish this work, but this particular workflow has been so helpful that I hate to dump a lot of time into redoing it just now.
One last honorable mention goes to Workbench, which we don’t use but I know a lot of libraries do use. This allows you to create a much more friendly interface for content editors so they don’t have to deal with the administrative backend of Drupal and allows them to just see their own content. We do use Workbench Moderation, which does have a Drupal 8 release, and allows a moderation queue for the six or so members of staff who can create or edit content but don’t have administrative rights to have their content checked by an administrator. None of them particularly like the standard Drupal content creation interface, and it’s not something that we would ever ask the rest of the staff to use. We know from the lack of use of our intranet, which also is on Drupal, that no one particularly cares for editing content there. So if we wanted to expand access to website editing, which we’ve talked about a lot, this would be a key module for us to use.
Given the current status of these modules with rewrites in progress, it seems likely that by the end of the year it may be possible to migrate to Drupal 8 with our current setup, or in playing around with Drupal 8 on a development site that we determine a different way to approach these needs. If you have the interest and time to do this, there are worse ways to pass the time. If you are creating a completely new Drupal site and don’t have a time crunch, starting in Drupal 8 now is probably the way to go, since by the time the site would be ready you may have additional modules available and get to take advantage of all the new features. If this is something you’re trying to roll out by the end of the semester, maybe wait on it.
Have you considered upgrading your library’s site to Drupal 8? Have you been successful? Let us know in the comments.
Anyone who has worked on an institutional repository for even a short time knows that collecting faculty scholarship is not a straightforward process, no matter how nice your workflow looks on paper or how dedicated you are. Keeping expectations for the process manageable (not necessarily low, as in my clickbaity title) and constant simplification and automation can make your process more manageable, however, and therefore work better. I’ve written before about some ways in which I’ve automated my process for faculty collection development, as well as how I’ve used lightweight project management tools to streamline processes. My newest technique for faculty scholarship collection development brings together pieces of all those to greatly improve our productivity.
Allocating Your Human and Machine Resources
First, here is the personnel situation we have for the institutional repository I manage. Your own circumstances will certainly vary, but I think institutions of all sizes will have some version of this distribution. I manage our repository as approximately half my position, and I have one graduate student assistant who works about 10-15 hours a week. From week to week we only average about 30-40 hours total to devote to all aspects of the repository, of which faculty collection development is only a part. We have 12 librarians who are liaisons with departments and do the majority of the outreach to faculty and promotion of the repository, but a limited amount of the collection development except for specific parts of the process. While they are certainly welcome to do more, in reality, they have so much else to do that it doesn’t make sense for them to spend their time on data entry unless they want to (and some of them do). The breakdown of work is roughly that the liaisons promote the repository to the faculty and answer basic questions; I answer more complex questions, develop procedures, train staff, make interpretations of publishing agreements, and verify metadata; and my GA does the simple research and data entry. From time to time we have additional graduate or undergraduate student help in the form of faculty research assistants, and we have a group of students available for digitization if needed.
Those are our human resources. The tools that we use for the day-to-day work include Digital Measures (our faculty activity system), Excel, OpenRefine, Box, and Asana. I’ll say a bit about what each of these are and how we use them below. By far the most important innovation for our faculty collection development workflow has been integration with the Faculty Activity System, which is how we refer to Digital Measures on our campus. Many colleges and universities have some type of faculty activity system or are in the process of implementing one. These generally are adopted for purposes of annual reports, retention, promotion, and tenure reviews. I have been at two different universities working on adopting such systems, and as you might imagine, it’s a slow process with varying levels of participation across departments. Faculty do not always like these systems for a variety of reasons, and so there may be hesitation to complete profiles even when required. Nevertheless, we felt in the library that this was a great source of faculty publication information that we could use for collection development for the repository and the collection in general.
We now have a required question about including the item in the repository on every item the faculty member enters in the Faculty Activity System. If a faculty member is saying they published an article, they also have to say whether it should be included in the repository. We started this in late 2014, and it revolutionized our ability to reach faculty and departments who never had participated in the repository before, as well as simplify the lives of faculty who were eager participants but now only had to enter data in one place. Of course, there are still a number of people whom we are missing, but this is part of keeping your expectation low–if you can’t reach everyone, focus your efforts on the people you can. And anyway, we are now so swamped with submissions that we can’t keep up with them, which is a good if unusual problem to have in this realm. Note that the process I describe below is basically the same as when we analyze a faculty member’s CV (which I described in my OpenRefine post), but we spend relatively little time doing that these days since it’s easier for most people to just enter their material in Digital Measures and select that they want to include it in the repository.
The ease of integration between your own institution’s faculty activity system (assuming it exists) and your repository certainly will vary, but in most cases it should be possible for the library to get access to the data. It’s a great selling point for the faculty to participate in the system for your Office of Institutional Research or similar office who administers it, since it gives faculty a reason to keep it up to date when they may be in between review cycles. If your institution does not yet have such a system, you might still discuss a partnership with that office, since your repository may hold extremely useful information for them about research activity of which they are not aware.
We get reports from the Faculty Activity System on roughly a quarterly basis. Faculty member data entry tends to bunch around certain dates, so we focus on end of semesters as the times to get the reports. The reports come by email as Excel files with information about the person, their department, contact information, and the like, as well as information about each publication. We do some initial processing in Excel to clean them up, remove duplicates from prior reports, and remove irrelevant information. It is amazing how many people see a field like “Journal Title” as a chance to ask a question rather than provide information. We focus our efforts on items that have actually been published, since the vast majority of people have no interest in posting pre-prints and those that do prefer to post them in arXiv or similar. The few people who do know about pre-prints and don’t have a subject archive generally submit their items directly. This is another way to lower expectations of what can be done through the process. I’ve already described how I use OpenRefine for creating reports from faculty CVs using the SHERPA/RoMEO API, and we follow a similar but much simplified process since we already have the data in the correct columns. Of course, following this process doesn’t tell us what we can do with every item. The journal title may be entered incorrectly so the API call didn’t pick it up, or the journal may not be in SHERPA/RoMEO. My graduate student assistant fills in what he is able to determine, and I work on the complex cases. As we are doing this, the Excel spreadsheet is saved in Box so we have the change history tracked and can easily add collaborators.
At this point, we are ready to move to Asana, which is a lightweight project management tool ideal for several people working on a group of related projects. Asana is far more fun and easy to work with than Excel spreadsheets, and this helps us work together better to manage workload and see where we are with all our on-going projects. For each report (or faculty member CV), we create a new project in Asana with several sections. While it doesn’t always happen in practice, in theory each citation is a task that moves between sections as it is completed, and finally checked off when it is either posted or moved off into some other fate not as glamorous as being archived as open access full text. The sections generally cover posting the publisher’s PDF, contacting publishers, reminders for followup, posting author’s manuscripts, or posting to SelectedWorks, which is our faculty profile service that is related to our repository but mainly holds citations rather than full text. Again, as part of the low expectations, we focus on posting final PDFs of articles or book chapters. We add books to a faculty book list, and don’t even attempt to include full text for these unless someone wants to make special arrangements with their publisher–this is rare, but again the people who really care make it happen. If we already know that the author’s manuscript is permitted, we don’t add these to Asana, but keep them in the spreadsheet until we are ready for them.
We contact publishers in batches, trying to group citations by journal and publisher to increase efficiency so we can send one letter to cover many articles or chapters. We note to follow up with a reminder in one month, and then again in a month after that. Usually the second notice is enough to catch the attention of the publisher. As they respond, we move the citation to either posting publisher’s PDF section or to author’s manuscript section, or if it’s not permitted at all to the post to SelectedWorks section. While we’ve tried several different procedures, I’ve determined it’s best for the liaison librarians to ask just for author’s accepted manuscripts for items after we’ve verified that no other version may be posted. And if we don’t ever get them, we don’t worry about it too much.
I hope you’ve gotten some ideas from this post about your own procedures and new tools you might try. Even more, I hope you’ll think about which pieces of your procedures are really working for you, and discard those that aren’t working any more. Your own situation will dictate which those are, but let’s all stop beating ourselves up about not achieving perfection. Make sure to let your repository stakeholders know what works and what doesn’t, and if something that isn’t working is still important, work collaboratively to figure out a way around that obstacle. That type of collaboration is what led to our partnership with the Office of Institutional Research to use the Digital Measures platform for our collection development, and that in turn has led to other collaborative opportunities.
A few of us at Tech Connect participated in the #1Lib1Ref campaign that’s running from January 15th to the 23rd . What’s #1Lib1Ref? It’s a campaign to encourage librarians to get involved with improving Wikipedia, specifically by citation chasing (one of my favorite pastimes!). From the project’s description:
Imagine a World where Every Librarian Added One More Reference to Wikipedia.
Wikipedia is a first stop for researchers: let’s make it better! Your goal today is to add one reference to Wikipedia! Any citation to a reliable source is a benefit to Wikipedia readers worldwide. When you add the reference to the article, make sure to include the hashtag #1Lib1Ref in the edit summary so that we can track participation.
Below, we each describe our experiences editing Wikipedia. Did you participate in #1Lib1Ref, too? Let us know in the comments or join the conversation on Twitter!
I recorded a short screencast of me adding a citation to the Darbhanga article.
— Eric Phetteplace
I used the Citation Hunt tool to find an article that needed a citation. I selected the second one I found, which was about urinary tract infections in space missions. That is very much up my alley. I discovered after a quick Google search that the paragraph in question was plagiarized from a book on Google Books! After a hunt through the Wikipedia policy on quotations, I decided to rewrite the paragraph to paraphrase the quote, and then added my citation. As is usual with plagiarism, the flow was wrong, since there was a reference to a theme in the previous paragraph of the book that wasn’t present in the Wikipedia article, so I chose to remove that entirely. The Wikipedia Citation Tool for Google Books was very helpful in automatically generating an acceptable citation for the appropriate page. Here’s my shiny new paragraph, complete with citation: https://en.wikipedia.org/wiki/
— Margaret Heller
I edited the “Library Facilities” section of the “University of Maryland Baltimore” article in Wikipedia. There was an outdated link in the existing citation, and I also wanted to add two additional sentences and citations. You can see how I went about doing this in my screen recording below. I used the “edit source” option to get the source first in the Text Editor and then made all the changes I wanted in advance. After that, I copy/pasted the changes I wanted from my text file to the Wikipedia page I was editing. Then, I previewed and saved the page. You can see that I also had a typo in my text and had to fix that again to make the citation display correctly. So I had to edit the article more than once. After my recording, I noticed another typo in there, which I fixed it using the “edit” option. The “edit” option is much easier to use than the “edit source” option for those who are not familiar with editing Wiki pages. It offers a menu bar on the top with several convenient options.
The recording of editing a Wikipedia article:
— Bohyun Kim
It has been so long since I’ve edited anything on Wikipedia that I had to make a new account and read the “how to add a reference” link; which is to say, if I could do it in 30 minutes while on vacation, anyone can. There is a WYSIWYG option for the editing interface, but I learned to do all this in plain text and it’s still the easiest way for me to edit. See the screenshot below for a view of the HTML editor.
I wondered what entry I would find to add a citation to…there have been so many that I’d come across but now I was drawing a total blank. Happily, the 1Lib1Ref campaign gave some suggestions, including “Provinces of Afghanistan.” Since this is my fatherland, I thought it would be a good service to dive into. Many of Afghanistan’s citations are hard to provide for a multitude of reasons. A lot of our history has been an oral tradition. Also, not insignificantly, Afghanistan has been in conflict for a very long time, with much of its history captured from the lens of Great Game participants like England or Russia. Primary sources from the 20th century are difficult to come by because of the state of war from 1979 onwards and there are not many digitization efforts underway to capture what there is available (shout out to NYU and the Afghanistan Digital Library project).
Once I found a source that I thought would be an appropriate reference for a statement on the topography of Uruzgan Province, I did need to edit the sentence to remove the numeric values that had been written since I could not find a source that quantified the area. It’s not a precise entry, to be honest, but it does give the opportunity to link to a good map with other opportunities to find additional information related to Afghanistan’s agriculture. I also wanted to chose something relatively uncontroversial, like geographical features rather than historical or person-based, for this particular campaign.
— Yasmeen Shorish