Data, data everywhere…but do we want to drink?

The role of data, digital curation, and scholarly communication in academic libraries.

Ask around and you’ll hear that data is the new bacon (or turkey bacon, in my case. Sorry, vegetarians). It’s the hot thing that everyone wants a piece of. It is another medium with which we interact and derive meaning from. It is information[1]; potentially valuable and abundant. But much like [turkey] bacon, un-moderated gorging, without balance or diversity of content, can raise blood pressure and give you a heart attack. To understand how best to interact with the data landscape, it is important to look beyond it.

What do academic libraries need to know about data? A lot, but in order to separate the signal from the noise, it is imperative to look at the entire environment. To do this, one can look to job postings as a measure of engagement. The data curation positions, research data services departments, and data management specializations focus almost exclusively on digital data. However, these positions, which are often catch-alls for many other things do not place the data management and curation activities within the larger frame of digital curation, let alone scholarly communication. Missing from job descriptions is an awareness of digital preservation or archival theory as it relates to data management or curation. In some cases, this omission could be because a fully staffed digital collections department has purview over these areas. Nonetheless, it is important to articulate the need to communicate with those stakeholders in the job description. It may be said that if the job ad discusses data curation, digital preservation should be an assumed skill, yet given the tendencies to have these positions “do-all-the-things” it is negligent not to explicitly mention it.

Digital curation is an area that has wide appeal for those working in academic and research libraries. The ACRL Digital Curation Interest Group (DCIG) has one of the largest memberships within ACRL, with 1075 members as of March 2015. The interest group was intentionally named “digital curation” rather than “data curation” because the founders (Patricia Hswe and Marisa Ramirez) understood the interconnectivity of the domains and that the work in one area, like archives, could influence the work in another, like data management. For example, the work from Digital POWRR can help inform digital collection platform decisions or workflows, including data repository concerns. This Big Tent philosophy can help frame the data conversations within libraries in a holistic, unified manner, where the various library stakeholders work collaboratively to meet the needs of the community.

The absence of a holistic approach to data can result in the propensity to separate data from the corpus of information for which librarians already provide stewardship. Academic libraries may recognize the need to provide leadership in the area of data management, but balk when asked to consider data a special collection or to ingest data into the institutional repository. While librarians should be working to help the campus community become critical users and responsible producers of data, the library institution must empower that work by recognizing this as an extension of the scholarly communication guidance currently in place. This means that academic libraries must incorporate the work of data information literacy into their existing information literacy and scholarly communication missions, else risk excluding these data librarian positions from the natural cohort of colleagues doing that work, or risk overextending the work of the library.

This overextension is most obvious in the positions that seek a librarian to do instruction in data management, reference, and outreach, and also provide expertise in all areas of data analysis, statistics, visualization, and other data manipulation. There are some academic libraries where this level of support is reasonable, given the mission, focus, and resourcing of the specific institution. However, considering the diversity of scope across academic libraries, I am skeptical that the prevalence of job ads that describe this suite of services is justified. Most “general” science librarians would scoff if a job ad asked for experience with interpreting spectra. The science librarian should know where to direct the person who needs help with reading the spectra, or finding comparative spectra, but it should not be a core competency to have expertise in that domain. Yet experience with SPSS, R, Python, statistics and statistical literacy, and/or data visualization software find their way into librarian position descriptions, some more specialized than others.

For some institutions this is not an overextension, but just an extension of the suite of specialized services offered, and that is well and good. My concern is that academic libraries, feeling the rush of an approved line for all things data, begin to think this is a normal role for a librarian. Do not mistake me, I do not write from the perspective that libraries should not evolve services or that librarians should not develop specialized areas of expertise. Rather, I raise a concern that too often these extensions are made without the strategic planning and commitment from the institution to fully support the work that this would entail.

Framing data management and curation within the construct of scholarly communication, and its intersections with information literacy, allows for the opportunity to build more of this content delivery across the organization, enfranchising all librarians in the conversation. A team approach can help with sustainability and message penetration, and moves the organization away from the single-position skill and knowledge-sink trap. Subject expertise is critical in the fast-moving realm of data management and curation, but it is an expertise that can be shared and that must be strategically supported. For example, with sufficient cross-training liaison librarians can work with their constituents to advise on meeting federal data sharing requirements, without requiring an immediate punt to the “data person” in the library (if such a person exists). In cases where there is no data point person, creating a data working group is a good approach to distribute across the organization both the knowledge and the responsibility for seeking out additional information.

Data specialization cuts across disciplinary bounds and concerns both public services and technical services. It is no easy task, but I posit that institutions must take a simultaneously expansive yet well-scoped approach to data engagement – mindful of the larger context of digital curation and scholarly communication, while limiting responsibilities to those most appropriate for a particular institution.

[1] Lest the “data-information-knowledge-wisdom” hierarchy (DIKW) torpedo the rest of this post, let me encourage readers to allow for an expansive definition of data. One that allows for the discrete bits of data that have no meaning without context, such as a series of numbers in a .csv file, and the data that is described and organized, such as those exact same numbers in a .csv file, but with column and row descriptors and perhaps an associated data dictionary file. Undoubtedly, the second .csv file is more useful and could be classified as information, but most people will continue to call it data.

Yasmeen Shorish is assistant professor and Physical & Life Sciences librarian at James Madison University. She is a past-convener for the ACRL Digital Curation Interest Group and her research focus is in the areas of data information literacy and scholarly communication.

A Brief Trip into Technology Planning, Brought to You By Meebo

The Day That Meebo Died

Today is the day that many librarians running reference services dreaded – Meebo discontinuing most of their products (with the exception of the Meebo Bar). Even though Meebo (or parts of it) will still live on in various Google products, that still doesn’t help those libraries who have build services and applications around a product that has been around for a while (Meebo was established in 2005).

If Meebo was any indication, even established, long running technology services can go away without much advanced notice. What is a library to do with incorporating third party applications, then? There is no way to ensure that all the services and applications that you use at your library will still be in existence for any length of time. Change is about the only constant in technology and it is up to us who deal with technology to plan for that change.

How to avoid backing your library into a corner with no escape route in sight

The worst has happened – the application you’re using is no longer being supported. Or, in a more positive light,  there’s a new alternative out there that performs better than the application your library is currently using at the moment.  The scenarios above have different priorities; migration due to discontinuation of support will probably happen on a faster timeline than upgrading to a better application. Overall, you should be prepared to survive without your current 3rd party applications with minimal amount of content loss and service disruption. For this post I’ll be focusing on third party application support and availability. Disruptions due to natural disasters, like fire, flooding, or, in Grinnell’s case, tornadoes, is equally important, but will not be covered at length in this post.

Competition (or lack there of)

When news broke that Google purchased Meebo, most weren’t sure about what would be next for the chat service. Soon afterwards, Meebo gave a month’s notice about the discontinuation of most of their products. Fortunately, alternative chat services were plentiful. Our library, for example, subscribes to LibraryH3lp, but we were using Meebo Messenger as well as the MeeboMe widget for some course pages to supplement LibraryH3lp’s services. After the announcement, our library quickly switched the messenger with Pidgin, and are working on replacing the Meebo widgets with LibraryH3lp’s widgets.

Having a diverse, healthy pool of different applications to choose from for a particular service is a good place to be when the application you use is no longer supported. Migrations are never fun, but consider the alternative. If you’re using a service or application that does not have readily available alternatives, how will your services be affected when that application is no longer supported?

The last question wasn’t rhetorical. If your answer is looking at a major service disruption, especially to services that are deemed by your library as mission-critical, then you’re putting yourself and the library in a precarious position. The same goes if the alternatives out there require a different technical skill set from your library staff. Applications that require a more advanced technical skill set will require more training and run the heightened risk of staff rejection if the required skill level is set too high.

Data wants to be backed up

Where’s your data right now? Can you export it out of the application? Do you even know if you can export your data or not? If not, then you’re setting yourself up for a preventable emergency. Exporting functionality and backups are especially important for services that are living outside of your direct control, like a hosted service. While most hosted services have backup servers to prevent loss of customer data, you should still have the ability to export your data and store it outside of the application. It’s best practice and gives you the peace of mind that you do not have to recreate years’ worth of work to restore data lost due to vendor error or lack of export functionality.

Another product that is widely used by academic libraries, LibGuides, provides a backup feature where you can export your guides in XML or individual guides in HTML. It will take some work for formatting and posting the data if needed, but the important thing is that you have your data and you can either host it locally in case of emergencies or harvest the content when the time comes to move on to another application.

Some technology service audit questions

Here are some general questions to start you down the path of evaluating where your library currently stands with third party applications you rely on for providing specific library services. Don’t worry if you find yourself not as prepared as you want to be. It’s better to start now than when you learn that another application you use will be shutting down.

  • What third party applications does your library currently use to provide library services?
  • Are there other comparable services/applications available?
    • What training resources are available for alternative applications?
    • What technical skills do these applications require? Are they compatible with the technical skills found with the majority of library staff?
  • Which applications are used for mission-critical library services?
  • Can you export your data and/or settings from the application?
    • If so, how often is the data being exported?
    • Where is the backup file stored? Locally? Remotely?
  • What is the plan if the application…
    • …is no longer supported?
    • …goes offline due to a service disruption?
      • …for a couple of hours?
      • …longer than a day?
      • …during finals week/first week of the semester/midterms (high pressure/high stakes times for library users)?

While there are many potential landmines when using third party applications for library services, these applications overall help expand and provide user services in various ways. Instead of becoming a technological recluse and shunning outside applications, use these applications wisely and make sure that your library has a plan in place.