Data Refuge and the Role of Libraries

Society is always changing. For some, the change can seem slow and frustrating, while others may feel as though the change occurred in a blink of an eye. What is this change that I speak of? It can be anything…civil rights, autonomous cars, or national leaders. One change that no one ever seems particularly prepared for, however, is when a website link becomes broken. One day, you could click a link and get to a site and the next day you get a 404 error. Sometimes this occurs because a site was migrated to a new server and the link was not redirected. Sometimes this occurs because the owner ceased to maintain the site. And sometimes, this occurs for less benign reasons.

Information access via the Internet is an activity that many (but not all) of us do everyday, in sometimes unconscious fashion: checking the weather, reading email, receiving news alerts. We also use the Internet to make datasets and other sources of information widely available. Individuals, universities, corporations, and governments share data and information in this way. In the Obama administration, the Open Government Initiative led to the development of Project Open Data and data.gov. Federal agencies started looking at ways to make information sharing easier, especially in areas where the data are unique.

One area of unique data is in climate science. Since climate data is captured on a specific day, time, and under certain conditions, it can never be truly reproduced. It will never be January XX, 2017 again. With these constraints, climate data can be thought of as fragile. The copies that we have are the only records that we have. Much of our nation’s climate data has been captured by research groups at institutes, universities, and government labs and agencies. During the election, much of the rhetoric from Donald Trump was rooted in the belief that climate change is a hoax. Upon his election, Trump tapped Scott Pruitt, who has fought much of the EPA’s attempts to regulate pollution, to lead the EPA. This, along with other messages from the new administration, has raised alarms within the scientific community that the United States may repeat the actions of the Harper administration in Canada, which literally threw away thousands of items from federal libraries that were deemed outside scope, through a process that was criticized as not transparent.

In an effort to safeguard and preserve this data, the Penn Program of Environmental Humanities (PPEH) helped organize a collaborative project called Data Refuge. This project requires the expertise of scientists, librarians, archivists, and programmers to organize, document, and back-up data that is distributed across federal agencies’ websites. Maintaining the integrity of the data, while ensuring the re-usability of it, are paramount concerns and areas where librarians and archivists must work hand in glove with the programmers (sometimes one and the same) who are writing the code to pull, duplicate, and push content. Wired magazine recently covered one of the Data Refuge events and detailed the way that the group worked together, while much of the process is driven by individual actions.

In order to capture as much of this data as possible, the Data Refuge project relies on groups of people organizing around this topic across the country. The PPEH site details the requirements to host a successful DataRescue event and has a Toolkit to help promote and document the event. There is also a survey that you can use to nominate climate or environmental data to be part of the Data Refuge. Not in a position to organize an event? Don’t like people? You can also work on your own! An interesting observation from the work on your own page is the option to nominate any “downloadable data that is vulnerable and valuable.” This means that Internet Archive and the End of Term Harvest Team (a project to preserve government websites from the Obama administration) is interested in any data that you have reason to believe may be in jeopardy under the current administration.

A quick note about politics. Politics are messy and it can seem odd that people are organizing in this way, when administrations change every four or eight years and, when there is a party change in the presidency, it is almost a certainty that there will be major departures in policy and prioritizations from administration to administration. What is important to recognize is that our data holdings are increasingly solely digital, and therefore fragile. The positions on issues like climate, environment, civil rights, and many, many others are so diametrically opposite from the Obama to Trump offices, that we – the public – have no assurances that the data will be retained or made widely available for sharing. This administration speaks of “alternative facts” and “disagree[ing] with the facts” and this makes people charged with preserving facts wary.

Many questions about the sustainability and longevity of the project remain. Will End of Term or Data Refuge be able to/need to expand the scope of these DataRescue efforts? How much resourcing can people donate to these events? What is the role of institutions in these efforts? This is a fantastic way for libraries to build partnerships with entities across campus and across a community, but some may view the political nature of these actions as incongruous with the library mission.

I would argue that policies and political actions are not inert abstractions. There is a difference between promoting a political party and calling attention to policies that are in conflict with human rights and freedom to information. Loathe as I am to make this comparison, would anyone truly claim that burning books is protected political speech, and that opposing such burning is “playing politics?” Yet, these were the actions of a political party – in living memory – hosted at university towns across Germany. Considering the initial attempt to silence the USDA and the temporary freeze on the EPA, libraries should strongly support the efforts of PPEH, Data Refuge, End of Term, and concerned citizens across the country.

 


Thoughts from NDLC16

I recently had the pleasure of going to, and presenting at, the National Diversity in Libraries Conference (NDLC) at UCLA. NDLC is an irregular conference that last occurred in 2010. This year’s organizers had hoped to have around 250 registrants and they greatly exceeded those numbers. As a result, there were multiple session, with as many as seven concurrent sessions in a single time slot. I recommend perusing the program for a sense of the conference and full descriptions of sessions and posters. Moreover, there was a lively Twitter stream capturing sessions and continuing conversations at #NDLC16. As I could not attend all of the sessions, I will highlight some key concepts in this post that I think are ideal to carry forward in the work that we do in libraries, supporting people and systems.

Equity (and access, inclusion, and diversity)

Yes, the conference is about “diversity in libraries” but what became apparent throughout many sessions, is that we know we have “diversity.” Not a lot; shockingly little by some measures, but there is diversity. Diversity of life experience, sexual orientation, gender expression, ability status, and racial/ethnic identity (this last one is often very visible and very, very poorly represented). If our goals are to increase diversity, that is great. But if we try to do that work without acknowledging that it takes equity and inclusion to actually make an impact, it is a waste of time.

Equity and inclusion are difficult to achieve when the greater system that we work within has been constructed upon bias and privilege. This is true whether the system is our national government, our institution, or the individual library where we work. You cannot talk about human interactions in a vacuum, and many of the sessions reflected this with themes like “Academic libraries and social justice on campus” and “Cultural aspects and perspectives in health sciences library services.”

Dismantling Structures

An important point is in recognizing the oppressive structures that we operate within and then determining the best course of action to dismantle those and in so doing, remove or reduce the barriers to participation. The difficulty in this lies in the many, many ways that those structures disenfranchise communities. It can be through omission: not collecting in areas that represent the totality of American society, for example. Do collections and archives reflect the communities present? The Native American, Latinx, Asian-American, African-American experience? What of the LGBTQA community?

Or it can be through our actual information seeking systems: subjective subject headings, that belie the dominant narrative. Words are there to describe items, but they are “othering,” (set in contrast to or aside to a presumed norm, e.g. Wikipedia editors’ one-time attempt to separate American novelists based on gender[1]) or strip away intersectionality; or perhaps use euphemisms for uncomfortable truths, such as framing the detention and imprisonment of Japanese-Americans during WWII as relocation camp/assembly center/ temporary detention center, rather than: internment camp/incarceration camp/American concentration camp[2]. (Incidentally, the LCSH for this event is: Japanese American — Evacuation and relocation, 1942-1945, which seems pretty euphemistic to me. What were they being evacuated from? Their safe homes, schools, and jobs?)

Or perhaps it is in our technology: How common practice is it to evaluate databases for screen reader accessibility as part of the selection process? We work with profile systems like PeopleSoft, which limit ability to gender identify. How do animations and lots of visual content on the website affect individuals with autism spectrum disorder (ASD)?

And what about our programming? Are makerspaces implicitly gendering activities in the promotional materials, e.g. using pinks and purples for sewing workshops and blues and black for Arduino workshops? What is really being said when a supervisor asks if there should be an “All Lives Matter” book display to “balance” the “Black Lives Matter” display?

These are complex issues because humans are complex animals. We form identities, often based on how others perceive us. How are the intersections of various facets of identity treated by society? By our organization? By our collections? By our metadata? To strive for a society wherein these oppressive structures are acknowledged, the history that put them in place is a known history, and that all members feel equal and respected is a herculean task. But it is one that we are fortunate enough to be engaged with by virtue of our profession, because we ascribe to ALA’s Core Values of Librarianship, or create things like the ACRL Diversity Standards, or align with the Digital Library Federation’s mission statement.

Start Local

One way to be an active participant in working towards a better future is to assess your local environment. Much of the programming at NDLC concerned recruitment and retention of diverse individuals in our profession. Panel discussions about LIS diversity initiatives and diversity fellow programs and presentations on strategic planning for diversity and inclusion and cultural competencies highlight the serious representation issues in our profession. According to the ALA demographic study of 2014, 87.1% of membership identified as white[3]. The 2015 census data from the United States reports the population as 61.6% white (non-Hispanic, not multi-racial)[4]. For a profession built so firmly on the notion of service and community engagement, this is problematic. Happily, many of the presentations at NDLC made their materials available, so we may learn from each other’s experiences[5].

Representative hiring and active retention are only a portion of what can be done locally to bring about positive change. Collection practices, metadata creation, digital project investment, special collections development, and community outreach and programming are the bread and butter of library activities; whether a public, school, or academic library. It is important that we examine our current processes, not with a “how is this diverse?” perspective, but with a “how is this anti-racist/anti-oppression?” This is an important, yet perhaps subtle, difference. Working towards an anti-oppression mentality means that there is recognition of systemic inequalities woven through the fabric of our society, which may require changing some of the core practices of librarians in order to move the needle.

An example of this could be in the way that an academic library engages with, say, a community of students who inhabit both an ethnic minority identity and are first-generation students. This library may have done a great job at collecting a diverse range of materials, that represent this population well, but if the library does not devote effort in the outreach and programming to this group, it is unlikely that the students will go into the library to discover these works themselves. Why? It’s not that they are lazy, or entitled, or willfully ignorant (as I have witnessed librarians opine).

Libraries are still intimidating places for many students, regardless of identity. Compound that generalized anxiety with first-generation student status and an ethnic minority and it should be clear that the onus is on the librarians to dismantle those barriers – be they perceived or real – and communicate that the library is the students’ library – is our library – is everyone’s library. If this sounds obvious to you (“of course the library is everyone’s library!”), then I fear that you may not have a realistic understanding of the varied and complex lived experiences of people living in America. If this sounds like a challenge worthy of time, effort, and resources, then YAY! And I really hope that you are in a position to take up that challenge.

I’m Tired Now

Another strong takeaway from NDLC is that this work is WORK. Really hard, exhausting work. Work that often falls on the shoulders of those most disenfranchised, and that requires pushing back against a bureaucratic and social machine that has been running for centuries. Writing this blog post took forever, as I have tried to find words that don’t require previous knowledge of “social justice jargon” and that – hopefully – anyone reading can find something to relate to. See, working towards an equitable and inclusive society doesn’t fall to those who lack access to societal privilege. Sure, those individuals will feel the imperative to change in their day to day experiences, but major heavy lifting must also come from those who implicitly – and often unconsciously – benefit from the constructs of the biased system.

Lasting change that doesn’t come from a burn-it-down-and-start-again revolution requires the positive participation of the most privileged. That is a lot of strength and honesty to ask of someone – to recognize that they have – in some ways – benefited from a system, and that benefit has been predicated on someone else’s oppression…that’s a heavy realization. It means recognizing that bringing equity to a system like this will most likely require some loss of benefits to the privileged group.

For example, it is possible that a collection focusing on female religious figures may purchase fewer books on Joan of Arc in order to represent Rabia Basri and Kāraikkāl Ammaiyār. That doesn’t diminish Joan of Arc’s impact on the collection, it just provides a wider lens to view the topic. Yet, some may feel a decline of Joan’s status, and that can be frightening if they’ve built an identity around that status.

This collection example is actually a decent representation of what it means to try to put some of these equity ideals into practice. Recognizing a representation imbalance and then taking action to address it may result in feelings of discomfort or fear for some, but for others it can give voice and visibility. We are lucky to be in a profession where we, as individuals/organizations/systems, can effect change and have such a positive impact on our communities, but we must be willing to recognize uncomfortable truths and believe that a just and equitable society is a future worth working for.

NDLC Again?

Rumor has it that there will be another NDLC in 2020. It was an exhilarating conference and one that I sincerely hope leaders in our profession come to and participate with, en masse. If you are looking to get more of a feel for the entire conference, several Storify’s were created: See http://ndlc.info/ for some, as well as Amelia Gibson’s on the BLM Town Hall, ARL’s, and mine.

[1] http://www.nytimes.com/2013/04/28/opinion/sunday/wikipedias-sexism-toward-female-novelists.html?_r=0

[2] http://www.discovernikkei.org/en/journal/2008/4/24/enduring-communities/

[3]http://www.ala.org/research/sites/ala.org.research/files/content/initiatives/membershipsurveys/September2014ALADemographics.pdf

[4] https://www.census.gov/quickfacts/table/PST045215/00

[5] Coming soon to http://ndlc.info/ or look through #NDLC16


Whither the workshop?

Academic libraries have long provided workshops that focus on research skills and tools to the community. Topics often include citation software or specific database search strategies. Increasingly, however, libraries are offering workshops on topics that some may consider untraditional or outside the natural home of the library. These topics include using R and other analysis packages, data visualization software, and GIS technology training, to name a few. Librarians are becoming trained as Data and Software Carpentry instructors in order to pull from their established lesson plans and become part of a larger instructional community. Librarians are also partnering with non-profit groups like Mozilla’s Science Lab to facilitate research and learning communities.

Traditional workshops have generally been conceived and executed by librarians in the library. Collaborating with outside groups like Software Carpentry (SWC) and Mozilla is a relatively new endeavor. As an example, certified trainers from SWC can come to campus and teach a topic from their course portfolio (e.g. using SQL, Python, R, Git). These workshops may or may not have a cost associated with them and are generally open to the campus community. From what I know, the library is typically the lead organizer of these events. This shouldn’t be terribly surprising. Librarians are often very aware of the research hurdles that faculty encounter, or what research skills aren’t being taught in the classroom to students (more on this later).

Librarians are helpers. If you have some biology knowledge, I find it useful to think of librarians as chaperone proteins, proteins that help other proteins get into their functional conformational shape. Librarians act in the same way, guiding and helping people to be more prepared to do effective research. We may not be altering their DNA, but we are helping them bend in new ways and take on different perspectives. When we see a skills gap, we think about how we can help. But workshops don’t just *spring* into being. They take a huge amount of planning and coordination. Librarians, on top of all the other things we do, pitch the idea to administration and other stakeholders on campus, coordinate the space, timing, refreshments, travel for the instructors (if they aren’t available in-house), registration, and advocate for the funding to pay for the event in order to make it free to the community. A recent listserv discussion regarding hosting SWC workshops resulted in consensus around a recommended minimum six week lead time. The workshops have all been hugely successful at the institutions responding on the list and there are even plans for future Library Carpentry events.

A colleague once said that everything that librarians do in instruction are things that the disciplinary faculty should be doing in the classroom anyway. That is, the research skills workshops, the use of a reference manager, searching databases, the data management best practices are all appropriately – and possibly more appropriately – taught in the classroom by the professor for the subject. While he is completely correct, that is most certainly not happening. We know this because faculty send their students to the library for help. They do this because they lack curricular time to cover any of these topics in depth and they lack professional development time to keep abreast of changes in certain research methods and technologies. And because these are all things that librarians should have expertise in. The beauty of our profession is that information is the coin of the realm for us, regardless of its form or subject. With minimal effort, we should be able to navigate information sources with precision and accuracy. This is one of the reasons why, time and again, the library is considered the intellectual center, the hub, or the heart of the university. Have an information need? We got you. Whether those information sources are in GitHub as code, spreadsheets as data, or databases as article surrogates, we should be able to chaperone our user through that process.

All of this is to the good, as far as I am concerned. Yet, I have a persistent niggle at the back of my mind that libraries are too often taking a passive posture. [Sidebar: I fully admit that this post is written from a place of feeling, of suspicions and anecdotes, and not from empirical data. Therefore, I am both uncomfortable writing it, yet unable to turn away from it.] My concern is that as libraries extend to take on these workshops because there is a need on campus for discipline-agnostic learning experiences, we (as a community) do so without really fomenting what the expectations and compensations of an academic library are, or should be. This is a natural extension of the “what types of positions should libraries provide/support?” question that seems to persist. How much of this response is based on the work of individuals volunteering to meet needs, stretching the work to fit into a job description or existing work loads, and ultimately putting user needs ahead of organizational health? I am not advocating that we ignore these needs; rather I am advocating that we integrate the support for these initiatives within the organization, that we systematize it, and that we own our expertise in it.

This brings me back to the idea of workshops and how we claim ownership of them. Are libraries providing these workshops only because no one else on campus is meeting the need? Or are we asserting our expertise in the domain of information/data shepherding and producing these workshops because the library is the best home for them, not a home by default? And if we are making this assertion, then have we positioned our people to be supported in the continual professional development that this demands? Have we set up mechanisms within the library and within the university for this work to be appropriately rewarded? The end result may be the same – say, providing workshops on R – but the motivation and framing of the service is important.

Information is our domain. We navigate its currents and ride its waves. It is ever changing and evolving, as we must be. And while we must be agile and nimble, we must also be institutionally supported and rewarded. I wonder if libraries can table the self-reflection and self-doubt regarding the appropriateness of our services (see everything ever written regarding libraries and data, digital humanities, digital scholarship, altmetrics, etc.) and instead advocate for the resourcing and recognition that our expertise warrants.


#1Lib1Ref

A few of us at Tech Connect participated in the #1Lib1Ref campaign that’s running from January 15th to the 23rd . What’s #1Lib1Ref? It’s a campaign to encourage librarians to get involved with improving Wikipedia, specifically by citation chasing (one of my favorite pastimes!). From the project’s description:

Imagine a World where Every Librarian Added One More Reference to Wikipedia.
Wikipedia is a first stop for researchers: let’s make it better! Your goal today is to add one reference to Wikipedia! Any citation to a reliable source is a benefit to Wikipedia readers worldwide. When you add the reference to the article, make sure to include the hashtag #1Lib1Ref in the edit summary so that we can track participation.

Below, we each describe our experiences editing Wikipedia. Did you participate in #1Lib1Ref, too? Let us know in the comments or join the conversation on Twitter!


 

I recorded a short screencast of me adding a citation to the Darbhanga article.

— Eric Phetteplace


 

I used the Citation Hunt tool to find an article that needed a citation. I selected the second one I found, which was about urinary tract infections in space missions. That is very much up my alley. I discovered after a quick Google search that the paragraph in question was plagiarized from a book on Google Books! After a hunt through the Wikipedia policy on quotations, I decided to rewrite the paragraph to paraphrase the quote, and then added my citation. As is usual with plagiarism, the flow was wrong, since there was a reference to a theme in the previous paragraph of the book that wasn’t present in the Wikipedia article, so I chose to remove that entirely. The Wikipedia Citation Tool for Google Books was very helpful in automatically generating an acceptable citation for the appropriate page. Here’s my shiny new paragraph, complete with citation: https://en.wikipedia.org/wiki/Astronautical_hygiene#Microbial_hazards_in_space.

— Margaret Heller


 

I edited the “Library Facilities” section of the “University of Maryland Baltimore” article in Wikipedia.  There was an outdated link in the existing citation, and I also wanted to add two additional sentences and citations. You can see how I went about doing this in my screen recording below. I used the “edit source” option to get the source first in the Text Editor and then made all the changes I wanted in advance. After that, I copy/pasted the changes I wanted from my text file to the Wikipedia page I was editing. Then, I previewed and saved the page. You can see that I also had a typo in my text  and had to fix that again to make the citation display correctly. So I had to edit the article more than once. After my recording, I noticed another typo in there, which I fixed it using the “edit” option. The “edit” option is much easier to use than the “edit source” option for those who are not familiar with editing Wiki pages. It offers a menu bar on the top with several convenient options.

wiki_edit_menu

The menu bar for the “edit” option in Wikipeda

The recording of editing a Wikipedia article:

— Bohyun Kim


 

It has been so long since I’ve edited anything on Wikipedia that I had to make a new account and read the “how to add a reference” link; which is to say, if I could do it in 30 minutes while on vacation, anyone can. There is a WYSIWYG option for the editing interface, but I learned to do all this in plain text and it’s still the easiest way for me to edit. See the screenshot below for a view of the HTML editor.

I wondered what entry I would find to add a citation to…there have been so many that I’d come across but now I was drawing a total blank. Happily, the 1Lib1Ref campaign gave some suggestions, including “Provinces of Afghanistan.” Since this is my fatherland, I thought it would be a good service to dive into. Many of Afghanistan’s citations are hard to provide for a multitude of reasons. A lot of our history has been an oral tradition. Also, not insignificantly, Afghanistan has been in conflict for a very long time, with much of its history captured from the lens of Great Game participants like England or Russia. Primary sources from the 20th century are difficult to come by because of the state of war from 1979 onwards and there are not many digitization efforts underway to capture what there is available (shout out to NYU and the Afghanistan Digital Library project).

Once I found a source that I thought would be an appropriate reference for a statement on the topography of Uruzgan Province, I did need to edit the sentence to remove the numeric values that had been written since I could not find a source that quantified the area. It’s not a precise entry, to be honest, but it does give the opportunity to link to a good map with other opportunities to find additional information related to Afghanistan’s agriculture. I also wanted to chose something relatively uncontroversial, like geographical features rather than historical or person-based, for this particular campaign.

— Yasmeen Shorish

WikiEditScreenshot

Edited area delineated by red box.


The Library as Research Partner

As I typed the title for this post, I couldn’t help but think “Well, yeah. What else would the library be?” Instead of changing the title, however, I want to actually unpack what we mean when we say “research partner,” especially in the context of research data management support. In the most traditional sense, libraries provide materials and space that support the research endeavor, whether it be in the physical form (books, special collections materials, study carrels) or the virtual (digital collections, online exhibits, electronic resources). Moreover, librarians are frequently involved in aiding researchers as they navigate those spaces and materials. This aid is often at the information seeking stage, when researchers have difficulty tracking down references, or need expert help formulating search strategies. Libraries and librarians have less often been involved at the most upstream point in the research process: the start of the experimental design or research question. As one considers the role of the Library in the scholarly life-cycle, one should consider the ways in which the Library can be a partner with other stakeholders in that life-cycle. With respect to research data management, what is the appropriate role for the Library?

In order to achieve effective research data management (RDM), planning for the life-cycle of the data should occur before any data are actually collected. In circumstances where there is a grant application requirement that triggers a call to the Library for data management plan (DMP) assistance, this may be possible. But why are researchers calling the Library? Ostensibly, it is because the Library has marketed itself (read: its people) as an expert in the domain of data management. It has most likely done this in coordination with the Research Office on campus. Even more likely, it did this because no one else was. It may have done this as a response to the National Science Foundation (NSF) DMP requirement in 2011, or it may have just started doing this because of perceived need on campus, or because it seems like the thing to do (which can lead to poorly executed hiring practices). But unlike monographic collecting or electronic resource acquisition, comprehensive RDM requires much more coordination with partners outside the Library.

Steven Van Tuyl has written about the common coordination model of the Library, the Research Office, and Central Computing with respect to RDM services. The Research Office has expertise in compliance and Central Computing can provide technical infrastructure, but he posits that there could be more effective partners in the RDM game than the Library. That perhaps the Library is only there because no one else was stepping up when DMP mandates came down. Perhaps enough time has passed, and RDM and data services have evolved enough that the Library doesn’t have to fill that void any longer. Perhaps the Library is actually the *wrong* partner in the model. If we acknowledge that communities of practice drive change, and intentional RDM is a change for many of the researchers, then wouldn’t ceding this work to the communities of practice be the most effective way to stimulate long lasting change? The Library has planted some starter seeds within departments and now the departments could go forth and carry the practice forward, right?

Well, yes. That would be ideal for many aspects of RDM. I personally would very much like to see the intentional planning for, and management of, research data more seamlessly integrated into standard experimental methodology. But I don’t think that by accomplishing that, the Library should be removed as a research partner in the data services model. I say this for two reasons:

  1. The data/information landscape is still changing. In addition to the fact that more funders are requiring DMPs, more research can benefit from using openly available (and well described – please make it understandable) data. While researchers are experts in their domain, the Library is still the expert in the information game. At its simplest, data sources are another information source. The Library has always been there to help researchers find sources; this is another facet of that aid. More holistically, the Library is increasingly positioning itself to be an advocate for effective scholarly communication at all points of the scholarship life-cycle. This is a logical move as the products of scholarship take on more diverse and “nontraditional” forms.

Some may propose that librarians who have cultivated RDM expertise can still provide data seeking services, but perhaps they should not reside in the Library. Would it not be better to have them collocated with the researchers in the college or department? Truly embedded in the local environment? I think this is a very interesting model that I have heard some large institutions may want to explore more fully. But I think my second point is a reason to explore this option with some caution:

2. Preservation and access. Libraries are the experts in the preservation and access of materials. Central Computing is a critical institutional partner in terms of infrastructure and determining institutional needs for storage, porting, computing power, and bandwidth but – in my experience – are happy to let the long-term preservation and access service fall to another entity. Libraries (and archives) have been leading the development of digital preservation best practices for some time now, with keen attention to complex objects. While not all institutions can provide repository services for research data, the Library perspective and expertise is important to have at the table. Moreover, because the Library is a discipline-agnostic entity, librarians may be able to more easily imagine diverse interest in research data than the data producer. This can increase the potential vehicles for data sharing, depending on the discipline.

Yes, RDM and data services are reaching a place of maturity in academic institutions where many Libraries are evaluating, or re-evaluating, their role as a research partner. While many researchers and departments may be taking a more proactive or interested position with RDM, it is not appropriate for Libraries to be removed from the coordinated work that is required. Libraries should assert their expertise, while recognizing the expertise of other partners, in order to determine effective outreach strategies and resource needs. Above all, Libraries must set scope for this work. Do not be deterred by the increased interest from other campus entities to join in this work. Rather, embrace that interest and determine how we all can support and strengthen the partnerships that facilitate the innovative and exciting research and scholarship at an institution.


Data, data everywhere…but do we want to drink?

The role of data, digital curation, and scholarly communication in academic libraries.

Ask around and you’ll hear that data is the new bacon (or turkey bacon, in my case. Sorry, vegetarians). It’s the hot thing that everyone wants a piece of. It is another medium with which we interact and derive meaning from. It is information[1]; potentially valuable and abundant. But much like [turkey] bacon, un-moderated gorging, without balance or diversity of content, can raise blood pressure and give you a heart attack. To understand how best to interact with the data landscape, it is important to look beyond it.

What do academic libraries need to know about data? A lot, but in order to separate the signal from the noise, it is imperative to look at the entire environment. To do this, one can look to job postings as a measure of engagement. The data curation positions, research data services departments, and data management specializations focus almost exclusively on digital data. However, these positions, which are often catch-alls for many other things do not place the data management and curation activities within the larger frame of digital curation, let alone scholarly communication. Missing from job descriptions is an awareness of digital preservation or archival theory as it relates to data management or curation. In some cases, this omission could be because a fully staffed digital collections department has purview over these areas. Nonetheless, it is important to articulate the need to communicate with those stakeholders in the job description. It may be said that if the job ad discusses data curation, digital preservation should be an assumed skill, yet given the tendencies to have these positions “do-all-the-things” it is negligent not to explicitly mention it.

Digital curation is an area that has wide appeal for those working in academic and research libraries. The ACRL Digital Curation Interest Group (DCIG) has one of the largest memberships within ACRL, with 1075 members as of March 2015. The interest group was intentionally named “digital curation” rather than “data curation” because the founders (Patricia Hswe and Marisa Ramirez) understood the interconnectivity of the domains and that the work in one area, like archives, could influence the work in another, like data management. For example, the work from Digital POWRR can help inform digital collection platform decisions or workflows, including data repository concerns. This Big Tent philosophy can help frame the data conversations within libraries in a holistic, unified manner, where the various library stakeholders work collaboratively to meet the needs of the community.

The absence of a holistic approach to data can result in the propensity to separate data from the corpus of information for which librarians already provide stewardship. Academic libraries may recognize the need to provide leadership in the area of data management, but balk when asked to consider data a special collection or to ingest data into the institutional repository. While librarians should be working to help the campus community become critical users and responsible producers of data, the library institution must empower that work by recognizing this as an extension of the scholarly communication guidance currently in place. This means that academic libraries must incorporate the work of data information literacy into their existing information literacy and scholarly communication missions, else risk excluding these data librarian positions from the natural cohort of colleagues doing that work, or risk overextending the work of the library.

This overextension is most obvious in the positions that seek a librarian to do instruction in data management, reference, and outreach, and also provide expertise in all areas of data analysis, statistics, visualization, and other data manipulation. There are some academic libraries where this level of support is reasonable, given the mission, focus, and resourcing of the specific institution. However, considering the diversity of scope across academic libraries, I am skeptical that the prevalence of job ads that describe this suite of services is justified. Most “general” science librarians would scoff if a job ad asked for experience with interpreting spectra. The science librarian should know where to direct the person who needs help with reading the spectra, or finding comparative spectra, but it should not be a core competency to have expertise in that domain. Yet experience with SPSS, R, Python, statistics and statistical literacy, and/or data visualization software find their way into librarian position descriptions, some more specialized than others.

For some institutions this is not an overextension, but just an extension of the suite of specialized services offered, and that is well and good. My concern is that academic libraries, feeling the rush of an approved line for all things data, begin to think this is a normal role for a librarian. Do not mistake me, I do not write from the perspective that libraries should not evolve services or that librarians should not develop specialized areas of expertise. Rather, I raise a concern that too often these extensions are made without the strategic planning and commitment from the institution to fully support the work that this would entail.

Framing data management and curation within the construct of scholarly communication, and its intersections with information literacy, allows for the opportunity to build more of this content delivery across the organization, enfranchising all librarians in the conversation. A team approach can help with sustainability and message penetration, and moves the organization away from the single-position skill and knowledge-sink trap. Subject expertise is critical in the fast-moving realm of data management and curation, but it is an expertise that can be shared and that must be strategically supported. For example, with sufficient cross-training liaison librarians can work with their constituents to advise on meeting federal data sharing requirements, without requiring an immediate punt to the “data person” in the library (if such a person exists). In cases where there is no data point person, creating a data working group is a good approach to distribute across the organization both the knowledge and the responsibility for seeking out additional information.

Data specialization cuts across disciplinary bounds and concerns both public services and technical services. It is no easy task, but I posit that institutions must take a simultaneously expansive yet well-scoped approach to data engagement – mindful of the larger context of digital curation and scholarly communication, while limiting responsibilities to those most appropriate for a particular institution.

[1] Lest the “data-information-knowledge-wisdom” hierarchy (DIKW) torpedo the rest of this post, let me encourage readers to allow for an expansive definition of data. One that allows for the discrete bits of data that have no meaning without context, such as a series of numbers in a .csv file, and the data that is described and organized, such as those exact same numbers in a .csv file, but with column and row descriptors and perhaps an associated data dictionary file. Undoubtedly, the second .csv file is more useful and could be classified as information, but most people will continue to call it data.

Yasmeen Shorish is assistant professor and Physical & Life Sciences librarian at James Madison University. She is a past-convener for the ACRL Digital Curation Interest Group and her research focus is in the areas of data information literacy and scholarly communication.