Data Refuge and the Role of LibrariesPosted: February 1, 2017 | Author: Yasmeen Shorish | Filed under: academic librarianship, data, hacking, library, technology, web | Leave a comment »
Society is always changing. For some, the change can seem slow and frustrating, while others may feel as though the change occurred in a blink of an eye. What is this change that I speak of? It can be anything…civil rights, autonomous cars, or national leaders. One change that no one ever seems particularly prepared for, however, is when a website link becomes broken. One day, you could click a link and get to a site and the next day you get a 404 error. Sometimes this occurs because a site was migrated to a new server and the link was not redirected. Sometimes this occurs because the owner ceased to maintain the site. And sometimes, this occurs for less benign reasons.
Information access via the Internet is an activity that many (but not all) of us do everyday, in sometimes unconscious fashion: checking the weather, reading email, receiving news alerts. We also use the Internet to make datasets and other sources of information widely available. Individuals, universities, corporations, and governments share data and information in this way. In the Obama administration, the Open Government Initiative led to the development of Project Open Data and data.gov. Federal agencies started looking at ways to make information sharing easier, especially in areas where the data are unique.
One area of unique data is in climate science. Since climate data is captured on a specific day, time, and under certain conditions, it can never be truly reproduced. It will never be January XX, 2017 again. With these constraints, climate data can be thought of as fragile. The copies that we have are the only records that we have. Much of our nation’s climate data has been captured by research groups at institutes, universities, and government labs and agencies. During the election, much of the rhetoric from Donald Trump was rooted in the belief that climate change is a hoax. Upon his election, Trump tapped Scott Pruitt, who has fought much of the EPA’s attempts to regulate pollution, to lead the EPA. This, along with other messages from the new administration, has raised alarms within the scientific community that the United States may repeat the actions of the Harper administration in Canada, which literally threw away thousands of items from federal libraries that were deemed outside scope, through a process that was criticized as not transparent.
In an effort to safeguard and preserve this data, the Penn Program of Environmental Humanities (PPEH) helped organize a collaborative project called Data Refuge. This project requires the expertise of scientists, librarians, archivists, and programmers to organize, document, and back-up data that is distributed across federal agencies’ websites. Maintaining the integrity of the data, while ensuring the re-usability of it, are paramount concerns and areas where librarians and archivists must work hand in glove with the programmers (sometimes one and the same) who are writing the code to pull, duplicate, and push content. Wired magazine recently covered one of the Data Refuge events and detailed the way that the group worked together, while much of the process is driven by individual actions.
In order to capture as much of this data as possible, the Data Refuge project relies on groups of people organizing around this topic across the country. The PPEH site details the requirements to host a successful DataRescue event and has a Toolkit to help promote and document the event. There is also a survey that you can use to nominate climate or environmental data to be part of the Data Refuge. Not in a position to organize an event? Don’t like people? You can also work on your own! An interesting observation from the work on your own page is the option to nominate any “downloadable data that is vulnerable and valuable.” This means that Internet Archive and the End of Term Harvest Team (a project to preserve government websites from the Obama administration) is interested in any data that you have reason to believe may be in jeopardy under the current administration.
A quick note about politics. Politics are messy and it can seem odd that people are organizing in this way, when administrations change every four or eight years and, when there is a party change in the presidency, it is almost a certainty that there will be major departures in policy and prioritizations from administration to administration. What is important to recognize is that our data holdings are increasingly solely digital, and therefore fragile. The positions on issues like climate, environment, civil rights, and many, many others are so diametrically opposite from the Obama to Trump offices, that we – the public – have no assurances that the data will be retained or made widely available for sharing. This administration speaks of “alternative facts” and “disagree[ing] with the facts” and this makes people charged with preserving facts wary.
Many questions about the sustainability and longevity of the project remain. Will End of Term or Data Refuge be able to/need to expand the scope of these DataRescue efforts? How much resourcing can people donate to these events? What is the role of institutions in these efforts? This is a fantastic way for libraries to build partnerships with entities across campus and across a community, but some may view the political nature of these actions as incongruous with the library mission.
I would argue that policies and political actions are not inert abstractions. There is a difference between promoting a political party and calling attention to policies that are in conflict with human rights and freedom to information. Loathe as I am to make this comparison, would anyone truly claim that burning books is protected political speech, and that opposing such burning is “playing politics?” Yet, these were the actions of a political party – in living memory – hosted at university towns across Germany. Considering the initial attempt to silence the USDA and the temporary freeze on the EPA, libraries should strongly support the efforts of PPEH, Data Refuge, End of Term, and concerned citizens across the country.