Data Librarianship: A Path and an Ethic

Vicky Steeves is the Librarian for Research Data Management and Reproducibility at New York University – a dual appointment between NYU’s Division of Libraries and Center for Data Science. Vicky contributes to ReproZip, is a co-founder of LIS Scholarship Archive, and developed Women Working in Openness – an effort initiated by April Hathcock. Vicky holds a BS in Computer Science and an MS in Library and Information Science from Simmons College.  

Thomas: I take it that you have a dual appointment in the libraries and an external center. Can you tell us more about your current work? Is your work novel? Might it suggest a model?

This work challenges me to think of scholarship more holistically, not just as an article and accompanying data, but as research materials and computational environment.

Vicky: Yes, I’m a dual appointment between the Libraries and the Center for Data Science. It’s exciting because I can flex my computer science muscles, working half time with the Center for Data Science supervised by one of the most badass women in computing around — Juliana Freire (first female chair of the Special Interest Group on the Management of Data for ACM). I contribute to tools like ReproZip that make an immediate impact on researchers. Working with Juliana, Remi Rampin, and Fernando Chirigati, I learn a lot about reproducibility with a particular emphasis on computational reproducibility (for an introduction check out Ben Marwick’s How Computers Broke Science). This work challenges me to think of scholarship more holistically, not just as an article and accompanying data, but as research materials and computational environment.

The other half of my time at NYU is spent building the libraries’ data management service with the wonderful Nicholas Wolf. We teach classes in the library and embedded classes for faculty upon request. We also build collections, create resources, and provide consultations for the NYU community. Nick and I only teach with freely available open source tools. This was a purposeful choice. We want students to be able to take what we teach them and be able to use it whether or not they are at NYU. We presented this model at the 2016 LITA forum.

I am certainly not the first librarian to have a dual appointment with an outside institute but I think the responsibilities of my job are novel. My job description explicitly requires me to support reproducibility initiatives. I receive a lot of questions from colleagues at other institutions about my role – my successes and failures. I wrote Reproducibility Librarianship, a “from the field” report, for Collaborative Librarianship that describes my day-to-day. The report is meant to be a resource for colleagues who want to fight for resources to support openness, reproducibility, and data management at their institutions. I think having well-resourced staff supporting reproducibility is important for enhancing and preserving the scholarly record.

Thomas: While you have a background in computer science and information science, I’d venture a guess that understanding of these areas doesn’t immediately resolve to some of your current areas of focus. Could you tell us about the path that took you to a career in libraries with this particular area of focus?

Vicky: I knew I wanted to be a librarian when I started college. I went into undergrad assuming I would major in English, thinking it was the best path into librarianship. Nanette Veilleux, my advisor, convinced me that I should take at least one Computer Science course, and that it might be more beneficial to librarianship than an English degree. I took one class with her, and I was hooked. The same professor approached me later that year and asked if I wanted to be “Student Zero” of the newly formed 3 + 1 program. This program would have me complete my Computer Science (CS) degree in three years, and my Library and Information Science (LIS) degree in one year.  I jumped on the opportunity, as paying for college was tough. I had to work four jobs on top of school all four years, so I was grateful for the chance to pay less, finish up early, and get going in my chosen field.

After I finished up my LIS degree and started looking for work, I saw the National Digital Stewardship Residency (NDSR) opportunity on the Simmons job list. I thought it would be a valuable chance to get more hands-on experience with digital preservation. I applied to NDSR NY and got into my first pick host institution – the American Museum of Natural History.

I realized that I was explaining what archivists would call pre-custodial intervention.

This was my first run-in with research data. My project entailed interviewing all the curators and some of their staff and students to better understand their data storage, curation, and preservation needs. In the course of conducting these interviews I ended up answering questions from researchers like: “Why is the library doing this?”,  “Isn’t this an IT job?”, and “What do you mean by data documentation?” In the course of answering I realized that I was explaining what archivists would call pre-custodial intervention — “acting to influence the arrangement, description, and appraisal of the materials by the creators before they are transferred to [a] repository” (Tatum 2010). Getting documentation together, migrating digital materials to open and preservable formats, making sure materials are stored and backed up securely. All of this was just digital preservation basics. I was explaining archiving and digital preservation to researchers and calling it research data management. It was a big lightbulb-over-the-head-moment for me. I took the results of these interviews and recommended strategies for preserving digital assets.

Thomas: Where do you think the Computer Science degree factors into your work, past and present, if at all?

I think computing or coding is seen as magical sometimes, and it’s really not.

Vicky: Well, I’d begin by highlighting the fact that very few librarians in the reproducibility and/or data management space have or need a computer science degree. For me, the most helpful part of my CS degree was the emphasis on learning how to learn technology instead of focusing on a few specific tools or programming languages. I’ve been able to adapt to different tools very well because of this and that has been immensely helpful.

My digital preservation classes introduced fundamental issues in managing and preserving all types of digital materials. The CS degree helped me understand the more technical aspects, e.g. how operating systems work, how file formats work, and so forth. I think computing or coding is seen as magical sometimes, and it’s really not.

My CS degree required two philosophy classes and one ethics class. These were huge in shaping my professional identity. Challenging and interesting questions were presented about privacy, sharing, and privilege that are important for any information professional. How are our systems privileging some users/patrons over others? How are we protecting user/patron data?

When I did my first degree, there were only two full time computer science professors at Simmons. I graduated with just 5 other people. We were a very tight-knit group. During graduation we all took The Pledge of the Computer Professional together and received this pin from the faculty which says ‘HONOR’ in ASCII:

It ended up being my second tattoo! The lines from the oath that have stuck with me are:

My work as a Computing Professional affects people’s lives, both now and into the future.

As a result, I bear moral and ethical responsibilities to society.

Thomas: Can you talk about a specific situation where your professional ethics came into play? As you thought about a way to work through that situation were there other people or examples of work that inspired you?

Vicky: It effects my day-to-day. How I approach building services, how I recommend resources to patrons, how I do my research, what I choose to spend time on. One side project I decided to work on was the Women Working in Openness site. The website itself is open source and uses CC0 self-reported data. It’s basically a searchable, sortable list of women who do work in the field of openness: open access, open science, open scholarship, open source code, open data, open education resources – anything open. The list started on April Hathcock’s google doc. I just transformed it into a list and a map on the web to encourage folks to quit it with the all-male panels on openness.

Women Working in Openness

It’s an ethical and an efficient choice to use FOSS tools. I think a lot about the corporate capture of the scholarly record, and how my work in data management and reproducibility can either contribute to or disrupt that.

When I started building services at NYU, I chose to only support freely available open source tools in my service area. This choice is guided by my ethics and is meant to help undermine lock-in with exorbitantly priced academic tools. Thinking back on The Pledge of the Computer Professional, this choice was made thinking about the students who come to my classes and workshops. Overwhelmingly I see graduate students and I really don’t think it’s right to train them on software that costs upwards of $200 for a standard license. Just because NYU has it isn’t a good enough reason. The students will leave someday, lose access (most likely), and have to learn something freely available anyway. It’s an ethical and an efficient choice to use FOSS tools.

I think a lot about the corporate capture of the scholarly record, and how my work in data management and reproducibility can either contribute to or disrupt that. With the rise of reproducibility as a buzzword, there are plenty of commercial entities ready to profit from so-called ‘reproducibility platforms’. This represents yet another corporate capture of scholarship. I try to disrupt this by advocating for community-run, open source software for reproducibility, such as ReproZip (which I work on), o2r, and Binder. The same goes for data management platforms. We’re seeing a lot of new data services springing up from major publishers and this is also something I am actively trying to combat.


With respect to ethics more broadly I often refer to Dorothea Salo, April Hathcock, Amy Buckland, Jeff Spies, Megan Wacha, Sara Mannheimer, Jessica Schomberg, and Yasmeen Shorish. Some have been influential in writing, others in face-to-face meetings and talks, and others I go to directly when in an ethical crisis myself.

Thomas: Lastly, whose work would you like people to know more about?

Vicky: In addition to all the folks I listed above:

Shirley Zhao, the Data Science Librarian at Eccles Health Sciences Library at the University of Utah is  doing some excellent work around community building in data management and reproducibility. She’s currently running a course for librarians on data management and reproducibility via the National Library of Medicine. She also helps organize events like the Research Reproducibility Conference at the University of Utah, and short courses like Principles and Practices for Reproducible Science.

Cynthia Hudson-Vitale, the Data Services Coordinator and Research Transparency Librarian in Data & GIS Services at Washington University in St. Louis (WU) Libraries focuses her work on community infrastructure for scholarly communication and data curation. She is a part of the core SHARE team as well as the Data Curation Network. Her work in providing open, public-goods infrastructure will help keep the scholarly record in the hands of researchers.

As for other data or data-adjacent librarians, there’s too many to possibly name doing great work. I especially follow the work of Jenny Muilenburg at the University of Washington, Kristin Briney at University of Wisconsin, Amy Riegelman at the University of Minnesota, Renaine Julian at Florida State University, and Natalie Meyers at the University of Notre Dame & the Center for Open Science. But again, there are many folks doing excellent work around open infrastructure and databrarianship! I recommend readers follow the #datalibs hashtag on Twitter to find more of us and engage there.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution 4.0 International License.

Thomas Padilla

Thomas Padilla is Visiting Digital Research Services Librarian at the University of Nevada Las Vegas. He publishes, presents, and teaches widely on digital scholarship, digital collections, Humanities data, data curation, and data information literacy. He is Principal Investigator of the Institute of Museum and Library Services supported, Collections as Data. Thomas is a member of the Association for Computers and the Humanities Executive Council (2017-2021), the Global Outlook::Digital Humanities Executive Council, the Integrating digital humanities into the web of scholarship with SHARE Advisory Board, and the ARL Fellowship for Digital and Inclusive Excellence Advisory Group.