DH Box and Access in the Digital Humanities

Stephen Zweibel is Digital Scholarship Librarian at the Graduate Center, CUNY. Patrick Smyth is a doctoral student in English and Digital Fellow, also at the Graduate Center. Both work on the NEH-funded project, DH Box, designed to help faculty integrate digital humanities tools into their curriculum.

Patrick: So we both work on DH Box, but could you briefly describe the project for our readers?

Stephen: We call it a computer lab in the cloud, but it’s really a web app that ties into a tool called Docker. Users create an account, and when they sign in, a new DH Box instance is created: a kind of virtual computer that has a number of programming and digital humanities relevant tools installed on it, built individually for the person who signed up for it. So they can play with, say, IPython Notebook or RStudio, or Omeka, or a number of other tools, or run scripts, or break it, or do whatever they want. It’s a cloud computer that you get personal access to, that has DH tools already provided for you.

Patrick: How did DH Box get started? Where did the project originate?

“a digital humanities laboratory in the cloud”

Stephen: At the time, I was a Health Sciences librarian at Hunter College, and was teaching workshops to other librarians at different CUNY campuses. I realized that it was a pain to have to install all these tools every time, and found that it was extremely frustrating to deal with the different IT policies and the different procurement requirements and the different infrastructures that were at each campus, and it was a huge barrier to teaching these kinds of skills. I had been trying to learn programming myself and was playing with all of these different types of tools like IPython Notebooks and RStudio. I was also made aware of equipment like the Raspberry Pi, and I thought it would be super cool to just deploy Raspberry Pis for everyone, so that everyone could have a little computer lab of their own.

So we started with Raspberry Pis, and then found that most people preferred to click a button on the internet than to buy a bunch of hardware (even if it’s cheap) and assemble it. And so we turned to the cloud.

And that’s where you came in.

Patrick: Right. In 2014, I joined the Digital Fellows, a small group of graduate students that works on digital projects at the CUNY Graduate Center, under Matt Gold and Lisa Rhody. That first summer I joined, Matt sent an email saying, hey, do any of the Fellows want to get involved with this project?

And when I came to the first meeting and heard what it was about, I recall being a little excited. I had just started playing around with Python and I was very much a novice, but one thing I was struggling with was package management, and where’s the REPL (Read-Eval-Print-Loop)? And what’s the difference between the command line and Python? And all of these very elementary things that you struggle with at the beginning when you’re learning.

Stephen: So that’s a question that has come up a few times, whether the installation of these tools is a necessary pain. That if we move more in the direction that DH Box is going, our students will miss out on some essential learning if they skip the installation.

Patrick: Something that’s often brought up in DH is this idea of experimentation and play, like Stephen Ramsay’s “The Hermeneutics of Screwing Around.” I think this is part of what DH Box allows you to do, that it does reduce some of these transaction costs. The difficulty of setting up DH Box versus setting up a whole MALLET environment is a pretty big one, and you could even just use our demo to ask these questions, like “What can MALLET do?” or “What does IPython look like?” And then you know if you want to install it or not.

It’s a way to get people to play with and experiment with these kinds of tools, and see whether or not they mean something to them, see whether or not they resonate with the kind of research they want to do, the kind of learning they want to engage in.

So I think there’s a case to be made for doing things the “hard” way, where you learn to configure everything yourself from scratch. But there are also a lot of use cases that involve skipping over the process of installation, either to test something or just to get people excited about DH.

Stephen: Sure. There is also a social justice aspect to it. We talk about accessibility and creating an on-ramp for people to use and learn these kinds of tools, engage with computational literacy. We work in the CUNY system, and it’s not always easy to find the resources of a fully kitted out computer lab, especially in the humanities departments, and especially the undergrad humanities departments. DH Box can serve as a bridge for anybody who has any kind of computer, and a browser, to use these tools that are difficult to find your way into. It gives you a friendly entrance. That’s the thing we talk about, it’s a way to get people to play with and experiment with these kinds of tools, and see whether or not they mean something to them, see whether or not they resonate with the kind of research they want to do, the kind of learning they want to engage in.

Patrick: Exactly! There are all these great open source tools out there now, but if you don’t know how to use a text editor or the command line, many are just too hard to get started with.
I was trying to explain this to someone the other day in terms of an anecdote. My dad is a carpenter and he builds all these houses, and so growing up we actually lived in a house that he had built himself. But for quite a long time, there were no doorknobs on any of the doors.

Stephen: Hmm!

Patrick: And I think that was because, as somebody who really enjoyed working on houses and building things, he’d done all of the stuff that was interesting for him. The house was built as far as he was concerned, but the…

Stephen: …the usability…

Patrick: …or getting rid of the pile of gravel outside. [laughs] Or putting doorknobs on. That just wasn’t interesting from the perspective of a builder or a creator. And I feel like I see that so often with programming, or even with myself now that I’ve gotten deeper into programming, that usually what draws you to a project is a problem, either your own problem you want to solve or there’s some interesting turn of the problem that intrigues you, and once you’ve solved the problem for yourself or you’ve solved that portion of the problem that drew you in, you tend to lose interest. So that’s the last mile problem, that the internet is filled with all these scripts that work for the person who created them, but they didn’t take the time to include the UI or the documentation or whatever it would take for an ordinary person to make use of it.

DH Box is a project that doesn’t do that, it’s a project that actually cares about the last mile. I appreciate that about working on DH Box.

Stephen: And getting to the last mile…what—or who—is the last mile?

Patrick: We’re building for someone at an institution or a library or a class who could set it up for a group of students. That’s a somewhat technical user, because DH Box still requires a certain amount of configuration to set it up for a group, and then everyone in that group can be an “end user” and just use it, without any particular technical knowledge.

Stephen: What would you say you’ve gotten out of working on DH Box, as a graduate student and as a researcher?

Patrick: I feel like I’ve gotten a ton out of working on this project. And I think this is an interesting thing about digital humanities projects in general: many of them tend to be very experimental and I feel like they really help people who work on them to grow as scholars, or as technologists. When I started working on DH Box, I’d done some technology consulting work before my Ph.D., and that was useful when working on the initial NEH grant. But I really hadn’t done much in the way of really technical work, like working with Python or working with web apps, or knowing about ports and protocols. And stuff like Linux, I use Linux all the time now, it’s one of my go-tos for…

Stephen: It is your go-to.

Patrick: [laughs] It is my go-to. And honestly it was really working on DH Box that brought me to that, because in order to work on DH Box, which uses Docker, I had to use Linux. I think nowadays you can use Docker on Windows or OSX, but Linux is really what it’s based on, and back then you had to use Linux to use Docker. And getting into Linux made me follow some other rabbitholes, learning about computers in general, learning about other programming languages like Lisp. And so I think that working on DH Box has started a bit of a landslide for me, where I kind of got brought into a whole world that I hadn’t really known existed.

Stephen: Yes…as for me, it’s a similar story, it’s taught me a lot about Docker, and containerization, and ports, and all that, which is all very useful for a lot of library applications; for instance digital scholarship centers should be using Docker, I think, more widely, and they probably will be. But also it’s taught me a lot about user experience and how difficult that is. You can do what you think users will like, and that has almost nothing to do with what users will actually do. UX is so iterative, and it’s certainly not intuitive to me, and it’s something I need to learn a lot more about.

Patrick: What’s next for DH Box?

Stephen: Well, we’re continuing work on it, we’re finding individual groups that are looking to do interesting things with the platform and building along with them. And we’ll see how it goes, and perhaps we’ll apply for another grant or two. We just had two great team members join us, Jojo Karlin for outreach and Jonathan Reeve at Columbia who is working on a corpus downloader. So we’re still full steam ahead for the project.


Creative Commons LicenseThis work is licensed under a Creative Commons Attribution 4.0 International License.