A few members of Tech Connect attended the recent Code4Lib 2018 conference in Washington, DC. If you missed it, the full livestream of the conference is on the Code4Lib YouTube channel. We wanted to highlight some of our favorite talks and tie them into the work we’re doing.
Also, it’s worth pointing to the Code4Lib community’s Statement in Support of opening keynote speaker Chris Bourg. Chris offered some hard truths in her speech that angry men on the internet, predictably, were unhappy about, but it’s a great model that the conference organizers and attendees promptly stood in support.
One of my favorite talks at Code4lib this year was Amy Wickner’s talk, “Web Archiving and You / Web Archiving and Us.” (Video, slides) I felt this talk really captured some of the essence of what I love most about Code4lib, this being my 4th conference in the past 5 years. (And I believe this was Amy’s first!). This talk was about a technical topic relevant to collecting libraries and handled in a way that acknowledges and prioritizes the essential personal component of any technical endeavor. This is what I found so wonderful about Amy’s talk and this is what I find so refreshing about Code4lib as an inherently technical conference with intentionality behind the human aspects of it.
Web archiving seems to be something of interest but seemingly overwhelming to begin to tackle. I mean, the internet is just so big. Amy brought forth a sort of proposal for ways in which a person or institution can begin thinking about how to start a web archiving project, focusing first on the significance of appraisal. Wickner, citing Terry Cook, spoke of the “care and feeding of archives” and thinking about appraisal as storytelling. I think this is a great way to make a big internet seem smaller, understanding the importance of care in appraisal while acknowledging that for web archiving, it is an essential practice. Representation in web archives is more likely to be chosen in the appraisal of web materials than in other formats historically.
This statement resonated with me: “Much of the power that archivists wield are in how we describe or create metadata that tells a story of a collection and its subjects.”
And also: For web archives, “the narrative of how they are built is closely tied to the stories they tell and how they represent the world.”
Wickner went on to discuss how web archives are and will be used, and who they will be used by, giving some examples but emphasizing there are many more, noting that we must learn to “critically read as much as learn to critically build” web archives, while acknowledging web archives exist both within and outside of institutions. And that for personal archiving, it can be as simple as replacing links in documents with perma.cc, Wayback Machine links, or WebRecorder links.
Another topic I enjoyed in this talk was the celebration of precarious web content through community storytelling on Twitter with the hashtags #VinesWithoutVines and #GifHistory, two brief but joyous moments.
The part of this year’s Code4Lib conference that I found most interesting was the talks and the discussion at a breakout session related to machine learning and deep learning. Machine learning is a subfield of artificial intelligence and deep learning is a kind of machine learning that utilizes hidden layers between the input layer and the output layer in order to refine and produce the algorithm that best represents the result in the output. Once such algorithm is produced from the data in the training set, it can be applied to a new set of data to predict results. Deep learning has been making waves in many fields such as Go playing, autonomous driving, and radiology to name a few. There were a few different talks on this topic ranging from reference chat sentiment analysis to feature detection (such as railroads) in the map data using the convolutional neural network model.
“Deep Learning for Libraries” presented by Lauren Di Monte and Nilesh Patil from University of Rochester was the most practical one among those talks as it started with a specific problem to solve and resulted in action that will address the problem. In their talk, Di Monte and Patil showed how they applied deep learning techniques to solve a problem in their library’s space assessment. The problem that they wanted to solve is to find out how many people visit the library to use the library’s space and services and how many people are simply passing through to get to another building or to the campus bus stop that is adjacent to the library. This made it difficult for the library to decide on the appropriate staffing level or the hours that best serve the users’ needs. It also prevented the library from showing the library’s reach and impact based upon the data and advocate for needed resources or budget to the decision-makers on the campus. The goal of their project was to develop automated and scalable methods for conducting space assessment and reporting tools that support decision-making for operations, service design, and service delivery.
For this project, they chose an area bounded by four smart control access gates on the first floor. They obtained the log files (with the data at the sensor level minute by minute) from the eight bi-directional sensors on those gates. They analyzed the data in order to create a recurrent neural network model. They trained the algorithm using this model, so that they can predict the future incoming and the outgoing traffic in that area and visually present those findings as a data dashboard application. For data preparation, processing, and modeling, they used Python. The tools used included Seaborn, Matplotlib, Pandas, NumPy, SciPy, TensorFlow, and Keras. They picked the recurrent neural network with stochastic gradient descent optimization, which is less complex than the time series model. For data visualization, they used Tableau. The project code is available at the library’s GitHub repo: https://github.com/URRCL/predicting_visitors.
Their project result led to the library to install six more gates in order to get a better overview of the library space usage. As a side benefit, the library was also able to pinpoint the times when the gates malfunctioned and communicate the issue with the gate vendor. Di Monte and Patil plan to hand over this project to the library’s assessment team for ongoing monitoring and to look for ways to map the library’s traffic flow across multiple buildings as the next step.
Overall, there were a lot of interests in machine learning, deep learning, and artificial intelligence at the Code4Lib conference this year. The breakout session I led at the conference on these topics produced a lively discussion on a variety of tools, current and future projects for many different libraries, as well as the impact of rapidly developing AI technologies on society. This breakout session also generated #ai-dl-ml channel in the Code4Lib Slack Space. The growing interests in these areas are also shown in the newly formed Machine and Deep Learning Research Interest Group of the Library and Information Technology Association. I hope to see more talks and discussion on these topics in the future Code4Lib and other library technology conferences.
One of the talks which struck me the most this year was Matthew Reidsma’s Auditing Algorithms. He used examples of search suggestions in the Summon discovery layer to show biased and inaccurate results:
In 2015 my colleague Jeffrey Daniels showed me the Summon search results for his go-to search: “Stress in the workplace.” Jeff likes this search because ‘stress’ is a common engineering term as well as one common to psychology and the social sciences. The search demonstrates how well a system handles word proximities, and in this regard, Summon did well. There are no apparent results for evaluating bridge design. But Summon’s Topic Explorer, the right-hand sidebar that provides contextual information about the topic you are searching for, had an issue. It suggested that Jeff’s search for “stress in the workplace” was really a search about women in the workforce. Implying that stress at work was caused, perhaps, by women.
This sort of work is not, for me, novel or groundbreaking. Rather, it was so important to hear because of its relation to similar issues I’ve been reading about since library school. From the bias present in Library of Congress subject headings where “Homosexuality” used to be filed under “Sexual deviance”, to Safiya Noble’s work on the algorithmic bias of major search engines like Google where her queries for the term “black girls” yielded pornographic results; our systems are not neutral but reify the existing power relations of our society. They reflect the dominant, oppressive forces that constructed them. I contrast LC subject headings and Google search suggestions intentionally; this problem is as old as the organization of information itself. Whether we use hierarchical, browsable classifications developed by experts or estimated proximities generated by an AI with massive amounts of user data at its disposal, there will be oppressive misrepresentations if we don’t work to prevent them.
Reidsma’s work engaged with algorithmic bias in a way that I found relatable since I manage a discovery layer. The talk made me want to immediately implement his recording script in our instance so I can start looking for and reporting problematic results. It also touched on some of what despairs me in library work lately—our reliance on vendors and their proprietary black boxes. We’ve had a number of issues lately related to full-text linking that are confusing for end users and make me feel powerless. I submit support ticket after support ticket only to be told there’s no timeline for the fix.
On a happier note, there were many other talks at Code4Lib that I enjoyed and admired: Chris Bourg gave a rousing opening keynote featuring a rallying cry against mansplaining; Andreas Orphanides, who keynoted last year’s conference, gave yet another great talk on design and systems theory full of illuminating examples; Jason Thomale’s introduction to Pycallnumber wowed me and gave me a new tool I immediately planned to use; Becky Yoose navigated the tricky balance between using data to improve services and upholding our duty to protect patron privacy. I fear I’ve not mentioned many more excellent talks but I don’t want to ramble any further. Suffice to say, I always find Code4Lib worthwhile and this year was no exception.