Reflections on Code4Lib 2018

A few members of Tech Connect attended the recent Code4Lib 2018 conference in Washington, DC. If you missed it, the full livestream of the conference is on the Code4Lib YouTube channel. We wanted to  highlight some of our favorite talks and tie them into the work we’re doing.

Also, it’s worth pointing to the Code4Lib community’s Statement in Support of opening keynote speaker Chris Bourg. Chris offered some hard truths in her speech that angry men on the internet, predictably, were unhappy about, but it’s a great model that the conference organizers and attendees promptly stood in support.


Ashley:

One of my favorite talks at Code4lib this year was Amy Wickner’s talk, “Web Archiving and You / Web Archiving and Us.” (Video, slides) I felt this talk really captured some of the essence of what I love most about Code4lib, this being my 4th conference in the past 5 years. (And I believe this was Amy’s first!). This talk was about a technical topic relevant to collecting libraries and handled in a way that acknowledges and prioritizes the essential personal component of any technical endeavor. This is what I found so wonderful about Amy’s talk and this is what I find so refreshing about Code4lib as an inherently technical conference with intentionality behind the human aspects of it.

Web archiving seems to be something of interest but seemingly overwhelming to begin to tackle. I mean, the internet is just so big. Amy brought forth a sort of proposal for ways in which a person or institution can begin thinking about how to start a web archiving project, focusing first on the significance of appraisal. Wickner, citing Terry Cook, spoke of the “care and feeding of archives” and thinking about appraisal as storytelling. I think this is a great way to make a big internet seem smaller, understanding the importance of care in appraisal while acknowledging that for web archiving, it is an essential practice. Representation in web archives is more likely to be chosen in the appraisal of web materials than in other formats historically.

This statement resonated with me: “Much of the power that archivists wield are in how we describe or create metadata that tells a story of a collection and its subjects.”

And also: For web archives, “the narrative of how they are built is closely tied to the stories they tell and how they represent the world.”

Wickner went on to discuss how web archives are and will be used, and who they will be used by, giving some examples but emphasizing there are many more, noting that we must learn to “critically read as much as learn to critically build” web archives, while acknowledging web archives exist both within and outside of institutions. And that for personal archiving, it can be as simple as replacing links in documents with perma.cc, Wayback Machine links, or WebRecorder links.

Another topic I enjoyed in this talk was the celebration of precarious web content through community storytelling on Twitter with the hashtags #VinesWithoutVines and #GifHistory, two brief but joyous moments.


Bohyun:

The part of this year’s Code4Lib conference that I found most interesting was the talks and the discussion at a breakout session related to machine learning and deep learning. Machine learning is a subfield of artificial intelligence and deep learning is a kind of machine learning that utilizes hidden layers between the input layer and the output layer in order to refine and produce the algorithm that best represents the result in the output. Once such algorithm is produced from the data in the training set, it can be applied to a new set of data to predict results. Deep learning has been making waves in many fields such as Go playing, autonomous driving, and radiology to name a few. There were a few different talks on this topic ranging from reference chat sentiment analysis to feature detection (such as railroads) in the map data using the convolutional neural network model.

“Deep Learning for Libraries” presented by Lauren Di Monte and Nilesh Patil from University of Rochester was the most practical one among those talks as it started with a specific problem to solve and resulted in action that will address the problem. In their talk, Di Monte and Patil showed how they applied deep learning techniques to solve a problem in their library’s space assessment. The problem that they wanted to solve is to find out how many people visit the library to use the library’s space and services and how many people are simply passing through to get to another building or to the campus bus stop that is adjacent to the library. This made it difficult for the library to decide on the appropriate staffing level or the hours that best serve the users’ needs. It also prevented the library from showing the library’s reach and impact based upon the data and advocate for needed resources or budget to the decision-makers on the campus. The goal of their project was to develop automated and scalable methods for conducting space assessment and reporting tools that support decision-making for operations, service design, and service delivery.

For this project, they chose an area bounded by four smart control access gates on the first floor. They obtained the log files (with the data at the sensor level minute by minute) from the eight bi-directional sensors on those gates. They analyzed the data in order to create a recurrent neural network model. They trained the algorithm using this model, so that they can predict the future incoming and the outgoing traffic in that area and visually present those findings as a data dashboard application. For data preparation, processing, and modeling, they used Python. The tools used included Seaborn, Matplotlib, Pandas, NumPy, SciPy, TensorFlow, and Keras. They picked the recurrent neural network with stochastic gradient descent optimization, which is less complex than the time series model. For data visualization, they used Tableau. The project code is available at the library’s GitHub repo: https://github.com/URRCL/predicting_visitors.

Their project result led to the library to install six more gates in order to get a better overview of the library space usage. As a side benefit, the library was also able to pinpoint the times when the gates malfunctioned and communicate the issue with the gate vendor. Di Monte and Patil plan to hand over this project to the library’s assessment team for ongoing monitoring and to look for ways to map the library’s traffic flow across multiple buildings as the next step.

Overall, there were a lot of interests in machine learning, deep learning, and artificial intelligence at the Code4Lib conference this year. The breakout session I led at the conference on these topics produced a lively discussion on a variety of tools, current and future projects for many different libraries, as well as the impact of rapidly developing AI technologies on society. This breakout session also generated #ai-dl-ml channel in the Code4Lib Slack Space. The growing interests in these areas are also shown in the newly formed Machine and Deep Learning Research Interest Group of the Library and Information Technology Association. I hope to see more talks and discussion on these topics in the future Code4Lib and other library technology conferences.


Eric:

One of the talks which struck me the most this year was Matthew Reidsma’s Auditing Algorithms. He used examples of search suggestions in the Summon discovery layer to show biased and inaccurate results:

In 2015 my colleague Jeffrey Daniels showed me the Summon search results for his go-to search: “Stress in the workplace.” Jeff likes this search because ‘stress’ is a common engineering term as well as one common to psychology and the social sciences. The search demonstrates how well a system handles word proximities, and in this regard, Summon did well. There are no apparent results for evaluating bridge design. But Summon’s Topic Explorer, the right-hand sidebar that provides contextual information about the topic you are searching for, had an issue. It suggested that Jeff’s search for “stress in the workplace” was really a search about women in the workforce. Implying that stress at work was caused, perhaps, by women.

This sort of work is not, for me, novel or groundbreaking. Rather, it was so important to hear because of its relation to similar issues I’ve been reading about since library school. From the bias present in Library of Congress subject headings where “Homosexuality” used to be filed under “Sexual deviance”, to Safiya Noble’s work on the algorithmic bias of major search engines like Google where her queries for the term “black girls” yielded pornographic results; our systems are not neutral but reify the existing power relations of our society. They reflect the dominant, oppressive forces that constructed them. I contrast LC subject headings and Google search suggestions intentionally; this problem is as old as the organization of information itself. Whether we use hierarchical, browsable classifications developed by experts or estimated proximities generated by an AI with massive amounts of user data at its disposal, there will be oppressive misrepresentations if we don’t work to prevent them.

Reidsma’s work engaged with algorithmic bias in a way that I found relatable since I manage a discovery layer. The talk made me want to immediately implement his recording script in our instance so I can start looking for and reporting problematic results. It also touched on some of what despairs me in library work lately—our reliance on vendors and their proprietary black boxes. We’ve had a number of issues lately related to full-text linking that are confusing for end users and make me feel powerless. I submit support ticket after support ticket only to be told there’s no timeline for the fix.

On a happier note, there were many other talks at Code4Lib that I enjoyed and admired: Chris Bourg gave a rousing opening keynote featuring a rallying cry against mansplaining; Andreas Orphanides, who keynoted last year’s conference, gave yet another great talk on design and systems theory full of illuminating examples; Jason Thomale’s introduction to Pycallnumber wowed me and gave me a new tool I immediately planned to use; Becky Yoose navigated the tricky balance between using data to improve services and upholding our duty to protect patron privacy. I fear I’ve not mentioned many more excellent talks but I don’t want to ramble any further. Suffice to say, I always find Code4Lib worthwhile and this year was no exception.

Net Neutrality Roundup: Alternate Internets

Now that we are facing net neutrality regulation rollbacks here in the United States, what new roles could librarians play in the continued struggle to provide people with unrestricted access to information?  ALA has long been dedicated to equal access to information, as clearly outlined in both the Core Values and Code of Ethics. You can read ALA’s Joint Letter to the FCC here. It emphasizes that “a non-neutral net, in which commercial providers can pay for enhanced transmission that libraries and higher education cannot afford, endangers our institutions’ ability to meet our educational mission.”

Net neutrality was discussed back in 2014 on this blog, with Margaret Heller’s post entitled “What Should Academic Librarians Know about Net Neutrality?” We recommend you start there for some background on the legal issues around net neutrality. It includes a fun trip into the physical spaces our content traverses through to get onto our screens. One of the conclusions of that post was that libraries need to work on ensuring that everyone has access to broadband networks to begin with, and that more varied access ensures that no company has a monopoly over internet service in a location. There have been a number of projects along these lines over the past decade and more, and we encourage you to find one in your area and get involved.

Library-based initiatives

Equal access to information starts with having access at all. Several libraries have kicked off initiatives in activities like loaning out wi-fi hotspots for several-month periods in New York City, Brooklyn, and Chicago.

Ideally everyone will have secure and private internet access.The Library Freedom Project has been working for years to protect the privacy of patrons, including educating librarians about the threat of surveillance in modern digital technology, working with Tor Project to configure Tor exit relays in library systems, and creating educational resources for teaching patrons about privacy.

These are some excellent steps towards a more democratic and equal access to information, but what happens if the internet as we know it fundamentally changes? Let’s explore some “alternative internets” that rely on municipal and/or grassroots solutions.

Mesh networks

You might be familiar with wireless mesh networks for home use. You can set up a wireless mesh network in your own house to ensure even coverage across the house. Since each node can cover a certain part of your house, you don’t have to rely on how close you are to the wireless router to connect. You can also change the network around easily as your needs change.

Mesh networks are dynamically routed networks that exchange routes, internet, local networks, and neighbors. They can be wireless or wired. Mesh networks may not be purely a “mesh” but rather a combination of “mesh network” technology as well as “point to point” linking, with connections directly linking to each other, and each of these connections expanding out to their local mesh networks. BMX6/BMX7, BATMAN, and Babel are some of the most popular network protocols (with highly memorable names!) for achieving a broad mesh network, but there are many more. Just as you can install devices in your home, you can cooperate with others in your community or region to create your own network. The LibreMesh project is an example of a way in which DIY wireless networks are being created in several European countries.

Municipal networks

Nineteen towns in Colorado are exploring alternate internet solutions, like a public alternative. Chattanooga offers public gigabit internet speeds. This has some major advantages for the city, including the ability to offer free internet access to low-income residents and ensure that anyone who pays for access gets the same level of access, which is not the case in most cities where some areas pay a high cost for a low signal. Even just the presence and availability of municipal broadband, “has radically altered the way local politicians and many ordinary Chattanoogans conceive of the Internet. They have come to think of it as a right rather than a luxury.”1 A similar initiative in Roanoke is the Roanoke Valley Broadband Authority, which in an interesting twist lobbied the Virginia legislature to reduce oversight of its activities in a bill that originally specifically stated that broadband services should focus on underserved areas–so a reminder that in many ways municipalities view this as an investment in business rather than a social justice issue. 2 In Detroit, the Detroit Community Technology Project is working to set up and bring community wireless to neighborhoods in Detroit. New York City‘s Red Hook neighborhood relied on their mesh network during Hurricane Sandy to stay connected to outside of New York. New York City also has the rapidly-growing NYC Mesh community with two supernodes and another coming later this year, uniting lower Manhattan with Northern and Central Brooklyn. Toronto also has an emerging mesh community with a handful of connected nodes. The Urbana-Champaign Independent Media Center developed CUWiN, which provided open wireless networks in “Champaign-Urbana, Homer, Illinois, tribal lands of the Mesa Grande Reservation, and the townships of South Africa”. 3

Outside of North America, Berlin has its own mesh networked, called Freifunk. Austria has Funkfeuer. Greece has the Athens Wireless Metropolitan Network. Italy has Ninux and Argentina has AlterMundi. Villages in rural northern England are joining together to get connected via a cooperative model called B4RN, where they dig their own trenches for cables using their farm tractors.

Thinking big

Guifi.net is a wi-fi network that covers a large part of Spain and defines itself as “the biggest free, open and neutral network.” It was developed in 2004 as a response to the lack of broadband Internet, where commercial Internet providers weren’t providing a connection or a very poor one in rural areas of the Catalonia region. Guifi has established a Wireless Commons License as guidelines that can be adopted by other networks. At time of posting, 34,306 nodes were active, with over 17,000 planned.

Finally, Brooklyn Public Library was granted $50,000 from IMLS to develop a mesh network and called BKLYN Link, along with a technology fellowship program for 18-24 year olds. Looking forward to what emerges from this initiative!

Conclusion

The internet was started when college campuses connected to each other across first short geographic areas and eventually much longer distances. Could we see academic and public libraries working together and leading the return to old ways of accessing the internet for a new era?

Meanwhile, it’s important to ensure that the FCC has appropriate regulatory powers over ISPs, otherwise we have no recourse if companies choose to prioritize packets. You should contact your legislators and make sure that the people at your campus who work with the government are sharing their perspectives as well. You can get some help with a letter to Congress from ALA.

Memory Labs and audiovisual digitization workflows with Lorena Ramírez-López

Hello! I’m Ashley Blewer, and I’ve recently joined the ACRL TechConnect blogging team. For my first post, I wanted to interview Lorena Ramírez-López. Lorena is working (among other places) at the D.C. Public Library on their Memory Lab initiative, which we will discuss below. Although this upcoming project targets public libraries, Lorena has a history of dedication to providing open technical workflows and documentation to support any library’s mission to set up similar “digitization stations.”

Hi Lorena! Can you please introduce yourself?

Hi! I’m Lorena Ramírez-López. I am a born and raised New Yorker from Queens. I went to New York University for Cinema Studies and Spanish where I did an honors thesis on Paraguayan cinema in regards to sound theory. I continued my education at NYU and graduated from the Moving Image Archiving and Preservation program where I concentrated on video and digital preservation. I was one of the National Digital Stewardship Residents for the American Archive of Public Broadcasting. I did my residency at Howard University television station (WHUT) in Washington D.C from 2016 until this June 2017. Along with being the project manager for the Memory Lab Network, I do contracting work for the National Portrait Gallery on their time based media artworks, joined the Women who Code community, and teach Spanish at Fluent City!

 

Tell us a little bit about DCPL’s Memory Lab and your role in it.

The DC Public Library’s Memory Lab was a National Digital Stewardship Project back in 2014 through 2015. This was the baby of DCPL’s National Digital Stewardship Resident, Jaime Mears, back in the day. A lot of my knowledge on how it started comes from reading the original project proposal, which you can find that on the Library of Congress’s website as well as Jaime Mear’s final report on the Memory Lab is found on the DC Library website. But to summarize its origin story, the Memory Lab was created as a local response to the fact that communities are generating a lot of digital content while still keeping many of their physical materials like VHS, miniDVs, and photos but might not necessarily have the equipment or knowledge to preserve their content. It has been widely accepted in the archival and preservation fields that we have an approximate 15- to 20-year window of opportunity to digitally preserve legacy audio and video recordings on magnetic tape because of the rate of degradation and the obsolescence of playback equipment. The term “video at risk” might ring a bell to some people. There’s also photographs and film, particularly color slides and negatives and moving image film formats, that will also fade and degrade over time. People want to save their memories as well as share them on a digital platform.

There are well-established best practices for digital preservation in archival practice, but these guidelines and documentation are generally written for a professional audience. And while there are a various personal digital archiving resources for a public audience, they aren’t really easy to find on the web and a lot of these resources aren’t updated to reflect the changes in our technology, software, and habits.

That being the case, our communities risk massive loss of history and culture! And to quote Gabriela Redwine’s Digital Preservation Coalition report,  “personal digital archives are important not just because of their potential value to future scholars, but because they are important to the people who created them.”

So the Memory Lab was the library’s local response in the Washington D.C. area to bridge this gap of digital archiving knowledge and provide the tools and resources for library patrons to personally digitize their own personal content.

My role is maintaining this memory lab (digitization rack). When hardware gets worn down or breaks, I fix it. When software for our computers upgrade to newer systems, I update our workflows.

I am currently re-doing the website to reflect the new wiring I did and updating the instructions with more explanations and images. You can expect gifs!

 

You recently received funding from IMLS to create a Memory Lab Network. Can you tell us more about that?

Yes! The DC Public Library in partnership with the Public Library Association received a national leadership grant to expand the memory lab model.

During this project, the Memory Lab Network will partner with seven public libraries across the United States. Our partners will receive training, mentoring, and financial support to develop their own memory lab as well as programs for their library patrons and community to digitize and preserve their personal and family collections. A lot of focus is put on the digitization rack, mostly because it’s cool, but the memory lab model is not just creating a digitization rack. It’s also developing classes and online resources for the community to understand that digital preservation doesn’t just end with digitizing analog formats.

By creating these memory labs, these libraries will help bridge the digital preservation divide between the professional archival community and the public community. But first we have to train and help the libraries set up the memory lab, which is why we are providing travel grants to Washington, D.C. for an in-depth digital preservation bootcamp and training for these seven partners.

If anyone wants to read the proposal, the Institute of Museum and Library Sciences has it here.

 

What are the goals of the Memory Lab Network, and how do you see this making an impact on the overall library field (outside of just the selected libraries)?

One of the main goals is to see how well the memory lab model holds up. The memory lab was a local response to a need but it was meant to be replicated. This funding is our chance to see how we can adapt and improve the memory lab model for other public libraries and not just our own urban library in Washington D.C.

There are actually many institutions and organizations that have digitization stations and/or the knowledge and resources, but we just don’t realize who they are. Sometimes it feels like we keep reinventing the wheel with digital preservation. There are plenty of websites that had contemporary information on digital preservation and links to articles and other explanations at one time. Then those websites weren’t sustained and remained stagnant while housing a series of broken links and lost PDFs. We could (and should) be better of not just creating new resources, but updating the ones we have.

The reasons why some organization aren’t transparent or updating the information, or why we aren’t searching in certain areas, varies, but we should be better at documenting and sharing our information to our archival and public communities. This is why the other goal is to create a network to better communicate and share.

 

What advice do you have for librarians thinking of setting up their own digitization stations? How can someone learn more about aspects of audiovisual preservation on the job?

If you are thinking of setting up your own digitization station, announce that not only to your local community but also the larger archival community. Tell us about this amazing adventure you’re about to tackle. Let us know if you need help! Circulate and cite that article you thought was super helpful. Try to communicate not only your successes but also your problems and failures.

We need to be better at documenting and sharing what we’re doing, especially when dealing with how to handle and repair playback decks for magnetic media. Besides the fact that the companies just stopped supporting this equipment, a lot of this information on how to support and repair equipment could have been shared or passed down by really knowledge experts, but it wasn’t. Now we’re all holding our breath and pulling our hair out because this one dude who repairs U-matic tapes is thinking about retiring. This lack of information and communication shouldn’t be the case in our environment when we can email and call.

We tend to freak out about audiovisual preservation because we see how other professional institutions set up their workflows and the amount of equipment they have. The great advantage libraries have is that not only can they act practically with their resources but also they have the best type of feedback to learn from: library patrons. We’re creating these memory lab models for the general public so getting practical experience, feedback, and concerns are great ways to learn more on what aspects of audiovisual preservation really need to be fleshed out and researched.

And for fun, try creating and archiving your own audiovisual media! You technically already do with taking photos and videos on your phone. Getting to know your equipment and where your media goes is very helpful.

 

Thanks very much, Lorena!

For more information on how to set up a “digitization station” at your library, I recommend Dorothea Salo’s robust website detailing how to build an “audio/video or digital data rescue kit”, available here.