The Tools Behind Doge Decimal Classification

Four months ago, I made Doge Decimal Classification, a little website that translates Dewey Decimal class names into “Doge” speak à la the meme. It’s a silly site, for sure, but I took the opportunity to learn several new tools. This post will briefly detail each and provide resources for learning more.

Node & NPM

Because I’m curious about the framework, I chose to write Doge Decimal Classification in Node.js. Node is a relatively recent programming framework that expands the capabilities of JavaScript; rather than running in a browser, Node lets you write server-side software or command line utilities.

For Doge Decimal Classification (henceforth just DDC because, let’s face it, it’s pretty much obsoleted Dewey at this point), I wanted to write two pieces: a module and a website. A module is a reusable piece of code which provides some functionality; in this case, doge-speak Dewey class names. My website is a consumer of that module, it takes the module’s output and dresses it up in Comic Sans and bright colors. By dividing the two pieces a bit, I can work on them separately and perhaps reuse the DDC module elsewhere, for instance as a command-line tool.

For the module, I needed to write a NPM package, which turned out to be fairly straightforward. NPM is Node’s package manager, it provides a central repository for thousands of modules. Skipping over the boring part where I write code that actually does something, an NPM package is defined by a package.json file that contains metadata. That metadata tells NPM how users can find, install, and use my module. You can read my module’s package.json and some of its meaning may be self-evident, but here are a few pieces worth explaining:

  • the main field determines what the entry point of my module is, so when another app uses it (which, in Node, is done via a line like “var ddc = require('dogedc');") Node knows to load and execute a certain file
  • the dependencies field lists any packages which my package uses (it just so happens I didn’t use any) while devDependencies contains packages which anyone who wanted to work on developing the dogedc module itself would need (e.g. to run tests or automate other tedious tasks)
  • the repository field tells NPM what version control software I’m using and where to download the source code for my package (in this case, on GitHub)

Once I had my running code and package.json file in place, all I needed to do was run npm publish inside my project and it was published on NPM for anyone to install. Magic! When I update dogedc, either by adding new features or squashing bugs, all I need to do is run npm publish once again.

One final nicety of NPM that’s worth describing is the npm link command. While I’m working on my module, I want to be able to use it in the website, but also develop new features and fiddle with it in other contexts. It doesn’t make much sense to repeatedly install it from NPM everywhere that I use it, especially when I’m debugging new code which I don’t want to publish yet. Instead, npm link tells NPM to use my local copy of dogedc as if it was installed globally from NPM. This lets me preview the experience of someone consuming the latest version of my module without actually pushing that version out to the world.

Learn more: What is Node.js & why do I care? , How to create a NodeJS NPM package, NPM Developer Guide, & How do I get started with Node.js

Doge-ification

Now for perhaps the only amusing part of this post: how I went about translating Dewey Decimal classes into doge. The doge meme consists almost entirely of two-word phrases along the lines of “Much zombies. Such death. So amaze” (to quote from a Walking Dead themed meme). The only common one-word phrase is “wow” which is sprinkled liberally in most Doge images.

Reading through a few memes, the approach to taking a class name and turning it into doge becomes clear: split the name into a series of two-word doge phrases with a random adjective inserted before each noun. So “Extraterrestial Worlds” (999) becomes something like “Many extraterrestial. Much worlds.” Because the meme ignores most grammar conventions, we don’t need to worry about mismatching count and noncount nouns with appropriate adjectives. So, for instance, even though “worlds” is a count noun (you can say “one world, two worlds”) we can use the noncount adjective “much” to modify it.

There are a few more steps we need to take to create proper doge phrases. First of all, how do we know what adjectives to use? After viewing a bunch of memes, I decided on a list of the most commonly used ones: many, much, so, such, and very. Secondly, our simple procedure might work on “Extraterrestial Worlds” just fine, but what about “Library and Information Sciences”? The resulting “Many library. Much and. So information. Such sciences.” doesn’t look quite right. We need to strip out small words like conjunctions because they’re not used in doge.

So the final algorithm resembles:

  • Make the entire class name lowercase
  • Strip out stop words and punctuation
  • Split the string into an array on spaces (in JavaScript this is “array.split(" ")")
  • Loop over the array, each time adding a Doge word in front of the current word and then a period
  • Flip a coin to decide whether or not to add “Wow” on the end when we’re done

There are still a few weaknesses here. Most notably, if we use an unassigned class number like 992 the phrase “Not assigned or no longer used” comes out terribly (“Much not. Many assigned. So no. Much longer. Very used.”) simply because it’s a more traditional sentence, not a noun-laden class name. To write this algorithm in a robust way, one would have to incorporate natural language processing to identify which words should be stripped out, or perhaps be more thoughtful about which adjectives modify which nouns. That seems like a fun project for another day, but it was too much for my DDC module.

Learn more: The article A Linguist Explains the Grammar of Doge. Wow was useful in understanding doge.

Express

With my module in hand, I chose to use Express for the foundation of my site because it’s the most popular web framework for Node right now. Express is similar to Ruby on Rails or Django for Python. 1 Express isn’t a CMS like WordPress or Drupal; you can’t just install it and have a running blog working in a few minutes. Instead, Express provides tools and conventions for writing a web app. It scaffolds you over some of the busywork involved but does not do everything for you.

To get started with Express, you can generate a basic template for a site with a command line tool courtesy of NPM:

$ # first we install this tool globally (the -g flag)
$ npm install -g express-generator
$ # creates a ddc folder & puts the site template in it
$ express ddc

   create : ddc
   create : ddc/package.json
   create : ddc/app.js
   …a whole bunch more of this…

   install dependencies:
     $ cd ddc && npm install

   run the app:
     $ DEBUG=ddc ./bin/www
$ # just following the instructions above…
$ cd ddc && npm install

Since the Doge Decimal Classification site is simplistic, the initial template that Express provides created about 80% of the final code for the site. Mostly by looking through the generated project’s structure, I figured out what I needed to change and where I should make edits, such as the stark CSS that comes with a new Express site.

There’s a lot to Express so I won’t cover the framework in detail, but to give a short example of how it works I’ll discuss routing. Routing is the practice of determining what content to serve when someone visits a particular URL. Express organizes routing into two parts, one of which occurs in an “app.js” file and the other inside a “routes” directory. Here’s an example:

app.js:

// in app.js, I tell Express which routes to use for
// certain requests. Below means "for GET requests to
// the web site's root, use index"
app.get("/", routes.index);
// for GET requests anywhere else, use the "number" route
// ":number" becomes a special token I can use later
app.get("/:number", routes.number);

routes/index.js:

// inside my routes/index.js folder, I define routes.
// The "request" and "response" parameters below
// correspond to the users' request and what I choose
// to send back to them
exports.index = function(request, response){
        response.render("index", {
            // a random number from 0 to 999
            number: Math.floor(Math.random()*1000)
        });
    });
};

exports.number = function(request, response){
     // here req.params.number is that ":number"
     // from app.js above
        response.render("index", {
            number: request.params.number
        });
    });
};

Routing is how information from a user’s request—such as visiting dogedc.herokuapp.com/020—gets passed into my website and used to generate specific content. I use it to render the specific class number that someone chooses to put onto the URL, but you can imagine better use cases like delivering a user’s profile with a route like “app.get('/users/:user', routes.profile)“. There’s another piece here in how that “number” information gets used to generate HTML, but we’ve talked about Express enough.

Learn more: The Dead-Simple Step-by-Step Guide for Front-End Developers to Getting Up and Running with Node.JS, Express, Jade, and MongoDB takes you through a more involved example of using Express with the popular MongoDB database. Introduction to Express by the ever-excellent NetTuts is also a decent intro. 2

Social Media <meta>s

In my website, I also included Facebook Open Graph and Twitter Card metadata using HTML meta tags. These ensure that I have greater control over how Facebook, Twitter, and other social media platforms present DDC. For instance, someone could share a link to the site in Twitter without including my username. With Twitter cards, I can associate the site with my account, which causes my name to display beside it. In general, the link looks better; an image and tagline I specify also appear. Compare the two tweets in the screenshot below, one which links to a Tumblr (which doesn’t have my <meta> tags on it) and one where Twitter Cards are in effect:

2 tweets about DDC

Learn more: Gearing Up Your Sites for Sharing with Twitter & Facebook Meta Tags

Heroku

There’s one final piece to the DDC site: deployment and hosting. I’ve written the web app and thanks to Express I can even run it locally with a simple npm start command from inside its directory. But how do I share it with the world?

Since I chose to write Doge Decimal Classification in Node, my hosting options were limited. Node is a new technology and it’s not available on everywhere. If I really wanted something easy to publish, I would’ve used PHP or pure client-side JavaScript. But I purposefully set out to learn new tools and that’s in large part why I chose to host my app on Heroku.

Heroku is a popular cloud hosting platform. It’s oriented around a suite of command line tools which push new versions of your website to the cloud, as opposed to a traditional web host which might use FTP, a web-based administrative dashboard like Plesk or cPanel, or SSH to provide access to a remote server. Instead, Heroku builds upon the fact that most developers are already versioning their apps with git. Rather than add another new technology to the stack that you have to learn, Heroku works via a smart integration with git.

Once you have the Heroku command line tools installed, all you need to do is write a short “Procfile” which tells Heroku how to start your app (my Procfile is a single line, “web: npm start”). From there, you can run heroku create to start your app and then git push heroku master to push your code live. This makes publishing an app a cinch and Heroku has a free tier which has suited my simple site just fine. If we wanted to scale up, however, Heroku makes increasing the power of our server a matter of running a single command.

Learn more: Getting Started with Heroku; there’s a more specific article for starting a Node project

In Conclusion

There are actually a couple more things I learned in writing Doge Decimal Classification, such as writing unit tests with Nodeunit and using Jade for templates, but this post has gone on long enough. While building DDC, there were many different technologies I could’ve chosen to utilize rather than the ones I selected here. Why Node and Express over Ruby and Sinatra? Because I felt like it. There are entirely too many programming languages, frameworks, and tools for any one person to become familiar with them all. What’s more, I’m not sure when I’ll get a chance to these tools again, as they’re not common to any of the library technology I work with. But I found using a fun side project to level up in new areas much worthwhile, very enjoy.

Notes

  1. Technically, Rails and Django are more full-featured than Express, which is perhaps more similar to the smaller Sinatra or Flask frameworks for Ruby and Python respectively. The general idea of all these frameworks is the same, though; they provide useful functionality for writing web apps in a particular programming language. Some are just more “batteries included” than others.
  2. There are many more fine Express tutorials out there, but try to stick to recent ones; Express is being developed at a rapid pace and chances are if you run npm install express at the beginning of a tutorial, you’ll end up with a more current version than the one being covered, with subtle but important differences. Be cognizant of that possibility and install the appropriate version by running npm install express@3.3.3 (if 3.3.3 is the version you want).

What Should Academic Librarians Know about Net Neutrality?

John Oliver describes net neutrality as the most boring important issue. More than that, it’s a complex idea that can be difficult to understand without a strong grasp of the architecture of the internet, which is not at all intuitive. An additional barrier to having a measured response is that most of the public discussions about net neutrality conflate it with negotiations over peering agreements (more on that later) and ultimately rest in contracts with unknown terms. The hyperbole surrounding net neutrality may be useful in riling up public sentiment, but the truth seems far more subtle. I want to approach a definition and an understanding of the issues surrounding net neutrality, but this post will only scratch the surface. Despite the technical and legal complexities, this is something worth understanding, since as academic librarians our daily lives and work revolve around internet access for us and for our students.

The most current public debate about net neutrality surrounds the Federal Communications Commission’s (FCC) ability to regulate internet service providers after a January 2014 court decision struck down the FCC’s 2010 Open Internet Order (PDF). The FCC is currently in an open comment period on a new plan to promote and protect the open internet.

The Communications Act of 1934 (PDF) created the FCC to regulate wire and radio communication. This classified phone companies and similar services as “common carriers”, which means that they are open to all equally. If internet service providers are classified in the same way, this ensures equal access, but for various reasons they are not considered common carriers, which was affirmed by the Supreme Court in 2005. The FCC is now seeking to use section 706 of the 1996 Telecommunications Act (PDF) to regulate internet service providers. Section 706 gave the FCC regulatory authority to expand broadband access, particularly to elementary and high schools, and this piece of it is included in the current rulemaking process.

The legal part of this is confusing to everyone, not least the FCC. We’ll return to that later. But for now, let’s turn our attention to the technical part of net neutrality, starting with one of the most visible spats.

A Tour Through the Internet

I am a Comcast customer for my home internet. Let’s say I want to watch Netflix. How do I get there from my home computer? First comes the traceroute that shows how the request from my computer travels over the physical lines that make up the internet.


 

C:\Users\MargaretEveryday>tracert netflix.com

Tracing route to netflix.com [69.53.236.17]
over a maximum of 30 hops:

  1     1 ms    <1 ms    <1 ms  10.0.1.1
  2    24 ms    30 ms    37 ms  98.213.176.1
  3    43 ms    40 ms    29 ms  te-0-4-0-17-sur04.chicago302.il.chicago.comcast.
net [68.86.115.41]
  4    20 ms    32 ms    36 ms  te-2-6-0-11-ar01.area4.il.chicago.comcast.net [6
8.86.197.133]
  5    33 ms    30 ms    37 ms  he-3-14-0-0-cr01.350ecermak.il.ibone.comcast.net
 [68.86.94.125]
  6    27 ms    34 ms    30 ms  pos-1-4-0-0-pe01.350ecermak.il.ibone.comcast.net
 [68.86.86.162]
  7    30 ms    41 ms    54 ms  chp-edge-01.inet.qwest.net [216.207.8.189]
  8     *        *        *     Request timed out.
  9    73 ms    69 ms    69 ms  63.145.225.58
 10    65 ms    77 ms    96 ms  te1-8.csrt-agg01.prod1.netflix.com [69.53.225.6]

 11    80 ms    81 ms    74 ms  www.netflix.com [69.53.236.17]

Trace complete.
Airport

Step 1. My computer sends data to this wireless router, which is hooked to my cable modem, which is wired out to the telephone pole in front of my apartment.

 

 

 

 

 

 

 

 

 

 

2. The cables travel through the city underground, accessed through manholes like this one.

2-4. The cables travel through the city underground, accessed through manholes like this one.

 

 

 

 

 

 

 

 

 

 

 

 

 

5- . Eventually my request to go to Netflix makes it to 350 E. Cermak, which is a major collocation and internet exchange site. If you've ever taken the shuttle bus at ALA in Chicago, you've gone right past this building.

5- 6. Eventually my request to go to Netflix makes it to 350 E. Cermak, which is a major collocation and internet exchange site. If you’ve ever taken the shuttle bus at ALA in Chicago, you’ve gone right past this building. Image © 2014 Google.

 

 

 

 

 

 

 

 

 

 

 

7-9. Now the request leaves Comcast, and goes out to a Tier 1 internet provider, which owns cables that cross the country. In this case, the cables belong to CenturyLink (which recently purchased Qwest).

10. My request has now made it to Grand Forks, ND, where Netflix buys space from Amazon Web Services.

10. My request has now made it to Grand Forks, ND, where Netflix buys space from Amazon Web Services. All this happened in less than a second. Image © 2014 Google.

 

 

 

 

 

 

 

 

 

 

Why should Comcast ask Netflix to pay to transmit their data over Comcast’s networks? Understanding this requires a few additional concepts.

Peering

Peering is an important concept in the structure of the internet. Peering is a physical link of hardware to hardware between networks in internet exchanges, which are (as pictured above) huge buildings filled with routers connected to each other. 1.  Facebook Peering is an example of a very open peering policy. Companies and internet service providers can use internet exchange centers to plug their equipment together directly, and so make their connections faster and more reliable. For websites such as Facebook which have an enormous amount of upload and download traffic, it’s well worth the effort for a small internet service provider to peer with Facebook 2.

Peering relies on some equality of traffic, as the name implies. The various tiers of internet service providers you may have heard of are based on with whom they “peer”. Tier 1 ISPs are large enough that they all peer with each other, and thus form what is usually called the backbone of the internet.

Academic institutions created the internet originally–computer science departments at major universities literally had the switches in their buildings. In the US this was ARPANET, but a variety of networks at academic institutions existed throughout the world. Groups such as Internet2 allow educational, research, and government networks to connect and peer with each other and commercial entities (including Facebook, if the traceroute from my workstation is any indication). Smaller or isolated institutions may rely on a consumer ISP, and what bandwidth is available to them may be limited by geography.

The Last Mile

Consumers, by contrast, are really at the mercy of whatever company dominates in their neighborhoods. Consumers obviously do not have the resources to lay their own fiber optic cables directly to all the websites they use most frequently. They rely on an internet service provider to do the heavy lifting, just as most of us rely on utility companies to get electricity, water, and sewage service (though of course it’s quite possible to live off the grid to a certain extent on all those services depending on where you live). We also don’t build our own roads, and we expect that certain spaces are open for traveling through by anyone. This idea of roads open for all to get from the wider world to arterial streets to local neighborhoods is thus used as an analogy for the internet–if internet service providers (like phone companies) must be common carriers, this ensures the middle and last miles aren’t jammed.

When Peering Goes Bad

Think about how peering works–it requires a roughly equal amount of traffic being sent and received through peered networks, or at least an amount of traffic to which both parties can agree. This is the problem with Netflix. Unlike big companies such as Facebook, and especially Google, Netflix is not trying to build its own network. It relies on content delivery services and internet backbone providers to get content from its servers (all hosted on Amazon Web Services) to consumers. But Netflix only sends traffic, it doesn’t take traffic, and this is the basis of most of the legal battles going on with internet service providers that service the “last mile”.

The Netflix/Comcast trouble started in 2010, when Netflix contracted with Level 3 for content delivery. Comcast claimed that Level 3 was relying on a peering relationship that was no longer valid with this increase in traffic, no matter who was sending it. (See this article for complete details.) Level 3, incidentally, accused another Tier 1 provider, Cogent, of overstepping their settlement-free peering agreement back in 2005, and cut them off for a short time, which cut pieces of the internet off from each other.

Netflix tried various arrangements, but ultimately negotiated with Comcast to pay for direct access to their last mile networks through internet exchanges, one of which is illustrated above in steps 4-6. This seems to be the most reasonable course of action for Netflix to get their outbound content over networks, since they really don’t have the ability to do settlement-free peering. Of course, Reed Hastings, the CEO of Netflix, didn’t see it that way. But for most cases, settlement-free peering is still the only way the internet can actually work, and while we may not see the agreements that make this happen, it won’t be going anywhere. In this case, Comcast was not offering Netflix paid prioritization of its content, it was negotiating for delivery of the content at all. This might seem equally wrong, but someone has to pay for the bandwidth, and why shouldn’t Netflix pay for it?

What Should We Do?

If companies want to connect with each other or build their own network connections, they can do under whatever terms work best for them. The problem would be if certain companies were using the same lines that everyone was using but their packets got preferential treatment. The imperfect road analogy works well enough for these purposes. When a firetruck, police car, and ambulance are racing through traffic with sirens blazing, we are usually ok with the resulting traffic jam since we can see this requires that speed for an emergency situation. But how do we feel when we suspect a single police car has turned on a siren just to cut in line to get to lunch faster? Or a funeral procession blocks traffic? Or an elected official has a motorcade? Or a block party? These situations are regulated by government authorities, but we may or may not like that these uses of public ways are being allowed and causing our own travel to slow down. Going further, it is clearly illegal for a private company to block a public road and charge a high rate for faster travel, but imagine if no governmental agency had the power to regulate this? The FCC is attempting to make sure they have those regulatory powers.

That said it doesn’t seem like anyone is actually planning to offer paid prioritization. Even Comcast claims “no company has had a stronger commitment to openness of the Internet…” and that they have no plans of offering such a service . I find it unlikely that we will face a situation that Barbara Stripling describes as “prioritizing Mickey Mouse and Jennifer Lawrence over William Shakespeare and Teddy Roosevelt.”

I certainly won’t advocate against treating ISPs as common carriers–my impression is that this is what the 1996 Telecommunications Act was trying to get at, though the legal issues are confounding. However, a larger problem facing libraries (not so much large academics, but smaller academics and publics) is the digital divide. If there’s no fiber optic line to a town, there isn’t going to be broadband access, and an internet service provider has no business incentive to create a line for a small town that may not generate a lot of revenue. I think we need to remain vigilant about ensuring that everyone has access to the internet at all or at a fast speed, and not get too sidetracked about theoretical future possible malfeasance by internet service providers. These points are included in the FCC’s proposal, but are not receiving most of the attention, despite the fact that they are given explicit regulatory authority to do this.

Public comments are open at the FCC’s website until July 15, so take the opportunity to leave a comment about Protecting and Promoting the Open Internet, and also consider comments on E-rate and broadband access, which is another topic the FCC is currently considering. (You can read ALA’s proposal about this here (PDF).)

  1. Blum, Andrew. Tubes: a Journey to the Center of the Internet. New York: Ecco, 2012, 80.
  2. Blum, 125-126.

Websockets For Real-time And Interactive Interfaces

TL;DR WebSockets allows the server to push up-to-date information to the browser without the browser making a new request. Watch the videos below to see the cool things WebSockets enables.

Real-Time Technologies

You are on a Web page. You click on a link and you wait for a new page to load. Then you click on another link and wait again. It may only be a second or a few seconds before the new page loads after each click, but it still feels like it takes way too long for each page to load. The browser always has to make a request and the server gives a response. This client-server architecture is part of what has made the Web such a success, but it is also a limitation of how HTTP works. Browser request, server response, browser request, server response….

But what if you need a page to provide up-to-the-moment information? Reloading the page for new information is not very efficient. What if you need to create a chat interface or to collaborate on a document in real-time? HTTP alone does not work so well in these cases. When a server gets updated information, HTTP provides no mechanism to push that message to clients that need it. This is a problem because you want to get information about a change in chat or a document as soon as it happens. Any kind of lag can disrupt the flow of the conversation or slow down the editing process.

Think about when you are tracking a package you are waiting for. You may have to keep reloading the page for some time until there is any updated information. You are basically manually polling the server for updates. Using XMLHttpRequest (XHR) (also commonly known as Ajax) has been a popular way to try to work around the limitations of HTTP somewhat. After the initial page load, JavaScript can be used to poll the server for any updated information without user intervention.

Using JavaScript in this way you can still use normal HTTP and almost simulate getting a real-time feed of data from the server. After the initial request for the page, JavaScript can repeatedly ask the server for updated information. The browser client still makes a request and the server responds, and the request can be repeated. Because this cycle is all done with JavaScript it does not require user input, does not result in a full page reload, and the amount of data which is returned from the server can be minimal. In the case where there is no new data to return, the server can just respond with something like, “Sorry. No new data. Try again.” And then the browser repeats the polling–tries again and again until there is some new data to update the page. And then goes back to polling again.

This kind of polling has been implemented in many different ways, but all polling methods still have some queuing latency. Queuing latency is the time a message has to wait on the server before it can be delivered to the client. Until recently there has not been a standardized, widely implemented way for the server to send messages to a browser client as soon as an event happens. The server would always have to sit on the information until the client made a request. But there are a couple of standards that do allow the server to send messages to the browser without having to wait for the client to make a new request.

Server Sent Events (aka EventSource) is one such standard. Once the client initiates the connection with a handshake, Server Sent Events allows the server to continue to stream data to the browser. This is a true push technology. The limitation is that only the server can send data over this channel. In order for the browser to send any data to the server, the browser would still need to make an Ajax/XHR request. EventSource also lacks support even in some recent browsers like IE11.

WebSockets allows for full-duplex communication between the client and the server. The client does not have to open up a new connection to send a message to the server which saves on some overhead. When the server has new data it does not have to wait for a request from the client and can send messages immediately to the client over the same connection. Client and server can even be sending messages to each other at the same time. WebSockets is a better option for applications like chat or collaborative editing because the communication channel is bidirectional and always open. While there are other kinds of latency involved here, WebSockets solves the problem of queuing latency. Removing this latency concern is what is meant by WebSockets being a real-time technology. Current browsers have good support for WebSockets.

Using WebSockets solves some real problems on the Web, but how might libraries, archives, and museums use them? I am going to share details of a couple applications from my work at NCSU Libraries.

Digital Collections Now!

When Google Analytics first turned on real-time reporting it was mesmerizing. I could see what resources on the NCSU Libraries’ Rare and Unique Digital Collections site were being viewed at the exact moment they were being viewed. Or rather I could view the URL for the resource being viewed. I happened to notice that there would sometimes be multiple people viewing the same resource at the same time. This gave me some hint that today someone’s social share or forum post was getting a lot of click throughs right now. Or sometimes there would be a story in the news and we had an image of one of the people involved. I could then follow up and see examples of where we were being effective with search engine optimization.

The Rare & Unique site has a lot of visual resources like photographs and architectural drawings. I wanted to see the actual images that were being viewed. The problem, though, was that Google Analytics does not have an easy way to click through from a URL to the resource on your site. I would have to retype the URL, copy and paste the part of the URL path, or do a search for the resource identifier. I just wanted to see the images now. (OK, this first use case was admittedly driven by one of the great virtues of a programmer–laziness.)

My first attempt at this was to create a page that would show the resources which had been viewed most frequently in the past day and past week. To enable this functionality, I added some custom logging that is saved to a database. Every view of every resource would just get a little tick mark that would be tallied up occasionally. These pages showing the popular resources of the moment are then regenerated every hour.

It was not a real-time view of activity, but it was easy to implement and it did answer a lot of questions for me about what was most popular. Some images are regularly in the group of the most-viewed images. I learned that people often visit the image of the NC State men’s basketball 1983 team roster which went on to win the NCAA tournament. People also seem to really like the indoor pool at the Biltmore estate.

Really Real-Time

Now that I had this logging in place I set about to make it really real-time. I wanted to see the actual images being viewed at that moment by a real user. I wanted to serve up a single page and have it be updated in real-time with what is being viewed. And this is where the persistent communication channel of WebSockets came in. WebSockets allows the server to immediately send these updates to the page to be displayed.

People have told me they find this real-time view to be addictive. I found it to be useful. I have discovered images I never would have seen or even known how to search for before. At least for me this has been an effective form of serendipitous discovery. I also have a better sense of what different traffic volume actually feels like on good day. You too can see what folks are viewing in real-time now. And I have written up some more details on how this is all wired up together.

The Hunt Library Video Walls

I also used WebSockets to create interactive interfaces on the the Hunt Library video walls. The Hunt Library has five large video walls created with Cristie MicroTiles. These very large displays each have their own affordances based on the technologies in the space and the architecture. The Art Wall is above the single service point just inside the entrance of the library and is visible from outside the doors on that level. The Commons Wall is in front of a set of stairs that also function as coliseum-like seating. The Game Lab is within a closed space and already set up with various game consoles.

Listen to Wikipedia

When I saw and heard the visualization and sonification Listen to Wikipedia, I thought it would be perfect for the iPearl Immersion Theater. Listen to Wikipedia visualizes and sonifies data from the stream of edits on Wikipedia. The size of the bubbles is determined by the size of the change to an entry, and the sound changes in pitch based on the size of the edit. Green circles show edits from unregistered contributors, and purple circles mark edits performed by automated bots. (These automated bots are sometimes used to integrate library data into Wikipedia.) A bell signals an addition to an entry. A string pluck is a subtraction. New users are announced with a string swell.

The original Listen to Wikipedia (L2W) is a good example of the use of WebSockets for real-time displays. Wikipedia publishes all edits for every language into IRC channels. A bot called wikimon monitors each of the Wikipedia IRC channels and watches for edits. The bot then forwards the information about the edits over WebSockets to the browser clients on the Listen to Wikipedia page. The browser then takes those WebSocket messages and uses the data to create the visualization and sonification.

As you walk into the Hunt Library almost all traffic goes past the iPearl Immersion Theater. The one feature that made this space perfect for Listen to Wikipedia was that it has sound and, depending on your tastes, L2W can create pleasant ambient sounds1. I began by adjusting the CSS styling so that the page would fit the large. Besides setting the width and height, I adjusted the size of the fonts. I added some text to a panel on the right explaining what folks are seeing and hearing. On the left is now text asking passersby to interact with the wall and the list of languages currently being watched for updates.

One feature of the original L2W that we wanted to keep was the ability to change which languages are being monitored and visualized. Each language can individually be turned off and on. During peak times the English Wikipedia alone can sound cacophonous. An active bot can make lots of edits of all roughly similar sizes. You can also turn off or on changes to Wikidata which collects structured data that can support Wikipedia entries. Having only a few of the less frequently edited languages on can result in moments of silence punctuated by a single little dot and small bell sound.

We wanted to keep the ability to change the experience and actually get a feel for the torrent or trickle of Wikipedia edits and allow folks to explore what that might mean. We currently have no input device for directly interacting with the Immersion Theater wall. For L2W the solution was to allow folks to bring their own devices to act as a remote control. We encourage passersby to interact with the wall with a prominent message. On the wall we show the URL to the remote control. We also display a QR code version of the URL. To prevent someone in New Zealand from controlling the Hunt Library wall in Raleigh, NC, we use a short-lived, three-character token.

Because we were uncertain how best to allow a visitor to kick off an interaction, we included both a URL and QR code. They each have slightly different URLs so that we can track use. We were surprised to find that most of the interactions began with scanning the QR code. Currently 78% of interactions begin with the QR code. We suspect that we could increase the number of visitors interacting with the wall if there were other simpler ways to begin the interaction. For bring-your-own-device remote controls we are interested in how we might use technologies like Bluetooth Low Energy within the building for a variety of interactions with the surroundings and our services.

The remote control Web page is a list of big checkboxes next to each of the languages. Clicking on one of the languages turns its stream on or off on the wall (connects or disconnects one of the WebSockets channels the wall is listening on). The change happens almost immediately with the wall showing a message and removing or adding the name of the language from a side panel. We wanted this to be at least as quick as the remote control on your TV at home.

The quick interaction is possible because of WebSockets. Both the browser page on the wall and the remote control client listen on another WebSockets channel for such messages. This means that as soon as the remote control sends a message to the server it can be sent immediately to the wall and the change reflected. If the wall were using polling to get changes, then there would potentially be more latency before a change registered on the wall. The remote control client also uses WebSockets to listen on a channel waiting for updates. This allows feedback to be displayed to the user once the change has actually been made. This feedback loop communication happens over WebSockets.

Having the remote control listen for messages from the server also serves another purpose. If more than one person enters the space to control the wall, what is the correct way to handle that situation? If there are two users, how do you accurately represent the current state on the wall for both users? Maybe once the first user begins controlling the wall it locks out other users. This would work, but then how long do you lock others out? It could be frustrating for a user to have launched their QR code reader, lined up the QR code in their camera, and scanned it only to find that they are locked out and unable to control the wall. What I chose to do instead was to have every message of every change go via WebSockets to every connected remote control. In this way it is easy to keep the remote controls synchronized. Every change on one remote control is quickly reflected on every other remote control instance. This prevents most cases where the remote controls might get out of sync. While there is still the possibility of a race condition, it becomes less likely with the real-time connection and is harmless. Besides not having to lock anyone out, it also seems like a lot more fun to notice that others are controlling things as well–maybe it even makes the experience a bit more social. (Although, can you imagine how awful it would be if everyone had their own TV remote at home?)

I also thought it was important for something like an interactive exhibit around Wikipedia data to provide the user some way to read the entries. From the remote control the user can get to a page which lists the same stream of edits that are shown on the wall. The page shows the title for the most recently edited entry at the top of the page and pushes others down the page. The titles link to the current revision for that page. This page just listens to the same WebSockets channels as the wall does, so the changes appear on the wall and remote control at the same time. Sometimes the stream of edits can be so fast that it is impossible to click on an interesting entry. A button allows the user to pause the stream. When an intriguing title appears on the wall or there is a large edit to a page, the viewer can pause the stream, find the title, and click through to the article.

The reaction from students and visitors has been fun to watch. The enthusiasm has had unexpected consequences. For instance one day we were testing L2W on the wall and noting what adjustments we would want to make to the design. A student came in and sat down to watch. At one point they opened up their laptop and deleted a large portion of a Wikipedia article just to see how large the bubble on the wall would be. Fortunately the edit was quickly reverted.

We have also seen the L2W exhibit pop up on social media. This Instagram video was posted with the comment, “Reasons why I should come to the library more often. #huntlibrary.”

This is people editing–Oh, someone just edited Home Alone–editing Wikipedia in this exact moment.

The original Listen to Wikipedia is open source. I have also made the source code for the Listen to Wikipedia exhibit and remote control application available. You would likely need to change the styling to fit whatever display you have.

Other Examples

I have also used WebSockets for some other fun projects. The Hunt Library Visualization Wall has a unique columnar design, and I used it to present images and video from our digital special collections in a way that allows users to change the exhibit. For the Code4Lib talk this post is based on, I developed a template for creating slide decks that include audience participation and synchronized notes via WebSockets.

Conclusion

The Web is now a better development platform for creating real-time and interactive interfaces. WebSockets provides the means for sending real-time messages between servers, browser clients, and other devices. This opens up new possibilities for what libraries, archives, and museums can do to provide up to the moment data feeds and to create engaging interactive interfaces using Web technologies.

If you would like more technical information about WebSockets and these projects, please see the materials from my Code4Lib 2014 talk (including speaker notes) and some notes on the services and libraries I have used. There you will also find a post with answers to the (serious) questions I was asked during the Code4Lib presentation. I’ve also posted some thoughts on designing for large video walls.

Thanks: Special thanks to Mike Nutt, Brian Dietz, Yairon Martinez, Alisa Katz, Brent Brafford, and Shirley Rodgers for their help with making these projects a reality.


About Our Guest Author: Jason Ronallo is the Associate Head of Digital Library Initiatives at NCSU Libraries and a Web developer. He has worked on lots of other interesting projects. He occasionally writes for his own blog Preliminary Inventory of Digital Collections.

Notes

  1. Though honestly Listen to Wikipedia drove me crazy listening to it so much as I was developing the Immersion Theater display.

Open Sourcing Ideas: Sharing and Recreating a Library Instruction Program

Overture: We’ve Got a Theory

In April 2013 at the ACRL Conference in Indianapolis, IN, Char Booth, Lia Friedman, Adrienne Lai, and Alice Whiteside presented a panel entitled, “Love your library: Building goodwill from the inside out and the outside in” in which they highlighted examples of non-traditional marketing in academic libraries at Claremont, the University of California San Diego, Mount Holyoke, and North Carolina State University. The panelists freely encouraged audience members to recreate and adapt the ideas at other institutions, saying “Here is something that worked for us. Maybe it will work for you!” One of the ideas Alice shared was a food-themed citation help event she developed with colleagues Chrissa Godbout and Kathleen Norton at Mount Holyoke. John Jackson recreated the event at Whittier College a year later. From opposite coasts, we’ve joined forces here to discuss the development of the ExCITING Food workshop, its reiterations, and the importance of sharing ideas among academic library communities.

Borrowing each others’ ideas is common in our field, and the “Love your library” panel celebrated and encouraged this practice. When we “steal” each other’s trade secrets (with proper credit, of course), everyone benefits. The advantages of “open-sourcing” instructional programming is probably obvious to readers of TechConnect. The information literacy needs of most undergraduates, especially first-year students, are roughly the same in that they come to college with little to no experience with scholarly communication practices, limited knowledge of the breadth of information resources, and feel overwhelmed by the complex requirements (i.e. format, tone, structure, citations) of their assignments. Even accounting for the idiosyncrasies of each institution, librarians can quickly adapt events that were successful at other libraries to their own unique communities, saving time, reducing the stress of preparation, and ultimately fulfilling a recognized information need for their users by sharing successful attempts at “sneaky teaching” with the professional community at large.

In our experience, everyone benefits more if the first round of sharing isn’t the end of it. We have many methods and modes of learning about “stealable” ideas: professional literature, conference presentations, the Web, and word of mouth. Databases like PRIMO and LOEX Instructional Resources, personal blogs, Slideshare, and LibGuides all facilitate this type of sharing. More rare is the ability to provide public feedback on how one programming event succeeded or failed in a different context and how it was adapted. How can we more actively create an open-source mindset around instructional development? We hope this post is a step in that direction.

Going Through the Motions

Creating the Event at Mount Holyoke College

Alice: At Mount Holyoke College, we hatched the idea for ExCITING Food when the Dean of Students Office asked if the library could provide a workshop on citing sources during fall 2012 orientation. The planning group consisted of myself, Chrissa Godbout, Kathleen Norton, and our MLS intern Lilly Sundell-Thomas. We felt strongly that orientation, when new students are concerned with getting their bearings, meeting new friends, and struggling to stay afloat of the information overload, was the wrong time to discuss the ethical use of resources in their future research papers. That said, we appreciated that the Dean of Students Office turned to the library with this request, and we began to think about other ways to address this clearly identified need.

Historically we haven’t had great success with drop-in workshops in the library, and we knew we wanted to try something different. Our goal was to help students understand the why, when, and how of citing sources. For the greatest impact, we wanted to reach them at the point in the semester when they were thinking about their bibliographies. We hoped our “not-a-workshop” would be informative but also low-threshold and engaging. At Mount Holyoke, the surest path to engaging students usually involves food, and thus the brainstorming began. When we pitched the idea to our department head, he was skeptical: a fun drop-in citation help event? Persuaded by our enthusiasm, he fortunately agreed to support our modest budget of $50 for food. As we figured out the details, we ran the idea by our student workers and reached out to the Speaking, Arguing, and Writing  (SAW) Center. The SAW Center agreed to join the effort, helping to advertise and staff the event.

students at mount holyoke event

Students attending the ExCITING Food workshop at Mount Holyoke

One of the central ideas for ExCITING Food is citing the recipes for each snack provided; we used the snacks themselves to illustrate different citation styles, and we selected snacks to showcase a range of recipe sources (book, website, archival material, etc.). Mount Holyoke College has a strong sense of its own history, and everyone on campus knows about Mary Lyon, the school’s founder, and her vision for women’s education. The Archives have a few recipes written out in her own hand, including one for gingerbread, a variation on her molasses cake. This was a clear winner for ExCITING Food.

 M. Lyon, ca. 1845, Molasses Cake with Plums, unpublished manuscript, Mount Holyoke College Archives and Special Collections, South Hadley, MA.

After that, we got a little carried away with picking snacks that had a connection to MHC: the infamous “Chef Jeff” cookies from Dining Services, caramel corn with an image from the Archives of students shucking corn ca. 1917. We even wrote to the President’s office asking for a recipe; she graciously replied but misunderstood our intention, sending a favorite recipe for a hearty stew. Instead of stew, we went with a recipe from the library director for mulled cider (C. Patriquin, personal communication, Nov 16, 2012).

Handouts from Mount Holyoke event

Setup for the Mount Holyoke ExCITING Food event

We chose to host ExCITING Food two weeks before exams, when many students were in the thick of working on final papers. We promoted the event via social media, posters, and personal emails to First Year Seminar faculty asking them to encourage their students to attend; SAW Center writing mentors, who are current students, distributed flyers and helped spread the word. Late in the afternoon on a Wednesday, we set up tables in the library atrium, a high traffic area in front of the main entrance, and wheeled out piles of handouts, platters of cookies, and crock pots full of mulled cider on book trucks. Our handouts included sample bibliographies (with the snack recipes) in different citation styles, RefWorks information, and DIY stickers (printed on mailing labels) with friendly URLs for the library’s Citing Guide and the SAW Center. Six librarians and two SAW Center writing mentors staffed the event, distributing snacks and handouts and answering questions.

Through the combination of thoughtful timing, delicious food, and a bit of silliness, we pulled off an extremely successful session. In one hour, we distributed handouts, snacks, and our elevator pitch to over a hundred students, provided 21 citation consultations, and received abundant positive feedback from students and from our partners in SAW.

Recreating the Event at Whittier College

John: Impressed by the creativity and accessibility of the outreach events presented during Alice’s ACRL 2013 session, I have since tried to reproduce many of the events in spirit if not in detail. In April 2014, Wardman Library hosted its iteration of the exCITING Food workshop in the week leading up to finals. The day before finals began was thought to be the best time as students were beginning to think about the requirements of their final projects but not yet overwhelmed by details and deadlines.

We promoted the event on the faculty and student email lists, our social media pages, via flyers and posters, and additionally contacted faculty who we knew had assigned bibliographies as final projects. We know that students and faculty struggle to manage the incredible amount of email they receive daily and so it was important to send frequent reminders via our (less intrusive) social media and to speak with teaching faculty directly about our plans, especially ways in which students could benefit from the information presented in our posters and handouts.

Whittier College setup

Setup for the ExCITING Food event at Whittier College

One of my primary goals for the event was to highlight the helpfulness and creativity of library staff. Accordingly, I asked each staff member to contribute a dish to the event. This was perhaps the greatest source of anxiety for me: acquiring staff buy-in to make and bring enough food to make the event successful. Wardman Library is staffed by 13 employees, many of whom are extremely busy during the final weeks of the semester (especially our circulation and media staff). I was hesitant to ask my colleagues to take time outside of work to locate an appropriate recipe (we needed to have enough variety in the sources) and make it on the designate day. However, my colleagues were incredibly supportive and we produced enough food to push the scheduled 2-hour event into a 4-hour one.

Originally, we planned to host the event outside the library in order to capture the portion of our student population that does not frequent the library on a regular basis, but coincidentally (and to our benefit) the southern California heat forced us to hold the event indoors. Instead, we held the workshop inside the library near an area that we thought would be unobtrusive and wouldn’t interfere with students trying to study for finals. To our surprise, the students were reluctant to approach the event, thinking it was invitation or RSVP only. So we waited for an appropriate moment and moved the event to a more central location, near the main stairwell between the library entrance and access to the bookstacks, one of the most heavily trafficked areas of the library. This turned out to immediately increase the number of students that approached the tables unreservedly.

At the event, we provided a number of dishes including a brownies recipe from Katharine Hepburn, cornbread from a late nineteenth century college cookbook, and cookies made from various websites to illustrate citing a material that lacks an author or publishing date.

Henderson, H. (2003, July 6). Straight Talk From Miss Hepburn: Plus the Actress’s Own Brownie Recipe. New York Times, p. CY9. New York, N.Y., United States.

Clayton, H. J. (1883). Clayton’s Quaker cook-book: being a practical treatise on the culinary art. San Francisco: Women’s Co-operative Printing Office.

Easy OREO Truffles. (n.d.). Allrecipes.com. Retrieved April 25, 2014, from http://allrecipes.com/Recipe/Easy-OREO-Truffles/Detail.aspx.

In addition to the food, we provided two-sided half-sheet handouts that contained the recipe for each dish on one side and how to cite it in MLA, APA, and Chicago style formats on the other side. We also created three 20 in. x 30 in. posters outlining the when, why, and how of citations and placed these behind the food table. We made sure at least one librarian and one additional staff member were present at the table at all times and encouraged all library staff to stop by during the event to meet and talk with students.

students sampling dishes

Students attending the ExCITING Food event at Whittier College

At the ACRL conference presentation, the panelists introduced the idea of “camogogy”: the combination of pedagogy and camouflage, or “sneaky teaching.” Ultimately, this was the spirit I endeavored to recreate at our iteration of the event and even went so far as to downplay the educational aspect of the workshop. Most surprising to me, however, was how little camouflage or “sneakiness” was required. The students loved the idea of citing recipes and seemed genuinely excited at the prospect of improving their own citations. A number of students returned later to ask specific questions about citing sources and, most importantly to me, identified librarians as being a resource for finalizing their bibliographies.

Once More, With Feeling: Future Adaptations

Alice: At Mount Holyoke, this event is on its way to becoming a library tradition. In November 2013 we hosted ExCITING Snacks, with essentially the same components as the first iteration and equal success. Our major change in the second year was that we simplified our snacks: just popcorn instead of caramel corn, cold cider instead of mulled cider. We also refined our publicity approach and were thrilled to get assistance from the Academic Deans Office, which sent an email blast to all first years, sophomores, and juniors about the event. Looking ahead, our favorite question is “Who else can we collaborate with?” While ExCITING Food is a fun event, the instructional component is very clear, and I think that has helped us find allies.

Learning about the details of Whittier College’s implementation of ExCITING Food has also helped us rethink our approach and consider new elements. Next time, we will definitely create large posters to help students identify at a glance what the event is about. While we haven’t yet explored taking ExCITING Food out of the library, this is now on our list as we brainstorm new ways to collaborate with offices across campus.

John: The success of our first attempt at the ExCITING Food workshop and the enthusiasm it generated among the faculty at Whittier College has all but guaranteed that we will attempt it again next year. However, there are certainly improvements to be made. For instance, we would like to be able to capture a “non-library” audience, students who do not regularly visit the library, and may consider moving the event to a more central location on campus. This could potentially open up opportunities for new collaborations with, say, catering services (if we decided to host the workshop near the student cafeteria) or the Center for Advising and Academic Success (if we wanted to host the workshop near the tutoring center). Additionally, we would like to find a way to involve faculty and student peer mentors, not only in promoting the event, but also in providing on-site help with creating citation (or even providing additional food!).

Where Do We Go From Here?

Talking about how we adapted an idea in this forum is the first step. What other ways can we publish, adapt, and improve instruction content as a community? Between 2006 and 2008, the Oregon Library Association’s Library Instruction Roundtable maintained a Library Instruction Wiki (since expired). Some academic faculty are using GitHub to post their class syllabus for other teachers to modify, fork, and utilize version control. Public librarians like Ben Bizzle of the Craighead County Jonesboro Public Library use Dropbox to share marketing content with other librarians. And who among us has not used Google Drive to collaborate?

The tools for open-source development of instruction material exist: we simply need to make a concerted effort to develop this content on a large scale.

References

Booth, C., Friedman, L. Lai, A., &  Whiteside, A. (2013, April). Love your library: Building goodwill from the inside out and the outside in. Panel presented at the meeting of the Association of College and Research Libraries, Indianapolis, IN. (slides | recording)

About our Guest Authors:

Alice Whiteside is a Librarian & Instructional Technology Consultant at Mount Holyoke College. An active member of the Art Libraries Society of North America (ARLIS/NA), she currently serves as chair of the Professional Development Committee-Education Subcommittee. Alice holds an MSLS from the University of North Carolina at Chapel Hill and a BA in Art History from Bard College.

John Jackson is the Reference & Instruction Librarian at the Wardman Library of Whittier College, a private liberal arts college outside Los Angeles. John holds an MLIS from San Jose State University and an MA in Medieval Studies from the University of Virginia.


Future? Libraries? What Now? – After the ALA Summit on the Future of Libraries

I attended the ALA Summit on the Future of Libraries a few weeks ago.

[Let's give it a minute for that to sink in.]

ALA President Barbara Stripling at the ALA Summit on the Future of Libraries at the Library of Congress

ALA President Barbara Stripling at the ALA Summit on the Future of Libraries at the Library of Congress. (Photo by the author)

Yes, that was that controversial Summit that was much talked about on Twitter with the #libfuturesummit hashtag. This Summit and other summits with a similar theme close to one another in timing – “The Future of Libraries Survival Summit” hosted by Information Today Inc. and “The Future of Libraries: Do We Have Five Years to Live?” hosted by Ken Heycock Associates Inc. and Dysart & Jones Associates – seemed to have brought out the sentiment that Andy Woodworth aptly named ‘Library Future Fatigue.’ It was impressive experience to see how active librarians – both ALA members and non-members – were in providing real-time comments and feedback about these summits while I was at one of those in person. I thought ALA is lucky to have such engaged members and librarians to work with.

A few days ago, ALA released the official Summit report.1 The report captured all the talks and many table discussions in great detail. In this post, I will focus on some of my thoughts and take-aways prompted by the talks and the table discussion at the Summit.

A. The Draw

Here is an interesting fact. The invitation to this Summit sat in my Inbox for over a month because from the email subject I thought it was just another advertisement for a fee-based webinar or workshop. It was only after I had gotten another email from the ALA office asking about the previous e-mail that I realized that it was something different.

What drew me to this Summit were: (a) I have never been at a formal event organized just for a discussion about the future of libraries, (b) the event were to include a good number of people outside of the libraries, and (c) the overall size of the Summit would be kept relatively small.

For those curious, the Summit had 51 attendees plus 6 speakers, a dozen discussion table facilitators, all of whom fit into the Members’ Room in the Library of Congress. Out of those 51 attendees, 9 of them were from the non-library sector such as Knight Foundation, PBS, Rosen Publishing, and Aspen Institute. 33 attendees ranged from academic librarians to public, school, federal, corporate librarians, library consultants, museum and archive folks, an LIS professor, and library vendors. And then there were 3 ALA presidents (current, past, and president-elect) and 6 officers from ALA. You can see the list of participants here.

B. Two Words (or Phrases)

At the beginning of the Summit, the participants were asked to come up with two words or short phrases that capture what they think about libraries “from now on.” We wrote these on the ribbons and put right under our name tags. Then we were encouraged to keep or change them as we move through the Summit.

My two phrases were “Capital and Labor” and “Peer-to-Peer.” I kept those two until the end of the Summit and didn’t change. I picked “Capital and Labor” because recently I have been thinking more about the socioeconomic background behind the expansion of post-secondary education (i.e. higher ed) and how it affects the changes in higher education and academic libraries.2 And of course, the fact that Thomas Picketty’s book, Capital in the 21st Century, was being reviewed and discussed all over in the mass media contributed to that choice of the words as well. In my opinion, libraries “from now on” will be closely driven by the demands of the capital and the labor market and asked to support more and more of the peer-to-peer learning activities that have become widespread with the advent of the Internet.

Other phrases and words I saw from other participants included “From infrastructure to engagement,” “Sanctuary for learning,” “Universally accessible,” “Nimble and Flexible,” “From Missionary to Mercenary,” “Ideas into Action,” and “Here, Now.” The official report also lists some of the words that were most used by participants. If you choose your two words or phrases that capture what you think about libraries “from now on,” what would those be?

C. The Set-up

The Summit organizers have filled the room with multiple round tables, and the first day morning, afternoon, and the second day morning, participants sat at the table according to the table number assigned on the back of their name badges. This was a good method that enabled participants to have discussion with different groups of people throughout the Summit.

As the Summit agenda shows, the Summit program started with a talk by a speaker. After that, participants were asked to personally reflect on the talk and then have a table discussion. This discussion was captured on the large poster-size papers by facilitators and collected by the event organizers. The papers on which we were asked to write our personal reflections were  also collected in the same way along with all our ribbons on which we wrote those two words or phrases. These were probably used to produce the official Summit report.

One thing I liked about the set-up was that every participant sat at a round table including speakers and all three ALA presidents (past, president, president-elect). Throughout the Summit, I had a chance to talk to Lorcan Dempsey from OCLC, Corinne Hill, the director of Chattanooga Public Library, Courtney Young, the ALA president-elect, and Thomas Frey, a well-known futurist at DaVinci Institute, which was neat.

Also, what struck me most during the Summit was that those who were outside of the library took the guiding questions and the following discussion much more seriously than those of us who are inside the library world. Maybe indeed we librarians are suffering from ‘library future fatigue.’ And/or maybe outsiders have more trust in libraries as institutions than we librarians do because they are less familiar with our daily struggles and challenges in the library operation. Either way, the Summit seemed to have given them an opportunity to seriously consider the future of libraries. The desired impact of this would be more policymakers, thought leaders, and industry leaders who are well informed about today’s libraries and will articulate, support, and promote the significant work libraries do to the benefit of the society in their own areas.

D. Talks, Table Discussion, and Some of My Thoughts and Take-aways

These were the talks given during the two days of the Summit:

  • “How to Think Like a Freak” – Stephen Dubner, Journalist
  • “What Are Libraries Good For?” – Joel Garreau, Journalist
  • “Education in the Future: Anywhere, Anytime” – Dr. Renu Khator, Chancellor and President at the University of Houston
  • “From an Internet of Things to a Library of Things” – Thomas Frey, Futurist
  • A Table Discussion of Choice:
    • Open – group decides the topic to discuss
    • Empowering individuals and families
    • Promoting literacy, particularly in children and youth
    • Building communities the library serves
    • Protecting and empowering access to information
    • Advancing research and scholarship at all levels
    • Preserving and/or creating cultural heritage
    • Supporting economic development and good government
  • “What Happened at the Summit?” – Joan Frye Williams, Library consultant

(0) Official Report, Liveblogging Posts, and Tweets

As I mentioned earlier, ALA released the 15-page official report of the Summit, which provides the detailed description of each talk and table discussion. Carolyn Foote, a school librarian and one of the Summit participants, also live-blogged all of the these talks in detail. I highly recommend reading her notes on Day 1, Day 2, and Closing in addition to the official report. The tweets from the Summit participants with the official hashtag, #libfuturesummit, will also give you an idea of what participants found exciting at the Summit.

(1) Redefining a Problem

The most fascinating story in the talk by Dubner was Kobe, the hot dog eating contest champion from Japan. The secret of his success in the eating contest was rethinking the accepted but unchallenged artificial limits and redefining the problem, said Dubner. In Kobe’s case, he redefined the problem from ‘How can I eat more hotdogs?’ to ‘How can I eat one hotdog faster?’ and then removed artificial limits – widely accepted but unchallenged conventions – such as when you eat a hot dog you hold it in the hand and eat it from the top to the bottom. He experimented with breaking the hotdog into two pieces to feed himself faster with two hands. He further refined his technique by eating the frankfurter and the bun separately to make the eating even speedier.

So where can libraries apply this lesson? One thing I can think of is the problem of the low attendance of some library programs. What if we ask what barriers we can remove instead of asking what kind of program will draw more people? Chattanooga Public Library did exactly this. Recently, they targeted the parents who would want to attend the library’s author talk and created an event that would specifically address the child care issue. The library scheduled a evening story time for kids and fun activities for tween and teens at the same time as the author talk. Then they asked parents to come to the library with the children, have their children participate in the library’s children’s programs, and enjoy themselves at the library’s author talk without worrying about the children.

Another library service that I came to learn about at my table was the Zip Books service by the Yolo county library in California. What if libraries ask what the fastest to way to deliver a book that the library doesn’t have to a patron’s door would be instead of asking how quickly the cataloging department can catalog a newly acquired book to get it ready for circulation? The Yolo county library Zip Books service came from that kind of redefinition of a problem. When a library user requests a book the library doesn’t have but meets certain requirements, the Yolo County Library purchases the book from a bookseller and have it shipped directly to the patron’s home without processing the book. Cataloging and processing is done when the book is returned to the library after the first use.

(2) What Can Happen to Higher Education

My favorite talk during the Summit was by Dr. Khator because she had deep insight in higher education and I have been working at university libraries for a long time. The two most interesting observations she made were the possibility of (a) the decoupling of the content development and the content delivery and (b) the decoupling of teaching and credentialing in higher education.

The upside of (a) is that some wonderful class a world-class scholar created may be taught by other instructors at places where the person who originally developed the class is not available. The downside of (a) is, of course, the possibility of it being used as the cookie-cutter type lowest baseline for quality control in higher education – University of Phoenix mentioned as an example of this by one of the participants at my table – instead of college and university students being exposed to the classes developed and taught by their institutions’ own individual faculty members.

I have to admit that (b) was a completely mind-blowing idea to me. Imagine colleges and universities with no credentialing authority. Your degree will no longer be tied to a particular institution to which you were admitted and graduate from. Just consider the impact of what this may entail if it ever becomes realized. If both (a) and (b) take place at the same time, the impact would be even more significant. What kind of role could an academic library play in such a scenario?

(3) Futurizing Libraries

Joe Garreau observed that nowadays what drives the need for a physical trip is more and more a face-to-face contact than anything else. Then he pointed out that as technology allows more people to tele-work, people are flocking to smaller cities where they can have a more meaningful contact with the community. If this is indeed the case, libraries that make their space a catalyst for a face-to-face contact in a community will prosper. Last speaker, Thomas Frey, spoke mostly about the Internet of Things (IoT).

While I think that IoT is an important trend to note, for sure, what I most liked about Frey’s talk was his statement that the vision of future we have today will change the decisions we make (towards that future). After the talk by Garreau, I had a chance to ask him a question about his somewhat idealized vision of the future, in which people live and work in a small but closely connected community in a society that is highly technological and collaborative. He called this ‘human evolution’.

But in my opinion, the reality that we see today in my opinion is not so idyllic.3 The current economy is highly volatile. It no longer offers job security, consistently reduces the number of jobs, and returns either stagnant or decreasing amount of income for those whose skills are not in high demand in the era of digital revolution.4 As a result, today’s college students, who are preparing to become tomorrow’s knowledge workers, are perceiving their education and their lives after quite differently than their parents did.5

Garreau’s answer to my question was that this concern of mine may be coming from a kind of techno-determinism. While this may be a fair critique, I felt that his portrayal of the human evolution may be just as techno-deterministic. (To be fair, he mentioned that he does not make predictions and this is one of the future scenarios he sees.)

Regarding the topic of the Internet of Things (IoT), which was the main topic of Frey’s talk, the privacy and the proper protection of the massive amount of data – which will result from the very many sensors that makes IoT possible – will be the real barrier to implementing the IoT on a large scale. After his talk, I had a chance to briefly chat with him about this. (There was no Q&A because Frey’s talk went over the time allotted). He mentioned the possibility of some kind of an international gathering similar to the scale of the Geneva Conventions to address the issue. While the likelihood of that is hard to assess, the idea seemed appropriate to the problem in question.

(4) What If…?

One of the slides from Thoams Frey's Talk at the ALA Summit. (Photo by the author)

One of the slides from Thomas Frey’s Talk at the ALA Summit. (Photo by the author)

Some of the shiny things shown at the talk, whose value for library users may appear dubious and distant, however, prompted Eli Neiburger at Ann Arbor District Library to question which useful service libraries can offer to provide the public with significant benefit now. He wondered what it would be like if many libraries ran a Tor exit node to help the privacy and anonymity of the web traffic, for example.

For those who are unfamiliar, Tor (the Onion Router) is “free software and an open network that helps you defend against traffic analysis, a form of network surveillance that threatens personal freedom and privacy, confidential business activities and relationships, and state security.” Tor is not foolproof, but it is still the best tool for privacy and anonymity on the Web.

Eli’s idea is a truly wild one because there are so many libraries in the US and the public’s privacy in the US is in such a precarious state.6 Running a Tor exit node is not a walk in the park as this post by someone who actually set up a Tor exit node on a hosted virtual server in Germany attests. But libraries have been a serious and dedicated advocate for privacy for people’s intellectual freedom for a long time and have a strong network of alliance. There is also the useful guidelines and tips that Tor provides in their website.

Just pause a minute and imagine what kind of impact such a project by libraries may have to the privacy of the public. What if?

(5) Leadership and Sustainability

For the “Table Discussion of Choice” session, I opted for the “Open” table because I was curious in what other topics people were interested. Two discussions at this session were most memorable to me. One was the great advice I got from Corinne Hill regarding leading people. A while ago, I read her interview, in which she commented that “the staff are just getting comfortable with making decisions.” In my role as a relatively new manager, I also found empowering my team members to be more autonomous decision makers a challenge. Corinne particularly cautioned that leaders should be very careful about not being over-critical when the staff takes an initiative but makes a bad decision. Being over-critical in that case can discourage the staff from trying to make their own decisions in their expertise areas, she said. Hearing her description of how she relies on the different types of strengths in her staff to move her library in the direction of innovation was also illuminating to me. (Lorcan Dempsey who was also at our table mentioned “Birkman Quadrants” in relation to Corinne’s description, a set of useful theoretical constructs. He also brought up the term ‘Normcore’ at another session. I forgot the exact context of that term, but the term was interesting that I wrote it down.) We also talked for a while about the current LIS education and how it is not sufficiently aligned with the skills needed in everyday library operation.

The other interesting discussion started with the question about the sustainability of the future libraries by Amy Garmer from Aspen Institute. (She has been working on a library-related project with various policy makers, and PLA has a program related to this project at the upcoming 2014 ALA Annual Conference if you are interested.) One thought that always comes to my mind whenever I think about the future of libraries is that while in the past the difference between small and large libraries was mostly quantitative in terms of how many books and other resources were available, in the present and future, the difference is and will be more qualitative. What New York Public Libraries offers for their patrons, a whole suite of digital library products from the NYPL Labs for example, cannot be easily replicated by a small rural library. Needless to say, this has a significant implication for the core mission of the library, which is equalizing the public’s access to information and knowledge. What can we do to close that gap? Or perhaps will different types of libraries have different strategies for the future, as Lorcan Dempsey asked at our table discussion? These two things are not incompatible to be worked out at the same time.

(6) Nimble and Media-Savvy

In her Summit summary, Joanne Frye Williams, who moved around to observe discussions at all tables during the Summit, mentioned that one of the themes that surfaced was thinking about a library as a developing enterprise rather than a stable organization. This means that the modus operandi of a library should become more nimble and flexible to keep the library in the same pace of the change that its community goes through.

Another thread of discussion among the Summit participants was that not all library supporters have to be the active users of the library services. As long as those supporters know that the presence and the service of libraries makes their communities strong, libraries are in a good place. Often libraries make the mistake of trying to reach all of their potential patrons to convert them into active library users. While this is admirable, it is not always practical or beneficial to the library operation. More needed and useful is a well-managed strategic media relations that will effectively publicize the library’s services and programs and its benefits and impact to its community. (On a related note, one journalist who was at the Summit mentioned how she noticed the recent coverage about libraries changing its direction from “Are libraries going to be extinct?” to “No, libraries are not going to be extinct. And do you know libraries offer way more than books such as … ?”, which is fantastic.)

E. What Now? Library Futurizing vs. Library Grounding

What all the discussion at the Summit reminded me was that ultimately the time and efforts we spend on trying to foresee what the future holds for us and on raising concerns about the future may be better directed at refining the positive vision for the desirable future for libraries and taking well-calculated and decisive actions towards the realization of that vision.

Technology is just a tool. It can be used to free people to engage in more meaningful work and creative pursuits. Or it can be used to generate a large number of the unemployed, who have to struggle to make the ends meet and to retool themselves with fast-changing skills that the labor market demands, along with those in the top 1 or 0.1 % of very rich people. And we have the power to influence and determine which path we should and would be on by what we do now.

Certainly, there are trends that we need to heed. For example, the shift of the economy that places a bigger role on entrepreneurship than ever before requires more education and support for entrepreneurship for students at universities and colleges. The growing tendency of the businesses looking for potential employees based upon their specific skill sets rather than their majors and grades has lead universities and colleges to adopt a digital badging system (such as Purdue’s Passport) or other ways for their students to record and prove the job-related skills obtained during their study.

But when we talk about the future, many of us tend to assume that there are some kind of inevitable trends that we either get or miss and that those trends will determine what our future will be. We forget that not some trends but (i) what we intend to achieve in the future and (ii) today’s actions we take to realize that intention are really what determines our future. (Also always critically reflect on whatever is trendy; you may be in for a surprise.7) The fact that people will no longer need to physically visit a library to check out books or access library resources does not automatically mean that the library in the future will cease to have a building. The question is whether we will let that be the case. Suppose we decide that we want the library to be and stay as the vibrant hub for a community’s freedom of inquiry and right to access human knowledge, no matter how much change takes place in the society. Realizing this vision ‘IS’ within our power. We only reach the future by walking through the present.

Notes

  1. Stripling, Barbara. “Report on the Summit on the Future of Libraries.” ALA Connect, May 19, 2014. http://connect.ala.org/node/223667.
  2. Kim, Bohyun. “Higher ‘Professional’ Ed, Lifelong Learning to Stay Employed, Quantified Self, and Libraries.” ACRL TechConnect Blog, March 23, 2014. http://acrl.ala.org/techconnect/?p=4180.
  3. Ibid.
  4. For a short but well-written clear description of this phenomenon, see Brynjolfsson, Erik, and Andrew McAfee. Race against the Machine: How the Digital Revolution Is Accelerating Innovation, Driving Productivity, and Irreversibly Transforming Employment and the Economy. Lexington: Digital Frontier Press, 2012.
  5. Brooks, David. “The Streamlined Life.” The New York Times, May 5, 2014. http://www.nytimes.com/2014/05/06/opinion/brooks-the-streamlined-life.html.
  6. See Timm, Trevor. “Everyone Should Know Just How Much the Government Lied to Defend the NSA.” The Guardian, May 17, 2014. http://www.theguardian.com/commentisfree/2014/may/17/government-lies-nsa-justice-department-supreme-court.
  7. For example, see this article about what the wide adoption of 3D-printing may mean to the public. Sadowski, Jathan, and Paul Manson. “3-D Print Your Way to Freedom and Prosperity.” Al Jazeera America, May 17, 2014. http://america.aljazeera.com/opinions/2014/5/3d-printing-politics.html.

Library & Academic Tech Conferences Roundup

Here we present a summary of various library technology conferences that ACRL TechConnect authors have been to. There are a lot of them and some fairly niche. So we hope this guide serves to assist neophytes and veterans alike in choosing how they spend their limited professional development monies. Do you attend one of these conferences every year because it’s awesome? Did we miss your favorite conference? Let us know in the comments!

The lisevents.com website might be of interest, as it compiles LIS conferences of all types. Also, one might be able to get a sense of the content of a conference by searching for its hashtag on Twitter. Most conferences list their hashtag on their website.

Access

  • Time: late in the year, typically September or October
  • Place: Canada
  • Website: http://accessconference.ca/
  • Access is a Canada’s annual library technology conference. Although the focus is primarily on technology, a wide variety of topics are addressed from linked data, innovation, makerspace, to digital archiving by librarians in various areas of specialization. (See the past conferences’ schedules: http://accessconference.ca/about/past-conferences/) Access provides an excellent opportunity to get an international perspective without traveling too far. Access is also a single-track conference, offers great opportunities to network, and starts with preconferences and the hackathon, which welcomes to all types of librarians not just library coders. Both preconferences and the hackathon are optional but highly recommended. (p.s. One of the ACRL TechConnect authors thinks that this is the conference with the best conference lunch and snacks.)

Code4Lib

  • Time: early in the year, typically February but this year in late March
  • Place: varies
  • Website: http://code4lib.org/conference/
  • Code4Lib is unique in that it is organized by a group of volunteers and not supported by any formal organization. While it does cover some more general technology concepts, the conference tends to be focused on coding, naturally. Preconferences from past years have covered the Railsbridge curriculum for learning Ruby on Rails and Blacklight, the open source discovery interface. Code4Lib moves quickly—talks are short (20 minutes) with even shorter lightning talks thrown in—but is also all on one track in the same room; attendees can see every presentation.

Computers in Libraries

  • Time: Late March or early April
  • Place: Washington, DC
  • Website: http://www.infotoday.com/conferences.asp
  • Computers in Libraries is a for-profit conference hosted by Information Today. Its use of tracks, organizing presentations around a topic or group of topics, is a useful way to attend a conference and its overall size is more conducive to networking, socializing, and talking with vendors in the exhibit hall than many other conferences. However, the role of consultants in panel and presentation selection and conference management, as opposed to people who work in libraries, means that there is occasionally a focus on trends that are popular at the moment, but don’t pan out, as well as language more suited to an MBA than an MLIS. The conference also lacks a code of conduct and given the corporate nature of the conference, the website is surprisingly antiquated.
  • They also run Internet Librarian, which meets in Monterey, California, every fall.
    — Jacob Berg, Library Director, Trinity Washington University

Digital Library Federation Forum

  • Time: later in the year, October or November
  • Place: varies
  • Website: http://www.diglib.org/
  • We couldn’t find someone who attended this. If you have, please add your review of this conference in the comments section!

edUI

  • Time: late in the year, typically November
  • Place: Richmond, VA
  • Website: http://eduiconf.org/
  • Not a library conference, edUI is aimed at web professionals working in higher education but draws a fair number of librarians. The conference tends to draw excellent speakers, both from within higher education and the web industry at large. Sessions cover user experience, design, social media, and current tools of the trade. The talks suit a broad range of specialties, from programmers to people who work on the web but aren’t technologists foremost.

Electronic Resources & Libraries

  • Time: generally early in the year, late-February to mid-March.
  • Place: Austin, TX
  • Website: http://www.electroniclibrarian.com/
  • The main focus of this conference is workflows and issues surrounding electronic resources (such as licensed databases and online journals, and understanding these is crucial to anyone working with library technology, whether or not they manage e-resources on a daily basis. In recent years the conference has expanded greatly into areas such as open access and user experience, with tracks specifically dedicated to those areas. This year there were also some overlapping programs and themes with SXSW and the Leadership, Technology, Gender Summit.

Handheld Librarian

  • Time: held a few times throughout the year
  • Place: online
  • Website: http://handheldlibrarian.org
  • An online conference devoted specifically to mobile technologies. The advantage of this conference is that without traveling, you can get a glimpse of the current developments and applications of mobile technologies in libraries. It originally started in 2009 as an annual one-day online conference based upon the accepted presentation proposals submitted in advance. The conference went through some changes in recent years, and now it offers a separate day of workshops in addition and focuses on a different theme in mobile technologies in libraries. All conference presentations and workshops are recorded. If you are interested in attending, it is a good idea to check out the presentations and the speakers in advance.

Internet Librarian

  • Time: October
  • Place: Monterey, CA
  • Website: http://www.infotoday.com/conferences.asp
  • Internet Librarian is for-profit conference hosted by Information Today. It is quite similar to Information Today’s Computers in Libraries utilizing tracks to organize a large number of presentations covering a broad swath of library information technology topics. Internet Librarian also hosts the Internet @ Schools track that focus on the IT needs of the K12 library community. IL is held annually in Monterey California in October. The speaker list is deep and varied and one can expect keynote speakers to be prominent and established names in the field. The conference is well attended and provides a good opportunity to network with library technology peers. As with Computers in Libraries, there is no conference code of conduct.

KohaCon

  • Time: varies, typically in the second half of the year
  • Place: varies, international
  • Website: http://koha-community.org/kohacon/
  • The annual conference devoted to the Koha open source ILS.

 Library Technology Conference

  • Time: mid-March
  • Place: St. Paul, MN
  • Website: http://libtechconf.org/
  • LTC is an annual library conference that takes place in March. It’s both organized by and takes place at Macalester College in St. Paul. Not as completely tech-heavy as a Code4Lib or even an Access, talks at LTC tend to run a whole range of technical aptitudes. Given the time and location of LTC, it has historically been primarily of regional interest but has seen elevating levels of participation nationally and internationally.
    — John Fink, Digital Scholarship Librarian, McMaster University
  • We asked Twitter for a short overview of Library Technology Conference, and Matthew Reidsma offered up this description:

LITA Forum

  • Time: Late in the year, typically November
  • Place: varies
  • Website: http://www.ala.org/lita/conferences
  • A general library technology conference that’s moderately sized, with some 300 attendees most years. One of LITA’s nice aspects is that, because of the smaller size of the conference and the arranged networking dinners, it’s very easy to meet other librarians. You need not be involved with LITA to attend and there are no committee or business meetings.

Open Repositories

  • Time: mid-summer, June or July
  • Place: varies, international
  • Website: changes each year, here are the 2013 and 2014 sites
  • A mid-sized conference focused specifically on institutional repositories.

Online NorthWest

  • Time: February
  • Place: Corvallis, OR
  • Website: http://onlinenorthwest.org/
  • A small library technology conference in the Pacific Northwest. Hosted by the Oregon University System, but invites content from Public, Medical, Special, Legal, and Academic libraries.

THATcamps

  • Time: all the time
  • Place: varies, international
  • Website: http://thatcamp.org/
  • Every THATCamp is different, but all revolve around technology and the humanities (i.e. The Technology And Humanities Camp). They are unconferences with “no spectators”, and so will reflect the interests of the participants. Some have specific themes such as digital pedagogy, others are attached to conferences as pre or post conference events, and some are more general regional events. Librarians are important participants in THATCamps, and if there is one in your area or at a conference you’re attending, you should go. They cost under $30 and are a great networking and education opportunity. Sign up for the THATCamp mailing list or subscribe to the RSS feed to find out about new THATCamps. They have a attendee limit and usually fill up quickly.

Analyzing Usage Logs with OpenRefine

Background

Like a lot of librarians, I have access to a lot of data, and sometimes no idea how to analyze it. When I learned about linked data and the ability to search against data sources with a piece of software called OpenRefine, I wondered if it would be possible to match our users’ discovery layer queries against the Library of Congress Subject Headings. From there I could use the linking in LCSH to find the Library of Congress Classification, and then get an overall picture of the subjects our users were searching for. As with many research projects, it didn’t really turn out like I anticipated, but it did open further areas of research.

At California State University, Fullerton, we use an open source application called Xerxes, developed by David Walker at the CSU Chancellor’s Office, in combination with the Summon API. Xerxes acts as an interface for any number of search tools, including Solr, federated search engines, and most of the major discovery service vendors. We call it the Basic Search, and it’s incredibly popular with students, with over 100,000 searches a month and growing. It’s also well-liked – in a survey, about 90% of users said they found what they were looking for. We have monthly files of our users’ queries, so I had all of the data I needed to go exploring with OpenRefine.

OpenRefine

OpenRefine is an open source tool that deals with data in a very different way than typical spreadsheets. It has been mentioned in TechConnect before, and Margaret Heller’s post, “A Librarian’s Guide to OpenRefine” provides an excellent summary and introduction. More resources are also available on Github.

One of the most powerful things OpenRefine does is to allow queries against open data sets through a function called reconciliation. In the open data world, reconciliation refers to matching the same concept among different data sets, although in this case we are matching unknown entities against “a well-known set of reference identifiers” (Re-using Cool URIs: Entity Reconciliation Against LOD Hubs).

Reconciling Against LCSH

In this case, we’re reconciling our discovery layer search queries with LCSH. This basically means it’s trying to match the entire user query (e.g. “artist” or “cost of assisted suicide”) against what’s included in the LCSH linked open data. According to the LCSH website this includes “all Library of Congress Subject Headings, free-floating subdivisions (topical and form), Genre/Form headings, Children’s (AC) headings, and validation strings* for which authority records have been created. The content includes a few name headings (personal and corporate), such as William Shakespeare, Jesus Christ, and Harvard University, and geographic headings that are added to LCSH as they are needed to establish subdivisions, provide a pattern for subdivision practice, or provide reference structure for other terms.”

I used the directions at Free Your Metadata to point me in the right direction. One note: the steps below apply to OpenRefine 2.5 and version 0.8 of the RDF extension. OpenRefine 2.6 requires version 0.9 of the RDF extension. Or you could use LODRefine, which bundles some major extensions and I hear is great, but personally haven’t tried. The basic process shouldn’t change too much.

(1) Import your data

OpenRefine has quite a few file type options, so your format is likely already supported.

 Screenshot of importing data

(2) Clean your data

In my case, this involves deduplicating by timestamp and removing leading and trailing whitespaces. You can also remove weird punctuation, numbers, and even extremely short queries (<2 characters).

(3) Add the RDF extension.

If you’ve done it correctly, you should see an RDF dropdown next to Freebase.

Screenshot of correctly installed RDF extension

(4) Decide which data you’d like to search on.

In this example, I’ve decided to use just queries that are less than or equal to four words, and removed duplicate search queries. (Xerxes handles facet clicks as if they were separate searches, so there are many duplicates. I usually don’t, though, unless they happen at nearly the same time). I’ve also experimented with limiting to 10 or 15 characters, but there were not many more matches with 15 characters than 10, even though the data set was much larger. It depends on how much computing time you want to spend…it’s really a personal choice. In this case, I chose 4 words because of my experience with 15 characters – longer does not necessarily translate into more matches. A cursory glance at LCSH left me with the impression that the vast majority of headings (not including subdivisions, since they’d be searched individually) were 4 words or less. This, of course, means that your data with more than 4 words is unusable – more on that later.

Screenshot of adding a column based on word count using ngrams

(5) Go!

Shows OpenRefine reconciling

(6) Now you have your queries that were reconciled against LCSH, so you can limit to just those.

Screenshot of limiting to reconciled queries

Finding LC Classification

First, you’ll need to extract the cell.recon.match.id – the ID for the matched query that in the case of LCSH is the URI of the concept.

Screenshot of using cell.recon.match.id to get URI of concept

At this point you can choose whether to grab the HTML or the JSON, and create a new column based on this one by fetching URLs. I’ve never been able to get the parseJson() function to work correctly with LC’s JSON outputs, so for both HTML and JSON I’ve just regexed the raw output to isolate the classification. For more on regex see Bohyun Kim’s previous TechConnect post, “Fear No Longer Regular Expressions.”

On the raw HTML, the easiest way to do it is to transform the cells or create a new column with:

replace(partition(value,/<li property=”madsrdf:classification”>(<[^>]+>)*([A-Z]{1,2})/)[1],/<li property=”madsrdf:classification”>(<[^>]+>)*([A-Z]{1,2})/,”$2″).

Screenshot of using regex to get classification

You’ll note this will only pull out the first classification given, even if some have multiple classifications. That was a conscious choice for me, but obviously your needs may vary.

(Also, although I’m only concentrating on classification for this project, there’s a huge amount of data that you could work with – you can see an example URI for Acting to see all of the different fields).

Once you have the classifications, you can export to Excel and create a pivot table to count the instances of each, and you get a pretty table.

Table of LC Classifications

Caveats & Further Explorations

As you can guess by the y-axis in the table above, the number of matches is a very small percentage of actual searches. First I limited to keyword searches (as opposed to title/subject), then of those only ones that were 4 or fewer words long (about 65% of keyword searches). Of those, only about 1000 of the 26000 queries matched, and resulted in about 360 actual LC Classifications. Most months I average around 500, but in this example I took out duplicates even if they were far apart in time, just to experiment.

One thing I haven’t done but am considering is allowing matches that aren’t 100%. From my example above, there are another 600 or so queries that matched at 50-99%. This could significantly increase the number of matches and thus give us more classifications to work with.

Some of this is related to the types of searches that students are doing (see Michael J DeMars’ and my presentation “Making Data Less Daunting” at Electronic Resources & Libraries 2014, which this article grew out of, for some crazy examples) and some to the way that LCSH is structured. I chose LCSH because I could get linked to the LC Classification and thus get a sense of the subjects, but I’m definitely open to ideas. If you know of a better linked data source, I’m all ears.

I must also note that this is a pretty inefficient way of matching against LCSH. If you know of a way I could download the entire set, I’m interested in investigating that way as well.

Another approach that I will explore is moving away from reconciliation with LCSH (which is really more appropriate for a controlled vocabulary) to named-entity extraction, which takes natural language inputs and tries to recognize or extract common concepts (name, place, etc). Here I would use it as a first step before trying to match against LCSH. Free Your Metadata has a new named-entity extraction extension for OpenRefine, so I’ll definitely explore that option.

Planned Research

In the end, although this is interesting, does it actually mean anything? My next step with this dataset is to take a subset of the search queries and assign classification numbers. Over the course of several months, I hope to see if what I’ve pulled in automatically resembles the hand-classified data, and then draw conclusions.

So far, most of the peaks are expected – psychology and nursing are quite strong departments. There are some surprises though – education has been consistently underrepresented, based on both our enrollment numbers and when you do word counts (see our presentation for one month’s top word counts). Education students have a robust information literacy program. Does this mean that education students do complex searches that don’t match LCSH? Do they mostly use subject databases? Once again, an area for future research, should these automatic results match the classifications I do by hand.

What do you think? I’d love to hear your feedback or suggestions.

About Our Guest Author

Jaclyn Bedoya has lived and worked on three continents, although currently she’s an ER Librarian at CSU Fullerton. It turns out that growing up in Southern California spoils you, and she’s happiest being back where there are 300 days of sunshine a year. Also Disneyland. Reach her @spamgirl on Twitter or jaclynbedoya@gmail.com


Getting Started with APIs

There has been a lot of discussion in the library community regarding the use of web service APIs over the past few years.  While APIs can be very powerful and provide awesome new ways to share, promote, manipulate and mashup your library’s data, getting started using APIs can be overwhelming.  This post is intended to provide a very basic overview of the technologies and terminology involved with web service APIs, and provides a brief example to get started using the Twitter API.

What is an API?

First, some definitions.  One of the steepest learning curves with APIs involves navigating the terminology, which unfortunately can be rather dense – but understanding a few key concepts makes a huge difference:

  • API stands for Application Programming Interface, which is a specification used by software components to communicate with each other.  If (when?) computers become self-aware, they could use APIs to retrieve information, tweet, post status updates, and essentially run most day-to-do functions for the machine uprising. There is no single API “standard” though one of the most common methods of interacting with APIs involves RESTful requests.
  • REST / RESTful APIs  – Discussions regarding APIs often make references to “REST” or “RESTful” architecture.  REST stands for Representational State Transfer, and you probably utilize RESTful requests every day when browsing the web. Web browsing is enabled by HTTP (Hypertext Transfer Protocol) – as in http://example.org.  The exchange of information that occurs when you browse the web uses a set of HTTP methods to retrieve information, submit web forms, etc.  APIs that use these common HTTP methods (sometimes referred to as HTTP verbs) are considered to be RESTful.  RESTful APIs are simply APIs that leverage the existing architecture of the web to enable communication between machines via HTTP methods.

HTTP Methods used by RESTful APIs

Most web service APIs you will encounter utilize at the core the following HTTP methods for creating, retrieving, updating, and deleting information through that web service.1  Not all APIs allow each method (at least without authentication) but some common methods for interacting with APIs include:

    • GET – You can think of GET as a way to “read” or retrieve information via an API.  GET is a good starting point for interacting with an API you are unfamiliar with.  Many APIs utilize GET, and GET requests can often be used without complex authentication.  A common example of a GET request that you’ve probably used when browsing the web is the use of query strings in URLs (e.g., www.example.org/search?query=ebooks).
    • POST – POST can be used to “write” data over the web.  You have probably generated  POST requests through your browser when submitting data on a web form or making a comment on a forum.  In an API context, POST can be used to request that an API’s server accept some data contained in the POST request – Tweets, status updates, and other data that is added to a web service often utilize the POST method.
    • PUT – PUT is similar to POST, but can be used to send data to a web service that can assign that data a unique uniform resource identifier (URI) such as a URL.  Like POST, it can be used to create and update information, but PUT (in a sense) is a little more aggressive. PUT requests are designed to interact with a specific URI and can replace an existing resource at that URI or create one if there isn’t one.
    • DELETE – DELETE, well, deletes – it removes information at the URI specified by the request.  For example, consider an API web service that could interact with your catalog records by barcode.2 During a weeding project, an application could be built with DELETE that would delete the catalog records as you scanned barcodes.3

Understanding API Authentication Methods

To me, one of the trickiest parts of getting started with APIs is understanding authentication. When an API is made available, the publishers of that API are essentially creating a door to their application’s data.  This can be risky:  imagine opening that door up to bad people who might wish to maliciously manipulate or delete your data.  So often APIs will require a key (or a set of keys) to access data through an API.

One helpful way to contextualize how an API is secured is to consider access in terms of identification, authentication, and authorization.4  Some APIs only want to know where the request is coming from (identification), while others require you to have a valid account (authentication) to access data.  Beyond authentication, an API may also want to ensure your account has permission to do certain functions (authorization).  For example, you may be an authenticated user of an API that allows you to make GET requests of data, but your account may still not be authorized to make POST, PUT, or DELETE requests.

Some common methods used by APIs to store authentication and authorization include OAuth and WSKey:

  • OAuth - OAuth is a widely used open standard for authorization access to HTTP services like APIs.5  If you have ever sent a tweet from an interface that’s not Twitter (like sharing a photo directly from your mobile phone) you’ve utilized the OAuth framework.  Applications that already store authentication data in the form of user accounts (like Twitter and Google) can utilize their existing authentication structures to assign authorization for API access.  API Keys, Secrets, and Tokens can be assigned to authorized users, and those variables can be used by 3rd party applications without requiring the sharing of passwords with those 3rd parties.
  • WSKey (Web Services Key) – This is an example from OCLC, that is conceptually very similar to OAuth.  If you have an OCLC account (either through worldcat.org or oclc.org account) you can request key access.  Your authorization – in other words, what services and REST requests you are permitted to access – may be dependent upon your relationship with an OCLC member organization.

Keys, Secrets, Tokens?  HMAC?!

API authorization mechanisms often require multiple values in order to successfully interact with the API.  For example, with the Twitter API, you may be assigned an API Key and a corresponding Secret.  The topic of secret key authentication can be fairly complex,6 but fundamentally a Key and its corresponding Secret are used to authenticate requests in a secure encrypted fashion that would be difficult to guess or decrypt by malicious third-parties.  Multiple keys may be required to perform particular requests – for example, the Twitter API requires a key and secret to access the API itself, as well as a token and secret for OAuth authorization.

Probably the most important thing to remember about secrets is to keep them secret.  Do not share them or post them anywhere, and definitely do not store secret values in code uploaded to Github 7 (.gitignore – a method to exclude files from a git repository – is your friend here). 8  To that end, one strategy that is used by RESTful APIs to further secure secret key value is an HMAC header (hash-based method authentication code).  When requests are sent, HMAC uses your secret key to sign the request without actually passing the secret key value in the request itself. 9

Case Study:  The Twitter API

It’s easier to understand how APIs work when you can see them in action.  To do these steps yourself, you’ll need a Twitter account.  I strongly recommend creating a Twitter account separate from your personal or organizational accounts for initial experimentation with the API.  This code example is a very simple walkthrough, and does not cover securing your applications’ server (and thus securing the keys that may be stored on that server).  Anytime you authorize access to a Twitter account to API access you may be exposing it to some level of vulnerability.  At the end of the walkthrough, I’ll list the steps you would need to take if your account does get compromised.

1.  Activate a developer account

Visit dev.twitter.com and click the sign in area in the upper right corner.  Sign in with your Twitter account. Once signed in, click on your account icon (again in the upper right corner of the page) and then select the My Applications option from the drop-down menu.

Screenshot of the Twitter Developer Network login screen

2.  Get authorization

In the My applications area, click the Create New App button, and then fill out the required fields (Name, Description, and Website where the app code will be stored).  If you don’t have a web server, don’t worry, you can still get started testing out the API without actually writing any code.

3.  Get your keys

After you’ve created the application and are looking at its settings, click on the API Keys tab.  Here’s where you’ll get the values you need.  Your API Access Level is probably limited to read access only.  Click the “modify app permissions” link to set up read and write access, which will allow you to post through the API.  You will have to associate a mobile phone number with your Twitter account to get this level of authorization.

Screenshot of Twitter API options that allow for configuraing API read and write access.

Scroll down and note that in addition to an API Key and Secret, you also have an access token associated with OAUTH access.  This Token Key and Secret are required to authorize account activity associated with your Twitter user account.

4.  Test Oauth Access / Make a GET call

From the application API Key page, click the Test OAuth button.  This is a good way to get a sense of the API calls.  Leave the key values as they are on the page, and scroll down to the Request Settings Area.  Let’s do a call to return the most recent tweet from our account.

With the GET request checked, enter the following values:

Request URI:

Request Query (obviously replace yourtwitterhandle with… your actual Twitter handle):

  • screen_name=yourtwitterhandle&count=1

For example, my GET request looks like this:

Screenshot of the GET request setup screen for OAuth testing.

Click “See OAuth signature for this request”.  On the next page, look for the cURL request.  You can copy and paste this into a terminal or console window to execute the GET request and see the response (there will be a lot more of response text than what’s posted here):

* SSLv3, TLS alert, Client hello (1):
[{"created_at":"Sun Apr 20 19:37:53 +0000 2014","id":457966401483845632,
"id_str":"457966401483845632",
"text":"Just Added: The Fault in Our Stars by John Green; 
2nd Floor PZ7.G8233 Fau 2012","

As you can see, the above response to my cURL request includes the text of my account’s last tweet:

image00

What to do if your Twitter API Key or OAuth Security is Compromised

If your Twitter account suddenly starts tweeting out spammy “secrets to weight loss success” that you did not authorize (or other tweets that you didn’t write), your account has been compromised.  If you can still login with your username and password, it’s likely that your OAuth Keys have been compromised.  If you can’t log in, your account has probably been hacked.10  Your account can be compromised if you’ve authorized a third party app to tweet, but if your Twitter account has an active developer application on dev.twitter.com, it could be your own application’s key storage that’s been compromised.

Here are the immediate steps to take to stop the spam:

  1. Revoke access to third party apps under Settings –> Apps.  You may want to re-authorize them later – but you’ll probably want to reset the password for the third-party accounts that you had authorized.
  2. If you have generated API keys, log into dev.twitter.com and re-generate your API Keys and Secrets and your OAuth Keys and Secrets.  You’ll have to update any apps using the keys with the new key and secret information – but only if you have verified the server running the app hasn’t also been compromised.
  3. Reset your Twitter account password.11
5.  Taking it further:  Posting a new titles Twitter feed

So now you know a lot about the Twitter API – what now?  One way to take this further might involve writing an application to post new books that are added to your library’s collection.  Maybe you want to highlight a particular subject or collection – you can use some text output from your library catalog to post the title, author, and call number of new books.

The first step to such an application could involve creating an app that can post to the Twitter API.  If you have access to a  server that can run PHP, you can easily get started by downloading this incredibly helpful PHP wrapper.

Then in the same directory create two new files:

  • settings.php, which contains the following code (replace all the values in quotes with your actual Twitter API Key information):
<?php

$settings = array {
 ‘oath_access_token’ => “YOUR_ACCESS_TOKEN”,
 ‘oath_access_token_secret’ => “YOUR_ACCESS_TOKEN_SECRET”,
 ‘consumer_key’ => “YOUR_API_KEY”,
 ‘consumer_secret’ => “YOUR_API_KEY_SECRET”,
);

?>
  • and twitterpost.php, which has the following code, but swap out the values of ‘screen_name’ with your Twitter handle, and change the ‘status’ value if desired:
<?php

//call the PHP wrapper and your API values
require_once('TwitterAPIExchange.php');
include 'settings.php';

//define the request URL and REST request type
$url = "https://api.twitter.com/1.1/statuses/update.json";
$requestMethod = "POST";

//define your account and what you want to tweet
$postfields = array(
  'screen_name' => 'YOUR_TWITTER_HANDLE',
  'status' => 'This is my first API test post!'
);

//put it all together and build the request
$twitter = new TwitterAPIExchange($settings);
echo $twitter->buildOauth($url, $requestMethod)
->setPostfields($postfields)
->performRequest();

?>

Save the files and run the twitterpost.php page in your browser. Check the Twitter account referenced by the screen_name variable.  There should now be a new post with the contents of the ‘status’ value.

This is just a start – you would still need to get data out of your ILS and feed it to this application in some way – which brings me to one final point.

Is there an API for your ILS?  Should there be? (Answer:  Yes!)

Getting data out of traditional, legacy ILS systems can be a challenge.  Extending or adding on to traditional ILS software can be impossible (and in some cases may have been prohibited by license agreements).  One of the reasons for this might be that the architecture of such systems was designed for a world where the kind of data exchange facilitated by RESTful APIs didn’t yet exist.  However, there is definitely a major trend by ILS developers to move toward allowing access to library data within ILS systems via APIs.

It can be difficult to articulate exactly why this kind of access is necessary – especially when looking toward the future of rich functionality in emerging web-based library service platforms.  Why should we have to build custom applications using APIs – shouldn’t our ILS systems be built with all the functionality we need?

While libraries should certainly promote comprehensive and flexible architecture in the ILS solutions they purchase, there will almost certainly come a time when no matter how comprehensive your ILS is, you’re going to wonder, “wouldn’t it be nice if our system did X”?  Moreover, consider how your patrons might use your library’s APIs; for example, integrating your library’s web services other apps and services they already to use, or to build their own applications with your library web services. If you have web service API access to your data – bibliographic, circulation, acquisition data, etc. – you have the opportunity to meet those needs and to innovate collaboratively.  Without access to your data, you’re limited to the development cycle of your ILS vendor, and it may be years before you see the functionality you really need to do something cool with your data.  (It may still be years before you can find the time to develop your own app with an API, but that’s an entirely different problem.)

Examples of Library Applications built using APIs and ILS API Resources

Further Reading

Michel, Jason P. Web Service APIs and Libraries. Chicago, IL:  ALA Editions, 2013. Print.

Richardson, Leonard, and Michael Amundsen. RESTful Web APIs. Sebastopol, Calif.: O’Reilly, 2013.

 

About our Guest Author:

Lauren Magnuson is Systems & Emerging Technologies Librarian at California State University, Northridge and a Systems Coordinator for the Private Academic Library Network of Indiana (PALNI).  She can be reached at lauren.magnuson@csun.edu or on Twitter @lpmagnuson.

 

Notes

  1. create, retrieve, update, and delete is sometimes referred to by acronym: CRUD
  2. For example, via the OCLC Collection Management API: http://www.oclc.org/developer/develop/web-services/wms-collection-management-api.en.html
  3. For more detail on these and other HTTP verbs, http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
  4. https://blog.apigee.com/detail/do_you_need_api_keys_api_identity_vs._authorization
  5. Google, for example: https://developers.google.com/accounts/docs/OAuth2
  6. To learn a lot more about this, check out this web series: http://www.youtube.com/playlist?list=PLB4D701646DAF0817
  7. http://www.securityweek.com/github-search-makes-easy-discovery-encryption-keys-passwords-source-code
  8. Learn more about .gitignore here:  https://help.github.com/articles/ignoring-files
  9. An nice overview of HMAC is here: http://www.wolfe.id.au/2012/10/20/what-is-hmac-and-why-is-it-useful
  10. Here’s what to do if you’re account is hacked and you can’t log in:  https://support.twitter.com/articles/185703-my-account-has-been-hacked
  11. More information, and further steps you can take are here:  https://support.twitter.com/articles/31796-my-account-has-been-compromised

Lightweight Project Management Tools in the Real World

My life got extra complicated in the last few months. I gave birth to my first child in January, and in between the stress of a new baby, unexpected hospital visits, and the worst winter in 35 years, it was a trying time. While I was able to step back from many commitments during my 8 week maternity leave, I didn’t want to be completely out the loop, and since I would come back to three conferences back to back, I needed to be able to jump back in and monitor collaborative projects from wherever. All of us have times in our lives that are this hectic or even more so, but even in the regular busy thrum of our professional lives it’s too easy to let ongoing commitments like committee work completely disappear from our mental landscapes other than the nagging feeling that you are missing something.

There are various methods and tools to enhance productivity, which we’ve looked at before. Some basic collaboration tools such as Google Docs are always good to have any time you are working on a group project that builds into something like a presentation or report. But for committee work or every day work in a department, something more specialized can be even better. I want to look at some real-life examples of using lightweight project management tools to keep projects that you work on with others going strong—or not so strong, depending on how they are used. Over the past 4-5 months I’ve gotten experience using Trello for committee work and Asana for work projects. Both of them have some great features, but as always the implementation doesn’t depend entirely on the software’s functionality. Beyond my experience with these two implementations I’ll address a few other tools and my experience with effective usage of them.

Asana

I have the great fortune of having an entire wall of my office painted with white board paint, Asana Screenshotand use it to sketch out ideas and projects. For that to be useful, I need to be physically be in the office. So before I went on maternity leave, I knew I needed to get all my projects at work organized in a way that I could give tasks I would normally do to others, as well as monitor what was happening on large on-going projects. I had used Asana before in another context, so I decided to give it a try for this purpose. Asana has projects, tasks, and due dates that anyone in a workspace can follow and assign. It’s a pretty flexible system–the screenshot shows one potential way of setting it up, but we use different models for different projects, and there are many ideas out there. My favorite feature is project templates, which I use in another workspace that I share with my graduate assistant. This allows you to create a new project based on a standard series of steps, which means that she could create new projects while I was away based on the normal workflow we follow and I could work on them when I returned. All of this requires a very strict attention to keeping projects organized, however, and if you don’t have an agreed upon system for naming and organizing tasks they can get out of hand very quickly.

We also use Asana as part of our help request system. We wanted to set up a system to track requests from all the library staff not only for my maternity leave but in general. I looked at many different systems, but they were almost all too heavy-duty for what we needed. I made our own very lightweight system using the Webform module in Drupal on our intranet. Staff submits requests through that form, which sends an email using a departmental email address to our Issue Tracking queue in Asana. Once the task is completed we explain the problem in an Asana comment (or just mark completed if it’s a normal request such as new user account), and then send a reply to the requestor through the intranet. They can see all the requests they’ve made plus the replies through that system. The nice thing about doing it this way is that everything is in one place–trouble tickets become projects with tasks very easily.

Trello

Trello screenshotTrello is designed to mimic the experience of using index cards or sticky notes on a wall to track ideas and figure out what is going on at a glance. This is particularly useful for ongoing work where you have multiple projects in a set of pipelines divvied up among various people. You can easily see how many ideas you have in the inception stage and how many are closer to completion, which can be a good motivator to move items along. Another use is to store detailed project ideas and notes and then sort them into lists once you figure out a structure.

Trello starts with a virtual board, which is divided into lists of cards. Trello cards can be assigned to specific people, and anyone can follow a card to get notifications. Clicking on a card brings up a whole set of additional options, including who is working on the project, attachments, due dates, color coding, and anything else you might want. The screenshot shows how the LITA Education Committee uses Trello to plan educational offerings. The white areas with small boxes indicate cards (we use one card per program/potential idea) that are active and assigned, the gray areas indicate cards which haven’t been touched in a while and so probably need followup. Not surprisingly, there are many more cards, many of which are inactive, at the beginning of the pipeline than at the end with programs already set up. This is a good visual reminder that we need to keep things moving along.

In this case I didn’t set up Trello, and I am not always the best user of it. Using this for committee work has been useful, but there are a few items to keep in mind for it to actually work to keep projects going. First, and this goes for everything, including analog cards or sticky notes, all the people working on the project need to check into it on a regular basis and use it consistently. One thing that I found was important to do to get it into a regular workflow was turn on email notifications. While it would be nice to stay out of email more, most of us are used to finding work show up there, and if you have a sane relationship to your inbox (i.e. you don’t use it to store work in progress), it can be helpful to know to log in to work on something. I haven’t used the mobile app yet, but that is another option for notifications.

Other Tools

While I have started using Asana and Trello more heavily recently, there are a number of other tools out there that you may need to use in your job or professional life. Here are a few:

Box

Many institutions have some sort of “cloud” file system now such as Box or Google Drive. My work uses Box, and I find it very useful for parts of projects where I need many people (but a slightly different set each time) to collaborate on completing a single task. I upload a spreadsheet that I need everyone to look at, use the information to do something, and then add additional information to the spreadsheet. This is a very common scenario that organizations often use a shared drive to accomplish, but there are a number of problems with that approach. If you’ve ever been confronted with the filename “Spring2014_report-Copy-Copy-DRAFT.xlsx” or not been able to open a file because someone else left it open on her desktop and went to lunch, you know what I mean. Instead of that, I upload the file to Box, and assign a task to the usernames of all the people I need to look at the document. They can use a tool called Box Edit to open the file in Excel and any changes they make are immediately saved back to the shared document, just as a Google Doc would do. They can then mark the task complete, and the system only sends email reminders to people who haven’t yet finished the task.

ALA Connect

This section is only relevant to people working on projects with an American Library Association group, whether a committee or interest group. Since this happens to most people working in academic libraries at some point, I think it’s worth considering. But if not,  skip to the conclusion. ALA Connect is the central repository for institutional memory and documents for work around ALA, including committees and interest groups. It can also be a good place to work on project collaboratively, but it takes some setup. As a committee chair, I freely admit that I need to organize my own ALA Connect page much better. My normal approach was to use an online document (so something editable by everyone) for each project and file each document under a subcommittee heading, but in practice I find it way too hard to find the right document to see what each subcommittee is working on. I am going to experiment with a new approach. I will create “groups” for each project, and use the Group Headings sidebar to organize these. If you’re on a committee and not the chair, you don’t have access to reorganize the sidebar or posts, but suggest this approach to your chair if you can’t find anything in “General News & Discussions”. Also, try to document the approach you’ve taken so future chairs will know what you did, and let other chairs know what works for your committee.

You also need to make a firm commitment as a chair to hold certain types of discussions on your committee mailing list, and certain discussions on ALA Connect, and then to document any pertinent mailing list discussions on ALA Connect. That way you won’t be unable to figure out where you are on the project because half your work is in email and half on ALA Connect. (This obviously goes for any other tool other than email as well).

 Conclusion

With all the tools above, you really have no excuse to be running projects through email, which is not very effective unless everyone you are working with is very strict with their email filing and reply times. (Hint: they aren’t—see above about a sane relationship with your inbox.) But any tool requires a good plan to understand how its strengths mesh with work you have to accomplish. If your project is to complete a document by a certain date, a combination of Google Docs or Box (or ALA Connect for ALA work) and automated reminders might be best. If you want to throw a lot of ideas around and then organize them, Trello or Asana might work. Since these are all free to try, explore a few tools before starting a big project to see what works for you and your collaborators. Once you pick one, dedicate a bit of time on a weekly or monthly basis to keeping your virtual workspace organized. If you find it’s no longer working, figure out why. Did the scope of your project change over time, and a different tool is now more effective? This can happen when you are planning to implement something and switch over from the implementation to ongoing work using the new system. Or maybe people have gotten complacent about checking in on work to do. Explore different types of notifications or mobile apps to reinvigorate your team.

I would love to hear about your own approach to lightweight project management with these tools or others in the comments.

 


One Shocking Tool Plus Two Simple Ideas That Will Forever Chage How You Share Links

The Click Economy

The economy of the web runs on clicks and page views. The way web sites turn traffic into profit is complex, but I think we can get away with a broad gloss of the link economy as long as we acknowledge that greater underlying complexity exists. Basically speaking, traffic (measured in clicks, views, unique visitors, length of visit, etc.) leads to ad revenue. Web sites benefit when viewers click on links to their pages and when these viewers see and click on ads. The scale of the click economy is difficult to visualize. Direct benefits from a single click or page view are minuscule. Profits tend to be nonexistent or trivial on any scale smaller than unbelievably massive. This has the effect of making individual clicks relatively meaningless, but systems that can magnify clicks and aggregate them are extremely valuable. What this means for the individual person on the web is that unless we are Ariana Huffington, Sheryl Sandberg, Larry Page, or Mark Zuckerberg we probably aren’t going to get rich off of clicks. However, we do have impact and our online reputations can significantly influence which articles and posts go viral. If we understand how the click economy works, we can use our reputation and influence responsibly. If we are linking to content we think is good and virtuous, then there is no problem with spreading “link juice” indiscriminately. However, if we want to draw someone’s attention to content we object to, we can take steps to link responsibly and not have our outrage fuel profits for the content’s author. 1 We’ve seen that links benefit the site’s owners in two way: directly through ad revenues and indirectly through “link juice” or the positive effect that inbound links have on search engine ranking and social network trend lists. If our goal is to link without benefiting the owner of the page we are linking to, we will need a separate technique for each the two ways a web site benefits from links.

For two excellent pieces on the click economy, check out see Robinson Meyer’s Why are Upworthy Headlines Suddenly Everywhere?2 in the Atlantic Monthly and Clay Johnson’s book The Information Diet especially The New Journalists section of chapter three3

Page Rank

Page Rank is the name of a key algorithm Google uses to rank web pages it returns. 4 It counts inbound links to a page and keeps track of the relative importance of the sites the links come from. A site’s Page Rank score is a significant part of how Google decides to rank search results. 5 Search engines like Google recognize that there would be a massive problem if all inbound links were counted as votes for a site’s quality. 6 Without some mechanism to communicate “I’m linking to this site as an example of awful thinking” there really would be no such thing as bad publicity and a website with thousands of complaints and zero positive reviews would shoot to the top of search engine rankings. For example, every time a librarian used martinlutherking.org (A malicious propaganda site run by the white supremacist group Stormfront) as an example in a lesson about web site evaluation, the page would rise in Google’s ranking and more people would find it in the course of natural searches for information on Dr. King. When linking to malicious content, we can avoid increasing its Page Rank score, by adding the rel=“nofollow” attribute to the anchor link tag. A normal link is written like this:

<a href=“http://www.horriblesite.com/horriblecontent/“ target=”_blank”>This is a horrible page.</a>

This link would add the referring page’s reputation or “link juice” to the horrible site’s Page Rank. To fix that, we need to add the rel=“nofollow” attribute.

<a href=“http://www.horriblesite.com/horriblecontent/“ target=”_blank” rel=“nofollow”>This is a horrible page.</a>

This addition communicates to the search engine that the link should not count as a vote for the site’s value or reputation. Of course, not all linking takes place on web pages anymore. What happens if we want to share this link on Facebook or Twitter? Both Facebook and Twitter automatically add rel=“nofollow” to their links (you can see this if you view page source), but we should not rely on that alone. Social networks aggregate links and provide their own link juice similarly to search engines. When sharing links on social networks, we’ll want to employ a tool that keeps control of the link’s power in our own hands. donotlink.com is an very interesting tool for this purpose.

donotlink.com

donotlink.com is a service that creates safe links that don’t pass on any reputation or link juice. It is ideal for sharing links to sites we object to. On one level, it works similarly to a URL shortener like bit.ly or tinyurl.com. It creates a new URL customized for sharing on social networks. On deeper levels, it does some very clever stuff to make sure no link juice dribbles to the site being linked. They explain what, why, and how very well on their site. Basically speaking donotlink.com passes the link through a new URL that uses javascript, a robots.txt file, and the nofollow and noindex link attributes to both ask search engines and social networks to not apply link juice and to make it structurally difficult to do ignore these requests. 7 This makes donotlink.com’s link masking service an excellent solution to the problem of web sites indirectly profiting from negative attention.

Page Views & Traffic

All of the techniques listed above will deny a linked site the indirect benefits of link juice. They will not, however, deny the site the direct benefits from increased traffic or views and clicks on the pages advertisements. There are ways to share content without generating any traffic or advertising revenues, but these involve capturing the content and posting it somewhere else so they raise ethical questions about respect for intellectual property. So I suggest using only with both caution and intentionality. A quick and easy way to direct traffic to content without benefiting the hosting site is to use a link to Google’s cache of the page. If you can find a page in a Google search, clicking the green arrow next to the URL (see image) will give the option of viewing the cached page. Then just copy the full URL and share that link instead of the original. Viewers can read the text without giving the content page views. Not all pages are visible on Google, so the Wayback Machine from the Internet Archive is a great alternative. The Wayback Machine provides access to archived version of web pages and also has a mechanism (see the image on the right) for adding new pages to the archive.

screengrab of google cache

Screengrab of Google Cache

screengrab of wayback machine

Caching a site at the wayback machine

Both of these solutions rely on external hosts and if the owner of the content is serious about erasing a page, there are processes for removing content from both Google’s cache and the Wayback Machine archives. To be certain of archiving content, the simplest solution is to capture a screenshot and share the image file. This gives you control over the image, but may be unwieldy for larger documents. In these cases saving as a PDF may be a useful workaround. (Personally, I prefer to use the Clearly browser plugin with Evernote, but I have a paid Evernote account and am already invested in the Evernote infrastructure.)

Summing up

In conclusion, there are a number of steps we can take when we want to be responsible with how we distribute link juice. If we want to share information without donating our online reputation to the information’s owner, we can use donotlink.com to generate a link that does not improve their search engine ranking. If we want to go a step further, we can link to a cached version of the page or share a screenshot.

Notes

  1. Using outrageous or objectionable content to generate web traffic is a black-hat SEO technique known as “evil hooks.” There is a lot of profit in “You won’t believe what this person said!” links.
  2. http://www.theatlantic.com/technology/archive/2013/12/why-are-upworthy-headlines-suddenly-everywhere/282048/
  3. The Information Diet, page 35-41
  4. https://en.wikipedia.org/wiki/PageRank
  5. Matt Cuts How Search Works Video.
  6. I’ve used this article http://www.nytimes.com/2010/11/28/business/28borker.html to explain this concept to my students. It is also referenced by donotlink.com in their documentation.
  7. javascript is slightly less transparent to search engines and social networks than is HTML, robots.txt is a file on a web server that tells search engine bots which pages to crawl (it works more like a no trespassing sign than a locked gate), noindex tells bots not to add the link to its index.