Bootstrap Responsibly

Bootstrap is the most popular front-end framework used for websites. An estimate by meanpath several months ago sat it firmly behind 1% of the web - for good reason: Bootstrap makes it relatively painless to puzzle together a pretty awesome plug-and-play, component-rich site. Its modularity is its key feature, developed so Twitter could rapidly spin-up internal microsites and dashboards.

Oh, and it’s responsive. This is kind of a thing. There’s not a library conference today that doesn’t showcase at least one talk about responsive web design. There’s a book, countless webinars, courses, whole blogs dedicated to it (ahem), and more. The pressure for libraries to have responsive, usable websites can seem to come more from the likes of us than from the patronbase itself, but don’t let that discredit it. The trend is clear and it is only a matter of time before our libraries have their mobile moment.

Library websites that aren’t responsive feel dated, and more importantly they are missing an opportunity to reach a bevy of mobile-only users that in 2012 already made up more than a quarter of all web traffic. Library redesigns are often quickly pulled together in a rush to meet the growing demand from stakeholders, pressure from the library community, and users. The sprint makes the allure of frameworks like Bootstrap that much more appealing, but Bootstrapped library websites often suffer the cruelest of responsive ironies:

They’re not mobile-friendly at all.

Assumptions that Frameworks Make

Let’s take a step back and consider whether using a framework is the right choice at all. A front-end framework like Bootstrap is a Lego set with all the pieces conveniently packed. It comes with a series of templates, a blown-out stylesheet, scripts tuned to the environment that let users essentially copy-and-paste fairly complex web-machinery into being. Carousels, tabs, responsive dropdown menus, all sorts of buttons, alerts for every occasion, gorgeous galleries, and very smart decisions made by a robust team of far-more capable developers than we.

Except for the specific layout and the content, every Bootstrapped site is essentially a complete organism years in the making. This is also the reason that designers sometimes scoff, joking that these sites look the same. Decked-out frameworks are ideal for rapid prototyping with a limited timescale and budget because the design decisions have by and large already been made. They assume you plan to use the framework as-is, and they don’t make customization easy.

In fact, Bootstrap’s guide points out that any customization is better suited to be cosmetic than a complete overhaul. The trade-off is that Bootstrap is otherwise complete. It is tried, true, usable, accessible out of the box, and only waiting for your content.

Not all Responsive Design is Created Equal

It is still common to hear the selling point for a swanky new site is that it is “responsive down to mobile.” The phrase probably rings a bell. It describes a website that collapses its grid as the width of the browser shrinks until its layout is appropriate for whatever screen users are carrying around. This is kind of the point – and cool, as any of us with a browser-resizing obsession could tell you.

Today, “responsive down to mobile” has a lot of baggage. Let me explain: it represents a telling and harrowing ideology that for these projects mobile is the afterthought when mobile optimization should be the most important part. Library design committees don’t actually say aloud or conceive of this stuff when researching options, but it should be implicit. When mobile is an afterthought, the committee presumes users are more likely to visit from a laptop or desktop than a phone (or refrigerator). This is not true.

See, a website, responsive or not, originally laid out for a 1366×768 desktop monitor in the designer’s office, wistfully depends on visitors with that same browsing context. If it looks good in-office and loads fast, then looking good and loading fast must be the default. “Responsive down to mobile” is divorced from the reality that a similarly wide screen is not the common denominator. As such, responsive down to mobile sites have a superficial layout optimized for the developers, not the user.

In a recent talk at An Event Apart–a conference–in Atlanta, Georgia, Mat Marquis stated that 72% of responsive websites send the same assets to mobile sites as they do desktop sites, and this is largely contributing to the web feeling slower. While setting img { width: 100%; } will scale media to fit snugly to the container, it is still sending the same high-resolution image to a 320px-wide phone as a 720px-wide tablet. A 1.6mb page loads differently on a phone than the machine it was designed on. The digital divide with which librarians are so familiar is certainly nowhere near closed, but while internet access is increasingly available its ubiquity doesn’t translate to speed:

  1. 50% of users ages 12-29 are “mostly mobile” users, and you know what wireless connections are like,
  2. even so, the weight of the average website ( currently 1.6mb) is increasing.

Last December, analysis of data from pagespeed quantiles during an HTTP Archive crawl tried to determine how fast the web was getting slower. The fastest sites are slowing at a greater rate than the big bloated sites, likely because the assets we send–like increasingly high resolution images to compensate for increasing pixel density in our devices–are getting bigger.

The havoc this wreaks on the load times of “mobile friendly” responsive websites is detrimental. Why? Well, we know that

  • users expect a mobile website to load as fast on their phone as it does on a desktop,
  • three-quarters of users will give up on a website if it takes longer than 4 seconds to load,
  • the optimistic average load time for just a 700kb website on 3G is more like 10-12 seconds

eep O_o.

A Better Responsive Design

So there was a big change to Bootstrap in August 2013 when it was restructured from a “responsive down to mobile” framework to “mobile-first.” It has also been given a simpler, flat design, which has 100% faster paint time – but I digress. “Mobile-first” is key. Emblazon this over the door of the library web committee. Strike “responsive down to mobile.” Suppress the record.

Technically, “mobile-first” describes the structure of the stylesheet using CSS3 Media Queries, which determine when certain styles are rendered by the browser.

.example {
  styles: these load first;
}

@media screen and (min-width: 48em) {

  .example {

    styles: these load once the screen is 48 ems wide;

  }

}

The most basic styles are loaded first. As more space becomes available, designers can assume (sort of) that the user’s device has a little extra juice, that their connection may be better, so they start adding pizzazz. One might make the decision that, hey, most of the devices less than 48em (720px approximately with a base font size of 16px) are probably touch only, so let’s not load any hover effects until the screen is wider.

Nirvana

In a literal sense, mobile-first is asset management. More than that, mobile-first is this philosophical undercurrent, an implicit zen of user-centric thinking that aligns with libraries’ missions to be accessible to all patrons. Designing mobile-first means designing to the lowest common denominator: functional and fast on a cracked Blackberry at peak time; functional and fast on a ten year old machine in the bayou, a browser with fourteen malware toolbars trudging through the mire of a dial-up connection; functional and fast [and beautiful?] on a 23″ iMac. Thinking about the mobile layout first makes design committees more selective of the content squeezed on to the front page, which makes committees more concerned with the quality of that content.

The Point

This is the important statement that Bootstrap now makes. It expects the design committee to think mobile-first. It comes with all the components you could want, but they want you to trim the fat.

Future Friendly Bootstrapping

This is what you get in the stock Bootstrap:

  • buttons, tables, forms, icons, etc. (97kb)
  • a theme (20kb)
  • javascripts (30kb)
  • oh, and jQuery (94kb)

That’s almost 250kb of website. This is like a browser eating a brick of Mackinac Island Fudge – and this high calorie bloat doesn’t include images. Consider that if the median load time for a 700kb page is 10-12 seconds on a phone, half that time with out-of-the-box Bootstrap is spent loading just the assets.

While it’s not totally deal-breaking, 100kb is 5x as much CSS as an average site should have, as well as 15%-20% of what all the assets on an average page should weigh. Josh Broton

To put this in context, I like to fall back on Ilya Girgorik’s example comparing load time to user reaction in his talk “Breaking the 1000ms Time to Glass Mobile Barrier.” If the site loads in just 0-100 milliseconds, this feels instant to the user. By 100-300ms, the site already begins to feel sluggish. At 300-1000ms, uh – is the machine working? After 1 second there is a mental context switch, which means that the user is impatient, distracted, or consciously aware of the load-time. After 10 seconds, the user gives up.

By choosing not to pair down, your Bootstrapped Library starts off on the wrong foot.

The Temptation to Widgetize

Even though Bootstrap provides modals, tabs, carousels, autocomplete, and other modules, this doesn’t mean a website needs to use them. Bootstrap lets you tailor which jQuery plugins are included in the final script. The hardest part of any redesign is to let quality content determine the tools, not the ability to tabularize or scrollspy be an excuse to implement them. Oh, don’t Google those. I’ll touch on tabs and scrollspy in a few minutes.

I am going to be super presumptuous now and walk through the total Bootstrap package, then make recommendations for lightening the load.

Transitions

Transitions.js is a fairly lightweight CSS transition polyfill. What this means is that the script checks to see if your user’s browser supports CSS Transitions, and if it doesn’t then it simulates those transitions with javascript. For instance, CSS transitions often handle the smooth, uh, transition between colors when you hover over a button. They are also a little more than just pizzazz. In a recent article, Rachel Nabors shows how transition and animation increase the usability of the site by guiding the eye.

With that said, CSS Transitions have pretty good browser support and they probably aren’t crucial to the functionality of the library website on IE9.

Recommendation: Don’t Include.

 Modals

“Modals” are popup windows. There are plenty of neat things you can do with them. Additionally, modals are a pain to design consistently for every browser. Let Bootstrap do that heavy lifting for you.

Recommendation: Include

Dropdown

It’s hard to conclude a library website design committee without a lot of links in your menu bar. Dropdown menus are kind of tricky to code, and Bootstrap does a really nice job keeping it a consistent and responsive experience.

Recommendation: Include

Scrollspy

If you have a fixed sidebar or menu that follows the user as they read, scrollspy.js can highlight the section of that menu you are currently viewing. This is useful if your site has a lot of long-form articles, or if it is a one-page app that scrolls forever. I’m not sure this describes many library websites, but even if it does, you probably want more functionality than Scrollspy offers. I recommend jQuery-Waypoints - but only if you are going to do something really cool with it.

Recommendation: Don’t Include

Tabs

Tabs are a good way to break-up a lot of content without actually putting it on another page. A lot of libraries use some kind of tab widget to handle the different search options. If you are writing guides or tutorials, tabs could be a nice way to display the text.

Recommendation: Include

Tooltips

Tooltips are often descriptive popup bubbles of a section, option, or icon requiring more explanation. Tooltips.js helps handle the predictable positioning of the tooltip across browsers. With that said, I don’t think tooltips are that engaging; they’re sometimes appropriate, but you definitely use to see more of them in the past. Your library’s time is better spent de-jargoning any content that would warrant a tooltip. Need a tooltip? Why not just make whatever needs the tooltip more obvious O_o?

Recommendation: Don’t Include

Popover

Even fancier tooltips.

Recommendation: Don’t Include

Alerts

Alerts.js lets your users dismiss alerts that you might put in the header of your website. It’s always a good idea to give users some kind of control over these things. Better they read and dismiss than get frustrated from the clutter.

Recommendation: Include

Collapse

The collapse plugin allows for accordion-style sections for content similarly distributed as you might use with tabs. The ease-in-ease-out animation triggers motion-sickness and other aaarrghs among users with vestibular disorders. You could just use tabs.

Recommendation: Don’t Include

Button

Button.js gives a little extra jolt to Bootstrap’s buttons, allowing them to communicate an action or state. By that, imagine you fill out a reference form and you click “submit.” Button.js will put a little loader icon in the button itself and change the text to “sending ….” This way, users are told that the process is running, and maybe they won’t feel compelled to click and click and click until the page refreshes. This is a good thing.

Recommendation: Include

Carousel

Carousels are the most popular design element on the web. It lets a website slideshow content like upcoming events or new material. Carousels exist because design committees must be appeased. There are all sorts of reasons why you probably shouldn’t put a carousel on your website: they are largely inaccessible, have low engagement, are slooooow, and kind of imply that libraries hate their patrons.

Recommendation: Don’t Include.

Affix

I’m not exactly sure what this does. I think it’s a fixed-menu thing. You probably don’t need this. You can use CSS.

Recommendation: Don’t Include

Now, Don’t You Feel Better?

Just comparing the bootstrap.js and bootstrap.min.js files between out-of-the-box Bootstrap and one tailored to the specs above, which of course doesn’t consider the differences in the CSS, the weight of the images not included in a carousel (not to mention the unquantifiable amount of pain you would have inflicted), the numbers are telling:

File Before After
bootstrap.js 54kb 19kb
bootstrap.min.js 29kb 10kb

So, Bootstrap Responsibly

There is more to say. When bouncing this topic around twitter awhile ago, Jeremy Prevost pointed out that Bootstrap’s minified assets can be GZipped down to about 20kb total. This is the right way to serve assets from any framework. It requires an Apache config or .htaccess rule. Here is the .htaccess file used in HTML5 Boilerplate. You’ll find it well commented and modular: go ahead and just copy and paste the parts you need. You can eke out even more performance by “lazy loading” scripts at a given time, but these are a little out of the scope of this post.

Here’s the thing: when we talk about having good library websites we’re mostly talking about the look. This is the wrong discussion. Web designs driven by anything but the content they already have make grasping assumptions about how slick it would look to have this killer carousel, these accordions, nifty tooltips, and of course a squishy responsive design. Subsequently, these responsive sites miss the point: if anything, they’re mobile unfriendly.

Much of the time, a responsive library website is used as a marker that such-and-such site is credible and not irrelevant, but as such the website reflects a lack of purpose (e.g., “this website needs to increase library-card registration). A superficial understanding of responsive webdesign and easy-to-grab frameworks entail that the patron is the least priority.

 

About Our Guest Author :

Michael Schofield is a front-end librarian in south Florida, where it is hot and rainy – always. He tries to do neat things there. You can hear him talk design and user experience for libraries on LibUX.


LibraryQuest Levels Up

Almost a year ago, GVSU Libraries launched LibraryQuest, our mobile quest-based game. It was designed to teach users about library spaces and services in a way that (we hoped) would be fun and engaging.  The game was released “into the wild” in the last week of August, 2013, which is the beginning of our fall semester.  It ran continuously until late November, shortly after midterms (we wanted to end early enough in the semester that we still had students on campus for post-game assessment efforts).  For details on the early development of the game, take a look at my earlier ACRL TechConnect post.  This article will focus on what happened after launch.

Screenshot of one of the quests.

A screenshot from one of our simpler quests as it was being played.

Running the Game

Once the app was released, we settled on a schedule that would put out between three to five new quests each month the game ran.  Designing quests is very time intensive, and 3-5 a month was all we could manage with the man and woman power we had available. We also had short duration quests run at random intervals to encourage students to keep checking the app.  Over the course of the game, we created about 30 quests total.  Almost all quests were designed with a specific educational objective in mind, such as showing students how a specific library system worked or where something or someone was in the physical building.  Quests were chiefly designed by our Digital Initiatives Librarian (me) with help and support from our implementation team and other staff in the library as needed.

For most of the quests, we developed quest write-up sheets like this one:  Raiders of the Lost…Bin.  The sheets detailed the name of the quest, points, educational objective, steps, completion codes, and any other information that defined the quest.  These sheets proved invaluable whenever a staff member needed to know something about a quest, which was often.  Even simple quests like the one above required a fair amount of cooperation and coordination.  For the raiders quest, we needed a special cataloging record created, we had to tag several plastic crowns and get them into our automatic storage and retrieval system.

For every quest players completed, they earned points.  For every thirty points they earned, they were entered once in a drawing to win an iPad.  This was a major component of the game’s advertising, since we imagined it would be the biggest draw to play (and we may not have been right, as you’ll see).  Once the game closed in November, we held the drawing, publicized the winner, and then commenced a round of post-game assessment.

Thank You for Playing: Post-Game Assessment

When the game wrapped in mid-November, we took some time to examine the statistics the game had collected.  One of our very talented design students created a game dashboard that showed all the metrics collected by the game database in graphic form. The final tally of registered players came in at 397. That means 397 people downloaded the app and logged in at least once (in case you’re curious, the total enrollment of GVSU is 25,000 students). This number probably includes a few non-students (since anyone could download the app), but we did some passes throughout the life of the game to remove non-student players from the tally and so feel confident that the vast majority of registered players are students. During development, we set a goal of having at least 300 registered players, based mostly on the cost of the game and how much money we had spent on other outreach efforts.  So we did, technically, meet that goal, but a closer examination of the numbers paints a more nuanced picture of student participation.

A screenshot of the LibraryQuest Dashboard

A detail from our game dashboard. This part shows overall numbers and quest completion dates.

Of the registered players, 173 earned points, meaning they completed at least one quest.  That means that 224 players downloaded the app and logged in at least once, but then failed to complete any quest content.  Clearly, getting players to take the first step and get involved in the game was somehow problematic.  There are any number of explanations for this, including encounters with technical problems that may have turned players off (the embedded QR code scanner was a problem throughout the life of the game), an unwillingness to travel to locations to do physical quests, or something else entirely.  The maximum number of points you could earn was 625, which was attained by one person, although a few others came close. Players tended to cluster at the lower and middle of the point spectrum, which was entirely expected.  Getting the maximum number of points required a high degree of dedication, since it meant paying very close attention to the app for all the temporary, randomly appearing quests.

A shot of a different aspect of the LibraryQuest Game dashboard.

Another detail from the game dashboard, showing acceptance and completion metrics for some of the quests. We used this extensively to determine which quests to retire at the end of the month.

In general, online-only quests were more popular than quests involving physical space, and were taken and completed more often.  Of the top five most-completed quests, four are online-only.  There are a number of possible explanations for this, including the observation offered by one of our survey recipients that possibly a lot of players were stationed at our downtown campus and didn’t want to travel to our Allendale campus, which is where most of the physical quests were located.

Finally, of our 397 registered users, only 60 registered in the second semester the game ran.  The vast majority signed up soon after game launch, and registrations tapered off over time.  This reinforced data we collected from other sources that suggested the game ran for too long and the pacing needed to be sped up.

In addition to data collected from the game itself, we also put out two surveys over the course of the game. The first was a mid-game survey that asked questions about quest design (what quests students liked or didn’t like, and why).  Responses to this survey were bewilderingly contradictory. Students would cite a quest as their favorite, while others would cite the exact same quest as their least favorite (and often for the same reasons).  The qualitative post-game evaluation we did provides some possible explanation for this (see below).  The second survey was a simple post-game questionnaire that asked whether students had enjoyed the game, whether they’d learned something, and if this was something we should continue doing.  We also asked if they had learned anything, and if so, what they had learned.  90% of the respondents to this survey indicated that they had learned something about the library, that they thought this was a good idea, and that it was something we should do again.

Finally, we offered players points and free coffee to come in to the library and spend 15-20 minutes talking to us about their experience playing the game.  We kept questions short and simple to keep within the time window.  We asked about overall impressions of the game, if the students would change anything, if they learned anything (and if so, what) and about what quests they liked or didn’t like and why.  The general tone of the feedback was very positive. Students seemed intrigued by the idea and appreciated that the library was trying to teach in nontraditional, self-directed ways.  When asked to sum up their overall impressions of the game, students said things like “Very well done, but could be improved upon”or “good but needs polish,” or my personal favorite: “an effective use of bribery to learn about the library.”

One of the things we asked people about was whether the game had changed how they thought about the library. They typically answered that it wasn’t so much that the game had changed how they thought about the library so much as it changed the way they thought about themselves in relation to it. They used words like “”aware,” “confident,” and “knowledgeable.”  They felt like they knew more about what they could do here and what we could do for them. Their retention of some of the quest content was remarkable, including library-specific lingo and knowledge of specific procedures (like how to use the retrieval system and how document delivery worked).

Players noted a variety of problems with the game. Some were technical in nature. The game app takes a long time to load, likely because of the way the back-end is designed. Some of them didn’t like the facebook login.  Stability on android devices was problematic (this is no surprise, as developing the android version was by far the more problematic part of developing the app).  Other problems were nontechnical, including quest content that didn’t work or took too long (my own lack of experience designing quests is to blame), communication issues (there’s no way to let us know when quest content doesn’t work), the flow and pacing of new quests (more content faster), and marketing issues.  These problems may in part account for the low on boarding numbers in terms of players that actually completed content.

They also had a variety of reasons for playing. While most cited the iPad grand prize as the major motivator, several of them said they wanted to learn about the library or were curious about the game, and that they thought it might be fun. This may explain differing reactions to the quest content survey that so confused me.  People who just wanted to have fun were irked by quests that had an overt educational goal.  Students who just wanted the iPad didn’t want to do lengthy or complex quests. Students who loved games for the fun wanted very hard quests that challenged them.  This diversity of desire is something all game developers struggle to cope with, and it’s a challenge for designing popular games that appeal to a wide variety of people.

Where to Go from Here

Deciding whether or not Library Quest has been successful depends greatly on what angle you look at the results from.  On one hand, the game absolutely taught people things.  Students in the survey and interviews were able to list concrete things they knew how to do, often in detail and using terminology directly from the game.  One student proudly showed us a book she had gotten from ILL, which she hadn’t known how to use before she played.  On the other hand, the overall participation was low, especially when contrasted against the expense and staff time of creating and running the game.  Looking only at the money spent (approximately $14,700), it’s easy to calculate an output of about $85 per student reached (173 with points) in development, prizes, and advertising.  The challenge is creating engaging games that are appealing to a large number of students in a way that’s economical in terms of staff time and resources.

After looking at all of this data and talking to Yeti CGI, our development partners, we feel there is still a great deal for us to learn here, and the results are promising enough that we should continue to experiment.  Both organizations feel there is still a great deal to learn about making games in physical space and that we’ve just scratched the surface of what we might be able to do.  With the lessons we have learned from this round of the game, we are looking to completely redesign the way the game app works, as well as revise the game into a shorter, leaner experience that does not require as much content or run so long.   In addition, we are seeking campus partners who would be interested in using the app in classes, as part of student life events, or in campus orientation.  Even if these events don’t directly involve the library, we can learn from the experience how to design better quest content that the library can use.  Embedding the app in smaller and more fixed events helps with marketing and cost issues.

Because the app design is so expensive, we are looking into the possibility of a research partnership with Yeti CGI.  We could both benefit from learning more about how mobile gaming works in a physical space, and sharing those lessons would get us Yeti’s help rebuilding the app as well as working with us to figure out content creation and pacing, without another huge outlay of development capital. We are also looking at ways to turn the game development itself into an educational opportunity. By working with our campus mobile app development lab, we can provide opportunities for GVSU students to learn app design.  Yeti is looking at making more of the game’s technical architecture open (for example, we are thinking about having all quest content marked up in XML) so that students can build custom interfaces and tools for the game.

Finally, we are looking at grants to support running and revising the game.  Our initial advertising and incentives budgets were very low, and we are curious to see what would happen if we put significant resources into those areas.  Would we see bigger numbers?  Would other kinds of rewards in addition to the iPad (something asked for by students) entice players into completing more quest content?  Understanding exactly how much money needs to be put into incentives and advertising can help quantify the total cost of running a large, open game for libraries, which would be valuable information for other libraries contemplating running large-scale games.

Get the LibraryQuest App: (iPhoneAndroid)

 

About our Guest Author:
Kyle Felker is the Digital Initiatives Librarian at Grand Valley State University Libraries, where he has worked since February of 2012.  He is also a longtime gamer.  He can be reached at felkerk@gvsu.edu, or on twitter @gwydion9.


What Should Academic Librarians Know about Net Neutrality?

John Oliver describes net neutrality as the most boring important issue. More than that, it’s a complex idea that can be difficult to understand without a strong grasp of the architecture of the internet, which is not at all intuitive. An additional barrier to having a measured response is that most of the public discussions about net neutrality conflate it with negotiations over peering agreements (more on that later) and ultimately rest in contracts with unknown terms. The hyperbole surrounding net neutrality may be useful in riling up public sentiment, but the truth seems far more subtle. I want to approach a definition and an understanding of the issues surrounding net neutrality, but this post will only scratch the surface. Despite the technical and legal complexities, this is something worth understanding, since as academic librarians our daily lives and work revolve around internet access for us and for our students.

The most current public debate about net neutrality surrounds the Federal Communications Commission’s (FCC) ability to regulate internet service providers after a January 2014 court decision struck down the FCC’s 2010 Open Internet Order (PDF). The FCC is currently in an open comment period on a new plan to promote and protect the open internet.

The Communications Act of 1934 (PDF) created the FCC to regulate wire and radio communication. This classified phone companies and similar services as “common carriers”, which means that they are open to all equally. If internet service providers are classified in the same way, this ensures equal access, but for various reasons they are not considered common carriers, which was affirmed by the Supreme Court in 2005. The FCC is now seeking to use section 706 of the 1996 Telecommunications Act (PDF) to regulate internet service providers. Section 706 gave the FCC regulatory authority to expand broadband access, particularly to elementary and high schools, and this piece of it is included in the current rulemaking process.

The legal part of this is confusing to everyone, not least the FCC. We’ll return to that later. But for now, let’s turn our attention to the technical part of net neutrality, starting with one of the most visible spats.

A Tour Through the Internet

I am a Comcast customer for my home internet. Let’s say I want to watch Netflix. How do I get there from my home computer? First comes the traceroute that shows how the request from my computer travels over the physical lines that make up the internet.


 

C:\Users\MargaretEveryday>tracert netflix.com

Tracing route to netflix.com [69.53.236.17]
over a maximum of 30 hops:

  1     1 ms    <1 ms    <1 ms  10.0.1.1
  2    24 ms    30 ms    37 ms  98.213.176.1
  3    43 ms    40 ms    29 ms  te-0-4-0-17-sur04.chicago302.il.chicago.comcast.
net [68.86.115.41]
  4    20 ms    32 ms    36 ms  te-2-6-0-11-ar01.area4.il.chicago.comcast.net [6
8.86.197.133]
  5    33 ms    30 ms    37 ms  he-3-14-0-0-cr01.350ecermak.il.ibone.comcast.net
 [68.86.94.125]
  6    27 ms    34 ms    30 ms  pos-1-4-0-0-pe01.350ecermak.il.ibone.comcast.net
 [68.86.86.162]
  7    30 ms    41 ms    54 ms  chp-edge-01.inet.qwest.net [216.207.8.189]
  8     *        *        *     Request timed out.
  9    73 ms    69 ms    69 ms  63.145.225.58
 10    65 ms    77 ms    96 ms  te1-8.csrt-agg01.prod1.netflix.com [69.53.225.6]

 11    80 ms    81 ms    74 ms  www.netflix.com [69.53.236.17]

Trace complete.
Airport

Step 1. My computer sends data to this wireless router, which is hooked to my cable modem, which is wired out to the telephone pole in front of my apartment.

 

 

 

 

 

 

 

 

 

 

2. The cables travel through the city underground, accessed through manholes like this one.

2-4. The cables travel through the city underground, accessed through manholes like this one.

 

 

 

 

 

 

 

 

 

 

 

 

 

5- . Eventually my request to go to Netflix makes it to 350 E. Cermak, which is a major collocation and internet exchange site. If you've ever taken the shuttle bus at ALA in Chicago, you've gone right past this building.

5- 6. Eventually my request to go to Netflix makes it to 350 E. Cermak, which is a major collocation and internet exchange site. If you’ve ever taken the shuttle bus at ALA in Chicago, you’ve gone right past this building. Image © 2014 Google.

 

 

 

 

 

 

 

 

 

 

 

7-9. Now the request leaves Comcast, and goes out to a Tier 1 internet provider, which owns cables that cross the country. In this case, the cables belong to CenturyLink (which recently purchased Qwest).

10. My request has now made it to Grand Forks, ND, where Netflix buys space from Amazon Web Services.

10. My request has now made it to Grand Forks, ND, where Netflix buys space from Amazon Web Services. All this happened in less than a second. Image © 2014 Google.

 

 

 

 

 

 

 

 

 

 

Why should Comcast ask Netflix to pay to transmit their data over Comcast’s networks? Understanding this requires a few additional concepts.

Peering

Peering is an important concept in the structure of the internet. Peering is a physical link of hardware to hardware between networks in internet exchanges, which are (as pictured above) huge buildings filled with routers connected to each other. 1.  Facebook Peering is an example of a very open peering policy. Companies and internet service providers can use internet exchange centers to plug their equipment together directly, and so make their connections faster and more reliable. For websites such as Facebook which have an enormous amount of upload and download traffic, it’s well worth the effort for a small internet service provider to peer with Facebook 2.

Peering relies on some equality of traffic, as the name implies. The various tiers of internet service providers you may have heard of are based on with whom they “peer”. Tier 1 ISPs are large enough that they all peer with each other, and thus form what is usually called the backbone of the internet.

Academic institutions created the internet originally–computer science departments at major universities literally had the switches in their buildings. In the US this was ARPANET, but a variety of networks at academic institutions existed throughout the world. Groups such as Internet2 allow educational, research, and government networks to connect and peer with each other and commercial entities (including Facebook, if the traceroute from my workstation is any indication). Smaller or isolated institutions may rely on a consumer ISP, and what bandwidth is available to them may be limited by geography.

The Last Mile

Consumers, by contrast, are really at the mercy of whatever company dominates in their neighborhoods. Consumers obviously do not have the resources to lay their own fiber optic cables directly to all the websites they use most frequently. They rely on an internet service provider to do the heavy lifting, just as most of us rely on utility companies to get electricity, water, and sewage service (though of course it’s quite possible to live off the grid to a certain extent on all those services depending on where you live). We also don’t build our own roads, and we expect that certain spaces are open for traveling through by anyone. This idea of roads open for all to get from the wider world to arterial streets to local neighborhoods is thus used as an analogy for the internet–if internet service providers (like phone companies) must be common carriers, this ensures the middle and last miles aren’t jammed.

When Peering Goes Bad

Think about how peering works–it requires a roughly equal amount of traffic being sent and received through peered networks, or at least an amount of traffic to which both parties can agree. This is the problem with Netflix. Unlike big companies such as Facebook, and especially Google, Netflix is not trying to build its own network. It relies on content delivery services and internet backbone providers to get content from its servers (all hosted on Amazon Web Services) to consumers. But Netflix only sends traffic, it doesn’t take traffic, and this is the basis of most of the legal battles going on with internet service providers that service the “last mile”.

The Netflix/Comcast trouble started in 2010, when Netflix contracted with Level 3 for content delivery. Comcast claimed that Level 3 was relying on a peering relationship that was no longer valid with this increase in traffic, no matter who was sending it. (See this article for complete details.) Level 3, incidentally, accused another Tier 1 provider, Cogent, of overstepping their settlement-free peering agreement back in 2005, and cut them off for a short time, which cut pieces of the internet off from each other.

Netflix tried various arrangements, but ultimately negotiated with Comcast to pay for direct access to their last mile networks through internet exchanges, one of which is illustrated above in steps 4-6. This seems to be the most reasonable course of action for Netflix to get their outbound content over networks, since they really don’t have the ability to do settlement-free peering. Of course, Reed Hastings, the CEO of Netflix, didn’t see it that way. But for most cases, settlement-free peering is still the only way the internet can actually work, and while we may not see the agreements that make this happen, it won’t be going anywhere. In this case, Comcast was not offering Netflix paid prioritization of its content, it was negotiating for delivery of the content at all. This might seem equally wrong, but someone has to pay for the bandwidth, and why shouldn’t Netflix pay for it?

What Should We Do?

If companies want to connect with each other or build their own network connections, they can do under whatever terms work best for them. The problem would be if certain companies were using the same lines that everyone was using but their packets got preferential treatment. The imperfect road analogy works well enough for these purposes. When a firetruck, police car, and ambulance are racing through traffic with sirens blazing, we are usually ok with the resulting traffic jam since we can see this requires that speed for an emergency situation. But how do we feel when we suspect a single police car has turned on a siren just to cut in line to get to lunch faster? Or a funeral procession blocks traffic? Or an elected official has a motorcade? Or a block party? These situations are regulated by government authorities, but we may or may not like that these uses of public ways are being allowed and causing our own travel to slow down. Going further, it is clearly illegal for a private company to block a public road and charge a high rate for faster travel, but imagine if no governmental agency had the power to regulate this? The FCC is attempting to make sure they have those regulatory powers.

That said it doesn’t seem like anyone is actually planning to offer paid prioritization. Even Comcast claims “no company has had a stronger commitment to openness of the Internet…” and that they have no plans of offering such a service . I find it unlikely that we will face a situation that Barbara Stripling describes as “prioritizing Mickey Mouse and Jennifer Lawrence over William Shakespeare and Teddy Roosevelt.”

I certainly won’t advocate against treating ISPs as common carriers–my impression is that this is what the 1996 Telecommunications Act was trying to get at, though the legal issues are confounding. However, a larger problem facing libraries (not so much large academics, but smaller academics and publics) is the digital divide. If there’s no fiber optic line to a town, there isn’t going to be broadband access, and an internet service provider has no business incentive to create a line for a small town that may not generate a lot of revenue. I think we need to remain vigilant about ensuring that everyone has access to the internet at all or at a fast speed, and not get too sidetracked about theoretical future possible malfeasance by internet service providers. These points are included in the FCC’s proposal, but are not receiving most of the attention, despite the fact that they are given explicit regulatory authority to do this.

Public comments are open at the FCC’s website until July 15, so take the opportunity to leave a comment about Protecting and Promoting the Open Internet, and also consider comments on E-rate and broadband access, which is another topic the FCC is currently considering. (You can read ALA’s proposal about this here (PDF).)

  1. Blum, Andrew. Tubes: a Journey to the Center of the Internet. New York: Ecco, 2012, 80.
  2. Blum, 125-126.

Websockets For Real-time And Interactive Interfaces

TL;DR WebSockets allows the server to push up-to-date information to the browser without the browser making a new request. Watch the videos below to see the cool things WebSockets enables.

Real-Time Technologies

You are on a Web page. You click on a link and you wait for a new page to load. Then you click on another link and wait again. It may only be a second or a few seconds before the new page loads after each click, but it still feels like it takes way too long for each page to load. The browser always has to make a request and the server gives a response. This client-server architecture is part of what has made the Web such a success, but it is also a limitation of how HTTP works. Browser request, server response, browser request, server response….

But what if you need a page to provide up-to-the-moment information? Reloading the page for new information is not very efficient. What if you need to create a chat interface or to collaborate on a document in real-time? HTTP alone does not work so well in these cases. When a server gets updated information, HTTP provides no mechanism to push that message to clients that need it. This is a problem because you want to get information about a change in chat or a document as soon as it happens. Any kind of lag can disrupt the flow of the conversation or slow down the editing process.

Think about when you are tracking a package you are waiting for. You may have to keep reloading the page for some time until there is any updated information. You are basically manually polling the server for updates. Using XMLHttpRequest (XHR) (also commonly known as Ajax) has been a popular way to try to work around the limitations of HTTP somewhat. After the initial page load, JavaScript can be used to poll the server for any updated information without user intervention.

Using JavaScript in this way you can still use normal HTTP and almost simulate getting a real-time feed of data from the server. After the initial request for the page, JavaScript can repeatedly ask the server for updated information. The browser client still makes a request and the server responds, and the request can be repeated. Because this cycle is all done with JavaScript it does not require user input, does not result in a full page reload, and the amount of data which is returned from the server can be minimal. In the case where there is no new data to return, the server can just respond with something like, “Sorry. No new data. Try again.” And then the browser repeats the polling–tries again and again until there is some new data to update the page. And then goes back to polling again.

This kind of polling has been implemented in many different ways, but all polling methods still have some queuing latency. Queuing latency is the time a message has to wait on the server before it can be delivered to the client. Until recently there has not been a standardized, widely implemented way for the server to send messages to a browser client as soon as an event happens. The server would always have to sit on the information until the client made a request. But there are a couple of standards that do allow the server to send messages to the browser without having to wait for the client to make a new request.

Server Sent Events (aka EventSource) is one such standard. Once the client initiates the connection with a handshake, Server Sent Events allows the server to continue to stream data to the browser. This is a true push technology. The limitation is that only the server can send data over this channel. In order for the browser to send any data to the server, the browser would still need to make an Ajax/XHR request. EventSource also lacks support even in some recent browsers like IE11.

WebSockets allows for full-duplex communication between the client and the server. The client does not have to open up a new connection to send a message to the server which saves on some overhead. When the server has new data it does not have to wait for a request from the client and can send messages immediately to the client over the same connection. Client and server can even be sending messages to each other at the same time. WebSockets is a better option for applications like chat or collaborative editing because the communication channel is bidirectional and always open. While there are other kinds of latency involved here, WebSockets solves the problem of queuing latency. Removing this latency concern is what is meant by WebSockets being a real-time technology. Current browsers have good support for WebSockets.

Using WebSockets solves some real problems on the Web, but how might libraries, archives, and museums use them? I am going to share details of a couple applications from my work at NCSU Libraries.

Digital Collections Now!

When Google Analytics first turned on real-time reporting it was mesmerizing. I could see what resources on the NCSU Libraries’ Rare and Unique Digital Collections site were being viewed at the exact moment they were being viewed. Or rather I could view the URL for the resource being viewed. I happened to notice that there would sometimes be multiple people viewing the same resource at the same time. This gave me some hint that today someone’s social share or forum post was getting a lot of click throughs right now. Or sometimes there would be a story in the news and we had an image of one of the people involved. I could then follow up and see examples of where we were being effective with search engine optimization.

The Rare & Unique site has a lot of visual resources like photographs and architectural drawings. I wanted to see the actual images that were being viewed. The problem, though, was that Google Analytics does not have an easy way to click through from a URL to the resource on your site. I would have to retype the URL, copy and paste the part of the URL path, or do a search for the resource identifier. I just wanted to see the images now. (OK, this first use case was admittedly driven by one of the great virtues of a programmer–laziness.)

My first attempt at this was to create a page that would show the resources which had been viewed most frequently in the past day and past week. To enable this functionality, I added some custom logging that is saved to a database. Every view of every resource would just get a little tick mark that would be tallied up occasionally. These pages showing the popular resources of the moment are then regenerated every hour.

It was not a real-time view of activity, but it was easy to implement and it did answer a lot of questions for me about what was most popular. Some images are regularly in the group of the most-viewed images. I learned that people often visit the image of the NC State men’s basketball 1983 team roster which went on to win the NCAA tournament. People also seem to really like the indoor pool at the Biltmore estate.

Really Real-Time

Now that I had this logging in place I set about to make it really real-time. I wanted to see the actual images being viewed at that moment by a real user. I wanted to serve up a single page and have it be updated in real-time with what is being viewed. And this is where the persistent communication channel of WebSockets came in. WebSockets allows the server to immediately send these updates to the page to be displayed.

People have told me they find this real-time view to be addictive. I found it to be useful. I have discovered images I never would have seen or even known how to search for before. At least for me this has been an effective form of serendipitous discovery. I also have a better sense of what different traffic volume actually feels like on good day. You too can see what folks are viewing in real-time now. And I have written up some more details on how this is all wired up together.

The Hunt Library Video Walls

I also used WebSockets to create interactive interfaces on the the Hunt Library video walls. The Hunt Library has five large video walls created with Cristie MicroTiles. These very large displays each have their own affordances based on the technologies in the space and the architecture. The Art Wall is above the single service point just inside the entrance of the library and is visible from outside the doors on that level. The Commons Wall is in front of a set of stairs that also function as coliseum-like seating. The Game Lab is within a closed space and already set up with various game consoles.


Listen to Wikipedia

When I saw and heard the visualization and sonification Listen to Wikipedia, I thought it would be perfect for the iPearl Immersion Theater. Listen to Wikipedia visualizes and sonifies data from the stream of edits on Wikipedia. The size of the bubbles is determined by the size of the change to an entry, and the sound changes in pitch based on the size of the edit. Green circles show edits from unregistered contributors, and purple circles mark edits performed by automated bots. (These automated bots are sometimes used to integrate library data into Wikipedia.) A bell signals an addition to an entry. A string pluck is a subtraction. New users are announced with a string swell.

The original Listen to Wikipedia (L2W) is a good example of the use of WebSockets for real-time displays. Wikipedia publishes all edits for every language into IRC channels. A bot called wikimon monitors each of the Wikipedia IRC channels and watches for edits. The bot then forwards the information about the edits over WebSockets to the browser clients on the Listen to Wikipedia page. The browser then takes those WebSocket messages and uses the data to create the visualization and sonification.

As you walk into the Hunt Library almost all traffic goes past the iPearl Immersion Theater. The one feature that made this space perfect for Listen to Wikipedia was that it has sound and, depending on your tastes, L2W can create pleasant ambient sounds1. I began by adjusting the CSS styling so that the page would fit the large. Besides setting the width and height, I adjusted the size of the fonts. I added some text to a panel on the right explaining what folks are seeing and hearing. On the left is now text asking passersby to interact with the wall and the list of languages currently being watched for updates.

One feature of the original L2W that we wanted to keep was the ability to change which languages are being monitored and visualized. Each language can individually be turned off and on. During peak times the English Wikipedia alone can sound cacophonous. An active bot can make lots of edits of all roughly similar sizes. You can also turn off or on changes to Wikidata which collects structured data that can support Wikipedia entries. Having only a few of the less frequently edited languages on can result in moments of silence punctuated by a single little dot and small bell sound.

We wanted to keep the ability to change the experience and actually get a feel for the torrent or trickle of Wikipedia edits and allow folks to explore what that might mean. We currently have no input device for directly interacting with the Immersion Theater wall. For L2W the solution was to allow folks to bring their own devices to act as a remote control. We encourage passersby to interact with the wall with a prominent message. On the wall we show the URL to the remote control. We also display a QR code version of the URL. To prevent someone in New Zealand from controlling the Hunt Library wall in Raleigh, NC, we use a short-lived, three-character token.

Because we were uncertain how best to allow a visitor to kick off an interaction, we included both a URL and QR code. They each have slightly different URLs so that we can track use. We were surprised to find that most of the interactions began with scanning the QR code. Currently 78% of interactions begin with the QR code. We suspect that we could increase the number of visitors interacting with the wall if there were other simpler ways to begin the interaction. For bring-your-own-device remote controls we are interested in how we might use technologies like Bluetooth Low Energy within the building for a variety of interactions with the surroundings and our services.

The remote control Web page is a list of big checkboxes next to each of the languages. Clicking on one of the languages turns its stream on or off on the wall (connects or disconnects one of the WebSockets channels the wall is listening on). The change happens almost immediately with the wall showing a message and removing or adding the name of the language from a side panel. We wanted this to be at least as quick as the remote control on your TV at home.

The quick interaction is possible because of WebSockets. Both the browser page on the wall and the remote control client listen on another WebSockets channel for such messages. This means that as soon as the remote control sends a message to the server it can be sent immediately to the wall and the change reflected. If the wall were using polling to get changes, then there would potentially be more latency before a change registered on the wall. The remote control client also uses WebSockets to listen on a channel waiting for updates. This allows feedback to be displayed to the user once the change has actually been made. This feedback loop communication happens over WebSockets.

Having the remote control listen for messages from the server also serves another purpose. If more than one person enters the space to control the wall, what is the correct way to handle that situation? If there are two users, how do you accurately represent the current state on the wall for both users? Maybe once the first user begins controlling the wall it locks out other users. This would work, but then how long do you lock others out? It could be frustrating for a user to have launched their QR code reader, lined up the QR code in their camera, and scanned it only to find that they are locked out and unable to control the wall. What I chose to do instead was to have every message of every change go via WebSockets to every connected remote control. In this way it is easy to keep the remote controls synchronized. Every change on one remote control is quickly reflected on every other remote control instance. This prevents most cases where the remote controls might get out of sync. While there is still the possibility of a race condition, it becomes less likely with the real-time connection and is harmless. Besides not having to lock anyone out, it also seems like a lot more fun to notice that others are controlling things as well–maybe it even makes the experience a bit more social. (Although, can you imagine how awful it would be if everyone had their own TV remote at home?)

I also thought it was important for something like an interactive exhibit around Wikipedia data to provide the user some way to read the entries. From the remote control the user can get to a page which lists the same stream of edits that are shown on the wall. The page shows the title for the most recently edited entry at the top of the page and pushes others down the page. The titles link to the current revision for that page. This page just listens to the same WebSockets channels as the wall does, so the changes appear on the wall and remote control at the same time. Sometimes the stream of edits can be so fast that it is impossible to click on an interesting entry. A button allows the user to pause the stream. When an intriguing title appears on the wall or there is a large edit to a page, the viewer can pause the stream, find the title, and click through to the article.

The reaction from students and visitors has been fun to watch. The enthusiasm has had unexpected consequences. For instance one day we were testing L2W on the wall and noting what adjustments we would want to make to the design. A student came in and sat down to watch. At one point they opened up their laptop and deleted a large portion of a Wikipedia article just to see how large the bubble on the wall would be. Fortunately the edit was quickly reverted.

We have also seen the L2W exhibit pop up on social media. This Instagram video was posted with the comment, “Reasons why I should come to the library more often. #huntlibrary.”

This is people editing–Oh, someone just edited Home Alone–editing Wikipedia in this exact moment.

The original Listen to Wikipedia is open source. I have also made the source code for the Listen to Wikipedia exhibit and remote control application available. You would likely need to change the styling to fit whatever display you have.

Other Examples

I have also used WebSockets for some other fun projects. The Hunt Library Visualization Wall has a unique columnar design, and I used it to present images and video from our digital special collections in a way that allows users to change the exhibit. For the Code4Lib talk this post is based on, I developed a template for creating slide decks that include audience participation and synchronized notes via WebSockets.

Conclusion

The Web is now a better development platform for creating real-time and interactive interfaces. WebSockets provides the means for sending real-time messages between servers, browser clients, and other devices. This opens up new possibilities for what libraries, archives, and museums can do to provide up to the moment data feeds and to create engaging interactive interfaces using Web technologies.

If you would like more technical information about WebSockets and these projects, please see the materials from my Code4Lib 2014 talk (including speaker notes) and some notes on the services and libraries I have used. There you will also find a post with answers to the (serious) questions I was asked during the Code4Lib presentation. I’ve also posted some thoughts on designing for large video walls.

Thanks: Special thanks to Mike Nutt, Brian Dietz, Yairon Martinez, Alisa Katz, Brent Brafford, and Shirley Rodgers for their help with making these projects a reality.


About Our Guest Author: Jason Ronallo is the Associate Head of Digital Library Initiatives at NCSU Libraries and a Web developer. He has worked on lots of other interesting projects. He occasionally writes for his own blog Preliminary Inventory of Digital Collections.

Notes

  1. Though honestly Listen to Wikipedia drove me crazy listening to it so much as I was developing the Immersion Theater display.

Analyzing Usage Logs with OpenRefine

Background

Like a lot of librarians, I have access to a lot of data, and sometimes no idea how to analyze it. When I learned about linked data and the ability to search against data sources with a piece of software called OpenRefine, I wondered if it would be possible to match our users’ discovery layer queries against the Library of Congress Subject Headings. From there I could use the linking in LCSH to find the Library of Congress Classification, and then get an overall picture of the subjects our users were searching for. As with many research projects, it didn’t really turn out like I anticipated, but it did open further areas of research.

At California State University, Fullerton, we use an open source application called Xerxes, developed by David Walker at the CSU Chancellor’s Office, in combination with the Summon API. Xerxes acts as an interface for any number of search tools, including Solr, federated search engines, and most of the major discovery service vendors. We call it the Basic Search, and it’s incredibly popular with students, with over 100,000 searches a month and growing. It’s also well-liked – in a survey, about 90% of users said they found what they were looking for. We have monthly files of our users’ queries, so I had all of the data I needed to go exploring with OpenRefine.

OpenRefine

OpenRefine is an open source tool that deals with data in a very different way than typical spreadsheets. It has been mentioned in TechConnect before, and Margaret Heller’s post, “A Librarian’s Guide to OpenRefine” provides an excellent summary and introduction. More resources are also available on Github.

One of the most powerful things OpenRefine does is to allow queries against open data sets through a function called reconciliation. In the open data world, reconciliation refers to matching the same concept among different data sets, although in this case we are matching unknown entities against “a well-known set of reference identifiers” (Re-using Cool URIs: Entity Reconciliation Against LOD Hubs).

Reconciling Against LCSH

In this case, we’re reconciling our discovery layer search queries with LCSH. This basically means it’s trying to match the entire user query (e.g. “artist” or “cost of assisted suicide”) against what’s included in the LCSH linked open data. According to the LCSH website this includes “all Library of Congress Subject Headings, free-floating subdivisions (topical and form), Genre/Form headings, Children’s (AC) headings, and validation strings* for which authority records have been created. The content includes a few name headings (personal and corporate), such as William Shakespeare, Jesus Christ, and Harvard University, and geographic headings that are added to LCSH as they are needed to establish subdivisions, provide a pattern for subdivision practice, or provide reference structure for other terms.”

I used the directions at Free Your Metadata to point me in the right direction. One note: the steps below apply to OpenRefine 2.5 and version 0.8 of the RDF extension. OpenRefine 2.6 requires version 0.9 of the RDF extension. Or you could use LODRefine, which bundles some major extensions and I hear is great, but personally haven’t tried. The basic process shouldn’t change too much.

(1) Import your data

OpenRefine has quite a few file type options, so your format is likely already supported.

 Screenshot of importing data

(2) Clean your data

In my case, this involves deduplicating by timestamp and removing leading and trailing whitespaces. You can also remove weird punctuation, numbers, and even extremely short queries (<2 characters).

(3) Add the RDF extension.

If you’ve done it correctly, you should see an RDF dropdown next to Freebase.

Screenshot of correctly installed RDF extension

(4) Decide which data you’d like to search on.

In this example, I’ve decided to use just queries that are less than or equal to four words, and removed duplicate search queries. (Xerxes handles facet clicks as if they were separate searches, so there are many duplicates. I usually don’t, though, unless they happen at nearly the same time). I’ve also experimented with limiting to 10 or 15 characters, but there were not many more matches with 15 characters than 10, even though the data set was much larger. It depends on how much computing time you want to spend…it’s really a personal choice. In this case, I chose 4 words because of my experience with 15 characters – longer does not necessarily translate into more matches. A cursory glance at LCSH left me with the impression that the vast majority of headings (not including subdivisions, since they’d be searched individually) were 4 words or less. This, of course, means that your data with more than 4 words is unusable – more on that later.

Screenshot of adding a column based on word count using ngrams

(5) Go!

Shows OpenRefine reconciling

(6) Now you have your queries that were reconciled against LCSH, so you can limit to just those.

Screenshot of limiting to reconciled queries

Finding LC Classification

First, you’ll need to extract the cell.recon.match.id – the ID for the matched query that in the case of LCSH is the URI of the concept.

Screenshot of using cell.recon.match.id to get URI of concept

At this point you can choose whether to grab the HTML or the JSON, and create a new column based on this one by fetching URLs. I’ve never been able to get the parseJson() function to work correctly with LC’s JSON outputs, so for both HTML and JSON I’ve just regexed the raw output to isolate the classification. For more on regex see Bohyun Kim’s previous TechConnect post, “Fear No Longer Regular Expressions.”

On the raw HTML, the easiest way to do it is to transform the cells or create a new column with:

replace(partition(value,/<li property=”madsrdf:classification”>(<[^>]+>)*([A-Z]{1,2})/)[1],/<li property=”madsrdf:classification”>(<[^>]+>)*([A-Z]{1,2})/,”$2″).

Screenshot of using regex to get classification

You’ll note this will only pull out the first classification given, even if some have multiple classifications. That was a conscious choice for me, but obviously your needs may vary.

(Also, although I’m only concentrating on classification for this project, there’s a huge amount of data that you could work with – you can see an example URI for Acting to see all of the different fields).

Once you have the classifications, you can export to Excel and create a pivot table to count the instances of each, and you get a pretty table.

Table of LC Classifications

Caveats & Further Explorations

As you can guess by the y-axis in the table above, the number of matches is a very small percentage of actual searches. First I limited to keyword searches (as opposed to title/subject), then of those only ones that were 4 or fewer words long (about 65% of keyword searches). Of those, only about 1000 of the 26000 queries matched, and resulted in about 360 actual LC Classifications. Most months I average around 500, but in this example I took out duplicates even if they were far apart in time, just to experiment.

One thing I haven’t done but am considering is allowing matches that aren’t 100%. From my example above, there are another 600 or so queries that matched at 50-99%. This could significantly increase the number of matches and thus give us more classifications to work with.

Some of this is related to the types of searches that students are doing (see Michael J DeMars’ and my presentation “Making Data Less Daunting” at Electronic Resources & Libraries 2014, which this article grew out of, for some crazy examples) and some to the way that LCSH is structured. I chose LCSH because I could get linked to the LC Classification and thus get a sense of the subjects, but I’m definitely open to ideas. If you know of a better linked data source, I’m all ears.

I must also note that this is a pretty inefficient way of matching against LCSH. If you know of a way I could download the entire set, I’m interested in investigating that way as well.

Another approach that I will explore is moving away from reconciliation with LCSH (which is really more appropriate for a controlled vocabulary) to named-entity extraction, which takes natural language inputs and tries to recognize or extract common concepts (name, place, etc). Here I would use it as a first step before trying to match against LCSH. Free Your Metadata has a new named-entity extraction extension for OpenRefine, so I’ll definitely explore that option.

Planned Research

In the end, although this is interesting, does it actually mean anything? My next step with this dataset is to take a subset of the search queries and assign classification numbers. Over the course of several months, I hope to see if what I’ve pulled in automatically resembles the hand-classified data, and then draw conclusions.

So far, most of the peaks are expected – psychology and nursing are quite strong departments. There are some surprises though – education has been consistently underrepresented, based on both our enrollment numbers and when you do word counts (see our presentation for one month’s top word counts). Education students have a robust information literacy program. Does this mean that education students do complex searches that don’t match LCSH? Do they mostly use subject databases? Once again, an area for future research, should these automatic results match the classifications I do by hand.

What do you think? I’d love to hear your feedback or suggestions.

About Our Guest Author

Jaclyn Bedoya has lived and worked on three continents, although currently she’s an ER Librarian at CSU Fullerton. It turns out that growing up in Southern California spoils you, and she’s happiest being back where there are 300 days of sunshine a year. Also Disneyland. Reach her @spamgirl on Twitter or jaclynbedoya@gmail.com


Getting Started with APIs

There has been a lot of discussion in the library community regarding the use of web service APIs over the past few years.  While APIs can be very powerful and provide awesome new ways to share, promote, manipulate and mashup your library’s data, getting started using APIs can be overwhelming.  This post is intended to provide a very basic overview of the technologies and terminology involved with web service APIs, and provides a brief example to get started using the Twitter API.

What is an API?

First, some definitions.  One of the steepest learning curves with APIs involves navigating the terminology, which unfortunately can be rather dense – but understanding a few key concepts makes a huge difference:

  • API stands for Application Programming Interface, which is a specification used by software components to communicate with each other.  If (when?) computers become self-aware, they could use APIs to retrieve information, tweet, post status updates, and essentially run most day-to-do functions for the machine uprising. There is no single API “standard” though one of the most common methods of interacting with APIs involves RESTful requests.
  • REST / RESTful APIs  - Discussions regarding APIs often make references to “REST” or “RESTful” architecture.  REST stands for Representational State Transfer, and you probably utilize RESTful requests every day when browsing the web. Web browsing is enabled by HTTP (Hypertext Transfer Protocol) – as in http://example.org.  The exchange of information that occurs when you browse the web uses a set of HTTP methods to retrieve information, submit web forms, etc.  APIs that use these common HTTP methods (sometimes referred to as HTTP verbs) are considered to be RESTful.  RESTful APIs are simply APIs that leverage the existing architecture of the web to enable communication between machines via HTTP methods.

HTTP Methods used by RESTful APIs

Most web service APIs you will encounter utilize at the core the following HTTP methods for creating, retrieving, updating, and deleting information through that web service.1  Not all APIs allow each method (at least without authentication) but some common methods for interacting with APIs include:

    • GET – You can think of GET as a way to “read” or retrieve information via an API.  GET is a good starting point for interacting with an API you are unfamiliar with.  Many APIs utilize GET, and GET requests can often be used without complex authentication.  A common example of a GET request that you’ve probably used when browsing the web is the use of query strings in URLs (e.g., www.example.org/search?query=ebooks).
    • POST – POST can be used to “write” data over the web.  You have probably generated  POST requests through your browser when submitting data on a web form or making a comment on a forum.  In an API context, POST can be used to request that an API’s server accept some data contained in the POST request – Tweets, status updates, and other data that is added to a web service often utilize the POST method.
    • PUT – PUT is similar to POST, but can be used to send data to a web service that can assign that data a unique uniform resource identifier (URI) such as a URL.  Like POST, it can be used to create and update information, but PUT (in a sense) is a little more aggressive. PUT requests are designed to interact with a specific URI and can replace an existing resource at that URI or create one if there isn’t one.
    • DELETE – DELETE, well, deletes – it removes information at the URI specified by the request.  For example, consider an API web service that could interact with your catalog records by barcode.2 During a weeding project, an application could be built with DELETE that would delete the catalog records as you scanned barcodes.3

Understanding API Authentication Methods

To me, one of the trickiest parts of getting started with APIs is understanding authentication. When an API is made available, the publishers of that API are essentially creating a door to their application’s data.  This can be risky:  imagine opening that door up to bad people who might wish to maliciously manipulate or delete your data.  So often APIs will require a key (or a set of keys) to access data through an API.

One helpful way to contextualize how an API is secured is to consider access in terms of identification, authentication, and authorization.4  Some APIs only want to know where the request is coming from (identification), while others require you to have a valid account (authentication) to access data.  Beyond authentication, an API may also want to ensure your account has permission to do certain functions (authorization).  For example, you may be an authenticated user of an API that allows you to make GET requests of data, but your account may still not be authorized to make POST, PUT, or DELETE requests.

Some common methods used by APIs to store authentication and authorization include OAuth and WSKey:

  • OAuth - OAuth is a widely used open standard for authorization access to HTTP services like APIs.5  If you have ever sent a tweet from an interface that’s not Twitter (like sharing a photo directly from your mobile phone) you’ve utilized the OAuth framework.  Applications that already store authentication data in the form of user accounts (like Twitter and Google) can utilize their existing authentication structures to assign authorization for API access.  API Keys, Secrets, and Tokens can be assigned to authorized users, and those variables can be used by 3rd party applications without requiring the sharing of passwords with those 3rd parties.
  • WSKey (Web Services Key) – This is an example from OCLC, that is conceptually very similar to OAuth.  If you have an OCLC account (either through worldcat.org or oclc.org account) you can request key access.  Your authorization – in other words, what services and REST requests you are permitted to access – may be dependent upon your relationship with an OCLC member organization.

Keys, Secrets, Tokens?  HMAC?!

API authorization mechanisms often require multiple values in order to successfully interact with the API.  For example, with the Twitter API, you may be assigned an API Key and a corresponding Secret.  The topic of secret key authentication can be fairly complex,6 but fundamentally a Key and its corresponding Secret are used to authenticate requests in a secure encrypted fashion that would be difficult to guess or decrypt by malicious third-parties.  Multiple keys may be required to perform particular requests – for example, the Twitter API requires a key and secret to access the API itself, as well as a token and secret for OAuth authorization.

Probably the most important thing to remember about secrets is to keep them secret.  Do not share them or post them anywhere, and definitely do not store secret values in code uploaded to Github 7 (.gitignore – a method to exclude files from a git repository – is your friend here). 8  To that end, one strategy that is used by RESTful APIs to further secure secret key value is an HMAC header (hash-based method authentication code).  When requests are sent, HMAC uses your secret key to sign the request without actually passing the secret key value in the request itself. 9

Case Study:  The Twitter API

It’s easier to understand how APIs work when you can see them in action.  To do these steps yourself, you’ll need a Twitter account.  I strongly recommend creating a Twitter account separate from your personal or organizational accounts for initial experimentation with the API.  This code example is a very simple walkthrough, and does not cover securing your applications’ server (and thus securing the keys that may be stored on that server).  Anytime you authorize access to a Twitter account to API access you may be exposing it to some level of vulnerability.  At the end of the walkthrough, I’ll list the steps you would need to take if your account does get compromised.

1.  Activate a developer account

Visit dev.twitter.com and click the sign in area in the upper right corner.  Sign in with your Twitter account. Once signed in, click on your account icon (again in the upper right corner of the page) and then select the My Applications option from the drop-down menu.

Screenshot of the Twitter Developer Network login screen

2.  Get authorization

In the My applications area, click the Create New App button, and then fill out the required fields (Name, Description, and Website where the app code will be stored).  If you don’t have a web server, don’t worry, you can still get started testing out the API without actually writing any code.

3.  Get your keys

After you’ve created the application and are looking at its settings, click on the API Keys tab.  Here’s where you’ll get the values you need.  Your API Access Level is probably limited to read access only.  Click the “modify app permissions” link to set up read and write access, which will allow you to post through the API.  You will have to associate a mobile phone number with your Twitter account to get this level of authorization.

Screenshot of Twitter API options that allow for configuraing API read and write access.

Scroll down and note that in addition to an API Key and Secret, you also have an access token associated with OAUTH access.  This Token Key and Secret are required to authorize account activity associated with your Twitter user account.

4.  Test Oauth Access / Make a GET call

From the application API Key page, click the Test OAuth button.  This is a good way to get a sense of the API calls.  Leave the key values as they are on the page, and scroll down to the Request Settings Area.  Let’s do a call to return the most recent tweet from our account.

With the GET request checked, enter the following values:

Request URI:

Request Query (obviously replace yourtwitterhandle with… your actual Twitter handle):

  • screen_name=yourtwitterhandle&count=1

For example, my GET request looks like this:

Screenshot of the GET request setup screen for OAuth testing.

Click “See OAuth signature for this request”.  On the next page, look for the cURL request.  You can copy and paste this into a terminal or console window to execute the GET request and see the response (there will be a lot more of response text than what’s posted here):

* SSLv3, TLS alert, Client hello (1):
[{"created_at":"Sun Apr 20 19:37:53 +0000 2014","id":457966401483845632,
"id_str":"457966401483845632",
"text":"Just Added: The Fault in Our Stars by John Green; 
2nd Floor PZ7.G8233 Fau 2012","

As you can see, the above response to my cURL request includes the text of my account’s last tweet:

image00

What to do if your Twitter API Key or OAuth Security is Compromised

If your Twitter account suddenly starts tweeting out spammy “secrets to weight loss success” that you did not authorize (or other tweets that you didn’t write), your account has been compromised.  If you can still login with your username and password, it’s likely that your OAuth Keys have been compromised.  If you can’t log in, your account has probably been hacked.10  Your account can be compromised if you’ve authorized a third party app to tweet, but if your Twitter account has an active developer application on dev.twitter.com, it could be your own application’s key storage that’s been compromised.

Here are the immediate steps to take to stop the spam:

  1. Revoke access to third party apps under Settings –> Apps.  You may want to re-authorize them later – but you’ll probably want to reset the password for the third-party accounts that you had authorized.
  2. If you have generated API keys, log into dev.twitter.com and re-generate your API Keys and Secrets and your OAuth Keys and Secrets.  You’ll have to update any apps using the keys with the new key and secret information – but only if you have verified the server running the app hasn’t also been compromised.
  3. Reset your Twitter account password.11
5.  Taking it further:  Posting a new titles Twitter feed

So now you know a lot about the Twitter API – what now?  One way to take this further might involve writing an application to post new books that are added to your library’s collection.  Maybe you want to highlight a particular subject or collection – you can use some text output from your library catalog to post the title, author, and call number of new books.

The first step to such an application could involve creating an app that can post to the Twitter API.  If you have access to a  server that can run PHP, you can easily get started by downloading this incredibly helpful PHP wrapper.

Then in the same directory create two new files:

  • settings.php, which contains the following code (replace all the values in quotes with your actual Twitter API Key information):
<?php

$settings = array {
 ‘oath_access_token’ => “YOUR_ACCESS_TOKEN”,
 ‘oath_access_token_secret’ => “YOUR_ACCESS_TOKEN_SECRET”,
 ‘consumer_key’ => “YOUR_API_KEY”,
 ‘consumer_secret’ => “YOUR_API_KEY_SECRET”,
);

?>
  • and twitterpost.php, which has the following code, but swap out the values of ‘screen_name’ with your Twitter handle, and change the ‘status’ value if desired:
<?php

//call the PHP wrapper and your API values
require_once('TwitterAPIExchange.php');
include 'settings.php';

//define the request URL and REST request type
$url = "https://api.twitter.com/1.1/statuses/update.json";
$requestMethod = "POST";

//define your account and what you want to tweet
$postfields = array(
  'screen_name' => 'YOUR_TWITTER_HANDLE',
  'status' => 'This is my first API test post!'
);

//put it all together and build the request
$twitter = new TwitterAPIExchange($settings);
echo $twitter->buildOauth($url, $requestMethod)
->setPostfields($postfields)
->performRequest();

?>

Save the files and run the twitterpost.php page in your browser. Check the Twitter account referenced by the screen_name variable.  There should now be a new post with the contents of the ‘status’ value.

This is just a start – you would still need to get data out of your ILS and feed it to this application in some way – which brings me to one final point.

Is there an API for your ILS?  Should there be? (Answer:  Yes!)

Getting data out of traditional, legacy ILS systems can be a challenge.  Extending or adding on to traditional ILS software can be impossible (and in some cases may have been prohibited by license agreements).  One of the reasons for this might be that the architecture of such systems was designed for a world where the kind of data exchange facilitated by RESTful APIs didn’t yet exist.  However, there is definitely a major trend by ILS developers to move toward allowing access to library data within ILS systems via APIs.

It can be difficult to articulate exactly why this kind of access is necessary – especially when looking toward the future of rich functionality in emerging web-based library service platforms.  Why should we have to build custom applications using APIs – shouldn’t our ILS systems be built with all the functionality we need?

While libraries should certainly promote comprehensive and flexible architecture in the ILS solutions they purchase, there will almost certainly come a time when no matter how comprehensive your ILS is, you’re going to wonder, “wouldn’t it be nice if our system did X”?  Moreover, consider how your patrons might use your library’s APIs; for example, integrating your library’s web services other apps and services they already to use, or to build their own applications with your library web services. If you have web service API access to your data – bibliographic, circulation, acquisition data, etc. – you have the opportunity to meet those needs and to innovate collaboratively.  Without access to your data, you’re limited to the development cycle of your ILS vendor, and it may be years before you see the functionality you really need to do something cool with your data.  (It may still be years before you can find the time to develop your own app with an API, but that’s an entirely different problem.)

Examples of Library Applications built using APIs and ILS API Resources

Further Reading

Michel, Jason P. Web Service APIs and Libraries. Chicago, IL:  ALA Editions, 2013. Print.

Richardson, Leonard, and Michael Amundsen. RESTful Web APIs. Sebastopol, Calif.: O’Reilly, 2013.

 

About our Guest Author:

Lauren Magnuson is Systems & Emerging Technologies Librarian at California State University, Northridge and a Systems Coordinator for the Private Academic Library Network of Indiana (PALNI).  She can be reached at lauren.magnuson@csun.edu or on Twitter @lpmagnuson.

 

Notes

  1. create, retrieve, update, and delete is sometimes referred to by acronym: CRUD
  2. For example, via the OCLC Collection Management API: http://www.oclc.org/developer/develop/web-services/wms-collection-management-api.en.html
  3. For more detail on these and other HTTP verbs, http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
  4. https://blog.apigee.com/detail/do_you_need_api_keys_api_identity_vs._authorization
  5. Google, for example: https://developers.google.com/accounts/docs/OAuth2
  6. To learn a lot more about this, check out this web series: http://www.youtube.com/playlist?list=PLB4D701646DAF0817
  7. http://www.securityweek.com/github-search-makes-easy-discovery-encryption-keys-passwords-source-code
  8. Learn more about .gitignore here:  https://help.github.com/articles/ignoring-files
  9. An nice overview of HMAC is here: http://www.wolfe.id.au/2012/10/20/what-is-hmac-and-why-is-it-useful
  10. Here’s what to do if you’re account is hacked and you can’t log in:  https://support.twitter.com/articles/185703-my-account-has-been-hacked
  11. More information, and further steps you can take are here:  https://support.twitter.com/articles/31796-my-account-has-been-compromised

Higher ‘Professional’ Ed, Lifelong Learning to Stay Employed, Quantified Self, and Libraries

The 2014 Horizon Report is mostly a report on emerging technologies. Many academic librarians carefully read its Higher Ed edition issued every year to learn about the upcoming technology trends. But this year’s Horizon Report Higher Ed edition was interesting to me more in terms of how the current state of higher education is being reflected on the report than in terms of the technologies on the near-term (one-to-five year) horizon of adoption. Let’s take a look.

A. Higher Ed or Higher Professional Ed?

To me, the most useful section of this year’s Horizon Report was ‘Wicked Challenges.’ The significant backdrop behind the first challenge “Expanding Access” is the fact that the knowledge economy is making higher education more and more closely and directly serve the needs of the labor market. The report says, “a postsecondary education is becoming less of an option and more of an economic imperative. Universities that were once bastions for the elite need to re-examine their trajectories in light of these issues of access, and the concept of a credit-based degree is currently in question.” (p.30)

Many of today’s students enter colleges and universities with a clear goal, i.e. obtaining a competitive edge and a better earning potential in the labor market. The result that is already familiar to many of us is the grade and the degree inflation and the emergence of higher ed institutions that pursue profit over even education itself. When the acquisition of skills takes precedence to the intellectual inquiry for its own sake, higher education comes to resemble higher professional education or intensive vocational training. As the economy almost forces people to take up the practice of lifelong learning to simply stay employed, the friction between the traditional goal of higher education – intellectual pursuit for its own sake – and the changing expectation of higher education — creative, adaptable, and flexible workforce – will only become more prominent.

Naturally, this socioeconomic background behind the expansion of postsecondary education raises the question of where its value lies. This is the second wicked challenge listed in the report, i.e. “Keeping Education Relevant.” The report says, “As online learning and free educational content become more pervasive, institutional stakeholders must address the question of what universities can provide that other approaches cannot, and rethink the value of higher education from a student’s perspective.” (p.32)

B. Lifelong Learning to Stay Employed

Today’s economy and labor market strongly prefer employees who can be hired, retooled, or let go at the same pace with the changes in technology as technology becomes one of the greatest driving force of economy. Workers are expected to enter the job market with more complex skills than in the past, to be able to adjust themselves quickly as important skills at workplaces change, and increasingly to take the role of a creator/producer/entrepreneur in their thinking and work practices. Credit-based degree programs fall short in this regard. It is no surprise that the report selected “Agile Approaches to Change” and “Shift from Students as Consumers to Students as Creators” as two of the long-range and the mid-range key trends in the report.

A strong focus on creativity, productivity, entrepreneurship, and lifelong learning, however, puts a heavier burden on both sides of education, i.e. instructors and students (full-time, part-time, and professional). While positive in emphasizing students’ active learning, the Flipped Classroom model selected as one of the key trends in the Horizon report often means additional work for instructors. In this model, instructors not only have to prepare the study materials for students to go over before the class, such as lecture videos, but also need to plan active learning activities for students during the class time. The Flipped Classroom model also assumes that students should be able to invest enough time outside the classroom to study.

The unfortunate side effect or consequence of this is that those who cannot afford to do so – for example, those who have to work on multiple jobs or have many family obligations, etc. – will suffer and fall behind. Today’s students and workers are now being asked to demonstrate their competencies with what they can produce beyond simply presenting the credit hours that they spent in the classroom. Probably as a result of this, a clear demarcation between work, learning, and personal life seems to be disappearing. “The E-Learning Predictions for 2014 Report”  from EdTech Europe predicts that ‘Learning Record Stores’, which track, record, and quantify an individual’s experiences and progress in both formal and informal learning, will be emerging in step with the need for continuous learning required for today’s job market. EdTech Europe also points out that learning is now being embedded in daily tasks and that we will see a significant increase in the availability and use of casual and informal learning apps both in education but also in the workplace.

C. Quantified Self and Learning Analytics

Among the six emerging technologies in the 2014 Horizon Report Higher Education edition, ‘Quantified Self’ is by far the most interesting new trend. (Other technologies should be pretty familiar to those who have been following the Horizon Report every year, except maybe the 4D printing mentioned in the 3D printing section. If you are looking for the emerging technologies that are on a farther horizon of adoption, check out this article from the World Economic Forum’s Global Agenda Council on Emerging Technologies, which lists technologies such as screenless display and brain-computer interfaces.)

According to the report, “Quantified Self describes the phenomenon of consumers being able to closely track data that is relevant to their daily activities through the use of technology.” (ACRL TechConnect has covered personal data monitoring and action analytics previously.) Quantified self is enabled by the wearable technology devices, such as Fitbit or Google Glass, and the Mobile Web. Wearable technology devices automatically collect personal data. Fitbit, for example, keeps track of one’s own sleep patterns, steps taken, and calories burned. And the Mobile Web is the platform that can store and present such personal data directly transferred from those devices. Through these devices and the resulting personal data, we get to observe our own behavior in a much more extensive and detailed manner than ever before. Instead of deciding on which part of our life to keep record of, we can now let these devices collect about almost all types of data about ourselves and then see which data would be of any use for us and whether any pattern emerges that we can perhaps utilize for the purpose of self-improvement.

Quantified Self is a notable trend not because it involves an unprecedented technology but because it gives us a glimpse of what our daily lives will be like in the near future, in which many of the emerging technologies that we are just getting used to right now – the mobile, big data, wearable technology – will come together in full bloom. Learning Analytics,’ which the Horizon Report calls “the educational application of ‘big data’” (p.38) and can be thought of as the application of Quantified Self in education, has been making a significant progress already in higher education. By collecting and analyzing the data about student behavior in online courses, learning analytics aims at improving student engagement, providing more personalized learning experience, detecting learning issues, and determining the behavior variables that are the significant indicators of student performance.

While privacy is a natural concern for Quantified Self, it is to be noted that we ourselves often willingly participate in personal data monitoring through the gamified self-tracking apps that can be offensive in other contexts. In her article, “Gamifying the Quantified Self,” Jennifer Whitson writes:

Gamified self-tracking and participatory surveillance applications are seen and embraced as play because they are entered into freely, injecting the spirit of play into otherwise monotonous activities. These gamified self-improvement apps evoke a specific agency—that of an active subject choosing to expose and disclose their otherwise secret selves, selves that can only be made penetrable via the datastreams and algorithms which pin down and make this otherwise unreachable interiority amenable to being operated on and consciously manipulated by the user and shared with others. The fact that these tools are consumer monitoring devices run by corporations that create neoliberal, responsibilized subjectivities become less salient to the user because of this freedom to quit the game at any time. These gamified applications are playthings that can be abandoned at whim, especially if they fail to pleasure, entertain and amuse. In contrast, the case of gamified workplaces exemplifies an entirely different problematic. (p.173; emphasis my own and not by the author)

If libraries and higher education institutions becomes active in monitoring and collecting students’ learning behavior, the success of an endeavor of that kind will depend on how well it creates and provides the sense of play to students for their willing participation. It will be also important for such kind of learning analytics project to offer an opt-out at any time and to keep the private data confidential and anonymous as much as possible.

D. Back to Libraries

The changed format of this year’s Horizon Report with the ‘Key Trends’ and the ‘Significant Challenges’ has shown the forces in play behind the emerging technologies to look out for in higher education much more clearly. A big take-away from this report, I believe, is that in spite of the doubt about the unique value of higher education, the demand will be increasing due to the students’ need to obtain a competitive advantage in entering or re-entering the workforce. And that higher ed institutions will endeavor to create appropriate means and tools to satisfy students’ need of acquiring and demonstrating skills and experience in a way that is appealing to future employers beyond credit-hour based degrees, such as competency-based assessments and a badge system, is another one.

Considering that the pace of change at higher education tends to be slow, this can be an opportunity for academic libraries. Both instructors and students are under constant pressure to innovate and experiment in their teaching and learning processes. Instructors designing the Flipped Classroom model may require a studio where they can record and produce their lecture videos. Students may need to compile portfolios to demonstrate their knowledge and skills for job interviews. Returning adult students may need to acquire the habitual lifelong learning practices with the help from librarians. Local employers and students may mutually benefit from a place where certain co-projects can be tried. As a neutral player on the campus with tech-savvy librarians and knowledgeable staff, libraries can create a place where the most palpable student needs that are yet to be satisfied by individual academic departments or student services are directly addressed. Maker labs, gamified learning or self-tracking modules, and a competency dashboard are all such examples. From the emerging technology trends in higher ed, we see that the learning activities in higher education and academic libraries will be more and more closely tied to the economic imperative of constant innovation.

Academic libraries may even go further and take up the role of leading the changes in higher education. In his blog post for Inside Higher Ed, Joshua Kim suggests exactly this and also nicely sums up the challenges that today’s higher education faces:

  • How do we increase postsecondary productivity while guarding against commodification?
  • How do we increase quality while increasing access?
  • How do we leverage technologies without sacrificing the human element essential for authentic learning?

How will academic libraries be able to lead the changes necessary for higher education to successfully meet these challenges? It is a question that will stay with academic libraries for many years to come.


Bash Scripting: automating repetitive command line tasks

Introduction

One of my current tasks is to develop workflows for digital preservation procedures. These workflows begin with the acquisition of files – either disk images or logical file transfers – both of which end up on a designated server. Once acquired, the images (or files) are checked for viruses. If clean, they are bagged using Bagit and then copied over to a different server for processing.1 This work is all done at the command line, and as you can imagine, it gets quite repetitive. It’s also a bit error-prone since our file naming conventions include a 10-digit media ID number, which is easily mistyped. So once all the individual processes were worked out, I decided to automate things a bit by placing the commands into a single script. I should mention here that I’m no Linux whiz- I use it as needed which sometimes is daily, sometimes not. This is the first time I’ve ever tried to tie commands together in a Bash script, but I figured previous programming experience would help.

Creating a Script

To get started, I placed all the virus check commands for disk images into a script. These commands are different than logical file virus checks since the disk has to be mounted to get a read. This is a pretty simple step – first add:

#!/bin/bash

as the first line of the file (this line should not be indented or have any other whitespace in front of it). This tells the kernel which kind of interpreter to invoke, in this case, Bash. You could substitute the path to another interpreter, like Python, for other types of scripts: #!/bin/python.

Next, I changed the file permissions to make the script executable:

chmod +x myscript

I separated the virus check commands so that I could test those out and make sure they were working as expected before delving into other script functions.

Here’s what my initial script looked like (comments are preceded by a #):

#!/bin/bash

#mount disk
sudo mount -o ro,loop,nodev,noexec,nosuid,noatime /mnt/transferfiles/2013_072/2013_072_DM0000000001/2013_072_DM0000000001.iso /mnt/iso

#check to verify mount
mountpoint -q /mnt/iso && echo "mounted" || "not mounted"

#call the Clam AV program to run the virus check
clamscan -r /mnt/iso > "/mnt/transferfiles/2013_072/2013_072_DM0000000001/2013_072_DM0000000001_scan_test.txt"

#unmount disk
sudo umount /mnt/iso

#check disk unmounted
mountpoint -q /mnt/iso && echo "mounted" || "not mounted"

All those options on the mount command? They give me the piece of mind that accessing the disk will in no way alter it (or affect the host server), thus preserving its authenticity as an archival object. You may also be wondering about the use of “&&” and “||”.  These function as conditional AND and OR operators, respectively. So “&&” tells the shell to run the first command, AND if that’s successful, it will run the second command. Conversely, the “||” tells the shell to run the first command OR if that fails, run the second command. So the mount check command can be read as: check to see if the directory at /mnt/iso is a mountpoint. If the mount is successful, then echo “mounted.” If it’s not, echo “not mounted.” More on redirection.

Adding Variables

You may have noted that the script above only works on one disk image (2013_072_DM0000000001.iso), which isn’t very useful. I created variables for the accession number, the digital media number, and the file extension, since they all changed depending on the disk image information.The file naming convention we use for disk images is consistent and strict. The top level directory is the accession number. Within that, each disk image acquired from that accession is stored within it’s own directory, named using it’s assigned number. The disk image is then named by a combination of the accession number and the disk image number. Yes, it’s repetitive, but it keeps track of where things came from and links to data we have stored elsewhere. Given that these disks may not be processed for 20 years, such redundancy is preferred.

At first I thought the accession number, digital media number, and extension variables would be best entered at the initial run command; type in one line to run many commands. Each variable is separated by a space, the .iso at the end is the extension for an optical disk image file:

$ ./virus_check.sh 2013_072 DM0000000001 .iso

In Bash, scripts run with arguments are named $1 for the first variable, $2 for the second, and so on. This actually tripped me up for a day or so. I initially thought the $1, $2, etc. variables names used by the book I was referencing were for examples only, and that the first variables I referenced in the script would automatically map in order, so if 2013_072 was the first argument, and $accession was the first variable, $accession = 2013_072 (much like when you pass in a parameter to a Python function). Then I realized there was a reason that more than one reference book and/or site used the $1, $2, $3 system for variables passed in as command line arguments. I assigned each to it’s proper variable, and things were rolling again.

#!/bin/bash

#assign command line variables
$1=$accession
$2=$digital_media
$3=$extension

#mount disk
sudo mount -o ro,loop,noatime /mnt/transferfiles/${accession}/${accession}_${digital_media}/${accession}_${digital_media}${extension} /mnt/iso<span style="line-height: 1.5em;"> </span>

Note: variables names are often presented without curly braces; it’s recommended to place them in curly braces when adjacent to other strings.2

Reading Data

After testing the script a bit, I realized I  didn’t like passing the variables in via the command line. I kept making typos, and it was annoying not to have any data validation done in a more interactive fashion. I reconfigured the script to prompt the user for input:

read -p "Please enter the accession number" accession

read -p "Please enter the digital media number" digital_media

read -p "Please enter the file extension, include the preceding period" extension

After reviewing some test options, I decided to test that $accession and $digital_media were valid directories, and that the combo of all three variables was a valid file. This test seems more conclusive than simply testing whether or not the variables fit the naming criteria, but it does mean that if the data entered is invalid, the feedback given to the user is limited. I’m considering adding tests for naming criteria as well, so that the user knows when the error is due to a typo vs. a non-existent directory or file. I also didn’t want the code to simply quit when one of the variables is invalid – that’s not very user-friendly. I decided to ask the user for input until valid input was received.

read -p "Please enter the accession number" accession
until [ -d /mnt/transferfiles/${accession} ]; do
     read -p "Invalid. Please enter the accession number." accession
done

read -p "Please enter the digital media number" digital_media
until [ -d /mnt/transferfiles/${accession}/${accession}_${digital_media} ]; do
     read -p "Invalid. Please enter the digital media number." digital_media
done

read -p  "Please enter the file extension, include the preceding period" extension
until [ -e/mnt/transferfiles/${accession}/${accession}_${digital_media}/
${accession}_${digital_media}${extension} ]; do
     read -p "Invalid. Please enter the file extension, including the preceding period" extension
done

Creating Functions

You may have noted that the command used to test if a disk is mounted or not is called twice. This is done on purpose, as it was a test I found helpful when running the virus checks; the virus check runs and outputs a report even if the disk isn’t mounted. Occasionally disks won’t mount for various reasons. In such cases, the resulting report will state that it scanned no data, which is confusing because the disk itself possibly could have contained no data. Testing if it’s mounted eliminates that confusion. The command is repeated after the disk has been unmounted, mainly because I found it easy to forget the unmount step, and testing helps reinforce good behavior. Given that the command is repeated twice, it makes sense to make it a function rather than duplicate it.

check_mount () {
     #checks to see if disk is mounted or not
     mountpoint -q /mnt/iso && echo "mounted" || "not mounted" 
}

Lastly, I created a function for the input variables. I’m sure there’s prettier, more concise ways of writing this function, but since it’s still being refined and I’m still learning Bash scripting, I decided to leave it for now. I did want it placed in it’s own function because I’m planning to add additional code that will notify me if the virus check is positive and exit the program, or if it’s negative, bag the disk image and corresponding files, and copy them over to another server where they’ll wait for further processing.

get_image () {
     #gets data from user, validates it    
     read -p "Please enter the accession number" accession
     until [ -d /mnt/transferfiles/${accession} ]; do
          read -p "Invalid. Please enter the accession number." accession
     done

     read -p "Please enter the digital media number" digital_media
     until [ -d /mnt/transferfiles/${accession}/${accession}_${digital_media} ]; do
          read -p "Invalid. Please enter the digital media number." digital_media
     done

     read -p  "Please enter the file extension, include the preceding period" extension
     until [ -e /mnt/transferfiles/${accession}/${accession}_${digital_media}/${accession}_${digital_media}${extension} ]; do
          read -p "Invalid. Please enter the file extension, including the preceding period" extension
     done
}

Resulting (but not final!) Script

#!/bin/bash
#takes accession number, digital media number, and extension as variables to mount a disk image and run a virus check

check_mount () {
     #checks to see if disk is mounted or not
     mountpoint -q /mnt/iso && echo "mounted" || "not mounted"
}

get_image () {
     #gets disk image data from user, validates info
     read -p "Please enter the accession number" accession
     until [ -d /mnt/transferfiles/${accession} ]; do
          read -p "Invalid. Please enter the accesion number." accession
     done

     read -p "Please enter the digital media number" digital_media
     until [ -d /mnt/transferfiles/${accession}/${accession}_${digital_media} ]; do
          read -p "Invalid. Please enter the digital media number." digital_media
     done

     read -p  "Please enter the file extension, include the preceding period" extension
     until [ -e/mnt/transferfiles/${accession}/${accession}_${digital_media}/${accession}_${digital_media}${extension} ]; do
          read -p "Invalid. Please enter the file extension, including the preceding period" extension
     done
}

get_image

#mount disk
sudo mount -o ro,loop,noatime /mnt/transferfiles/${accession}/${accession}_${digital_media}/${accession}_${digital_media}${extension} /mnt/${extension} 

check_mount

#run virus check
sudo clamscan -r /mnt/iso > "/mnt/transferfiles/${accession}/${accession}_${digital_media}/${accession}_${digital_media}_scan_test.txt"

#unmount disk
sudo umount /mnt/iso

check_mount

Conclusion

There’s a lot more I’d like to do with this script. In addition to what I’ve already mentioned, I’d love to enable it to run over a range of digital media numbers, since they often are sequential. It also doesn’t stop if the disk isn’t mounted, which is an issue. But I thought it served as a good example of how easy it is to take repetitive command line tasks and turn them into a script. Next time, I’ll write about the second phase of development, which will include combining this script with another one, virus scan reporting, bagging, and transfer to another server.

Suggested References

An A-Z Index of the Bash command line for Linux

The books I used, both are good for basic command line work, but they only include a small section for actual scripting:

Barrett, Daniel J., Linux Pocket Guide. O’Reilly Media, 2004.

Shotts, Jr., William E. The Linux Command Line: A complete introduction. no starch press. 2012.

The book I wished I used:

Robbins, Arnold and Beebe, Nelson H. F. Classic Shell Scripting. O’Reilly Media, 2005.

Notes

  1. Logical file transfers often arrive in bags, which are then validated and virus checked.
  2. Linux Pocket Guide

Query a Google Spreadsheet like a Database with Google Visualization API Query Language

Libraries make much use of spreadsheets. Spreadsheets are easy to create, and most library staff are familiar with how to use them. But they can quickly get unwieldy as more and more data are entered. The more rows and columns a spreadsheet has, the more difficult it is to browse and quickly identify specific information. Creating a searchable web application with a database at the back-end is a good solution since it will let users to quickly perform a custom search and filter out unnecessary information. But due to the staff time and expertise it requires, creating a full-fledged searchable web database application is not always a feasible option at many libraries.

Creating a MS Access custom database or using a free service such as Zoho can be an alternative to creating a searchable web database application. But providing a read-only view for MS Access database can be tricky although possible. MS Access is also software locally installed in each PC and therefore not necessarily available for the library staff when they are not with their work PCs on which MS Access is installed. Zoho Creator offers a way to easily convert a spreadsheet into a database, but its free version service has very limited features such as maximum 3 users, 1,000 records, and 200 MB storage.

Google Visualization API Query Language provides a quick and easy way to query a Google spreadsheet and return and display a selective set of data without actually converting a spreadsheet into a database. You can display the query result in the form of a HTML table, which can be served as a stand-alone webpage. All you have to do is to construct a custom URL.

A free version of Google spreadsheet has a limit in size and complexity. For example, one free Google spreadsheet can have no more than 400, 000 total cells. But you can purchase more Google Drive storage and also query multiple Google spreadsheets (or even your own custom databases) by using Google Visualization API Query Language and Google Chart Libraries together.  (This will be the topic of my next post. You can also see the examples of using Google Chart Libraries and Google Visualization API Query Language together in my presentation slides at the end of this post.)

In this post, I will explain the parameters of Google Visualization API Query Language and how to construct a custom URL that will query, return, and display a selective set of data in the form of an HTML page.

A. Display a Google Spreadsheet as an HTML page

The first step is to identify the URL of the Google spreadsheet of your choice.

The URL below opens up the third sheet (Sheet 3) of a specific Google spreadsheet. There are two parameters you need to pay attention inside the URL: key and gid.

https://docs.google.com/spreadsheet/ccc?key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE&usp=drive_web#gid=2

This breaks down the parameters in a way that is easier to view:

  • https://docs.google.com/spreadsheet/ccc
    ?key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE
    &usp=drive_web

    #gid=2

Key is a unique identifier to each Google spreadsheet. So you need to use that to cretee a custom URL later that will query and display the data in this spreadsheet. Gid specifies which sheet in the spreadsheet you are opening up. The gid for the first sheet is 0; the gid for the third sheet is 2.

Screen Shot 2013-11-27 at 9.44.29 AM

Let’s first see how Google Visualization API returns the spreadsheet data as a DataTable object. This is only for those who are curious about what goes on behind the scenes. You can see that for this view, the URL is slightly different but the values of the key and the gid parameter stay the same.

https://spreadsheets.google.com/tq?&tq=&key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE&gid=2

Screen Shot 2013-11-27 at 9.56.03 AM

In order to display the same result as an independent HTML page, all you need to do is to take the key and the gid parameter values of your own Google spreadsheet and construct the custom URL following the same pattern shown below.

  • https://spreadsheets.google.com
    /tq?tqx=out:html&tq=
    &key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE
    &gid=2

https://spreadsheets.google.com/tq?tqx=out:html&tq=&key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE&gid=2

Screen Shot 2013-11-27 at 9.59.11 AM

By the way, if the URL you created doesn’t work, it is probably because you have not encoded it properly. Try this handy URL encoder/decoder page to encode it by hand or you can use JavaScript encodeURIComponent() function.
Also if you want the URL to display the query result without people logging into Google Drive first, make sure to set the permission setting of the spreadsheet to be public. On the other hand, if you need to control access to the spreadsheet only to a number of users, you have to remind your users to first go to Google Drive webpage and log in with their Google account before clicking your URLs. Only when the users are logged into Google Drive, they will be able see the query result.

B. How to Query a Google Spreadsheet

We have seen how to create a URL to show an entire sheet of a Google spreadsheet as an HTML page above. Now let’s do some querying, so that we can pick and choose what data the table is going to display instead of the whole sheet. That’s where the Query Language comes in handy.

Here is an example spreadsheet with over 50 columns and 500 rows.

  • https://docs.google.com/spreadsheet/ccc?
    key=0AqAPbBT_k2VUdDFYamtHdkFqVHZ4VXZXSVVraGxacEE
    &usp=drive_web
    #gid=0

https://docs.google.com/spreadsheet/ccc?key=0AqAPbBT_k2VUdDFYamtHdkFqVHZ4VXZXSVVraGxacEE&usp=drive_web#gid=0

Screen Shot 2013-11-27 at 10.15.41 AM

What I want to do is to show only column B, C, D, F where C contains ‘Florida.’ How do I do this? Remember the URL we created to show the entire sheet above?

  • https://spreadsheets.google.com/tq?tqx=out:html&tq=&key=___&gid=___

There we had no value for the tq parameter. This is where we insert our query.

Google Visualization API Query Language is pretty much the same as SQL. So if you are familiar with SQL, forming a query is dead simple. If you aren’t SQL is also easy to learn.

  • The query should be written like this:
    SELECT B, C, D, F WHERE C CONTAINS ‘Florida’
  • After encoding it properly, you get something like this:
    SELECT%20B%2C%20C%2C%20D%2C%20F%20WHERE%20C%20CONTAINS%20%27Florida%27
  • Add it to the tq parameter and don’t forget to also specify the key:
    https://spreadsheets.google.com/tq?tqx=out:html&tq=SELECT%20B%2C%20C%2C%20D%2C%20F%20WHERE%20C%20CONTAINS%20%27Florida%27
    &key=0AqAPbBT_k2VUdEtXYXdLdjM0TXY1YUVhMk9jeUQ0NkE

I am omitting the gid parameter here because there is only one sheet in this spreadsheet but you can add it if you would like. You can also omit it if the sheet you want is the first sheet. Ta-da!

Screen Shot 2013-11-27 at 10.26.13 AM

Compare this with the original spreadsheet view. I am sure you can appreciate how the small effort put into creating a URL pays back in terms of viewing an unwieldy large spreadsheet manageable.

You can also easily incorporate functions such as count() or sum() into your query to get an overview of the data you have in the spreadsheet.

  • select D,F count(C) where (B contains ‘author name’) group by D, F

For example, this query above shows how many articles a specific author published per year in each journal. The screenshot of the result is below and you can see it for yourself here: https://spreadsheets.google.com/tq?tqx=out:html&tq=select+D,F,count(C)+where+%28B+contains+%27Agoulnik%27%29+group+by+D,F&key=0AqAPbBT_k2VUdEtXYXdLdjM0TXY1YUVhMk9jeUQ0NkE

Screen Shot 2013-11-27 at 11.34.25 AM

Take this spread sheet as another example.

libbudgetfake

This simple query below displays the library budget by year. For those who are unfamiliar with ‘pivot‘, pivot table is a data summarization tool. The query below asks the spreadsheet to calculate the total of all the values in the B column (Budget amount for each category) by the values found in the C column (Years).

Screen Shot 2013-11-27 at 11.46.49 AM

This is another example of querying the spreadsheet connected to my library’s Literature Search request form. The following query asks the spreadsheet to count the number of literature search requests by Research Topic (=column I) that were received in 2011 (=column G) grouped by the values in the column C, i.e. College of Medicine Faculty or College of Medicine Staff.

  • select C, count(I) where (G contains ’2011′) group by C

litsearch

C. More Querying Options

There are many more things you can do with a custom query. Google has an extensive documentation that is easy to follow: https://developers.google.com/chart/interactive/docs/querylanguage#Language_Syntax

These are just a few examples.

  • ORDER BY __ DESC
    : Order the results in the descending order of the column of your choice. Without ‘DESC,’ the result will be listed in the ascending order.
  • LIMIT 5
    : Limit the number of results. Combined with ‘Order by’ you can quickly filter the results by the most recent or the oldest items.

My presentation slides given at the 2013 LITA Forum below includes more detailed information about Google Visualization API Query Language, parameters, and other options as well as how to use Google Chart Libraries in combination with Google Visualization API Query Language for data visualization, which is the topic of my next post.

Happy querying Google Spreadsheet!

 


#libtechgender: A Post in Two Parts

Conversations about gender relations, bias, and appropriate behavior have bubbled up all over the technology sector recently. We have seen conferences adopt codes of conduct that strive to create welcoming atmospheres. We have also seen cases of bias and harassment, cases that may once have been tolerated or ignored, now being identified and condemned. These conversations, like gender itself, are not simple or binary but being able to listen respectfully and talk honestly about uncomfortable topics offers hope that positive change is possible.

On October 28th Sarah Houghton, the director of the San Rafael Public Library, moderated a panel on gender in library technology at the Internet Librarian conference. In today’s post I’d like to share my small contributions to the panel discussion that day and also to share how my understanding of the issues changed after the discussion there. It is my hope that more conversations—more talking and more listening—about gender issue in library technology will be sparked from this start.

Part I: Internet Librarian Panel on Gender

Our panel’s intent was to invite librarians into a public conversation about gender issues. In the Internet Librarian program our invitation read:

Join us for a lively panel and audience discussion about the challenges of gender differences in technology librarianship. The topics of fairness and bias with both genders have appeared in articles, blogs, etc and this panel of women and men who work in libraries and gender studies briefly share personal experiences, then engage the audience about experiences and how best to increase understanding between the genders specifically in the area of technology work in librarianship. 1
Panelists: Sarah Houghton, Ryan Claringbole, Emily Clasper, Kate Kosturski, Lisa Rabey, John Bultena, Tatum Lindsay, and Nicholas Schiller

My invitation to participate on the stemmed from blog posts I wrote about how online conversations about gender issues can go off the rails and become disasters. I used my allotted time to share some simple suggestions I developed observing these conversations. Coming from my personal (white cis straight male) perspective, I paid attention to things that I and my male colleagues do and say that result in unintended offense, silencing, and anger in our female colleagues. By reverse engineering these conversational disasters, I attempted to learn from unfortunate mistakes and build better conversations. Assuming honest good intentions, following these suggestions can help us avoid contention and build more empathy and trust.

  1. Listen generously. Context and perspective are vital to these discussions. If we’re actively cultivating diverse perspectives then we are inviting ideas that conflict with our assumptions. It’s more effective to assume these ideas come from unfamiliar but valid contexts than to assume they are automatically unreasonable. By deferring judgement until after new ideas have been assimilated and understood we can avoid silencing voices that we need to hear.
  2. Defensive responses can be more harmful than offensive responses. No one likes to feel called on the carpet, but the instinctive responses we can give when we feel blamed or accused can be worse than simply giving offense. Defensive denials can lead to others feeling silenced, which is much more damaging and divisive than simple disagreement. It can be the difference between communicating  “you and I disagree on this matter” and communicating “you are wrong and don’t get a voice in this conversation.” That kind of silencing and exclusion can be worse than simply giving offense.
  3. It is okay to disagree or to be wrong. Conversations about gender are full of fear. People are afraid to speak up for fear of reprisal. People are afraid to say the wrong thing and be revealed as a secret misogynist. People are afraid. The good news is that conversations where all parties feel welcome, respected, and listened to can be healing. Because context and perspective matter so much in how we address issues, once we accept the contexts and perspectives of others, we are more likely to receive acceptance of our own perspectives and contexts. Given an environment of mutual respect and inclusion, we don’t need to be afraid of holding unpopular views. These are complex issues and once trust is established, complex positions are acceptable.

This is what I presented at the panel session and I still stand behind these suggestions. They can be useful tools for building better conversations between people with good intentions. Specifically, they can help men in our field avoid all-too-common barriers to productive conversation.

That day I listened and learned a lot from the audience and from my fellow panelists. I shifted my priorities. I still think cultivating better conversations is an important goal. I still want to learn how to be a better listener and colleague.  I think these are skills that don’t just happen, but need to be intentionally cultivated. That said, I came in to the panel believing that the most important gender related issue in library technology was finding ways for well-intentioned colleagues to communicate effectively about an uncomfortable problem. Listening to my colleagues tell their stories, I learned that there are more direct and pressing gender issues in libraries.

Part II: After the Panel

As I listened to my fellow panelists tell their stories and then as I listened to people in the audience share their experiences, no one else seemed particularly concerned about well-intentioned people having misunderstandings or about breakdowns in communication. Instead, they related a series of harrowing personal experiences where men (and women, but mostly men) were directly harassing, intentionally abusive, and strategically cruel in ways that are having a very large impact on the daily work, career paths, and the quality of life of my female colleagues. I assumed that since this kind of harassment clearly violates standard HR policies that the problem is adequately addressed by existing administrative policies. That assumption is incorrect.

It is easy to ignore what we don’t see and I don’t see harassment taking place in libraries and I don’t often hear it discussed. It has been easy to underestimate the prevalence and impact it has on many of my colleagues. Listening to librarians.

Then, after the conference one evening, a friend of mine was harassed on the street and I had another assumption challenged. It happened quickly, but a stranger on the street harassed my friend while I watched in stunned passivity. 2 I arrived at the conference feeling positive about my grasp of the issues and also feeling confident about my role as an ally. I left feeling shaken and doubting both my thoughts and my actions.

In response to the panel and its aftermath, I’ve composed three more points to reflect what I learned. These aren’t suggestions, like I brought to the panel, instead they are realizations or statements. I’m obviously not an expert on the topic and I’m not speaking from a seat of authority. I’m relating stories and experiences told by others and they tell them much better than I do. In the tradition of geeks and hackers now that I have learned something new I’m sharing it with the community in hopes that my experience moves the project forward. It is my hope that better informed and more experienced voices will take this conversation farther than I am able to. These three realizations may be obvious to some, because they were not obvious to me, it seems useful to clearly articulate them.

  1. Intentional and targeted harassment of women is a significant problem in the library technology field. While subtle micro aggressions, problem conversations, and colleagues who deny that significant gender issues exist in libraries are problematic, these issues are overshadowed by direct and intentional harassing behavior targeting gender identity or sex. The clear message I heard at the panel was that workplace harassment is a very real and very current threat to women working in library technology fields.
  2. This harassment is not visible to those not targeted by it. It is easy to ignore what we do not see. Responses to the panel included many library technology women sharing their experiences and commenting that it was good to hear others’ stories. Even though the experience of workplace harassment was common, those who spoke of it reported feelings of isolation. While legislation and human resources polices clearly state harassment is unacceptable and unlawful, it still happens and when it happens the target can be isolated by the experience. Those of us who participate in library conferences, journals, and online communities can help pierce this isolation by cultivating opportunities to talk about these issues openly and public. By publicly talking about gender issues, we can thwart isolation and make the problems more visible to those who are not direct targets of harassment.
  3. This is a cultural problem, not only an individual problem. While no one point on the gender spectrum has a monopoly on either perpetrating or being the target of workplace harassment, the predominant narrative in our panel discussion was men harassing women. Legally speaking, these need to be treated as individual acts, but as a profession, we can address the cultural aspects of the issue. Something in our library technology culture is fostering an environment where women are systematically exposed to bad behavior from men.

In the field of Library Technology, we spend a lot of our time and brain power intentionally designing user experiences and assessing how users interact with our designs. Because harassment of some of our own is pervasive and cultural, I suggest we turn the same attention and intentionality to designing a workplace culture that is responsive to the needs of all of us who work here. I look forward to reading conference presentations, journal articles, and online discussions where these problems are publicly identified and directly addressed rather than occurring in isolation or being ignored.

  1. infotoday.com/il2013/Monday.asp#TrackD
  2. I don’t advocate a macho confrontational response or take responsibility for the actions of others, but an ally has their friend’s back and that night I did not speak up.