Would you like an even more intimate glimpse into what users are actually doing on your site, instead of what you (or the library web committee) think they are doing? There are several easy-to-use web-based analytics services like ClickTale , userfly, Loop11, Crazy Egg, Inspectlet, or Optimalworkshop. These online usability services offer various ways to track what users are doing as they actually navigate your pages — all without setting up a usability lab, recruiting participants, or introducing the artificiality and anxiety of an observed user session. ClickTale and userfly record user actions that you can view later as a video; most services offer heatmaps of where users actually click on your site; some offer “eye tracking” maps based on mouse movement.
Most services allow you to sign up for one free account for a limited amount of data or time.
Most allow you to specify which pages or sections of your site that you want to test at a time.
Many have monthly pricing plans that would allow for snapshots of user activity in various months of the year without having to pay for an entire year’s service.
We’re testing Inspectlet at the moment. I like it because the free account offers the two services I’m most interested in: periodic video captures of the designated site and heat maps of actual clicks. The code is a snippet added to the web pages of interest. The screen captures are fascinating — watch below as an off-campus user searches the library home page for the correct place to do an author search in the library catalog. I view it as a bit of a cautionary illustration about providing a lot of options. Follow the yellow “spotlight” to track the user’s mouse movements. As a contrast, I watched video after video of clearly experienced users taking less than two seconds to hit the “Ebsco Academic Search” link. Be prepared; watching a series of videos of unassisted users can dismantle your or your web committee’s cherished notions about how users navigate your site.
This is a Jing video of a screen capture — the actual screen captures are much sharper, and I have zoomed out for illustrative purposes. The free Inspectlet account does not support downloads of capture videos, but Rachit Gupta, the founder, wrote me that in the coming few weeks, Inspectlet is releasing a feature to allow downloads for paid accounts. Paid accounts also have access to real time analytics, so libraries would be able to get a montage of what’s happening in the lobby as it is happening. Imagine being able to walk out and announce a “pop-up library workshop” on using the library catalog effectively after seeing the twentieth person fumble through the OPAC.
Another thing I like about Inspectlet is the ability to anonymize the IP addresses in the individual screen captures to protect an individual patron’s privacy.
The chart below compares the features of a few of the most widely used web-based analytics tools.
If you are using one of these services, or a similar service, what have you learned about your users?
Testing new designs or alternative designs – widely used web-based usability tools
After you’ve watched your users and determined where there are problems or where you would like to try an alternative design, these services offer easy ways to test new designs and gather feedback from users without setting up a local usability lab.
Note: This is part one of a two part series on workflow automation in Technical Services. Part one will cover the what and process of workflow automation and an example of an item level workflow automation process. Part two will discuss batch level workflow automation and resources/tools for workflow automation.
The mysterious door at the library
A majority of you might have passed by this door many times in your library lives. Sometimes it isn’t even a door; maybe a room divider, or an invisible line that runs across the room. In any case, you may have ventured into the space called “Technical Services” (or a similar name), but do you know what goes on there? For most libraries, Technical Services staff acquire, create, and maintain access to library materials, spanning from books and a box of rocks to various electronic databases and digitized local collections. Without them, it would be hard for a library to serve its users: no physical items to borrow, no electronic journals to search for articles, and no metadata in the library discovery layer for users and staff to search for those resources. With the variety of items come a variety of workflows to process those items, many of which are repeated at various intervals: some once a week while others repeated multiple times a day. Staff time and resources are spoken for every time a workflow is repeated. Every time a workflow is manually repeated, less time and resources can be spent on other projects or on new projects that would add value to existing collections or add new collections for library users to use. Technology provides a variety of strategies for workflow automation that reduce time spent on repetitive workflows.
What is workflow automation?
The oversimplified answer to this question is that workflow automation is the process where you have the computer do the things that it can be programmed to do, thereby reducing repetitive manual actions by the staff member.
There are two types of automation to consider when you look at your workflows:
Data Entry: This type of automation is fairly straight forward, and you’ve probably already done this type of automation already without realizing it. For example, the automation script completes a form with data that remains the same for each form or types out standard text in an email being sent to a vendor. Useful for automating repetitive keystrokes, be it system codes, text, or even creating new documents in certain applications, such as an item recor. The automation script is hard-coded, meaning that the output of that script will be the same every time you run it.
Decision Making: This type of automation makes all the decisions for you! Okay, while it won’t make every decision for you, several automation languages and programs can handle fairly complex decision making flowcharts using standard conditionals. For example, if bibliographic record “A” has field “B”, then do action ”C”; else do action “D”. As you probably already guessed, this type of automation resembles coding to a certain extent. The automation script that is designed to deal with several possible outcomes is not hard-coded like the data entry script described above.
What can be automated?
Most Technical Services departments acquire, create, and maintain access to a variety of different formats, from physical to electronic formats. Traditionally, workflows focus on the individual item going through the department and its various teams: acquisitions, cataloging, and processing, for example. With the changeover to electronic formats, workflows are going more towards a batch approach, processing and/or cataloging multiple items (for example, a collection of ebooks) at once.
In addition to adding materials to library collections, a library’s Technical Services staff do a fair amount of database maintenance for the library’s ILS (Integrated Library System). The term “dirty data” is thrown around the TS departments, covering database projects dealing with misspellings, outdated codes, or incorrect codes – anything that could inhibit a library user’s access to the resource.
Why should I automate my workflows?
Better quality control of workflow and data. Any time you let a human near a workflow, errors can be introduced into a workflow: incorrect codes, mistyped text, or mishandled items. Having an automated workflow cuts down on the workflow’s fail points and allow for better overall consistency and accuracy.
Save staff time. You and your staff spend a good amount of time with repetitive keystrokes and decisions. Even small repetitive actions add up during the work day, resulting in hours of valuable staff time and resources. By automating the repetitive actions, you free up staff time to work on more complex workflows which are not as easily automated.
How do you decide what workflows to automate?
Flowchart your workflow. A simple flowchart from the beginning of the workflow to the end might reveal several places where current manual decision making can be relegated to a script. If a person is currently looking for a code in the order record to figure out what location code they should enter in the item record, the script could be set to do the same.
What are the patterns? In each step, what data remains constant throughout all items? What codes, phrases, or fields do you insert every time you go through the workflow? Is there a pattern of going from one application to another at the same point in every workflow? One record to another?
How will the script access the data? Working with a file of MARC records will be different than working with a bibliographic record that is open in your ILS. Having a file of data is easier, but if you’re automating an item-level workflow, you will be dealing with windows that you have to work with. Getting data from a window can be tricky; sometimes you are able to access the data directly, and other times you will have to scrape the screen to get to the data that you want to work on with the script.
Example: Receipt Cataloging
At my former place of work, Technical Services had three levels of cataloging: receipt cataloging, copy cataloging, and original cataloging. All monographs would go through the receipt cataloging process, with items being bumped to the two higher levels of cataloging. The majority of items that go through receipt cataloging, having met a list of 40+ criteria, are fast-tracked to physical processing, shortening the time between the item arriving at the library to being placed on the shelf, which is the overreaching goal of receipt cataloging. The criteria range from determining if the record is DLC (Library of Congress) to determining if the 008, 050, and 260 ‡c dates match in the bibliographic record (if not a conference publication).
Given that the criteria and the decision making flowchart are fairly standard and straightforward, this workflow was built with automation in mind. My predecessor used Macro Express (ME) for the first version of the receipt cataloging macros. When we got to the point where we were bumping up against ME’s limits, I migrated the macros to AutoIt, where I was able to include many more quality control checks on the bibliographic and item records.
Below is a screencast where I walk through the receipt cataloging process. If I wasn’t explaining what was happening, the whole process would have taken a minute and 10 seconds to complete, a couple of seconds more if the item was bumped to another team in the department. Compared to a five minute turnaround time if our staff manually checked every criteria, the macros allows the department to go through more items during the day with better quality control.
Bonus Example: Ordering from GOBI
Another workflow at my former place of work involved ordering monographs from GOBI. The workflow, unlike receipt cataloging, have a lot more complex decision making flowchart and more exceptions. While I could not automate on the level of receipt cataloging, there were still patterns and routines that I could automate, such as searching the library catalog with information supplied by GOBI, and determining which codes to enter in the 949 field in the OCLC record (for exporting into our database).
Below is a screencast that shows a part of the notification ordering automation script set.
Preview for Part 2
In this post, I covered more of the item level workflow automation possibilities. More of Technical Services workflows, however, are changing towards dealing with many items at once. In part 2, I will discuss some examples of batch process automation and several tools (including those mentioned in this post) that can assist in making life easier in Technical Services.
Browsing Experience in the Virtual vs. the Physical Space
However entangled our lives are in virtual spaces, it is in the physical space that we exist. For this reason, human attention is most easily directed at where visual and other sensory stimuli are. The resulting sensory feedback from interacting with the source of these stimuli further enriches the experience we have in the physical space. Libraries can take advantage of this fact in order to bring users’ fleeting attention to their often-invisible online collections. So far, our experience on the Internet, where we spend so much time, is still mostly limited to one or two sensory stimuli and provides little or no sensory feedback. A library’s online resources, often touted for its 24/7 accessibility anywhere, are no exception to this limitation.
Think about new library books, for example. The print ones are usually prominently displayed at a library lobby area attracting library visitors to walk up and browse them in the physical space. By thumbing through a new book and moving back and forth from the table of contents to different chapters, we can quickly get a sense of what kind of a book it is and decide whether we want to further read the book or not. The tactile, olfactory, visual, and auditory sensory input that we get from thumbing through a newly printed book with fresh ink contributes to making this experience enjoyable and memorable at the same time.
Now compare this experience with reading a library Web page with the list of new online library books on a computer screen. Each book is reduced to a string of words and a hyperlink. It is hard to provide any engaging experience with a string of words and a hyperlink.
The Invisibility Problem of Library e-Books
Like many libraries, Florida International University (FIU) Library started an e-book reader lending program that circulates e-book readers. Each reader comes with more than one hundred titles that have been selected by subject librarians. But how can a library make these library e-books on e-book readers noticed by library users? How can a library help a user to quickly figure out what books are available on, say, a library Kindle device when those are specifically what the user is looking for?
Well, if a user runs a keyword search in the library’s online catalog, say, with ‘Kindle,’ s/he will find more than sufficient information since the library has already neatly cataloged all titles available on the Kindle device there. But many users may fail to try this or even be unaware of the new e-book reader lending program in the first place. The e-book reader lending program offers a great service to library users. However, the library e-books offered on the e-book readers can be largely invisible to users who tend to think that what they can see in a library is all a library has.
Giving Physical Presence to Library e-Books on e-Book Readers
The problem can be solved by giving some physical presence to e-books on the library’s e-book readers using a dummy bookmark on the stacks. This is particularly effective as it quickly captures users’ attention while they are already browsing the library stacks looking for something to read.
Users are familiar with a dummy book on physical shelves that marks a print title that is often looked for under different names or the recent change of the location of a title. Applied to Kindle e-books, a dummy bookmark is just as effective. A user can walk around the space where stacks are located and physically identify those e-books that the library makes available on a e-book reader in each subject section. By a visible cue, a dummy bookmarks create a direct sensory association between an e-book and something physical (that provides a visible and tactile feedback) in a user’s mind, thereby effectively expanding a users’ idea of what is available at a library.
When you pull out the bookmark, it looks like this. The bookmark includes the book’s cover image, title, author, and call number, which help a user to locate the title record in the library’s online catalog. But in reality, users are more likely to just walk down to the Course Reserves area to check out an e-book reader after reading this sign.
I tweeted this photo a while ago when I accidentally found out the idea was implemented while looking for some book in the stacks. (See the disclaimer below.) I was quite surprised by many positive comments that I received in Twitter. Many librarians also suggested adding a QR code to the dummy bookmark next to the Call Number. The addition of the QR code would be an excellent bonus on the bookmark. It will allow users to check the availability of the title on their mobile devices, so that they can avoid the situation in which the e-book and the e-book reader device have been already checked out.
If you are running a pricy e-book reader lending program at your library, a dummy bookmark might be an inexpensive but highly effective way to make those e-books stand out to users on the library stacks. What other things do you do at your library to make your online resources and e-books more visible to users?
Disclaimer: I have suggested this idea at the E-resources group meeting where all FIU libraries (including Medical Library where I work) are represented. But the implementation was done solely by the FIU main Library for their Kindle e-book collection on their stacks. For those who are curious, I was unable to find the exact number of dummy bookmarks on the stacks.
If the New York Times articleThe Internet Gets Physicalis any indication, a sea change is approaching in just how smart everyday appliances are going to become. In theory, smart infrastructure will connect you and any appliance with an IP address to everything else.
For example: your car will talk to your phone. Appliances like your computer, and chair, and desk, interact over the web. Data will be passed via standard web technologies from every Internet-capable appliance. Everyday consumer electronics will be de facto networked to the Internet. The overall effect of these smart objects means the possibility of new library services and research environments.
“with the advent of the New Internet Protocol, version six, those objects can now have an IP address, enabling their information store to be accessed in the same way a webcam might be, allowing real-time access to that information from anywhere…the implications are not yet clear, but it is evident that hundreds of billions of devices — from delicate lab equipment to refrigerators to next-generation home security systems — will soon be designed to take advantage of such connections…” (p.8)
What are the implications of the physical Internet in library settings?
Your smart phone interacts with the library building
The ways in which mobile apps can interact with the library building is not yet fully realized; for example, should your phone and the building be able to tell you things such as the interrelations among your physical presence and searches you’ve done on your home or office computer – or places you’ve driven past in your commute; or where you spend you leisure hours? Who makes the choices of suggesting resources to you based on the information in all of your life-sensors? Surely libraries will need filtering algorithms to control for allowable data referencing but where and how will we implement such recommender services?
Smart digital shelving units
What if a future digital shelf arrangement could be responsive to your personal preferences? For example – the library building’s digital smart infrastructure could respond to your circulation history or Internet searches in a way that shelves could promote content to you in real time. What would this recommendation look like for individual research, study, or browsing? And how would libraries be able to leverage such a service?
Digital library integration with physical objects
Smart objects allow libraries to consider how to make the virtual presence (databases, e-books, ILS data) physical. Many libraries would welcome a more physical instantiation of vended software products, since to a certain extent, users believe the library’s collection consists of only the things that they can see in the library.
The 2012 NMC Horizon Report indicates that smart objects are on the far term horizon. So it may be four to five years before they affect higher eduction — what is your plan for smart objects in the library environment?
Librarians, as a rule, don’t tolerate anarchy well. They like things to be organized and to follow processes. But when it comes to emerging technologies, too much reliance on planning and committees can stifle creativity and delay adoption. The open source software community can offer librarians models for how to make progress on big projects with minimal oversight.
“Lazy consensus” is one such model from which librarians can learn a lot. At the Code4Lib conference in February 2012, Bethany Nowviskie of the University of Virginia Scholar’s Lab encouraged library development teams to embrace this concept in order to create more innovative libraries. (I encourage you to watch a video or read the text of her keynote.) This goes for all sizes and types of academic libraries, whether they have a development staff or just staff with enthusiasm for learning about emerging technologies.
Lazy Consensus means that when you are convinced that you know what the community would like to see happen you can simply assume that you already have consensus and get on with the work. You don’t have to insist people discuss and/or approve your plan, and you certainly don’t need to call a vote to get approval. You just assume you have the community’s support unless someone says otherwise.
(quote from http://incubator.apache.org/odftoolkit/docs/governance/lazyConsensus.html)
Nowviskie suggests lazy consensus as a way to cope with an institutional culture where “no” is too often the default answer, since in lazy consensus the default answer is “yes.” If someone doesn’t agree with a proposal, he or she must present and defend an alternative within a reasonable amount of time (usually 72 hours). This ensures that the people who really care about a project have a chance to speak up and make sure the project is going in the right direction. By changing the default answer to YES, we make it easier to move forward on the things we really care about.
When you care about delivering the best possible experience and set of services for your library patrons, you should advocate for ways to make that happen and spend your time thinking about how to make that happen. Nowviskie points out the kinds of environments in which this is likely to thrive. Developers and technologists need time for research and development, “20% time” projects, and freedom to explore new possibilities. Even at small libraries without any development staff, librarians need time to research and understand issues of technology in libraries to make better decisions about the adoption of emerging technologies.
Implementing lazy consensus
Implementing lazy consensus in your library must be done with care. First and foremost, you must be aware of the culture you are in and be respectful of it even as you see room for change and improvement. Coming in the first day at a new job is not the moment to implement this process across the board, but in your own work or your department’s work you can set an example and a precedent. Nowviskie provides a few guidelines for healthy lazy consensus. Emphasize working hard and with integrity while being open and friendly. Keep everyone informed about what you are working on, and keep your mission in mind as the centerpiece of your work. In libraries, this means you must keep public services involved in any project from the earliest possible stages, and always maintain a commitment to maintaining the best possible user experience. When you or your team reliably deliver good results you will show the value in the process.
While default negativity can certainly stifle creativity, default positivity for all ideas can be equally stifling. Jonah Lehrer wrote in a recent New Yorker article article that the evidence shows that traditional brainstorming, where all ideas are presented to a group without criticism, doesn’t work. Creating better ideas requires critiquing wrong assumptions, which in turn helps us examine our own assumptions. In adopting lazy consensus, make sure there is authentic room for debate. Responding to a disagreement about a course of action with reasoned critique and alternate paths is more likely to result in creative ideas, and brings the discussion forward rather than ending it with a “no.”
Librarians know a lot about information and people. The open source software community knows a lot about how to run flexible and transparent organizations. Combining the two can create wonderful experiences for our users.
At the NCSU Libraries, my colleagues and I in the Research and Information Services department do a fair bit of instruction, especially to classes from the university’s First Year Writing Program. Some new initiatives and outreach have significantly increased our instruction load, to the point where it was getting more difficult for us to effectively cover all the sessions that were requested due to practical limits of our schedules. By way of a solution, we wanted to train some of our grad assistants, who (at the time of this writing) are all library/information science students from that school down the road, in the dark arts of basic library instruction, to help spread the instruction burden out a little.
This would work great, but there’s a secondary problem: since UNC is a good 40 minute drive away, our grad assistants tend to have very rigid schedules, which are fixed well in advance — so we can’t just alter our grad assistants’ schedules on short notice to have them cover a class. Meanwhile, instruction scheduling is very haphazard, due to wide variation in how course slots are configured in the weekly calendar, so it can be hard to predict when instruction requests are likely to be scheduled. What we need is a technique to maximize the likelihood that a grad student’s standing schedule will overlap with the timing of instruction requests that we do get — before the requests come in.
Searching for a Solution – Bar graph-based analysis
The obvious solution was to try to figure out when during the day and week we provided library instruction most frequently. If we could figure this out, we could work with our grad students to get their schedules to coincide with these busy periods.
Luckily, we had some accrued data on our instructional activity from previous semesters. This seemed like the obvious starting point: look at when we taught previously and see what days and times of day were most popular. The data consisted of about 80 instruction sessions given over the course of the prior two semesters; data included date, day of week, session start time, and a few other tidbits. The data was basically scraped by hand from the instruction records we maintain for annual reports; my colleague Anne Burke did the dirty work of collecting and cleaning the data, as well as the initial analysis.
Anne’s first pass at analyzing the data was to look each day of the week in terms of courses taught in the morning, afternoon, and evening. A bit of hand-counting and spreadsheet magic produced this:
This chart was somewhat helpful — certainly it’s clear that Monday, Tuesday and Thursday are our busiest days — but but it doesn’t provide a lot of clarity regarding times of day that are hot for instruction. Other than noting that Friday evening is a dead time (hardly a huge insight), we don’t really get a lot of new information on how the instruction sessions shake out throughout the week.
Let’s Get Visual – Heatmap-based visualization
The chart above gets the fundamentals right — since we’re designing weekly schedules for our grad assistants, it’s clear that the relevant dimensions are days of week and times of day. However, there are basically two problems with the stacked bar chart approach: (1) The resolution of the stacked bars — morning, afternoon and evening — is too coarse. We need to get more granular if we’re really going to see the times that are popular for instruction; (2) The stacked bar chart slices just don’t fit our mental model of a week. If we’re going to solve a calendaring problem, doesn’t it make a lot of sense to create a visualization that looks like a calendar?
What we need is a matrix — something where one dimension is the day of the week and the other dimension is the hour of the day (with proportional spacing) — just like a weekly planner. Then for any given hour, we need something to represent how “popular” that time slot is for instruction. It’d be great if we had some way for closely clustered but non-overlapping sessions to contribute “weight” to each other, since it’s not guaranteed that instruction session timing will coincide precisely.
When I thought about analyzing the data in these terms, the concept of a heatmap immediately came to mind. A heatmap is a tool commonly used to look for areas of density in spatial data. It’s often used for mapping click or eye-tracking data on websites, to develop an understanding of the areas of interest on the website. A heatmap’s density modeling works like this: each data point is mapped in two dimensions and displayed graphically as a circular “blob” with a small halo effect; in closely-packed data, the blobs overlap. Areas of overlap are drawn with more intense color, and the intensity effect is cumulative, so the regions with the most intense color correspond to the areas of highest density of points.
Part of my motivation for using heatmaps to solve our scheduling problem was simply to use the tools I had at hand: it seemed that it would be a simple matter to convert the instruction data into a form that would be amenable to modeling with the heatmap software I had access to. But in a lot of ways, a heatmap was a perfect tool: with a proper arrangement of the data, the heatmap’s ability to model intensity would highlight the parts of each day where the most instruction occurred, without having to worry too much about the precise timing of instruction sessions.
The heatmap generation tool that I had was a slightly modified version of the Heatmap PHP class from LabsMedia’s ClickHeat, an open-source tool for website click tracking. My modified version of the heatmap package takes in an array of (x,y) ordered pairs, corresponding to the locations of the data points to be mapped, and outputs a PNG file of the generated heatmap.
So here was the plan: I would convert each instruction session in the data to a set of (x,y) coordinates, with one coordinate representing day of week and the other representing time of day. Feeding these coordinates into the heatmap software would, I hoped, create five colorful swatches, one for each day of the week. The brightest regions in the swatches would represent the busiest times of the corresponding days.
Arbitrarily, I selected the y-coordinate to represent the day of the week. So I decided that any Monday slot, for instance, would be represented by some small (but nonzero) y-coordinate, with Tuesday represented by some greater y-coordinate, etc., with the intervals between consecutive days of the week equal. The main concern in assigning these y-coordinates was for the generated swatches to be far enough apart so that the heatmap “halo” around one day of the week would not interfere with its neighbors — we’re treating the days of the week independently. Then it was a simple matter of mapping time of day to the x-coordinate in a proportional manner. The graphic below shows the output from this process.
In this graphic, days of the week are represented by the horizontal rows of blobs, with Monday as the first row and Friday as the last. The leftmost extent of each row corresponds to approximately 8am, while the rightmost extent is about 7:30pm. The key in the upper left indicates (more or less) the number of overlapping data points in a given location. A bit of labeling helps to clarify things:
Right away, we get a good sense of the shape of the instruction week. This presentation reinforces the findings of the earlier chart: that Monday, Tuesday, and Thursday are busiest, and that Friday afternoon is basically dead. But we do see a few other interesting tidbits, which are visible to us specifically through the use of the heatmap:
Monday, Tuesday and Thursday aren’t just busy, they’re consistently well-trafficked throughout the day.
Friday is really quite slow throughout.
There are a few interesting hotspots scattered here and there, notably first thing in the morning on Tuesday.
Wednesday is quite sparse overall, except for two or three prominent afternoon/evening times.
There is a block of late afternoon-early evening time-slots that are consistently busy in the first half of the week.
Using this information, we can take a much more informed approach to scheduling our graduate students, and hopefully be able to maximize their availability for instruction sessions.
“Better than it was before. Better. Stronger. Faster.” – Open questions and areas for improvement
As a proof of concept, this approach to analyzing our instruction data for the purposes of setting student schedules seems quite promising. We used our findings to inform our scheduling of graduate students this semester, but it’s hard to know whether our findings can even be validated: since this is the first semester where we’re actively assigning instruction to our graduate students, there’s no data available to compare this semester against, with respect to amount of grad student instruction performed. Nevertheless, it seems clear that knowledge of popular instruction times is a good guideline for grad student scheduling for this purpose.
There’s also plenty of work to be done as far as data collection and analysis is concerned. In particular:
Data curation by hand is burdensome and inefficient. If we can automate the data collection process at all, we’ll be in a much better position to repeat this type of analysis in future semesters.
The current data analysis completely ignores class session length, which is an important factor for scheduling (class times vary between 50 and 100 minutes). This data is recorded in our instruction spreadsheet, but there aren’t any set guidelines on how it’s entered — librarians entering their instruction data tend to round to the nearest quarter- or half-hour increment at their own preference, so a 50-minute class is sometimes listed as “.75 hours” and other times as “1 hour”. More accurate and consistent session time recording would allow us to reliably use session length in our analysis.
To make the best use of session length in the analysis, I’ll have to learn a little bit more about PHP’s image generation libraries. The current approach is basically a plug-in adaptation of ClickHeat’s existing Heatmap class, which is only designed to handle “point” data. To modify the code to treat sessions as little line segments corresponding to their duration (rather than points that correspond to their start times) would require using image processing methods that are currently beyond my ken.
A bit better knowledge of the image libraries would also allow me to add automatic labeling to the output file. You’ll notice the prominent use of “ish” to describe the hours dimension of the labeled heatmap above: this is because I had neither the inclination nor the patience to count pixels to determine where exactly the labels should go. With better knowledge of the image libraries I would be able to add graphical text labels directly to the generated heatmap, at precisely the correct location.
There are other fundamental questions that may be worth answering — or at least experimenting against — as well. For instance, in this analysis I used data about actual instruction sessions performed. But when lecturers request library sessions, they include two or three “preferred” dates, of which we pick the one that fits our librarian and room schedules best. For the purposes of analysis, it’s not entirely clear whether we should use the actual instruction data, which takes into account real space limitations but is also skewed by librarian availability; or whether we should look strictly at what lecturers are requesting, which might allow us to schedule our grad students in a way that could accommodate lecturers’ first choices better, but which might run us up against the library’s space limitations. In previous semesters, we didn’t store the data on the requests we received; this semester we’re doing that, so I’ll likely perform two analyses, one based on our actual instruction and one based on requests. Some insight might be gained by comparing the results of the two analyses, but it’s unclear what exactly the outcome will be.
Finally, it’s hard to predict how long-term trends in the data will affect our ability to plan for future semesters. It’s unclear whether prior semesters are a good indicator of future semesters, especially as lecturers move into and out of the First Year Writing Program, the source of the vast majority of our requests. We’ll get a better sense of this, presumably, as we perform more frequent analyses — it would also make sense to examine each semester separately to look for trends in instruction scheduling from semester to semester.
In any case, there’s plenty of experimenting left to do and plenty of improvements that we could make.
Reflections and Lessons Learned
There’s a few big points that I took away from this experience. A big one is simply that sometimes the right approach is a totally unexpected one. You can gain some interesting insights if you don’t limit yourself to the tools that are most familiar for a particular problem. Don’t be afraid to throw data at the wall and see what sticks.
Really, what we did in this case is not so different from creating separate histograms of instruction times for each day of the week, and comparing the histograms to each other. But using heatmaps gave us a couple of advantages over traditional histograms: first, our bin size is essentially infinitely narrow; because of the proximity effects of the heatmap calculation, nearby but non-overlapping data points still contribute weight to each other without us having to define bins as in a regular histogram. Second, histograms are typically drawn in two dimensions, which would make comparing them against each other rather a nuisance. In this case, our separate heatmap graphics for each day of the week are basically one-dimensional, which allows us to compare them side by side with little fuss. This technique could be used for side-by-side examinations of multiple sets of any histogram-like data for quick and intuitive at-a-glance comparison.
In particular, it’s important to remember — especially if your familiarity with heatmaps is already firmly entrenched in a spatial mapping context — that data doesn’t have to be spatial in order to be analyzed with heatmaps. This is really just an extension of the idea of graphical data analysis: A heatmap is just another way to look at arbitrary data represented graphically, not so different from a bar graph, pie chart, or scatter plot. Anything that you can express in two dimensions (or even just one), and where questions of frequency, density, proximity, etc., are relevant, can be analyzed using the heatmap approach.
A final point: as an analysis tool, the heatmap is really about getting a feel for how the data lies in aggregate, rather than getting a precise sense of where each point falls. Since the halo effect of a data point extends some distance away from the point, the limits of influence of that point on the final image are a bit fuzzy. If precision analysis is necessary, then heatmaps are not the right tool.
About our guest author: Andreas Orphanides is Librarian for Digital Technologies and Learning in the Research and Information Services department at NCSU Libraries. He holds an MSLS from UNC-Chapel Hill and a BA in mathematics from Oberlin College. His interests include instructional design, user interface development, devising technological solutions to problems in library instruction and public services, long walks on the beach, and kittens.
If you say “analytics” to most technology-savvy librarians, they think of Google Analytics or similar web analytics services. Many libraries are using such sophisticated data collection and analyses to improve the user experience on library-controlled sites. But the standard library analytics are retrospective: what have users done in the past? Have we designed our web platforms and pages successfully, and where do we need to change them?
Technology is enabling a different kind of future-oriented analytics. Action Analytics is evidence-based, combines data sets from different silos, and uses actions, performance, and data from the past to provide recommendations and actionable intelligence meant to influence future actions at both the institutional and the individual level. We’re familiar with these services in library-like contexts such as Amazon’s “customers who bought this item also bought” book recommendations and Netflix’s “other movies you might enjoy”.
Action Analytics in the Academic Library Landscape
It was a presentation by Mark David Milliron at Educause 2011 on “Analytics Today: Getting Smarter About Emerging Technology, Diverse Students, and the Completion Challenge” that made me think about the possibilities of the interventionist aspect of analytics for libraries. He described the complex dependencies between inter-generational poverty transmission, education as a disrupter, drop-out rates for first generation college students, and other factors such international competition and the job market. Then he moved on to the role of sophisticated analytics and data platforms and spoke about how it can help individual students succeed by using technology to deliver the right resource at the right time to the right student. Where do these sorts of analytics fit into the academic library landscape?
If your library is like my library, the pressure to prove your value to strategic campus initiatives such student success and retention is increasing. But assessing services with most analytics is past-oriented; how do we add the kind of library analytics that provide a useful intervention or recommendation? These analytics could be designed to help an individual student choose a database, or trigger a recommendation to dive deeper into reference services like chat reference or individual appointments. We need to design platforms and technology that can integrate data from various campus sources, do some predictive modeling, and deliver a timely text message to an English 101 student that recommends using these databases for the first writing assignment, or suggests an individual research appointment with the appropriate subject specialist (and a link to the appointment scheduler) to every honors students a month into their thesis year.
But should we? Are these sorts of interventions creepy and stalker-ish?* Would this be seen as an invasion of privacy? Does the use of data in this way collide with the profession’s ethical obligation and historical commitment to keep individual patron’s reading, browsing, or viewing habits private?
Every librarian I’ve discussed this with felt the same unease. I’m left with a series of questions: Have technology and online data gathering changed the context and meaning of privacy in such fundamental ways that we need to take a long hard look at our assumptions, especially in the academic environment? (Short answer — yes.) Are there ways to manage opt-in and opt-out preferences for these sorts of services so these services are only offered to those who want them? And does that miss the point? Aren’t we trying to influence the students who are unaware of library services and how the library could help them succeed?
Furthermore, are we modeling our ideas of “creepiness” and our adamant rejection of any “intervention” on the face-to-face model of the past that involved a feeling of personal surveillance and possible social judgment by live flesh persons? The phone app Mobilyze helps those with clinical depression avoid known triggers by suggesting preventative measures. The software is highly personalized and combines all kinds of data collected by the phone with self-reported mood diaries. Researcher Colin Depp observes that participants felt that the impersonal advice delivered via technology was easier to act on than “say, getting advice from their mother.”**
While I am not suggesting in any way that libraries move away from face-to-face, personalized encounters at public service desks, is there room for another model for delivering assistance? A model that some students might find less intrusive, less invasive, and more effective — precisely because it is technological and impersonal? And given the struggle that some students have to succeed in school, and the staggering debt that most of them incur, where exactly are our moral imperatives in delivering academic services in an increasingly personalized, technology-infused, data-dependent environment?
Increasingly, health services, commercial entities, and technologies such as browsers and social networking environments that are deeply embedded in most people’s lives, use these sorts of action analytics to allow the remote monitoring of our aging parents, sell us things, and match us with potential dates. Some of these uses are for the benefit of the user; some are for the benefit of the data gatherer. The moment from the Milliron presentation that really stayed with me was the poignant question that a student in a focus group asked him: “Can you use information about me…to help me?”