Learning Web Analytics from the LITA 2012 National Forum Pre-conference

Note: The 2012 LITA Forum pre-conference on Web Analytics was taught by Tabatha (Tabby) Farney and Nina McHale.  Our guest authors, Joel Richard and Kelly Sattler were two of the people who attended the pre-conference and they wrote a summary of the pre-conference to share with the ACRL TechConnect readers.

In advance of the conference, Tabby and Nina reached out to the participants ahead of time with a survey on what we the participants were interested in learning and solicited questions to be answered in the class.  Twenty-one participants responded and of them seventeen were already using Google Analytics (GA).  About half those using GA check their reports 1-2 times per month and the rest less often.  The conference opened with introductions and a brief description of what we were doing with analytics on our website and what we hoped to learn.

Web Analytics Strategy

The overall theme of the pre-conference was the following:

A web analytics strategy is the structured process of identifying and evaluating your key performance indicators on the basis of an organization’s objectives and website goals – the desired outcomes, or what you want people to do on the website.

We learned that beyond the tool we use measure our analytics, we need to identify what we want our website to do.  We do this by using pre-existing documentation our institutions have on their mission and purpose as well as the mission and purpose of the website and who it is to serve. Additionally, we need a privacy statement so our patrons understand that we will be tracking their movements on the site and what we will be collecting. We learned that there are challenges when using only IP addresses (versus cookies) for tracking purposes.  For example, does our institution’s network architecture allow for you to identify patrons versus staff using IP address or are cookies a necessity?

Tool Options for Website Statistics

To start things off, we discussed the types of web analytics tools that are available and which we were using. Many of the participants were already using Google Analytics (GA) and thus most of the activities were demonstrated in GA as we could log into our own accounts.  We were reminded that though it is free, GA keeps our data and does not allow us to delete it.  GA has us place a bit of Javascript code on the pages we want tracked. It is easier to set up GA within a content management system but it may not work as well for mobile devices.  Piwik is an open-source alternative to Google Analytics that uses a similar Javascript tagging method.  Additionally we were reminded that if we use any Javascript tagging method, we should review our code snippets least every two years as they do change.

We learned about other, less common systems for tracking user activity. AWStats is installed locally and reads the website log files and processes them into reports.  It offers the user more control and may be more useful for sites not in a content management system.  Sometimes it provides more information than desired and will be unable to clearly differentiate between users based on IP.  Other similar tools are Webalizer, FireStats, and Webtrends.

A third option is to use Web Beacons which are small, invisible transparent GIFs embedded on every page.  This is useful for when Javascript won’t work, but they probably aren’t as applicable today as they once were.

Finally, we took a brief look at the heat mapping tool, Crazy Egg.  It focuses on visual analytics and uses Javascript tagging to provide heat maps of exactly where visitors clicked on our site offering insights as to what areas of a page receive the most attention.  Crazy Egg has a 30 day free trial and then it costs per page tracked, but there are subscriptions for under $100/month if you find the information worth the cost.  The images can really give webmasters an understanding of what the users are doing on their site and are persuasive tools when redesigning a page or analyzing specific kinds of user behavior.

Core Concepts and Metrics of Web Analytics

Next, Tabby and Nina presented a basic list of terminology used within web analytics.  Of course, different tools refer to the same concept by different names, but these were the terms we used throughout our session.

  • Visits – A visit is when someone comes to the site. A visit ends when a user has not seen a new page in 30 minutes (or when they have left the site.)
  • Visitor Types: New & Returning – A cookie is used to determine whether a visitor has been to the site in the past. If a user disables cookies or clears them regularly, they will show up as a new user each time they visit.
  • Unique Visitors – To distinguish visits by the same person, the cookie is used to track when the same person returns to the site in a given period of time (hours, days, weeks or more).
  • Page Views – More specific than “hits,” a page view is recorded when a page is loaded in a visitor’s browser.
  • User Technology – This includes information about the visitor’s operating system, browser version, mobile device or desktop computer, etc.
  • Geographic Data – A visitor’s location in the world can often be determined to which city they are in.
  • Entry and Exit Pages – These refer to the page the visitor sees first during their visit (Entry) and the last page they see before leaving or their session expires (Exit).
  • Referral Sources – Did the visitor come from another site? If so, this will tell who is sending traffic to us.
  • Bounce Rate – A bounce is when someone comes to the site and views only one page before leaving.
  • Engagement Metrics – This indicates how much visitors are on our site measured by time they spent on the site or number of pages viewed.
Goals/Conversion

Considering how often the terms “goals” and “conversions” are used, we learned that it is important to realize that in web analytics lingo, a goal is a metric, also referred to as a conversion, and measures whether a desired action has occurred on your site. There are four primary types of conversions:

  1. URL Destination – A visitor has reached a targeted end page.  For commercial sites, this would be the “Thank you for your purchase” page. For a library site, this is a little more challenging to classify and will include several different pages or types of pages.
  2. Visit Duration – How much time a visitor spends on our site. This is often an unclear concept. If a user is on the site for a long time, we don’t know if they were interrupted while on our site, if they had a hard time finding what they were looking for, or if they were enthralled with all the amazing information we provide and read every word twice.
  3. Pages per Visit – Indicates site engagement. Similar to Visit Duration, many pages may mean the user was interested in our content, or that they were unable to find what they were looking for.  We distinguish this by looking at the “paths” of page the visitor saw.  As an example, we might want to know if someone finds the page they were looking for in three pages or less.
  4. Events – Targets an action on the site. This can be anything and is often used to track outbound pages or links to a downloadable PDF.

Conversion rate is an equation that shows the percentage of how often the desired action occurs.

Conversion rate = Desired action / Total or Unique visits

Goal Reports also known as Conversion Reports are sometimes provided by the tool and include the total number of conversions and the conversion rate.  We learned that we can also assign a monetary value to take advantage of the more commerce-focused tools often used in analytics software, but the results can be challenging to interpret.  Conversion reports also show an Abandonment Rate as people leave our site. However, we can counter this by creating a “funnel” that identifies the steps needed to complete the goal. The funnel report shows us where in the steps visitors drop off and how many make it through the complete conversion.

Key Performance Indicators (KPIs) were a focus of much of the conference.  They measure the outcome based on our site’s objectives/goals and are implemented via conversion rates.  KPIs are unique to each site.  Through examples, we learned that each organization’s web presence may be made up of multiple sites. For instance, an organization may have its main library pages, Libguides, the catalog, a branch site, a set of sites for digitized collections, etc. A KPI may span activities on more than one of these sites.

Segment or Filter

We then discussed the similarities and differences between Segments and Filters, both of which offer methods to narrow the data enabling us to focus on a particular point of interest.  The difference between the two is that (i) filtering will remove the data from the collection process thereby resulting in lost data; whereas (ii) segmentation hides data from the reports leaving it available for other reports. Generally, we felt that the use of Segments was preferable over Filters in Google Analytics given that it is impossible to recover data that is lost during GA’s real-time data collection.

We talked about the different kinds of segments that some of us are using. For example, is Joel’s organization, he is using a technique to segment the staff computers in their offices from computers in the library branches by adding a query string to the homepage URL of the branch computers’ browsers. Using this, he can create a segment in Google Analytics to view the activity of either group of users by segmenting on the different Entry pages (with and without this special query string). Segmenting on IP Address also further segregates his users between researchers and the general public.

Benchmarking

As a step towards measuring success for our sites, we discussed benchmarking, which is used to look at the performance of our sites before and after a change. Having performance data before making changes is essential to knowing whether those changes are successful, as defined by our goals and KPIs.

Comparing a site to itself either in a prior iteration or before making a change is called Internal Benchmarking. Comparing a site to other similar sites on the Internet is known as External Benchmarking. Since external benchmarking requires data to make a comparison, we need to request of another website their data or reports. Another alternative is to use service sites such as Alexa, Quantcast, Hitwise and others, which will do the comparison for you.  Keep in mind that these may use e-commerce or commercial indicators which may not make for a good comparison to humanities-oriented sites.

Event Tracking

Page views and visitor statistics are important for tracking how our site is doing, but sometimes we need to know about events that aren’t tracked through the normal means. We learned that an Event, both in the conceptual sense and in the analytics world, can be used to track actions that don’t naturally result in a page view. Events are used to track access to resources that aren’t a web page, such as videos, PDFs, dynamic page elements, and outbound links.

Tracking events doesn’t always come naturally and require some effort to set up. Content management systems (CMS) like Drupal help make event tracking easy either via a module or plugin or simply by editing a template or function that produces the HTML pages.  If a website is not using a CMS the webmaster will need to add event tracking code to each link or action that they wish to record in Google Analytics. Fortunately, as we saw, the event tracking code is simple and easy to add to a site and there is good documentation describing this in Google’s Event Tracking Guide documentation.

Finally, we learned that tracking events is preferable to creating “fake” pageviews as it does not inflate the statistics generated by regular pageviews due to the visitors’ usual browsing activities.

Success for our websites

Much of the second half of the conference was focused on learning about and performing some exercises to define and measure success for our sites. We started by understanding our site in terms of our Users, our Content and our Goals. These all point to the site’s purpose and circle back around to the content delivered by our site to the users in order to meet our goals. It’s all interconnected. The following questions and steps helped us to clarify the components that we need to have in hand to develop a successful website.

Content Audit – Perform an inventory that lists every page on the site. This are likely to be tedious and time-consuming. It includes finding abandoned pages, lost images, etc.  The web server is a great place to start identifying files.  Sometimes we can use automated web crawling tools to find the pages on our site.  Then we need to evaluate that content. Beyond the basic use of a page, consider recording last updated date, bounce rate, time on page, whether it is a landing page or not, and who is responsible for the content.

Identifying Related Sites – Create a list of sites that our site links to and sites that link back to our site.  Examples: parent site (e.g. our organization’s overall homepage), databases, journals, library catalog site, blog site, flickr, Twitter, Facebook, Internet Archive, etc.

Who are our users? – What is our site’s intended audience or audiences? For us at the conference, this was a variety of people: students, staff, the general public, collectors, adults, teens, parents, etc. Some of us may need to use a survey to determine this.  Some populations of users (e.g. staff) might be identified via IP Addresses. We were reminded that most sites serve one major set of users with other smaller groups of users served. For example, students might be the primary users whereas faculty and staff are secondary users.

Related Goals and plans – Use existing planning documents, strategic goals, a library’s mission statement to set a mission statement and/or goals for the website. Who are we going to help? Who is our audience?  We must define why our site exists and it’s purpose on the web.  Generally we’ll have one primary purpose per site. Secondary purposes also help define what the site does and fall under the “nice to have” category, but are also very useful to our users. (For example, Amazon.com’s primary purpose is to sell products, but secondary purposes include reviews, wishlists, ratings, etc.)

When we have a new service to promote, we can use analytics and goals to track how well that goal is being met. This is an ongoing expansion of the website and the web analytics strategy.  We were reminded to make goals that are practical, simple and achievable. Priorities can change from year to year in what we will monitor and promote.

Things to do right away

Nearing the end of our conference, we discussed things that we can do improve our analytics in the near term. These are not necessarily quick to implement, but doing these things will put us in a good place for starting our web analytics strategy. It was mentioned that if we aren’t tracking our website’s usage at all, we should install something today to at least begin collecting data!

  1. Share what we are doing with our colleagues. Educate them at a high level, so they know more about our decision making process. Be proactive and share information; don’t wait to be asked what’s going on. This will offer a sense of inclusion and transparency. What we do is not magic in any sense. We may also consider granting read-only access to some people who are interested in seeing and playing with the statistics on their own.
  2. Set a schedule for pulling and analyzing your data and statistics. On a quarterly basis, report to staff on things that we found that were interesting: important metrics, fun things, anecdotes about what is happening on our site. Also check our goals that we are tracking in analytics on a quarterly basis; do not “set and forget” our goals. On monthly basis, we should report to IT staff on topics of concern, 404 pages, important values, and things that need attention.
  3. Test, Analyze, Edit, and Repeat. This is an ongoing, long-term effort to keep improving our sites. During a site redesign, we compare analytics data before and after we make changes. Use analytics to make certain the changes we are implementing have a positive effect. Use analytics to drive the changes in our site, not because it would be cool/fun/neat to do things a certain way. Remember that our site is meant to serve our users.
  4. Measure all content. Get tracking code installed across all of our sites. Google Analytics cross-domain tracking is tricky to set up, but once installed will track users as they move between different servers. Examples for this are our website, blog, OPAC, and other servers. For things not under our control, be sure to at least track outbound to know when people leave our site.
  5. Measure all users. When we are reporting, segment the users into groups as much as possible to understand their different habits.
  6. Look at top mobile content. Use that information to divide the site and focus on things that mobile users are going to most often.
Summary

Spending eight hours learning about a topic and how to practically apply it to our site is a great way to get excited about taking on more responsibilities in our daily work. There is still a good deal of learning to be done since much of the expertise in web analytics comes from taking the time to experiment with the data and settings.

We, Kelly and Joel, are looking forward to working with analytics from the ground-up, so to speak. We are both are in an early stage of redeploying our website under new software which allows us to take into account the most up-to-date analytics tools and techniques available to us. Additionally, our organizations, though different in their specific missions and goals, are entering into a new round of long-term planning with the result being a new set of goals for the next three to five years. It becomes clear that the website is an important part of this planning and that the goals of our websites directly translate into actions that we take when configuring and using Google Analytics.

We both expect that we will experience a learning curve in understanding and applying web analytics and there will be a set of long-term, ongoing tasks for us. However, after this session, we are more confident about how to effectively apply and understand analytics towards tracking and achieving the goals of our organization and create an effective and useful set of websites.

About our Guest Authors:

Kelly Sattler is a Digital Project Librarian and Head of Web Services at Michigan State University.  She and her team are involved with migrating the Libraries’ website into Drupal 7 and are analyzing our Google Analytics data, search terms, and chat logs to identify places where we can improve our site through usability studies. Kelly spent 12 years in Information Technology at a large electrical company before becoming a librarian and has a bachelor’s degree in Computer Engineering.  She can be found on twitter at @ksattler.

Joel Richard is the lead Web Developer for the Smithsonian Libraries in Washington, DC and is currently in the process of rebuilding and migrating 15 years’ worth of content to Drupal 7. He has 18 years of experience in software development and internet technology and is a confirmed internet junkie. In his spare time, he is an enthusiastic proponent of Linked Open Data and believes it will change the way the internet works. One day. He can be found on twitter at @cajunjoel.


Security!

“I didn’t do anything! All I did was plugged in the USB stick to see if there was a name in any documents so I can return it to its owner.”

“I kept getting pop-ups on my workstation, and I keep clicking the cancel button on all of them. Why won’t they stop popping up?”

“Six people have called in the last half hour, saying they couldn’t access The Expensive Electronic Resource. I’ve called the vendor, and they said there’s been strange activity on our IP address and their system’s not allowing us to access their site…”

Have you heard any of the above, or have others to add? If so, you’re not alone. When you’re one of the techies in an academic library, you are on the front line when things go wrong. You help people get through printing, emails, and various library systems troubleshooting, and you’re good at it. How good are you, though, in regards of dealing with library IT security?

Why bother with security? What’s at stake and what to do about it
Parody error message for an OPAC

Source: Credaro,A.B.(2002). Computer Error Messages for Library OPACs. Warrior Librarian Weekly [online]
http://www.warriorlibrarian.com/LOL/portal.html

I mean, who is that desperate to break library IT security? All we have is bibliographic records, and, really, who in the heck wants them?

The reality is that academic libraries have much to offer for those who want to break in and wreak havoc, including student and faculty data, restricted resources, and access to the campus network. And yes, there are bots out there that screen scrape MARC records from OPACs that slow systems down to a crawl when a bot is scraping away (I’ve been through my share, and they’re a pain to deal with). Though academic libraries usually have the benefit of campus IT to take care of antivirus and firewall setup and maintenance, it is up to the library staff themselves to ensure that their system is secure.

The most important thing you can do is to be proactive. Assessing loopholes in your library technology setup before an attack will not only decrease the ways that your system can be compromised but also decrease the damage done to your system if your system is compromised.

Passwords
Your campus IT has password requirements built into various campus systems including hardware; library systems usually do not see the same treatment. You also have systems not forcing password changes after a certain time. And, since these systems typically not to talk to each other, you have staff using the same password for multiple accounts. Do any of the systems or applications you use in your library have any built-in password requirements? Can the system be set to automatically require a password change after a certain amount of time? If the systems in question cannot do either of the above, you can still create password policies that will need to be manually enforced.

System logs and usage reports
System and report monitoring can help pinpoint suspicious activity as well as determine if a system has been compromised. Many of you may remember Aaron Swartz systematically downloading materials from JSTOR on the MIT network in 2011 1. Sometimes unauthorized access to a library resource happens with one person systematically downloading a huge number of materials; other times, like University of Saskatchewan found out when they looked at their reports, unauthorized access may be dispersed geographically 2. In similar situations regular monitoring of usage reports would tip off library staff of the unusual behavior and contact the vendor to relay the information before the vendor’s systems cuts off access to all users.

Servers also need monitoring for suspicious activity. If your library is responsible for its own servers, there are many server monitoring applications to choose from, like Nagios 3. In addition to monitoring server resources through these applications, depending on the server setup you will have access to a variety of system logs for your perusal. I occasionally see a bot unsuccessfully to hack into one of our servers while scanning through our system logs; however, that’s the only way I would have known about those attempts. Logs and reports might be your first sign that your system has been compromised, so it’s best to check them regularly.

Human staff

A van with "free candy" painted on its side

Any email from “Campus IT” asking for your password = the above picture. Source: https://secure.flickr.com/photos/stephen_nomura/3070133859/

The biggest security loophole in any IT environment is your average human. Humans plug in USB sticks left behind in the computer lab into their workstation, they download files from emails or websites, and they keep clicking on those flashing pop-ups. Humans are also too trustworthy – you’ve probably seen the email where “Campus IT” is asking for your password so they can increase your storage quota. Many people still email their passwords and other sensitive information because they truly believe that the email is from IT, their bank, that businessman overseas, and so on. The best way to close the human loophole is through training. Training library staff in security issues can take on many forms. For example, at our monthly library staff meeting at Grinnell College we dedicate 10-15 minutes for “Tech Topics” where we regularly cover security topics, including what to do when you think your computer is infected, passwords, and data security. Staff have access to resources covered at these meetings in our shared drive for future reference.

Unfortunately, you cannot completely close the human loophole. While you can control the staff side, you cannot prevent a student giving out their password to their friends, or a faculty member giving their password to a colleague at a different institution. Not all is lost – tightening other loopholes does help with dealing with the user loophole.

Where to start

There’s a lot to keep track of when you are tackling IT security at your library; you might feel overwhelmed, not knowing where to start. Here are a few places and resources to help you start:

Campus IT: Most likely your campus IT department already has campus-wide policies on various topics, including password changes, what standards 3rd party vendors must meet when storing institutional data on off campus servers, and what to do when you suspect a networked computer is infected with a virus. Read the policies and talk to your campus IT staff to see how you can adapt their policies in your library’s specific needs.

SEC4LIB: Blake Carver, of LISNews fame, has created an online resource dedicated to library IT security. The website has a number of resources as well as a wiki with some outlines covering general IT security issues. If you find yourself with a library IT security question, there is a listserv where like-minded library staff can point you in the right direction.

Here!: Does your library have a security policy or action plan? Do you have a security horror story that you want others to learn from your mistake? Share them in the comments below.

  1. Schwartz, John. “Open-Access Advocate is Arrested for Huge Download.” New York Times, Jul 20, 2011. http://search.proquest.com/docview/878013667?accountid=7379.
  2. White, Heather Tones. “Electronic Resources Security: A look at Unauthorized Users.” Code4Lib Journal 12 (December 2010): http://journal.code4lib.org/articles/4117.
  3. Silver, T. Michael. “Monitoring Network and Service Availability with Open-Source Software.” Information Technology & Libraries 29, no. 1 (March 2010): 8-22.

A Brief Trip into Technology Planning, Brought to You By Meebo

The Day That Meebo Died

Today is the day that many librarians running reference services dreaded – Meebo discontinuing most of their products (with the exception of the Meebo Bar). Even though Meebo (or parts of it) will still live on in various Google products, that still doesn’t help those libraries who have build services and applications around a product that has been around for a while (Meebo was established in 2005).

If Meebo was any indication, even established, long running technology services can go away without much advanced notice. What is a library to do with incorporating third party applications, then? There is no way to ensure that all the services and applications that you use at your library will still be in existence for any length of time. Change is about the only constant in technology and it is up to us who deal with technology to plan for that change.

How to avoid backing your library into a corner with no escape route in sight

The worst has happened – the application you’re using is no longer being supported. Or, in a more positive light,  there’s a new alternative out there that performs better than the application your library is currently using at the moment.  The scenarios above have different priorities; migration due to discontinuation of support will probably happen on a faster timeline than upgrading to a better application. Overall, you should be prepared to survive without your current 3rd party applications with minimal amount of content loss and service disruption. For this post I’ll be focusing on third party application support and availability. Disruptions due to natural disasters, like fire, flooding, or, in Grinnell’s case, tornadoes, is equally important, but will not be covered at length in this post.

Competition (or lack there of)

When news broke that Google purchased Meebo, most weren’t sure about what would be next for the chat service. Soon afterwards, Meebo gave a month’s notice about the discontinuation of most of their products. Fortunately, alternative chat services were plentiful. Our library, for example, subscribes to LibraryH3lp, but we were using Meebo Messenger as well as the MeeboMe widget for some course pages to supplement LibraryH3lp’s services. After the announcement, our library quickly switched the messenger with Pidgin, and are working on replacing the Meebo widgets with LibraryH3lp’s widgets.

Having a diverse, healthy pool of different applications to choose from for a particular service is a good place to be when the application you use is no longer supported. Migrations are never fun, but consider the alternative. If you’re using a service or application that does not have readily available alternatives, how will your services be affected when that application is no longer supported?

The last question wasn’t rhetorical. If your answer is looking at a major service disruption, especially to services that are deemed by your library as mission-critical, then you’re putting yourself and the library in a precarious position. The same goes if the alternatives out there require a different technical skill set from your library staff. Applications that require a more advanced technical skill set will require more training and run the heightened risk of staff rejection if the required skill level is set too high.

Data wants to be backed up

Where’s your data right now? Can you export it out of the application? Do you even know if you can export your data or not? If not, then you’re setting yourself up for a preventable emergency. Exporting functionality and backups are especially important for services that are living outside of your direct control, like a hosted service. While most hosted services have backup servers to prevent loss of customer data, you should still have the ability to export your data and store it outside of the application. It’s best practice and gives you the peace of mind that you do not have to recreate years’ worth of work to restore data lost due to vendor error or lack of export functionality.

Another product that is widely used by academic libraries, LibGuides, provides a backup feature where you can export your guides in XML or individual guides in HTML. It will take some work for formatting and posting the data if needed, but the important thing is that you have your data and you can either host it locally in case of emergencies or harvest the content when the time comes to move on to another application.

Some technology service audit questions

Here are some general questions to start you down the path of evaluating where your library currently stands with third party applications you rely on for providing specific library services. Don’t worry if you find yourself not as prepared as you want to be. It’s better to start now than when you learn that another application you use will be shutting down.

  • What third party applications does your library currently use to provide library services?
  • Are there other comparable services/applications available?
    • What training resources are available for alternative applications?
    • What technical skills do these applications require? Are they compatible with the technical skills found with the majority of library staff?
  • Which applications are used for mission-critical library services?
  • Can you export your data and/or settings from the application?
    • If so, how often is the data being exported?
    • Where is the backup file stored? Locally? Remotely?
  • What is the plan if the application…
    • …is no longer supported?
    • …goes offline due to a service disruption?
      • …for a couple of hours?
      • …longer than a day?
      • …during finals week/first week of the semester/midterms (high pressure/high stakes times for library users)?

While there are many potential landmines when using third party applications for library services, these applications overall help expand and provide user services in various ways. Instead of becoming a technological recluse and shunning outside applications, use these applications wisely and make sure that your library has a plan in place.