Event Tracking with Google Analytics

In a previous post by Kelly Sattler and Joel Richard, we explored using web analytics to measure a website’s success. That post provides a clear high-level picture of how to create an analytics strategy by evaluating our users, web content, and goals. This post will explore a single topic in-depth; how to set up event tracking in Google Analytics.

Why Do We Need Event Tracking?

Finding solid figures to demonstrate a library’s value and make strategic decisions is a topic of increasing importance. It can be tough to stitch together the right information from a hodgepodge of third-party services; we rely on our ILSs to report circulation totals, our databases to report usage like full-text downloads, and our web analytics software to show visitor totals. But are pageviews and bounce rates the only meaningful measure of website success? Luckily, Google Analytics provides a way to track arbitrary events which occur on web pages. Event tracking lets us define what is important. Do we want to monitor how many people hover over a carousel of book covers, but only in the first second after the page has loaded? How about how many people first hover over the carousel, then the search box, but end up clicking a link in the footer? As long as we can imagine it and JavaScript has an event for it, we can track it.

How It Works

Many people are probably familiar with Google Analytics as a snippet of JavaScript pasted into their web pages. But Analytics also exposes some of its inner workings to manipulation. We can use the _gaq.push method to execute a “_trackEvent” method which sends information about our event back to Analytics. The basic structure of a call to _trackEvent is:

_gaq.push( [ '_trackEvent', 'the category of the event', 'the action performed', 'an optional label for the event', 'an optional integer value that quantifies something about the event' ] );

Looking at the array parameter of _gaq.push is telling: we should have an idea of what our event categories, actions, labels, and quantitative details will be before we go crazy adding tracking code to all our web pages. Once events are recorded, they cannot be deleted from Analytics. Developing a firm plan helps us to avoid the danger of switching the definition of our fields after we start collecting data.

We can be a bit creative with these fields. “Action” and “label” are just Google’s way of describing them; in reality, we can set up anything we like, using category->action->label as a triple-tiered hierarchy or as three independent variables.

Example: A List of Databases

Almost every library has a web page listing third-party databases, be they subscription or open access. This is a prime opportunity for event tracking because of the numerous external links. Default metrics can be misleading on this type of page. Bounce rate—the proportion of visitors who start on one of our pages and then immediately leave without viewing another page—is typically considered a negative metric; if a page has a high bounce rate, then visitors are not engaged with its content. But the purpose of a databases page is to get visitors to their research destinations as quickly as possible; bounce rate is a positive figure. Similarly, time spent on page is typically considered a positive sign of engagement, but on a databases page it’s more likely to indicate confusion or difficulty browsing. With event tracking, we can not only track which links were clicked but we can make it so database links don’t count towards bounce rate, giving us a more realistic picture of the page’s success.

One way of structuring “database” events is:

  • The top-level Category is “database”
  • The Action is the topical category, e.g. “Social Sciences”
  • The Label is the name of the database itself, e.g. “Academic Search Premier”

The final, quantitative piece could be the position of the database in the list or the number of seconds after page load it took the user to click its link. We could report some boolean value, such as whether the database is open access or highlighted in some way.

To implement this, we set up a JavaScript function which will be called every time one of our events occur. We will store some contextual information in variables, push that information to Google Analytics, and then delay the page’s navigation so the event has a chance to be recorded. Let’s walk through the code piece by piece:

function databaseTracking  ( event ) {
    var destination = $( this )[ 0 ].href,
        resource = $( this ).text(),
        // move up from <a> to parent element, then find the nearest preceding <h2> section header
        section = $( this ).parent().prevAll( 'h2' )[ 0 ].innerText,
        highlighted = $( this ).hasClass( 'highlighted' ) ? 1 : 0;

_gaq.push( [ '_trackEvent', 'database', resource, section, highlighted ] );

The top of our function just grabs information from the page. We’re using jQuery to make our lives easier, so all the $( this ) pieces of our code refer to the element that initiated the event. In our case, that’s the link pointing to an external database which the user just clicked. So we set destination to the link’s href attribute, resource to its text (e.g. the database’s name), section to the text inside the h2 element that labels a topical set of databases, and highlighted is a boolean value equal to 1 if the element has a class of “highlighted.” Next, this data is pushed into the _gaq array which is a queue of functions and their parameters that Analytics fires asynchronously. In this instance, we’re telling Analytics to run the _trackEvent function with the parameters that follow. Analytics will then record an event of type “database” with an action of [database name], a label of [section header], and a boolean representing whether it was highlighted or not.

setTimeout( function () {
    window.location = destination;
}, 200 );
event.preventDefault();
}

Next comes perhaps the least obvious piece: we prevent the default browser behavior from occurring, which in the case of a link is navigating away from our page, but then send the user to destination 200 milliseconds later anyways. The _trackEvent function now has a chance to fire; if we let the user follow the link right away it might not complete and our event would not be recorded.1

$( document ).ready(
    // target all anchors in list of databases
    $( '#databases-list a' ).on( 'click', databaseTracking )
);

There’s one last step; merely defining the databaseTracking function won’t cause it to execute when we want it to. JavaScript uses event handlers to execute certain functions based on various user actions, such as mousing over or clicking an element. Here, we add click event handlers to all <a> elements in the list of databases. Now whenever a user clicks a link in the databases list (which has a container with id “databases-list”), databaseTracking will run and send data to Google Analytics.

There is a demo on JSFiddle which uses the code above with some sample HTML. Every time you click a link, a pop-up shows you what the _gaq.push array looks like.

Though we used jQuery in our example, any JavaScript library can be used with event tracking.2 The procedure is always the same: write a function that gathers data to send back to Google Analytics and then add that function as a handler to an appropriate event, such as click or mouseover, on an element.

For another example, complete with code samples, see the article “Discovering Digital Library User Behavior with Google Analytics” in Code4Lib Journal. In it, Kirk Hess of the University of Illinois Urbana-Champaign details how to use event tracking to see how often external links are clicked or files are downloaded. While these events are particularly meaningful to digital libraries, most libraries offer PDFs or other documents online.

Some Ideas

The true power of Event Tracking is that it does not have to be limited to the mere clicking of hyperlinks; any interaction which JavaScript knows about can be recorded and categorized. Google’s own Event Tracking Guide uses the example of a video player, recording when control buttons like play, pause, and fast forward are activated. Here are some more obvious use cases for event tracking:

  • Track video plays on particular pages; we may already know how many views a video gets, but how many come from particular embedded instances of the video?
  • Clicking to external content, such as a vendor’s database or another library’s study materials.
  • If there is a print or “download to PDF” button on our site, we can track each time it’s clicked. Unfortunately, only Internet Explorer and Firefox (versions >= 6.0) have an onbeforeprint event in JavaScript which could be used to detect when a user hits the browser’s native print command.
  • Web applications are particularly suited to event tracking. Many modern web apps have a single page architecture, so while the user is constantly clicking and interacting within the app they rarely generate typical interaction statistics like pageviews or exits.

 

Notes
  1. There is a discussion on the best way to delay outbound links enough to record them as events. A Google Analytics support page condones the setTimeout approach. For other methods, there are threads on StackOverflow and various blog posts around the web. Alternatively, we could use the onmousedown event which fires slightly earlier than onclick but also might record false positives due to click-and-drag scrolling.
  2. Below is an attempt at rewriting the jQuery tracking code in pure JavaScript. It will only work in modern browsers because of use of querySelectorAll, parentElement, and previousElementSibling. Versions of Internet Explorer prior to 9 also use a unique attachEvent syntax for event handlers. Yes, there’s a reason people use libraries to do anything the least bit sophisticated with JavaScript.
function databaseTracking  ( event ) {
        var destination = event.target.href,
            resource = event.target.innerHTML,
            section = "none",
            highlighted = event.target.className.match( /highlighted/ ) ? 1: 0;

        // getting a parent element's nearest <h2> sibling is non-trivial without a library
        var currentSibling = event.target.parentElement;
        while ( currentSibling !== null ) {
            if ( currentSibling.tagName !== "H2" ) {
                currentSibling = currentSibling.previousElementSibling;
            }
            else {
                section = currentSibling.innerHTML;
                currentSibling = null;
            }
        }

        _gaq.push( [ '_trackEvent', 'database', resource, section, highlighted ] );

        // delay navigation to ensure event is recorded
        setTimeout( function () {
            window.location = destination;
        }, 200 );
        event.preventDefault();
    }

document.addEventListener( 'DOMContentLoaded', function () {
        var dbLinks = document.querySelectorAll( '#databases-list a' ),
            len = dbLinks.length;
        for ( i = 0; i < len; i++ ) {
            dbLinks[ i ].addEventListener( 'click', databaseTracking, false );
        }
    }, false );
Association of College & Research Libraries. (n.d.). ACRL Value of Academic Libraries. Retrieved January 12, 2013, from http://www.acrl.ala.org/value/
Event Tracking – Web Tracking (ga.js) – Google Analytics — Google Developers. (n.d.). Retrieved January 12, 2013, from https://developers.google.com/analytics/devguides/collection/gajs/eventTrackerGuide
Hess, K. (2012). Discovering Digital Library User Behavior with Google Analytics. The Code4Lib Journal, (17). Retrieved from http://journal.code4lib.org/articles/6942
Marek, K. (2011). Using Web Analytics in the Library a Library Technology Report. Chicago, IL: ALA Editions. Retrieved from http://public.eblib.com/EBLPublic/PublicView.do?ptiID=820360
Sattler, K., & Richard, J. (2012, October 30). Learning Web Analytics from the LITA 2012 National Forum Pre-conference. ACRL TechConnect Blog. Blog. Retrieved January 18, 2013, from http://acrl.ala.org/techconnect/?p=2133
Tracking Code: Event Tracking – Google Analytics — Google Developers. (n.d.). Retrieved January 12, 2013, from https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiEventTracking
window.onbeforeprint – Document Object Model (DOM) | MDN. (n.d.). Mozilla Developer Network. Retrieved January 12, 2013, from https://developer.mozilla.org/en-US/docs/DOM/window.onbeforeprint

An Elevator Pitch for File Naming Conventions

As a curator and a coder, I know it is essential to use naming conventions.  It is important to employ a consistent approach when naming digital files or software components such as modules or variables. However, when a student assistant asked me recently why it was important not to use spaces in our image file names, I struggled to come up with an answer.  “Because I said so,” while tempting, is not really an acceptable response.  Why, in fact, is this important?  For this blog entry, I set out to answer this question and to see if, along the way, I could develop an “elevator pitch” – a short spiel on the reasoning behind file naming conventions.

The Conventions

As a habit, I implore my assistants and anyone I work with on digital collections to adhere to the following when naming files:

  • Do not use spaces or special characters (other than “-” and “_”)
  • Use descriptive file names.  Descriptive file names include date information and keywords regarding the content of the file, within a reasonable length.
  • Date information is the following format: YYYY-MM-DD.

So, 2013-01-03-SmithSculptureOSU.jpg would be an appropriate file name, whereas Smith Jan 13.jpg would not.  But, are these modern practices?  Current versions of Windows, for example, will accept a wide variety of special characters and spaces in naming files, so why is it important to restrict the use of these characters in our work?

The Search Results

A quick Google search finds support for my assertions, though often for very specific cases of file management.  For example, the University of Oregon publishes recommendations on file naming for managing research data.  A similar guide is available from the University of Illinois library, but takes a much more technical, detailed stance on the format of file names for the purposes of the library’s digital content.

The Bentley Historical Library at University of Michigan, however, provides a general guide to digital file management very much in line with my practices: use descriptive directory and file names, avoid special characters and spaces.  In addition, this page discusses avoiding personal names in the directory structure and using consistent conventions to indicate the version of a file.

The Why – Dates

The Bentley page also provides links to a couple of sources which help answer the “why” question.  First, there is the ISO date standard (or, officially, “ISO 8601:2004: Data elements and interchange formats — Information interchange — Representation of dates and times”).  This standard dictates that dates be ordered from largest term to smallest term, so instead of the month-day-year we all wrote on our grade school papers, dates should take the form year-month-day.  Further, since we have passed into a new millennium, a four digit year is necessary.  This provides a consistent format to eliminate confusion, but also allows for file systems to sort the files appropriately.  For example, let’s look at the following three files:

1960-11-01_libraryPhoto.jpg
1977-01-05_libraryPhoto.jpg
2000-05-01_libraryPhoto.jpg

If we expressed those dates in another format, say, month-day-year, they would not be listed in chronological order in a file system sorting alphabetically.  Instead, we would see:

01-05-1977_libraryPhoto.jpg
05-01-2000_libraryPhoto.jpg
11-01-1960_libraryPhoto.jpg

This may not be a problem if you are visually searching through three files, but what if there were 100?  Now, if we only used a two digit year, we would see:

00-05-01_libraryPhoto.jpg
60-11-01_libraryPhoto.jpg
77-01-05_libraryPhoto.jpg

If we did not standardize the number of digits, we might see:

00-5-1_libraryPhoto.jpg
77-1-5_libraryPhoto.jpg
60-11-1_libraryPhoto.jpg

You can try this pretty easily on your own system.  Create three text files with the names above, sort the files by name and check the order.  Imagine the problems this might create for someone trying to quickly locate a file.

You might ask, why include the date at all, when dates are also maintained by the operating system?  There are many situations where the operating system dates are unreliable.  In cases where a file moves to a new drive or computer, for example, the Date Created may reflect the date the file moved to the new system, instead of the initial creation date.  Or, consider the case where a user opens a file to view it and the application changes the Date Modified, even though the file content was not modified.  Lastly, consider our earlier example of a photograph from 1960; the Date Created is likely to reflect the date of digitization.  In each of these examples, it would be helpful to include an additional date in the file name.

The Why – Descriptive File Names

So far we have digressed down a date-specific path.  What about our other conventions?  Why are those important?  Also linked from the Bentley Library and in the Google search results are YouTube videos created by the State Library of North Carolina which answer some of these questions.  The Inform U series on file naming has four parts, and is intended to help users manage personal files.  However, the rationale described in Part 1 for descriptive file names in personal file management also applies in our libraries.

First, we want to avoid the accidental overwriting of files.  Image files can provide a good example here: many cameras use the file naming convention of IMG_1234.jpg.  If this name is unique to the system, that works ok, but in a situation where multiple cameras or scanners are generating files for a digital collection, there is potential for problems.  It is better to batch re-name image files with a more descriptive name. (Tutorials on this can be found all over the web, such as the first item in this list on using Mac’s Automator program to re-name a batch of photos).

Second, we want to avoid the loss of files due to non-descriptive names.  While many operating systems will search the text content of files, naming files appropriately makes for more efficient access.  For example, consider mynotes.docx and 2012-01-05WebMeetingNotes.docx – which file’s contents are easier to ascertain?

I should note, however, that there are cases where non-descriptive file names are appropriate.  The use of a unique identifier as a filename is sometimes a necessary approach.  However, in those cases where you must use a non-descriptive filename, be sure that the file names are unique and in a descriptive directory structure.  Overall, it is important that others in the organization have the same ability to find and use the files we currently manage, long after we have moved on to another institution.

The Why – Special Characters & Spaces

We have now covered descriptive names and reasons for including dates, which leaves us with spaces and special characters to address.  Part 3 of the Inform U video series addresses this as well.  Special characters can designate special meaning to programming languages and operating systems, and might be misinterpreted when included in file names.  For instance, the $ character designates the beginning of variable names in the php programming language and the \ character designates file path locations in the Windows operating system.

Spaces may make things easier for humans to read, but systems generally do better without the spaces.  While operating systems attempt to handle spaces gracefully and generally do so, browsers and software programs are not consistent in how they handle spaces.  For example, consider a file stored in a digital repository system with a space in the file name.  The user downloads the file and their browser truncates the file name after the first space.  This equates to the loss of any descriptive information after the first space.  Plus, the file extension is also removed, which may make it harder for less tech savvy users to use a file.

The Pitch

That example leads us to the heart of the issue: we never know where our files are going to end up, especially files disseminated to the public.  Our content is useless if our users cannot open it due to a poorly formatted file name.  And, in the case of non-public files or our personal archives, it is essential to facilitate the discovery of items in the piles and piles of digital material accumulated every day.

So, do I have my elevator pitch?  I think so.  When asked about file naming standards in the future, I think I can safely reply with the following:  “It is impossible to accurately predict all of the situations in which a file might be used.  Therefore, in the interest of preserving access to digital files, we choose file name components that are least likely to cause a problem in any environment.  File names should provide context and be easily understood by humans and computers, now and in the future.”

And, like a good file name, that is much more effective and descriptive than, “Because I said so.”