Aaron Swartz and Too-Comfortable Research Libraries

*** Update: Several references and a video added (thanks to Brett Bonfield) on Feb. 21, 2013. ***

Who was Aaron Swartz?

If you are a librarian and do not know who Aaron Swartz is, that should probably change now. He helped developing the RSS standard, was the co-founder of Reddit, worked on the Open Library project, downloaded and freed 20% (2.7 million documents) of the Public Access to Court Electronic Records (PACER) database that charges access fees for the United States federal court documents, out of which about 1,600 had privacy issues, played a lead role in preventing the Stop Online Piracy Act (SOPA), and wrote the Guerrilla Open Access Manifesto.

Most famously, he was arrested in 2011 for the mass download of journal articles from JSTOR. He returned the documents to JSTOR and apologized. The Massachusetts state court dismissed the charges, and JSTOR decided not to pursue civil litigation. But MIT stayed silent, and the federal court charged Swartz with wire fraud, computer fraud, unlawfully obtaining information from a protected computer and recklessly damaging a protected computer. If convicted on these charges, Swartz could be sentenced to up to 35 years in prison at the age of 26. He committed suicide after facing charges for two years, on January 11, 2013.

Information wants to be free; Information wants to be expensive

Now, he was a controversial figure. He advocated Open Access (OA) but to the extent of encouraging scholars, librarians, students who have access to copyrighted academic materials to trade passwords and circulate them freely on the grounds that this is an act of civil disobedience against unjust copyright laws in his manifesto. He was an advocate of the open Internet, the transparent government, and open access to scholarly output. But he also physically hacked into the MIT network wiring closet and attached his laptop to download over 4 million articles from JSTOR. Most people including librarians are not going to advocate trading their institutions’ subscription database passwords or breaking into a staff-only computer networking area of an institution. The actual method of OA that Swartz recommended was highly controversial even among the strongest OA advocates.

But in his Guerrilla OA manifesto, Swartz raised one very valid point about the nature of information in the era of the World Wide Web. That is, information is power. (a) As power, information can be spread to and be made useful to as many of us as possible. Or, (b) it can be locked up and the access to it can be restricted to only those who can pay for it or have access privileges some other way. One thing is clear. Those who do not have access to information will be at a significant disadvantage compared to those who do.

And I would like to ask what today’s academic and/or research libraries are doing to realize Scenario (a) rather than Scenario (b). Are academic/research libraries doing enough to make information available to as many as possible?

Too-comfortable Internet, Too-comfortable academic libraries

Among the many articles I read about Aaron Swartz’s sudden death, the one that made me think most was “Aaron Swartz’s suicide shows the risk of a too-comfortable Internet.” The author of this article worries that we may now have a too-comfortable Internet. The Internet is slowly turning into just another platform for those who can afford purchasing information. The Internet as the place where you could freely find, use, modify, create, and share information is disappearing. Instead pay walls and closed doors are being established. Useful information on the Internet is being fast monetized, and the access is no longer free and open. Even the government documents become no longer freely accessible to the public when they are put up on the Internet (likely to be due to digitization and online storage costs) as shown in the case of PACER and Aaron Swartz. We are more and more getting used to giving up our privacy or to paying for information. This may be inevitable in a capitalist society, but should the same apply to libraries as well?

The thought about the too-comfortable Internet made me wonder whether perhaps academic research libraries were also becoming too comfortable with the status quo of licensing electronic journals and databases for patrons. In the times when the library collection was physical, people who walk into the library were rarely turned away. The resources in the library are collected and preserved because we believe that people have the right to learn and investigate things and to form one’s own opinions and that the knowledge of the past should be made available for that purpose. Regardless of one’s age, gender, social and financial status, libraries have been welcoming and encouraging people who were in the quest for knowledge and information. With the increasing number of electronic resources in the library, however, this has been changing.

Many academic libraries offer computers, which are necessary to access electronic resources of the library itself. But how many of academic libraries keep all the computers open for user without the user log-in? Often those library computers are locked up and require the username and password, which only those affiliated with the institution possess. The same often goes for many electronic resources. How many academic libraries allow the on-site access to electronic resources by walk-in users? How many academic libraries insist on the walk-in users’ access to those resources that they pay for in the license?  Many academic libraries also participate in the Federal Depository Library program, which requires those libraries to provide free access to the government documents that they receive to the public. But how easy is it for the public to enter and access the free government information at those libraries?

I asked in Twitter about the guest access in academic libraries to computers and e-resources. Approximately 25 academic librarians generously answered my question. (Thank you!) According to the responses in Twitter,  almost all except a few libraries ( mentioned in Twitter responses) offer guest access to computers and e-resources on-site. It is to be noted, however, that a few offer the guest -access to neither. Also some libraries limit the guests’ computer-use to 30 minutes – 4 hours, thereby restricting the access to the library’s electronic resources as well. Only a few libraries offer free wi-fi for guests. And at some libraries, the guest wi-fi users are unable to access the library’s e-resources even on-site because the IP range of the guest wi-fi is different from that of the campus wi-fi.

I am not sure how many academic libraries consciously negotiate the walk-in users’ on-site access with e-resources vendors or whether this is done somewhat semi-automatically because many libraries ask the library building IP range to be registered with vendors so that the authentication can be turned off inside the building. I surmise that publishers and database vendors will not automatically permit the walk-in users’ on-site access in their licenses unless libraries ask for it. Some vendors also explicitly prohibit libraries from using their materials to fill the Interlibrary loan requests from other libraries. The electronic resource vendors and publishers’ pricing has become more and more closely tied to the number of patrons who can access their products. Academic libraries has been dealing with the escalating costs for electronic resources by filtering out library patrons and limiting the access to those in a specific disciplines. For example, academic medical and health sciences libraries often subscribe to databases and resources that have the most up-to-date information about biomedical research, diseases, medications, and treatments. These are almost always inaccessible to the general public and often even to those affiliated with the institution. The use of these prohibitively expensive resources is limited to a very small portion of people who are affiliated with the institution in specific disciplines such as medicine and health sciences. Academic research libraries have been partially responsible for the proliferation of these access limitations by welcoming and often preferring these limitations as a cost-saving measure. (By contrast, if those resources were in the print format, no librarian would think that it is OK to permanently limit its use to those in medical or health science disciplines only.)

Too-comfortable libraries do not ask themselves if they are serving the public good of providing access to information and knowledge for those who are in need but cannot afford it. Too-comfortable libraries see their role as a mediator and broker in the transaction between the information seller and the information buyer. They may act as an efficient and successful mediator and broker. But I don’t believe that that is why libraries exist. Ultimately, libraries exist to foster the sharing and dissemination of knowledge more than anything, not to efficiently mediate information leasing. And this is the dangerous idea: You cannot put a price tag on knowledge; it belongs to the human race. Libraries used to be the institution that validates and confirms this idea. But will they continue to be so in the future? Will an academic library be able to remain as a sanctuary for all ideas and a place for sharing knowledge for people’s intellectual pursuits regardless of their institutional membership? Or will it be reduced to a branch of an institution that sells knowledge to its tuition-paying customers only? While public libraries are more strongly aligned with this mission of making information and knowledge freely and openly available to the public than academic libraries, they cannot be expected to cover the research needs of patrons as fully as academic libraries.

I am not denying that libraries are also making efforts in continuing the preservation and access to the information and resources through initiatives such as Hathi Trust and DPLA (Digital Public Library of America). My concern is rather whether academic research libraries are becoming perhaps too well-adapted to the times of the Internet and online resources and too comfortable serving the needs of the most tangible patron base only in the most cost-efficient way, assuming that the library’s mission of storing and disseminating knowledge can now be safely and neutrally relegated to the Internet and the market. But it is a fantasy to believe that the Internet will be a sanctuary for all ideas (The Internet is being censored as shown in the case of Tarek Mehanna.), and the market will surely not have the ideal of the free and open access to knowledge for the public.

If libraries do not fight for and advocate those who are in need of information and knowledge but cannot afford it, no other institution will do so. Of course, it costs to create, format, review, and package content. Authors as well as those who work in this business of content formatting, reviewing, packaging, and producing should be compensated for their work. But not to the extent that the content is completely inaccessible to those who cannot afford to purchase but nevertheless want access to it for learning, inquiry, and research. This is probably the reason why we are all moved by Swartz’s Guerrilla Open Access Manifesto in spite of the illegal implications of the action that he actually recommended in the manifesto.

Knowledge and information is not like any other product for purchase. Sharing increases its value, thereby enabling innovation, further research, and new knowledge. Limiting knowledge and information to only those with access privilege and/or sufficient purchasing power creates a fundamental inequality. The mission of a research institution should never be limited to self-serving its members only, in my opinion. And if the institution forgets this, it should be the library that first raises a red flag. The mission of an academic research institution is to promote the freedom of inquiry and research and to provide an environment that supports that mission inside and outside of its walls, and that is why a library is said to be the center of an academic research institution.

I don’t have any good answers to the inevitable question of “So what can an academic research library do?” Perhaps, we can start with broadening the guest access to the library computers, wi-fi, and electronic resources on-site. Academic research libraries should also start asking themselves this question: What will libraries have to offer for those who seek knowledge for learning and inquiry but cannot afford it? If the answer is nothing, we will have lost libraries.

In his talk about the Internet Archive’s Open Library project at the Code4Lib Conference in 2008 (at 11:20), Swartz describes how librarians had argued about which subject headings to use for the books in the Open Library website. And he says, “We will use all of them. It’s online. We don’t have to have this kind of argument.” The use of online information and resources does not incur additional costs for use once produced. Many resources, particularly those scholarly research outputs, already have established buyers such as research libraries. Do we have to deny access to information and knowledge to those who cannot afford but are seeking for it, just so that we can have a market where information and knowledge resources are sold and bought and authors are compensated along with those who work with the created content as a result? No, this is a false question. We can have both. But libraries and librarians will have to make it so.

Videos to Watch

“Code4Lib 2008: Building the Open Library – YouTube.”


“Aaron Swartz on Picking Winners” American Library Association Midwinter meeting, January 12, 2008.

“Freedom to Connect: Aaron Swartz (1986-2013) on Victory to Save Open Internet, Fight Online Censors.”

REFERENCES

“Aaron Swartz.” 2013. Accessed February 10. http://www.aaronsw.com/.

“Aaron Swartz – Wikipedia, the Free Encyclopedia.” 2013. Accessed February 10. http://en.wikipedia.org/wiki/Aaron_Swartz#JSTOR.

“Aaron Swartz on Picking Winners – YouTube.” 2008. http://www.youtube.com/watch?feature=player_embedded&v=BvJqXaoO4FI.

“Aaron Swartz’s Suicide Shows the Risk of a Too-comfortable Internet – The Globe and Mail.” 2013. Accessed February 10. http://www.theglobeandmail.com/commentary/aaron-swartzs-suicide-shows-the-risk-of-a-too-comfortable-internet/article7509277/.

“Academics Remember Reddit Co-Founder With #PDFTribute.” 2013. Accessed February 10. http://www.slate.com/blogs/the_slatest/2013/01/14/aaron_swartz_death_pdftribute_hashtag_aggregates_copyrighted_articles_released.html.

“After Aaron, Reputation Metrics Startups Aim To Disrupt The Scientific Journal Industry | TechCrunch.” 2013. Accessed February 10. http://techcrunch.com/2013/02/03/the-future-of-the-scientific-journal-industry/.

American Library Association, “A Memorial Resolution Honoring Aaron Swartz.” 2013. http://connect.ala.org/files/memorial_5_aaron%20swartz.pdf.

“An Effort to Upgrade a Court Archive System to Free and Easy – NYTimes.com.” 2013. Accessed February 10. http://www.nytimes.com/2009/02/13/us/13records.html?_r=1&.

Bonfield, Brett. 2013. “Aaron Swartz.” In the Library with the Lead Pipe (February 20). http://www.inthelibrarywiththeleadpipe.org/2013/aaron-swartz/.

“Code4Lib 2008: Building the Open Library – YouTube.” 2013. Accessed February 10. http://www.youtube.com/watch?v=oV-P2uzzc4s&feature=youtu.be&t=2s.

“Daily Kos: What Aaron Swartz Did at MIT.” 2013. Accessed February 10. http://www.dailykos.com/story/2013/01/13/1178600/-What-Aaron-Swartz-did-at-MIT.

Dupuis, John. 2013a. “Around the Web: Aaron Swartz Chronological Link Roundup – Confessions of a Science Librarian.” Accessed February 10. http://scienceblogs.com/confessions/2013/01/20/around-the-web-aaron-swartz-chronological-link-roundup/.

———. 2013b. “Library Vendors, Politics, Aaron Swartz, #pdftribute – Confessions of a Science Librarian.” Accessed February 10. http://scienceblogs.com/confessions/2013/01/17/library-vendors-politics-aaron-swartz-pdftribute/.

“FDLP for PUBLIC.” 2013. Accessed February 10. http://www.gpo.gov/libraries/public/.

“Freedom to Connect: Aaron Swartz (1986-2013) on Victory to Save Open Internet, Fight Online Censors.” 2013. Accessed February 10. http://www.democracynow.org/2013/1/14/freedom_to_connect_aaron_swartz_1986.

“Full Text of ‘Guerilla Open Access Manifesto’.” 2013. Accessed February 10. http://archive.org/stream/GuerillaOpenAccessManifesto/Goamjuly2008_djvu.txt.

Groover, Myron. 2013. “British Columbia Library Association – News – The Last Days of Aaron Swartz.” Accessed February 21. http://www.bcla.bc.ca/page/news/ezlist_item_9abb44a1-4516-49f9-9e31-57685e9ca5cc.aspx#.USat2-i3pJP.

Hellman, Eric. 2013a. “Go To Hellman: Edward Tufte Was a Proto-Phreaker (#aaronswnyc Part 1).” Accessed February 21. http://go-to-hellman.blogspot.com/2013/01/edward-tufte-was-proto-phreaker.html.

———. 2013b. “Go To Hellman: The Four Crimes of Aaron Swartz (#aaronswnyc Part 2).” Accessed February 21. http://go-to-hellman.blogspot.com/2013/01/the-four-crimes-of-aaron-swartz.html.

“How M.I.T. Ensnared a Hacker, Bucking a Freewheeling Culture – NYTimes.com.” 2013. Accessed February 10. http://www.nytimes.com/2013/01/21/technology/how-mit-ensnared-a-hacker-bucking-a-freewheeling-culture.html?pagewanted=all.

March, Andrew. 2013. “A Dangerous Mind? – NYTimes.com.” Accessed February 10. http://www.nytimes.com/2012/04/22/opinion/sunday/a-dangerous-mind.html?pagewanted=all.

“MediaBerkman » Blog Archive » Aaron Swartz on The Open Library.” 2013. Accessed February 22. http://blogs.law.harvard.edu/mediaberkman/2007/10/25/aaron-swartz-on-the-open-library-2/.

Peters, Justin. 2013. “The Idealist.” Slate, February 7. http://www.slate.com/articles/technology/technology/2013/02/aaron_swartz_he_wanted_to_save_the_world_why_couldn_t_he_save_himself.html.

“Public Access to Court Electronic Records.” 2013a. Accessed February 10. http://www.pacer.gov/.

“Publishers and Library Groups Spar in Appeal to Ruling on E-Reserves – Technology – The Chronicle of Higher Education.” 2013. Accessed February 10. http://chronicle.com/article/PublishersLibrary-Groups/136995/?cid=pm&utm_source=pm&utm_medium=en.

“Remember Aaron Swartz.” 2013. Celebrating Aaron Swartz. Accessed February 22. http://www.rememberaaronsw.com.

Rochkind, Jonathan. 2013. “Library Values and the Growing Scholarly Digital Divide: In Memoriam Aaron Swartz | Bibliographic Wilderness.” Accessed February 10. http://bibwild.wordpress.com/2013/01/13/library-values-and-digital-divide-in-memoriam-aaron-swartz/.

Sims, Nancy. 2013. “What Is the Government’s Interest in Copyright? Not That of the Public. – Copyright Librarian.” Accessed February 10. http://blog.lib.umn.edu/copyrightlibn/2013/02/what-is-the-governments-interest-in-copyright.html.

Stamos, Alex. 2013. “The Truth About Aaron Swartz’s ‘Crime’.” Unhandled Exception. Accessed February 22. http://unhandled.com/2013/01/12/the-truth-about-aaron-swartzs-crime/.

Summers, Ed. 2013. “Aaronsw | Inkdroid.” Accessed February 21. http://inkdroid.org/journal/2013/01/19/aaronsw/.

“The Inside Story of Aaron Swartz’s Campaign to Liberate Court Filings | Ars Technica.” 2013. Accessed February 10. http://arstechnica.com/tech-policy/2013/02/the-inside-story-of-aaron-swartzs-campaign-to-liberate-court-filings/.

“Welcome to Open Library (Open Library).” 2013. Accessed February 10. http://openlibrary.org/.

West, Jessamyn. 2013. “Librarian.net » Blog Archive » On Leadership and Remembering Aaron.” Accessed February 21. http://www.librarian.net/stax/3984/on-leadership-and-remembering-aaron/.

 

Learning Web Analytics from the LITA 2012 National Forum Pre-conference

Note: The 2012 LITA Forum pre-conference on Web Analytics was taught by Tabatha (Tabby) Farney and Nina McHale.  Our guest authors, Joel Richard and Kelly Sattler were two of the people who attended the pre-conference and they wrote a summary of the pre-conference to share with the ACRL TechConnect readers.

In advance of the conference, Tabby and Nina reached out to the participants ahead of time with a survey on what we the participants were interested in learning and solicited questions to be answered in the class.  Twenty-one participants responded and of them seventeen were already using Google Analytics (GA).  About half those using GA check their reports 1-2 times per month and the rest less often.  The conference opened with introductions and a brief description of what we were doing with analytics on our website and what we hoped to learn.

Web Analytics Strategy

The overall theme of the pre-conference was the following:

A web analytics strategy is the structured process of identifying and evaluating your key performance indicators on the basis of an organization’s objectives and website goals – the desired outcomes, or what you want people to do on the website.

We learned that beyond the tool we use measure our analytics, we need to identify what we want our website to do.  We do this by using pre-existing documentation our institutions have on their mission and purpose as well as the mission and purpose of the website and who it is to serve. Additionally, we need a privacy statement so our patrons understand that we will be tracking their movements on the site and what we will be collecting. We learned that there are challenges when using only IP addresses (versus cookies) for tracking purposes.  For example, does our institution’s network architecture allow for you to identify patrons versus staff using IP address or are cookies a necessity?

Tool Options for Website Statistics

To start things off, we discussed the types of web analytics tools that are available and which we were using. Many of the participants were already using Google Analytics (GA) and thus most of the activities were demonstrated in GA as we could log into our own accounts.  We were reminded that though it is free, GA keeps our data and does not allow us to delete it.  GA has us place a bit of Javascript code on the pages we want tracked. It is easier to set up GA within a content management system but it may not work as well for mobile devices.  Piwik is an open-source alternative to Google Analytics that uses a similar Javascript tagging method.  Additionally we were reminded that if we use any Javascript tagging method, we should review our code snippets least every two years as they do change.

We learned about other, less common systems for tracking user activity. AWStats is installed locally and reads the website log files and processes them into reports.  It offers the user more control and may be more useful for sites not in a content management system.  Sometimes it provides more information than desired and will be unable to clearly differentiate between users based on IP.  Other similar tools are Webalizer, FireStats, and Webtrends.

A third option is to use Web Beacons which are small, invisible transparent GIFs embedded on every page.  This is useful for when Javascript won’t work, but they probably aren’t as applicable today as they once were.

Finally, we took a brief look at the heat mapping tool, Crazy Egg.  It focuses on visual analytics and uses Javascript tagging to provide heat maps of exactly where visitors clicked on our site offering insights as to what areas of a page receive the most attention.  Crazy Egg has a 30 day free trial and then it costs per page tracked, but there are subscriptions for under $100/month if you find the information worth the cost.  The images can really give webmasters an understanding of what the users are doing on their site and are persuasive tools when redesigning a page or analyzing specific kinds of user behavior.

Core Concepts and Metrics of Web Analytics

Next, Tabby and Nina presented a basic list of terminology used within web analytics.  Of course, different tools refer to the same concept by different names, but these were the terms we used throughout our session.

  • Visits – A visit is when someone comes to the site. A visit ends when a user has not seen a new page in 30 minutes (or when they have left the site.)
  • Visitor Types: New & Returning – A cookie is used to determine whether a visitor has been to the site in the past. If a user disables cookies or clears them regularly, they will show up as a new user each time they visit.
  • Unique Visitors – To distinguish visits by the same person, the cookie is used to track when the same person returns to the site in a given period of time (hours, days, weeks or more).
  • Page Views – More specific than “hits,” a page view is recorded when a page is loaded in a visitor’s browser.
  • User Technology – This includes information about the visitor’s operating system, browser version, mobile device or desktop computer, etc.
  • Geographic Data – A visitor’s location in the world can often be determined to which city they are in.
  • Entry and Exit Pages – These refer to the page the visitor sees first during their visit (Entry) and the last page they see before leaving or their session expires (Exit).
  • Referral Sources – Did the visitor come from another site? If so, this will tell who is sending traffic to us.
  • Bounce Rate – A bounce is when someone comes to the site and views only one page before leaving.
  • Engagement Metrics – This indicates how much visitors are on our site measured by time they spent on the site or number of pages viewed.
Goals/Conversion

Considering how often the terms “goals” and “conversions” are used, we learned that it is important to realize that in web analytics lingo, a goal is a metric, also referred to as a conversion, and measures whether a desired action has occurred on your site. There are four primary types of conversions:

  1. URL Destination – A visitor has reached a targeted end page.  For commercial sites, this would be the “Thank you for your purchase” page. For a library site, this is a little more challenging to classify and will include several different pages or types of pages.
  2. Visit Duration – How much time a visitor spends on our site. This is often an unclear concept. If a user is on the site for a long time, we don’t know if they were interrupted while on our site, if they had a hard time finding what they were looking for, or if they were enthralled with all the amazing information we provide and read every word twice.
  3. Pages per Visit – Indicates site engagement. Similar to Visit Duration, many pages may mean the user was interested in our content, or that they were unable to find what they were looking for.  We distinguish this by looking at the “paths” of page the visitor saw.  As an example, we might want to know if someone finds the page they were looking for in three pages or less.
  4. Events – Targets an action on the site. This can be anything and is often used to track outbound pages or links to a downloadable PDF.

Conversion rate is an equation that shows the percentage of how often the desired action occurs.

Conversion rate = Desired action / Total or Unique visits

Goal Reports also known as Conversion Reports are sometimes provided by the tool and include the total number of conversions and the conversion rate.  We learned that we can also assign a monetary value to take advantage of the more commerce-focused tools often used in analytics software, but the results can be challenging to interpret.  Conversion reports also show an Abandonment Rate as people leave our site. However, we can counter this by creating a “funnel” that identifies the steps needed to complete the goal. The funnel report shows us where in the steps visitors drop off and how many make it through the complete conversion.

Key Performance Indicators (KPIs) were a focus of much of the conference.  They measure the outcome based on our site’s objectives/goals and are implemented via conversion rates.  KPIs are unique to each site.  Through examples, we learned that each organization’s web presence may be made up of multiple sites. For instance, an organization may have its main library pages, Libguides, the catalog, a branch site, a set of sites for digitized collections, etc. A KPI may span activities on more than one of these sites.

Segment or Filter

We then discussed the similarities and differences between Segments and Filters, both of which offer methods to narrow the data enabling us to focus on a particular point of interest.  The difference between the two is that (i) filtering will remove the data from the collection process thereby resulting in lost data; whereas (ii) segmentation hides data from the reports leaving it available for other reports. Generally, we felt that the use of Segments was preferable over Filters in Google Analytics given that it is impossible to recover data that is lost during GA’s real-time data collection.

We talked about the different kinds of segments that some of us are using. For example, is Joel’s organization, he is using a technique to segment the staff computers in their offices from computers in the library branches by adding a query string to the homepage URL of the branch computers’ browsers. Using this, he can create a segment in Google Analytics to view the activity of either group of users by segmenting on the different Entry pages (with and without this special query string). Segmenting on IP Address also further segregates his users between researchers and the general public.

Benchmarking

As a step towards measuring success for our sites, we discussed benchmarking, which is used to look at the performance of our sites before and after a change. Having performance data before making changes is essential to knowing whether those changes are successful, as defined by our goals and KPIs.

Comparing a site to itself either in a prior iteration or before making a change is called Internal Benchmarking. Comparing a site to other similar sites on the Internet is known as External Benchmarking. Since external benchmarking requires data to make a comparison, we need to request of another website their data or reports. Another alternative is to use service sites such as Alexa, Quantcast, Hitwise and others, which will do the comparison for you.  Keep in mind that these may use e-commerce or commercial indicators which may not make for a good comparison to humanities-oriented sites.

Event Tracking

Page views and visitor statistics are important for tracking how our site is doing, but sometimes we need to know about events that aren’t tracked through the normal means. We learned that an Event, both in the conceptual sense and in the analytics world, can be used to track actions that don’t naturally result in a page view. Events are used to track access to resources that aren’t a web page, such as videos, PDFs, dynamic page elements, and outbound links.

Tracking events doesn’t always come naturally and require some effort to set up. Content management systems (CMS) like Drupal help make event tracking easy either via a module or plugin or simply by editing a template or function that produces the HTML pages.  If a website is not using a CMS the webmaster will need to add event tracking code to each link or action that they wish to record in Google Analytics. Fortunately, as we saw, the event tracking code is simple and easy to add to a site and there is good documentation describing this in Google’s Event Tracking Guide documentation.

Finally, we learned that tracking events is preferable to creating “fake” pageviews as it does not inflate the statistics generated by regular pageviews due to the visitors’ usual browsing activities.

Success for our websites

Much of the second half of the conference was focused on learning about and performing some exercises to define and measure success for our sites. We started by understanding our site in terms of our Users, our Content and our Goals. These all point to the site’s purpose and circle back around to the content delivered by our site to the users in order to meet our goals. It’s all interconnected. The following questions and steps helped us to clarify the components that we need to have in hand to develop a successful website.

Content Audit – Perform an inventory that lists every page on the site. This are likely to be tedious and time-consuming. It includes finding abandoned pages, lost images, etc.  The web server is a great place to start identifying files.  Sometimes we can use automated web crawling tools to find the pages on our site.  Then we need to evaluate that content. Beyond the basic use of a page, consider recording last updated date, bounce rate, time on page, whether it is a landing page or not, and who is responsible for the content.

Identifying Related Sites – Create a list of sites that our site links to and sites that link back to our site.  Examples: parent site (e.g. our organization’s overall homepage), databases, journals, library catalog site, blog site, flickr, Twitter, Facebook, Internet Archive, etc.

Who are our users? – What is our site’s intended audience or audiences? For us at the conference, this was a variety of people: students, staff, the general public, collectors, adults, teens, parents, etc. Some of us may need to use a survey to determine this.  Some populations of users (e.g. staff) might be identified via IP Addresses. We were reminded that most sites serve one major set of users with other smaller groups of users served. For example, students might be the primary users whereas faculty and staff are secondary users.

Related Goals and plans – Use existing planning documents, strategic goals, a library’s mission statement to set a mission statement and/or goals for the website. Who are we going to help? Who is our audience?  We must define why our site exists and it’s purpose on the web.  Generally we’ll have one primary purpose per site. Secondary purposes also help define what the site does and fall under the “nice to have” category, but are also very useful to our users. (For example, Amazon.com’s primary purpose is to sell products, but secondary purposes include reviews, wishlists, ratings, etc.)

When we have a new service to promote, we can use analytics and goals to track how well that goal is being met. This is an ongoing expansion of the website and the web analytics strategy.  We were reminded to make goals that are practical, simple and achievable. Priorities can change from year to year in what we will monitor and promote.

Things to do right away

Nearing the end of our conference, we discussed things that we can do improve our analytics in the near term. These are not necessarily quick to implement, but doing these things will put us in a good place for starting our web analytics strategy. It was mentioned that if we aren’t tracking our website’s usage at all, we should install something today to at least begin collecting data!

  1. Share what we are doing with our colleagues. Educate them at a high level, so they know more about our decision making process. Be proactive and share information; don’t wait to be asked what’s going on. This will offer a sense of inclusion and transparency. What we do is not magic in any sense. We may also consider granting read-only access to some people who are interested in seeing and playing with the statistics on their own.
  2. Set a schedule for pulling and analyzing your data and statistics. On a quarterly basis, report to staff on things that we found that were interesting: important metrics, fun things, anecdotes about what is happening on our site. Also check our goals that we are tracking in analytics on a quarterly basis; do not “set and forget” our goals. On monthly basis, we should report to IT staff on topics of concern, 404 pages, important values, and things that need attention.
  3. Test, Analyze, Edit, and Repeat. This is an ongoing, long-term effort to keep improving our sites. During a site redesign, we compare analytics data before and after we make changes. Use analytics to make certain the changes we are implementing have a positive effect. Use analytics to drive the changes in our site, not because it would be cool/fun/neat to do things a certain way. Remember that our site is meant to serve our users.
  4. Measure all content. Get tracking code installed across all of our sites. Google Analytics cross-domain tracking is tricky to set up, but once installed will track users as they move between different servers. Examples for this are our website, blog, OPAC, and other servers. For things not under our control, be sure to at least track outbound to know when people leave our site.
  5. Measure all users. When we are reporting, segment the users into groups as much as possible to understand their different habits.
  6. Look at top mobile content. Use that information to divide the site and focus on things that mobile users are going to most often.
Summary

Spending eight hours learning about a topic and how to practically apply it to our site is a great way to get excited about taking on more responsibilities in our daily work. There is still a good deal of learning to be done since much of the expertise in web analytics comes from taking the time to experiment with the data and settings.

We, Kelly and Joel, are looking forward to working with analytics from the ground-up, so to speak. We are both are in an early stage of redeploying our website under new software which allows us to take into account the most up-to-date analytics tools and techniques available to us. Additionally, our organizations, though different in their specific missions and goals, are entering into a new round of long-term planning with the result being a new set of goals for the next three to five years. It becomes clear that the website is an important part of this planning and that the goals of our websites directly translate into actions that we take when configuring and using Google Analytics.

We both expect that we will experience a learning curve in understanding and applying web analytics and there will be a set of long-term, ongoing tasks for us. However, after this session, we are more confident about how to effectively apply and understand analytics towards tracking and achieving the goals of our organization and create an effective and useful set of websites.

About our Guest Authors:

Kelly Sattler is a Digital Project Librarian and Head of Web Services at Michigan State University.  She and her team are involved with migrating the Libraries’ website into Drupal 7 and are analyzing our Google Analytics data, search terms, and chat logs to identify places where we can improve our site through usability studies. Kelly spent 12 years in Information Technology at a large electrical company before becoming a librarian and has a bachelor’s degree in Computer Engineering.  She can be found on twitter at @ksattler.

Joel Richard is the lead Web Developer for the Smithsonian Libraries in Washington, DC and is currently in the process of rebuilding and migrating 15 years’ worth of content to Drupal 7. He has 18 years of experience in software development and internet technology and is a confirmed internet junkie. In his spare time, he is an enthusiastic proponent of Linked Open Data and believes it will change the way the internet works. One day. He can be found on twitter at @cajunjoel.

Big Type and Readability

The Big Type

Jeffrey Zeldman published a post that explains his choice of big type in his website/blog last week. If you are curious about how huge the type is in his site, see below my screenshot (or visit his site: http://zeldman.com). It is pretty big. Compare it to any Web site or this current site of mine. Yea, the type is huge.

zeldman.com

He says people either hate or love the big type and the simplistic/minimalist layout of his site or just spends time processing them. I found myself loving it because hey, it was so fr**king easy to read without any other distraction in the site. As Zeldman himself says, It’s over the top but not unusable nor, in my opinion, unbeautiful.” And in my opinion, being fully functional counts to a great degree in favor of beauty.

Readability

The strange satisfaction that I felt while reading the articles in his site set in the big type has led me to realize how hard it is to read the main content of any common web page. It is usually so hard that the first thing I do before reading any Web page is to increase the font size inside a Web browser (thereby also removing the top navigation and all other things on both sides except the main content out of my sight). Sometimes, I also use the ‘Print’ preview, just to read, not to print anything (since this removes all ads and images etc.). Also handy is a plugin like Readability. Zeldman’s site was the first site where none of these actions was necessary.

The Web design convention with must-have items such as a top navigation, header image, navigation on the left, ads and numerous links on the right forces us to take out those very items by manually manipulating the browser in order to make the main content simply readable! This is an irony that is more than fully appreciated by those who build and manage Web sites in particular. We (the universal we as Web workers) follow the convention as something canonical because we want to build a Web site that is usable and pleasant to interact with. But while interacting with any such conventional site, our own behavior reveals that we try to eliminate those very canonical elements.

It’s not that we can or should eliminate right away all those conventional items. They are useful for various purposes. But the point is that no matter how useful they are, those things are also great distractions in reading. In a Web site or a page where reading is the primary activity, the readability of its content is a greater problem than other sites or pages. Zeldman’s Big Type experiment would be simply bizarre if it is applied without any modification to, say, the WSJ homepage. But it probably is not a bad idea to apply it to an individual article page in the WSJ Web site.

Zeldman’s experimental design with the big type reminds me of what the application, Flipboard, does. (See below the demo video if you are not familiar with the Flipboard app.) It strips off elements that are distracting to reading and re-formats the page in a way that is attractive and functional. Where the design fails to help one to read a Web page, an app comes to rescue.

Now you may ask how all these relate to libraries. My question is: (a) how much of the main function of a library Web site is reading, and how much is not, (b) what parts of a library Web site is to be read and not, and (c) how we can balance and facilitate those different uses of a library Web site. Rarely a Web site is designed solely for reading, but reading is an important part of almost always a certain section of any Web site. So this is an issue that is worth thinking about and matters to not only library Web sites but also any other Web site. Just asking these questions could be a good step towards making your Web pages more usable.

In the next post, I will discuss how we read on the Web and how to design and serve the content for the Web in a user-friendly manner.