Building a Data Dashboard

Dashboard web pages that display, at a glance, a wide range of information about a library’s operations are becoming common. These dashboards are made possible by the ubiquity of web-based information visualization tools combined with the ever increasing availability of data sources. Naturally, libraries want to take advantage of these tools to provide insight alongside some “wow” factor.

Ball State Libraries' dashboard
Ball State University Libraries’ dashboard shows many high-level figures about a variety of services. One can learn more about a particular figure by clicking the “+” or “i” icons beside it.
TADL dashboard
The Traverse Area District Library dashboard has a few sections with different types of charts. Shown here are four line graphs displaying changes in various usage statistics over time.

I’ve assembled a list of several data dashboards; if you know of one that isn’t on the list, let me know in the comments so I can add it.

It’s difficult to set about piecing together a display from the information immediately available. The design and data collection process largely determines the success of the dashboard. I don’t pretend to be an expert on these processes; rather, my own attempt at building a dashboard was a productive failure that helped to highlight where I went wrong.

Why do we want to build a dashboard?

As with any project, the starting point for building a dashboard is to identify what we’re trying to accomplish. Having a guiding idea will determine everything that follows, from the data we collect to how we choose to display them.

There are two primary goals for dashboards: marketing and success. One seeks to advertise the excellence of the library—perhaps to secure further funding, perhaps to raise its profile on campus—while the other aims at improved daily operations, however that may be defined. These are two terrifically broad categories but they create a useful distinction when building a dashboard.

Perhaps our goal is “to make a flashy dashboard, one that’ll make spectators rubberneck as they navigate the information superhighway.”

I’m not even being sarcastic. It’s easy to dismiss designs that are flashy to the detriment of their content. The web is rife with visualizations that leave their audiences with no improved understanding of the issue at hand. But there is also value to shiny sites that show the library can create an intriguing product. Furthermore, experimental visualizations that explore the possibilities of design can be a wonderful starting point. What appeared to be superficial glitz may unearth an important pattern. Several failed, bombastic charts may ultimately lead to a balanced display.

In general, flashy displays fall into the marketing category. They’re eye candy for the library, whether through the enormity of the figures themselves (“look how many books we circulated last year! articles downloaded from our databases! reference queries we answered!”) or loud design.

Increasing the success of the library is a very different goal. What defines “success” is more subjective than marketing and as such the types of data included will vary greatly. Assessing services calls for timelines, cross-sections of data, and charts that render divergent data representing different aspects of a library comparable. Unlike marketing figures, the charts need not be easily interpreted. Simplification may unintentionally distort what’s going on, obscuring important but subtle trends. To quote from an In the Library with the Lead Pipe post from 2009:

Libraries collect a lot of data that encompass complex networks about how users navigate through online resources, which subjects circulate the most or the least, which resources are requested via interlibrary loan, visitation patterns over periods of time, reference queries, and usage statistics of online journals and databases. Making sense of these complex networks of use and need isn’t easy. But the relationships between use and need patterns can help libraries make hard decisions (say, about which journals to cut) and creative decisions to improve user experiences, outreach, achieve efficiencies, and enhance alignment with organizational goals.
— Hilary Davis, “Not Just Another Pretty Picture” 1

If an institution or department has a more narrow goal in mind than marketing or assessment, chances are it still relates to one of the two. It’s also possible to build a dashboard that accomplishes both goals, providing insight for staff while thrilling external stakeholders. However, that’s no easy feat to accomplish. What external parties find surprising may be pedestrian to staff, and vice versa. Libraries provide simple, Google-esque search fields alongside triplicate advanced search forms for the same reason. Users shouldn’t need to know SQL to find a title; staff shouldn’t be limited to keyword searches for their administrative tasks. Appealing to such varied sensibilities at once is an onerous task best avoided if possible.

Choosing Our Data Sources

The ultimate goal of a data dashboard will greatly affect what data sources are sensible. Reviewing the dashboards of other libraries, a few data sources are repeated across them:

  • Checkouts and renewals
  • Print volumes and other material holdings
  • Inter-library loan
  • Gate counts
  • Computer use
  • Reference questions

These data are commonly available, but don’t capture the unique value that libraries provide. We can use them as a starting point but should keep in mind that our dashboard’s purpose determines what information is pertinent. Simply because a figure is readily available within our ILS isn’t a sufficient reason to include it. On the other hand, we may find our display would benefit greatly from a figure which isn’t available, thus influencing how we collect and analyze data going forward.

Marketing displays must be persuasive. The information in them should be positive, surprising, and understandable to a broad audience.

Total circulation, article/ebook downloads, and new items digitized all might be starting points. The breadth of a library’s offerings is often surprising to external parties, too. Show circulation across many subject areas, attempt to outline all the impact of the library’s many services. For instance, if our library has an active social media presence we could show various accounts and the measures of popularity for each (followers and favorites, translated across the schemas of the specific networks).

Simply pilfering all the dashboard’s data from typical year-end reports submitted to organizations like the Nation Center for Educational Statistics, ALA, or our accrediting bodies is a terrible idea for marketing displays. These numbers are less important outside of the long-term, cross-institutional trends they make visible. Our students, faculty, or funders probably do not care about the exact quantity of our microfilm holdings. Since our dashboard isn’t held to any reporting standards, this is our chance to show what’s new, what those statistical warehouse organizations do not yet collect.

Marketing displays should focus on aggregate numbers that look impressive in isolate and do not require much context to apprehend. Different categories need not be comparable or normalized; each figure can have its own type of chart and style as long as the design is visually appealing. Marketing isn’t meant to generate insight so highlighting patterns across different data sources isn’t necessary. All of the screenshot examples at the beginning of this post represent marketing-oriented dashboards because of the impressive nature of the figures presented as well as their relative lack of informative depth.

I’ll be honest, presenting data for marketing purposes makes me feel conflicted. It’s easy to skirt the truth since one’s incentives lie in producing ever more stunning figures, not in representing reality. If there is serious dysfunction within a library, marketing displays won’t disclose them. Nonetheless, there’s value to these efforts. I wouldn’t be in this profession if I didn’t believe that libraries produce an astonishing amount of value for their host institutions, through a wide range of services. Demonstrating that value is vital.

A dashboard meant to improve library success should answer pressing questions about collections and services. A library might want to investigate different areas each year, and so data sources could change periodically. Investigations might be specific and change each year; one year we aim at improving information literacy instruction, another year at increasing access to electronic resources. The dashboard can differ from year to year if it’s meant to aid a focused study. Trying to represent everything a library does at once leads to clutter, information overload, and time spent culling statistics no one cares about.

For a nice end-to-end discussion of building a data warehouse and hooking it up to dashboard charts, see “Trends at a Glance: A Management Dashboard of Library Statistics” (open access via ITAL) by Emily G. Morton-Owens and Karen L. Hanson. They cover in-depth creating a dashboard meant to give an overview of library operations from a broad variety of data sources, including external ones like Google Analytics as well as a local inter-library loan database.

GVSU Libraries status
A dashboard shows that a few minor issues affect various systems at GVSU.

While it’s not displaying the sorts of data this posts discusses, the Grand Valley State University Libraries’ Status page is an example of a good dashboard that aims at improving service. Issues affecting several areas are brought together in on one page, making it possible to gain a quick overview of the library’s operations in seconds. Just because GVSU Libraries Status is not a marketing display doesn’t imply design doesn’t matter; bright, easily scannable colors make information more digestible. The familiar red-for-error, green-for-success color scheme is well applied.

There are some attributes that influence data choices regardless of the goal of our dashboard. Data sources that are easy to access and reuse will naturally be better selections than cumbersome ones. For instance, while COUNTER usage statistics for vendor databases can be both informative and persuasive, they’re often painful to collect. One must log into a number of separate administrative sites, download a CSV (possibly multiple CSVs if we need information from separate reports), trim useless rows, and combine into one master file. This process is difficult to automate and the SUSHI standard meant to address this labor has no viable software implementations. 2

On the other hand, many modern third-party services offer robust APIs that make retrieving data simple. For the dashboard I built, some of the easiest pieces came from LibGuides, YouTube, and WordPress because all three surfaced usage information. Since I wanted to show the breadth of web services we were using and their popularity, it was easy to set up some simple API calls that for our most popular guides, videos, and blog posts. 3

Building the Dashboard Pt. I: Data Stores

Once we’ve chosen our data sources—ignoring, for the sake of brevity, the important part where we get staff buy-in and ensure our data quality is sufficient—we’ll need a central data store to power our dashboard. This can be simpler than it sounds, for instance a single relational database with tables for various data sources which are imported on a periodic basis. I’ve even used Google Spreadsheets as a data store and since the spreadsheet is already online, its data can be accessed through an API or even downloaded as a CSV.

Even if we rely on third-party APIs for much of our data, we may want to download the data into a data store. Having a full copy of our library’s information has many benefits: we don’t need to worry about service outages or sudden discontinuation, we can add summative values which many not be present in the original API, and we can reduce the total number of databases the dashboard relies upon by keeping everything centralized.

Once we have a data store, we can use it to power an API and a site that consumes the API, known as a client. At a high level, our client-API interaction looks like:

  • the client passes parameters in the URL letting the API know what kind of data it wants
  • the API receives the request, parses the parameters (in PHP this is done with the superglobal $_GET associative array, for instance)
  • the API queries a database with those parameters
  • the API takes the query response and formats it as JSON, sending it along to our client (formatting as JSON is common enough that most languages have a built-in function for it) 4
  • the client consumes the JSON and outputs charts, figures, etc.

A more code-like example of this process:

a GET HTTP request for example.com/api.php?month=02&dataset=reference asks for reference data from the month of February

the API performs query like SELECT * FROM reference WHERE month LIKE 'february' on the database

the API takes the query results and formats them into a JSON response like:

{
 "reference questions": 404,
 "directional questions": 808,
 "computer questions": 10101,
 "ready reference": 42,
 "research consultations": 42
}

This is a basic example of library-specific data that probably lives in a database we control. However, depending on how many third party APIs we choose to rely upon, the necessary flexibility of the client varies. If we’re normalizing and storing everything in our data store, the front end can be simple since the data is in a pre-processed, predictable format. For instance, aggregation operations like sums, maximums, and averages can be baked into a well-designed, comprehensive dashboard API such that the client only needs to know what URL to request. On the other end, a sophisticated client might consume multiple sources of data, combine them into a similar profile, and then run summative calculations itself. Both approaches have their relative advantages depending upon the situation.

Building the Dashboard Pt. II: Charting

There are numerous choices for building graphs and charts on the web. This post will not attempt to list them. Instead, I have two recommendations. One, our front-end shouldn’t be Flash. Adobe Flash doesn’t work on the mobile web and, at this point, JavaScript has a tremendous number of data visualization libraries. Secondly, we should use D3 or something that builds on top of D3.

While there are a lot of approaches to data visualization in JavaScript, D3 has a large user community, tremendous flexibility (it can be used for visualizations as varied as timelines, bar charts, and word clouds), and a nice design based on chained function calls that will be familiar to jQuery users.

While D3 is a knockout choice, we can’t just plug data in, pick a chart type, and watch the output instantly appear as in Excel or Numbers. Instead, D3 is more of a framework for working with data on the web. It takes a lot of learning and code to produce high quality charts. Instead, it’s best to use a library that provides an easy-to-use wrapper around D3. Here are several choices that abstract away common difficulties:

  • C3 makes creating D3 charts full of nice features and best practices as simple as passing data to c3.generate()
  • reD3 is a set of reusable D3 charts
  • The D3 Gallery shows what the framework can do with imitable example code

Raw is another interesting option. It’s browser-based charting software that works more similarly to spreadsheet software: plug in our data, choose a chart type, and then take the output that’s produced (using D3). If our charts don’t change often then we don’t need to build a sophisticated client-server setup; we can take our data and then use Raw to produce a web-friendly chart. Repeat every year or as often as the dashboard needs be updated.

When I made a dashboard, I didn’t use a wrapper around D3 and instead ended up building my own, a creaky abstraction for making pie charts with a few built-in options. While it was fun, I spent time making something not on par with what was already available.

Conclusion

Figure out why you want a dashboard, choose data sources appropriate to your purposes, build a data store with an API, use D3 to display web-based charts. Easy, right?

That sounds like numerous increasingly sophisticated steps, but I think each one also poses value in its own right. If you can’t define the goals of the data you’re collecting, why exactly are you gathering them? If you’re not aggregating your data in one place, aren’t they just a messy hodgepodge that everyone dreads negotiating come annual report time? Dashboards are shiny. Shiny things are nice. But the reflective processes behind building a data dashboard are perhaps more valuable than their final product.

Notes

  1. This post also recommends the book Information Dashboard Design by Stephen Few which is probably an excellent resource for those who want to learn more about the topic.
  2. I’d love to hear from anyone who has found a good open source SUSHI client. I’ve been looking for one for years now and found a number of broken options but nothing I was capable of setting up. There are a few commercial SUSHI products that ease this pain significantly for those libraries that can afford them.
  3. I haven’t seen many good explanations of what an API is; the acronym stands for “Application Programming Interface” which I find utterly nondescript. APIs expose information about web services such that they can be easily processed by programs. They fall somewhere in between having direct access to query a database and only being able to see an HTML page intended for human consumption.
  4. There are non-JSON APIs—XML is common—but JSON is becoming a de facto standard due to its light weight and the ease with which JavaScript and other languages can parse it.

2 thoughts on “Building a Data Dashboard”

    1. Looking at Ball State’s dashboard, I’m not sure if they used any particular product. The dashboard is put together more or less the way I describe; I can see them using the jQuery JavaScript library to call an API of sorts hosted on their own server. The API provides different data as preformatted HTML for different pieces of the dashboard. I’m guessing their web team assembled this from available data sources, though the surest way to find out would be to get in touch with them directly.

Comments are closed.