Two Free Methods for Sharing Google Analytics Data Visualizations on a Public Dashboard

At this point, Google Analytics is arguably the standard way to track website usage data.  It’s easy to implement, but powerful enough to capture both wide general usage trends and very granular patterns.  However, it is not immediately obvious how to share Google Analytics data either with internal staff or external stakeholders – both of whom often have widespread demand for up-to-date metrics about library web property performance.  While access to Google Analytics can be granted by Google Analytics administrators to individual Google account-holders through the Google Analytics Admin screen, publishing data without requiring authentication requires some intermediary steps.

There are two main free methods of publishing Google Analytics visualizations to the web, and both involve using the Google Analytics API: OOCharts and the Google Analytics superProxy.  Both methods rely upon an intermediary service to retrieve data from the API and cache it, both to improve retrieval time and to avoid exceeding limits in API requests.1  The first method – OOCharts – requires much less time to get running initially. However, OOCharts’ long-standing beta status, and its status as a stand-alone service has less potential for long-term support than the second method, the Google Analytics superProxy.  For that reason, while OOCharts is certainly easier to set up, the superProxy method is definitely worth the investment in time (depending on what your needs).  I’ll cover both methods.

OOCharts Beta

OOCharts’ service facilitates the creation and storage of Google Analytics API Keys, which are required for sending secure requests to the API in order to retrieve data.

When setting up an OOCharts account, create your account utilizing the email address you use to access Google Analytics.  For example, if you log into Google Analytics using admin@mylibrary.org, I suggest you use this email address to sign up for OOCharts.  After creating an account with OOCharts, you will be directed to Google Analytics to authorize the OOCharts service to access your Google Analytics data on your behalf.

Let OOCharts gather Google Analytics Data on your Behalf.

Let OOCharts gather Google Analytics Data on your Behalf.

After authorizing the service, you will be able to generate an API key for any of the properties your to which your Google Analytics account has access.

In the OOCharts interface, click your name in the upper right corner, and then click inside the Add API Key field to see a list of available Analytics Properties from your account.

 

Once you’ve select a property, OOCharts will generate a key for your OOCharts application.  When you go back to the list of API Keys, you’ll see your keys along with property IDs (shown in brackets after the URL of your properties, e.g., [9999999].

APIkeys

Your API Keys now show your key values and your Google Analytics property IDs, both of which you’ll need to create visualizations with the OOCharts library.

 

Creating Visualizations with the OOCharts JavaScript Library

OOCharts appears to have started as a simple chart library for visualization data returned from the Google Analytics API. After you have set up your OOCharts account, download the front-end JavaScript library (available at http://docs.oocharts.com/) and upload it to your server or localhost.   Navigate to the /examples directory and locate the ‘timeline.html’ file.  This file contains a simplified example that displays web visits over time in Google Analytics’ familiar timeline format.

The code for this page is very simple, and contains two methods – JavaScript-only and with HTML Attributes – for creating OOCharts visualizations.  Below, I’ve separated out the required elements for both methods.  While either methods will work on their own, using HTML attributes allows for additional customizations and styling:

JavaScript-Only
		<h3>With JS</h3>
		<div id='chart'></div>
		
		<script src='../oocharts.js'></script>
		<script type="text/javascript">

			window.onload = function(){

				oo.setAPIKey("{{ YOUR API KEY }}");

				oo.load(function(){

					var timeline = new oo.Timeline("{{ YOUR PROFILE ID }}", "180d");

					timeline.addMetric("ga:visits", "Visits");

					timeline.addMetric("ga:newVisits", "New Visits");

					timeline.draw('chart');

				});
			};

		</script>
With HTML Attributes:
<h3>With HTML Attributes</h3>
	<div data-oochart='timeline' 
        data-oochart-metrics='ga:visits,Visits,ga:newVisits,New Visits' 
        data-oochart-start-date='180d' 
        data-oochart-profile='{{ YOUR PROFILE ID }}'></div>
		
			<script src='../oocharts.js'></script>
			<script type="text/javascript">

			window.onload = function(){

				oo.setAPIKey("{{ YOUR API KEY }}");

				oo.load(function(){
				});
			};

		</script>

 

For either method, plugin {{ YOUR API KEY }} where indicated with the API Key generated in OOCharts and replace {{ YOUR PROFILE ID }} with the associated eight-digit profile ID.  Load the page in your browser, and you get this:

With the API Key and Profile ID in place, the timeline.html example looks like this.  In this example I also adjusted the date paramter (30d by default) to 180d for more data.

With the API Key and Profile ID in place, the timeline.html example looks like this. In this example I also adjusted the date parameter (30d by default) to 180d for more data.

This example shows you two formats for the chart – one that is driven solely by JavaScript, and another that can be customized using HTML attributes.  For example, you could modify the <div> tag to include a style attribute or CSS class to change the width of the chart, e.g.:

<h3>With HTML Attributes</h3>

<div style=”width:400px” data-oochart='timeline' 
data-oochart-start-date='180d' 
data-oochart-metrics='ga:visits,Visits,ga:newVisits,New Visits' 
data-oochart-profile='{{ YOUR PROFILE ID }}'></div>

Here’s the same example.html file showing both the JavaScript-only format and the HTML-attributes format, now with a bit of styling on the HTML attributes chart to make it smaller:

You can use styling to adjust the HTML attributes example.

You can use styling to adjust the HTML attributes example.

Easy, right?  So what’s the catch?

OOCharts only allows 10,000 requests a month – which is even easier to exceed than the 50,000 limit on the Google Analytics API.  Each time your page loads, you use another request.  Perhaps more importantly, your Analytics API key and profile ID are pretty much ‘out there’ for the world to see if they view your page source, because those values are stored in your client-side JavaScript2.  If you’re making a private intranet for your library staff, that’s probably not a big deal; but if you want to publish your dashboard fully to the public, you’ll want to make sure those values are secure.  You can do this with the Google Analytics superProxy.

Google Analytics superProxy

In 2013, Google Analytics released a method of accessing Google Analytics API data that doesn’t require end-users to authenticate in order to view data, known as the Google Analytics superProxy.  Much like OOCharts, the superProxy facilitates the creation of a query engine that retrieves Google Analytics statistics through the Google Analytics Core Reporting API, caches the statistics in a separate web application service, and enables the display of Google Analytics data to end users without requiring individual authentication. Caching the data has the additional benefit of ensuring that your application will not exceed the Google Core Reporting API request limit of 50,000 requests each day. The superProxy can be set up to refresh a limited number of times per day, and most dashboard applications only need a daily refresh of data to stay current.
The required elements of this method are available on the superProxy Github page (Google Analytics, “Google Analytics superProxy”). There are four major parts to the setup of the superProxy:

  1. Setting up Google App Engine hosting,
  2. Preparing the development environment,
  3. Configuring and deploying the superProxy to Google’s App Engine Appspot host; and
  4. Writing and scheduling queries that will be used to populate your dashboard.
Set up Google App Engine hosting

First, your Google Analytics account credentials to access the Google App Engine at https://appengine.google.com/start. The superProxy application you will be creating will be freely hosted by the Google App Engine. Create your application and designate an Application Identifier that will serve as the endpoint domain for queries to your Google Analytics data (e.g., mylibrarycharts.appspot.com).

Create your App Engine Application

Create your App Engine Application

You can leave the default authentication option, Open to All Google Users, selected.  This setting only reflects access to your App Engine administrative screen and does not affect the ability for end-users to view the dashboard charts you create.  Only those Google users who have been authorized to access Google Analytics data will be able to access any Google Analytics information through the Google App Engine.

Ensure that API access to Google Analytics is turned on under the Services pane of the Google Developer’s Console. Under APIs and Auth for your project, visit the APIs menu and ensure that the Analytics API is turned on.

Turn on the Google Analytics API.

Turn on the Google Analytics API.  Make sure the name under the Projects label in the upper left corner is the same as your newly created Google App Engine project (e.g., librarycharts).

 

Then visit the Credentials menu to set up an OAuth 2.0 Client ID. Set the Authorized JavaScript Origins value to your appspot domain (e.g., http://mylibrarycharts.appspot.com). Use the same value for the Authorized Redirect URI, but add /admin/auth to the end (e.g., http://mylibrarycharts.appspot.com/admin/auth). Note the OAuth Client ID, OAuth Client Secret, and OAuth Redirect URI that are stored here, as you will need to reference them later before you deploy your superProxy application to the Google App Engine.

Finally, visit the Consent Screen menu and choose an email address (such as your Google account email address), fill in the product name field with your Application Identifier (e.g., mylibrarycharts) and save your settings. If you do not include these settings, you may experience errors when accessing your superProxy application admin menu.

Prepare the Development Environment

In order to configure superProxy and deploy it to Google App Engine you will need Python 2.7 installed and the Google App Engine Launcher (AKA the Google App Engine SDK).  Python just needs to be installed for the App Engine Launcher to run; don’t worry, no Python coding is required.

Configure and Deploy the superProxy

The superProxy application is available from the superProxy Github page. Download the .zip files and extract them onto your computer into a location you can easily reference (e.g., C:/Users/yourname/Desktop/superproxy or /Applications/superproxy). Use a text editor such as Notepad or Notepad++ to edit the src/app.yaml to include your Application ID (e.g., mylibrarycharts). Then use Notepad to edit src/config.py to include the OAuth Client ID, OAuth Client Secret, and the OAuth Redirect URI that were generated when you created the Client ID in the Google Developer’s Console under the Credentials menu. Detailed instructions for editing these files are available on the superProxy Github page.

After you have edited and saved src/app.yaml and src/config.py, open the Google App Engine Launcher previously downloaded. Go to File > Add Existing Application. In the dialogue box that appears, browse to the location of your superProxy app’s /src directory.

To upload your superProxy application, use the Google App Engine Launcher and browse to the /src directory where you saved and configured your superProxy application.

To upload your superProxy application, use the Google App Engine Launcher and browse to the /src directory where you saved and configured your superProxy application.

Click Add, then click the Deploy button in the upper right corner of the App Engine Launcher. You may be asked to log into your Google account, and a log console may appear informing you of the deployment process. When deployment has finished, you should be able to access your superProxy application’s Admin screen at http://[yourapplicationID].appspot.com/admin, replacing [yourapplicationID] with your Application Identifier.

Creating superProxy Queries

SuperProxy queries request data from your Google Analytics account and return that data to the superProxy application. When the data is returned, it is made available to an end-point that can be used to populate charts, graphs, or other data visualizations. Most data available to you through the Google Analytics native interface is available through superProxy queries.

An easy way to get started with building a query is to visit the Google Analytics Query Explorer. You will need to login with your Google Analytics account to use the Query Explorer. This tool allows you to build an example query for the Core Reporting API, which is the same API service that your superProxy application will be using.

Google Analytics API Query Explorer

Running example queries through the Google Analytics Query Explorer can help you to identify the metrics and dimensions you would like to use in superProxy queries. Be sure to note the metrics and dimensions you use, and also be sure to note the ids value that is populated for you when using the API Explorer.

 

When experimenting with the Google Analytics Query explorer, make note of all the elements you use in your query. For example, to create a query that retrieves the number of users that visited your site between July 4th and July 18th 2014, you will need to select your Google Account, Property and View from the drop-down menus, and then build a query with the following parameters:

  • ids = this is a number (usually 8 digits) that will be automatically populated for you when you choose your Google Analytics Account, Property and View. The ids value is your property ID, and you will need this value later when building your superProxy query.
  • dimensions = ga:browser
  • metrics = ga:users
  • start-date = 07-04-2014
  • end-date = 07-18-2014

You can set the max-results value to limit the number of results returned. For queries that could potentially have thousands of results (such as individual search terms entered by users), limiting to the top 10 or 50 results will retrieve data more quickly. Clicking on any of the fields will generate a menu from which you can select available options. Click Get Data to retrieve Google Analytics data and verify that your query works

Successful Google Analytics Query Explorer query result showing visits by browser.

Successful Google Analytics Query Explorer query result showing visits by browser.

After building a successful query, you can replicate the query in your superProxy application. Return to your superProxy application’s admin page (e.g., http://[yourapplicationid].appspot.com/admin) and select Create Query. Name your query something to make it easy to identify later (e.g., Users by Browser). The Refresh Interval refers to how often you want the superProxy to retrieve fresh data from Google Analytics. For most queries, a daily refresh of the data will be sufficient, so if you are unsure, set the refresh interval to 86400. This will refresh your data every 86400 seconds, or once per day.

Create a superProxy Query

Create a superProxy Query

We can reuse all of the elements of queries built using the Google Analytics API Explorer to build the superProxy query encoded URI.  Here is an example of an Encoded URI that queries the number of users (organized by browser) that have visited a web property in the last 30 days (you’ll need to enter your own profile ID in the ids value for this to work):

https://www.googleapis.com/analytics/v3/data/ga?ids=ga:99999991
&metrics=ga:users&dimensions=ga:browser&max-results=10
&start-date={30daysago}&end-date={today}

Before saving, be sure to run Test Query to see a preview of the kind of data that is returned by your query. A successful query will return a json response, e.g.:

{"kind": "analytics#gaData", "rows": 
[["Amazon Silk", "8"], 
["Android Browser", "36"], 
["Chrome", "1456"], 
["Firefox", "1018"], 
["IE with Chrome Frame", "1"], 
["Internet Explorer", "899"], 
["Maxthon", "2"], 
["Opera", "7"], 
["Opera Mini", "2"], 
["Safari", "940"]], 
"containsSampledData": false, 
"totalsForAllResults": {"ga:users": "4398"}, 
"id": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:84099180&dimensions=ga:browser&metrics=ga:users&start-date=2014-06-29&end-date=2014-07-29&max-results=10", 
"itemsPerPage": 10, "nextLink": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:84099180&dimensions=ga:browser&metrics=ga:users&start-date=2014-07-04&end-date=2014-07-18&start-index=11&max-results=10", 
"totalResults": 13, "query": {"max-results": 10, "dimensions": "ga:browser", "start-date": "2014-06-29", "start-index": 1, "ids": "ga:84099180", "metrics": ["ga:users"], "end-date": "2014-07-18"}, 
"profileInfo": {"webPropertyId": "UA-9999999-9", "internalWebPropertyId": "99999999", "tableId": "ga:84099180", "profileId": "9999999", "profileName": "Library Chart", "accountId": "99999999"}, 
"columnHeaders": [{"dataType": "STRING", "columnType": "DIMENSION", "name": "ga:browser"}, {"dataType": "INTEGER", "columnType": "METRIC", "name": "ga:users"}], 
"selfLink": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:84099180&dimensions=ga:browser&metrics=ga:users&start-date=2014-06-29&end-date=2014-07-29&max-results=10"}

Once you’ve tested a successful query, save it, which will allow the json string to become accessible to an application that can help to visualize this data. After saving, you will be directed to the management screen for your API, where you will need to click Activate Endpoint to begin publishing the results of the query in a way that is retrievable. Then click Start Scheduling so that the query data is refreshed on the schedule you determined when you built the query (e.g., once a day). Finally, click Refresh Data to return data for the first time so that you can start interacting with the data returned from your query. Return to your superProxy application’s Admin page, where you will be able to manage your query and locate the public end-point needed to create a chart visualization.

Using the Google Visualization API to visualize Google Analytics data

Included with the superProxy .zip file downloaded to your computer from the Github repository is a sample .html page located under /samples/superproxy-demo.html. This file uses the Google Visualization API to generate two pie charts from data returned from superProxy queries. The Google Visualization API is a service that can ingest raw data (such as json arrays that are returned by the superProxy) and generate visual charts and graphs. Save superproxy-demo.html onto a web server or onto your computer’s localhost.  We’ll set up the first pie chart to use the data from the Users by Browser query saved in your superProxy app.

Open superproxy-demo.html and locate this section:

var browserWrapper = new google.visualization.ChartWrapper({
// Example Browser Share Query
"containerId": "browser",
// Example URL: http://your-application-id.appspot.com/query?id=QUERY_ID&format=data-table-response
"dataSourceUrl": "REPLACE WITH Google Analytics superProxy PUBLIC URL, DATA TABLE RESPONSE FORMAT",
"refreshInterval": REPLACE_WITH_A_TIME_INTERVAL,
"chartType": "PieChart",
"options": {
"showRowNumber" : true,
"width": 630,
"height": 440,
"is3D": true,
"title": "REPLACE WITH TITLE"
}
});

Three values need to be modified to create a pie chart visualization:

  • dataSourceUrl: This value is the public end-point of the superProxy query you have created. To get this value, navigate to your superProxy admin page and click Manage Query on the Users by Browser query you have created. On this page, right click the DataTable (JSON Response) link and copy the URL (Figure 8). Paste the copied URL into superproxy-demo.html, replacing the text REPLACE WITH Google Analytics superProxy PUBLIC URL, DATA TABLE FORMAT. Leave quotes around the pasted URL.
Right-click the DataTable (JSON Response) link and copy the URL to your clipboard. The copied link will serve as the dataSouceUrl value in superproxy-demo.html.

Right-click the DataTable (JSON Response) link and copy the URL to your clipboard. The copied link will serve as the dataSouceUrl value in superproxy-demo.html.

  • refreshInterval – you can leave this value the same as the refresh Interval of your superProxy query (in seconds) – e.g., 86400.
  • title – this is the title that will appear above your pie chart, and should describe the data your users are looking at – e.g., Users by Browser.

Save the modified file to your server or local development environment, and load the saved page in a browser.  You should see a rather lovely pie chart:

Your pie chart's data will refresh automatically on the refresh schedule you set in your query.

Your pie chart’s data will refresh automatically based upon the Refresh Interval you specify in your superProxy query and your page’s JavaScript parameters.

That probably seemed like a lot of work just to make a pie chart.  But now that your app is set up, making new charts from your Google Analytics data just involves visiting your App Engine site, scheduling a new query, and referencing that with the Google Visualization API.  To me, the Google superProxy method has three distinct advantages over the simpler OOCharts method:

  • Security – Users won’t be able to view your API Keys by viewing the source of your dashboard’s web page
  • Stability – OOCharts might not be around forever.  For that matter, Google’s free App Engine service might not be around forever, but betting on Google is [mostly] a safe bet
  • Flexibility – You can create a huge range of queries, and test them out easily using the API explorer, and the Google Visualization API has extensive documentation and a fairly active user group from whom to gather advice and examples.

 

Notes

  1. There is a 50,000 request per day limit on the Analytics API.  That sounds like a lot, but it’s surprisingly easy to exceed. Consider creating a dashboard with 10 charts, each making a call to the Analytics API.  Without a service that caches the data, the data is refreshed every time a user loads a page.  After just 5,000 visits to the page (which makes 10 API calls – one for each chart – each time the page is loaded), the API limit is exceeded:  5,000 page loads x 10 calls per page = 50,000 API requests.
  2. You can use pre-built OOCharts Queries – https://app.oocharts.com/mc/query/list – to hide your profile ID (but not your API Key). There are many ways to minify and obfuscate client-side JavaScript to make it harder to read, but it’s still pretty much accessible to someone who wants to get it

Query a Google Spreadsheet like a Database with Google Visualization API Query Language

Libraries make much use of spreadsheets. Spreadsheets are easy to create, and most library staff are familiar with how to use them. But they can quickly get unwieldy as more and more data are entered. The more rows and columns a spreadsheet has, the more difficult it is to browse and quickly identify specific information. Creating a searchable web application with a database at the back-end is a good solution since it will let users to quickly perform a custom search and filter out unnecessary information. But due to the staff time and expertise it requires, creating a full-fledged searchable web database application is not always a feasible option at many libraries.

Creating a MS Access custom database or using a free service such as Zoho can be an alternative to creating a searchable web database application. But providing a read-only view for MS Access database can be tricky although possible. MS Access is also software locally installed in each PC and therefore not necessarily available for the library staff when they are not with their work PCs on which MS Access is installed. Zoho Creator offers a way to easily convert a spreadsheet into a database, but its free version service has very limited features such as maximum 3 users, 1,000 records, and 200 MB storage.

Google Visualization API Query Language provides a quick and easy way to query a Google spreadsheet and return and display a selective set of data without actually converting a spreadsheet into a database. You can display the query result in the form of a HTML table, which can be served as a stand-alone webpage. All you have to do is to construct a custom URL.

A free version of Google spreadsheet has a limit in size and complexity. For example, one free Google spreadsheet can have no more than 400, 000 total cells. But you can purchase more Google Drive storage and also query multiple Google spreadsheets (or even your own custom databases) by using Google Visualization API Query Language and Google Chart Libraries together.  (This will be the topic of my next post. You can also see the examples of using Google Chart Libraries and Google Visualization API Query Language together in my presentation slides at the end of this post.)

In this post, I will explain the parameters of Google Visualization API Query Language and how to construct a custom URL that will query, return, and display a selective set of data in the form of an HTML page.

A. Display a Google Spreadsheet as an HTML page

The first step is to identify the URL of the Google spreadsheet of your choice.

The URL below opens up the third sheet (Sheet 3) of a specific Google spreadsheet. There are two parameters you need to pay attention inside the URL: key and gid.

https://docs.google.com/spreadsheet/ccc?key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE&usp=drive_web#gid=2

This breaks down the parameters in a way that is easier to view:

  • https://docs.google.com/spreadsheet/ccc
    ?key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE
    &usp=drive_web

    #gid=2

Key is a unique identifier to each Google spreadsheet. So you need to use that to cretee a custom URL later that will query and display the data in this spreadsheet. Gid specifies which sheet in the spreadsheet you are opening up. The gid for the first sheet is 0; the gid for the third sheet is 2.

Screen Shot 2013-11-27 at 9.44.29 AM

Let’s first see how Google Visualization API returns the spreadsheet data as a DataTable object. This is only for those who are curious about what goes on behind the scenes. You can see that for this view, the URL is slightly different but the values of the key and the gid parameter stay the same.

https://spreadsheets.google.com/tq?&tq=&key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE&gid=2

Screen Shot 2013-11-27 at 9.56.03 AM

In order to display the same result as an independent HTML page, all you need to do is to take the key and the gid parameter values of your own Google spreadsheet and construct the custom URL following the same pattern shown below.

  • https://spreadsheets.google.com
    /tq?tqx=out:html&tq=
    &key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE
    &gid=2

https://spreadsheets.google.com/tq?tqx=out:html&tq=&key=0AqAPbBT_k2VUdDc3aC1xS2o0c2ZmaVpOQWkyY0l1eVE&gid=2

Screen Shot 2013-11-27 at 9.59.11 AM

By the way, if the URL you created doesn’t work, it is probably because you have not encoded it properly. Try this handy URL encoder/decoder page to encode it by hand or you can use JavaScript encodeURIComponent() function.
Also if you want the URL to display the query result without people logging into Google Drive first, make sure to set the permission setting of the spreadsheet to be public. On the other hand, if you need to control access to the spreadsheet only to a number of users, you have to remind your users to first go to Google Drive webpage and log in with their Google account before clicking your URLs. Only when the users are logged into Google Drive, they will be able see the query result.

B. How to Query a Google Spreadsheet

We have seen how to create a URL to show an entire sheet of a Google spreadsheet as an HTML page above. Now let’s do some querying, so that we can pick and choose what data the table is going to display instead of the whole sheet. That’s where the Query Language comes in handy.

Here is an example spreadsheet with over 50 columns and 500 rows.

  • https://docs.google.com/spreadsheet/ccc?
    key=0AqAPbBT_k2VUdDFYamtHdkFqVHZ4VXZXSVVraGxacEE
    &usp=drive_web
    #gid=0

https://docs.google.com/spreadsheet/ccc?key=0AqAPbBT_k2VUdDFYamtHdkFqVHZ4VXZXSVVraGxacEE&usp=drive_web#gid=0

Screen Shot 2013-11-27 at 10.15.41 AM

What I want to do is to show only column B, C, D, F where C contains ‘Florida.’ How do I do this? Remember the URL we created to show the entire sheet above?

  • https://spreadsheets.google.com/tq?tqx=out:html&tq=&key=___&gid=___

There we had no value for the tq parameter. This is where we insert our query.

Google Visualization API Query Language is pretty much the same as SQL. So if you are familiar with SQL, forming a query is dead simple. If you aren’t SQL is also easy to learn.

  • The query should be written like this:
    SELECT B, C, D, F WHERE C CONTAINS ‘Florida’
  • After encoding it properly, you get something like this:
    SELECT%20B%2C%20C%2C%20D%2C%20F%20WHERE%20C%20CONTAINS%20%27Florida%27
  • Add it to the tq parameter and don’t forget to also specify the key:
    https://spreadsheets.google.com/tq?tqx=out:html&tq=SELECT%20B%2C%20C%2C%20D%2C%20F%20WHERE%20C%20CONTAINS%20%27Florida%27
    &key=0AqAPbBT_k2VUdEtXYXdLdjM0TXY1YUVhMk9jeUQ0NkE

I am omitting the gid parameter here because there is only one sheet in this spreadsheet but you can add it if you would like. You can also omit it if the sheet you want is the first sheet. Ta-da!

Screen Shot 2013-11-27 at 10.26.13 AM

Compare this with the original spreadsheet view. I am sure you can appreciate how the small effort put into creating a URL pays back in terms of viewing an unwieldy large spreadsheet manageable.

You can also easily incorporate functions such as count() or sum() into your query to get an overview of the data you have in the spreadsheet.

  • select D,F count(C) where (B contains ‘author name’) group by D, F

For example, this query above shows how many articles a specific author published per year in each journal. The screenshot of the result is below and you can see it for yourself here: https://spreadsheets.google.com/tq?tqx=out:html&tq=select+D,F,count(C)+where+%28B+contains+%27Agoulnik%27%29+group+by+D,F&key=0AqAPbBT_k2VUdEtXYXdLdjM0TXY1YUVhMk9jeUQ0NkE

Screen Shot 2013-11-27 at 11.34.25 AM

Take this spread sheet as another example.

libbudgetfake

This simple query below displays the library budget by year. For those who are unfamiliar with ‘pivot‘, pivot table is a data summarization tool. The query below asks the spreadsheet to calculate the total of all the values in the B column (Budget amount for each category) by the values found in the C column (Years).

Screen Shot 2013-11-27 at 11.46.49 AM

This is another example of querying the spreadsheet connected to my library’s Literature Search request form. The following query asks the spreadsheet to count the number of literature search requests by Research Topic (=column I) that were received in 2011 (=column G) grouped by the values in the column C, i.e. College of Medicine Faculty or College of Medicine Staff.

  • select C, count(I) where (G contains ’2011′) group by C

litsearch

C. More Querying Options

There are many more things you can do with a custom query. Google has an extensive documentation that is easy to follow: https://developers.google.com/chart/interactive/docs/querylanguage#Language_Syntax

These are just a few examples.

  • ORDER BY __ DESC
    : Order the results in the descending order of the column of your choice. Without ‘DESC,’ the result will be listed in the ascending order.
  • LIMIT 5
    : Limit the number of results. Combined with ‘Order by’ you can quickly filter the results by the most recent or the oldest items.

My presentation slides given at the 2013 LITA Forum below includes more detailed information about Google Visualization API Query Language, parameters, and other options as well as how to use Google Chart Libraries in combination with Google Visualization API Query Language for data visualization, which is the topic of my next post.

Happy querying Google Spreadsheet!

 


Creating themes in Omeka 2.0

Omeka is an easy to use content management system for digital exhibits created by the Ray Rosenzweig Center for History and New Media. It’s very modular, so you can customize it for various functions. I won’t go into the details here on how to set up Omeka, but you can read documentation and see example collections at Omeka.org. If you want to experiment with Omeka without installing it on your own server, you can set up a hosted account at Omeka.net

Earlier this year Omeka was completely rewritten and released a 2.0 version (now 2.1). Like with many open source content management systems, it took awhile for the contributed plug-ins and themes to catch up to the new release. As of July, most of the crucial contributed plug-ins were available, and if you haven’t yet updated your installation this is a good time to think about doing so. In this post I’m going to focus on the process of customizing Omeka 2.0 to your institution, and specifically creating a custom theme. While there are now several good themes available for 2.0, you will probably want to make a default theme that matches the rest of your website. One of the nice features of Omeka that is quite different from other content management systems is that it is very easy for  the people who create exhibits to pick a custom theme that differs from the default theme. That said, providing a custom theme for your institution makes it easy for visitors to know where they are, and will also make it easier on the staff who are creating exhibits since you can adapt the theme to their needs.

Planning

Like any design project, you should start with a discussion with the people who use the system most. (If you are new to design, check the ACRL TechConnect posts on design). In my case, there are two archives on campus who both use Omeka for their exhibits. Mock up what the layout should look like–you may not be able to get it perfectly, but use this as a guide to future development. We came up with a rough sketch based on what the archivist liked and didn’t like about templates available, and worked together on determining the priorities for the design. (Side note: if you can get your whole wall painted with whiteboard paint this is a very fun collaborative project.)

Rough sketch of ideas for new theme.

Rough sketch of ideas for new theme.

Development

Development is very easy to start when you are modifying an existing theme. Start with a theme (there are only a few that are 2.0 compatible) that is close to what you need. Rather than the subtheme system you may be used to with Drupal or WordPress, with Omeka you can pick the theme you want to hack on and copy the entire directory and rename it.

Here was the process I followed to build my theme. I suggest that you set up a local development environment (I used XAMPP) to do this work, but make sure that you have at least one exhibit to test, since some of the CSS is different for exhibits than for the rest of the site.

  • Pick a theme
Seasons Autumn

Seasons (with the Autumn color scheme)

I started with the Seasons theme. I copied the seasons directory from the themes directory and pasted it back with a new name of luctest (which I renamed when it was time to move it to a production environment).

  • Modify theme.ini

This is what you will start with. You really only need to edit the author, title, and description unless you want to edit the rest.

[theme]
author = "Roy Rosenzweig Center for History and New Media"
title = "Seasons"
description = "A colorful theme with a configuration option to switch style sheets for a particular season, plus 'night'."
license = "GPLv3"
website = "<a href="http://omeka.org">http://omeka.org</a>"
support_link = "<a href="http://omeka.org/forums/forum/themes-and-public-display">http://omeka.org/forums/forum/themes-and-public-display</a>"
omeka_minimum_version="2.0"
omeka_target_version="2.0"
version="2.1.1"
tags="yellow, blue, summer, season, fall, orange, green, dark"
  • Modify config.ini

Check which elements are set in the configuration (i.e. the person such as an archivist who is creating the exhibit can set them) and which you need to set in the theme. This can cause a lot of frustration when you attempt to style an element whose value is actually set by the user. If you don’t want to allow the user to change anything, you can take that option out of the config.ini, just make sure you’ve set it elsewhere.

[config]

; Style Sheet
style_sheet.type = "select"
style_sheet.options.label = "Style Sheet"
style_sheet.options.description = "Choose a style sheet"
style_sheet.options.multiOptions.spring = "Spring"
style_sheet.options.multiOptions.summer = "Summer"
style_sheet.options.multiOptions.autumn = "Autumn"
style_sheet.options.multiOptions.winter = "Winter"
style_sheet.options.multiOptions.night = "Night"
style_sheet.options.value = "winter"

logo.type = "file"
logo.options.label = "Logo File"
logo.options.description = "Choose a logo file. This will replace the site title in the header of the theme. Recommended maximum width for the logo is 500px."
logo.options.validators.count.validator = "Count"
logo.options.validators.count.options.max = "1"

display_featured_item.type = "checkbox"
display_featured_item.options.label = "Display Featured Item"
display_featured_item.options.description = "Check this box if you wish to show the featured item on the homepage."
display_featured_item.options.value = "1"

display_featured_collection.type = "checkbox"
display_featured_collection.options.label = "Display Featured Collection"
display_featured_collection.options.description = "Check this box if you wish to show the featured collection on the homepage."
display_featured_collection.options.value = "1"

display_featured_exhibit.type = "checkbox"
display_featured_exhibit.options.label = "Display Featured Exhibit"
display_featured_exhibit.options.description = "Check this box if you wish to show the featured exhibit on the homepage."
display_featured_exhibit.options.value = "1"

homepage_recent_items.type = "text"
homepage_recent_items.options.label = "Homepage Recent Items"
homepage_recent_items.options.description = "Choose a number of recent items to be displayed on the homepage."
homepage_recent_items.options.maxlength = "2"

homepage_text.type = "textarea"
homepage_text.options.label = "Homepage Text"
homepage_text.options.description = "Add some text to be displayed on your homepage."
homepage_text.options.rows = "5"
homepage_text.options.attribs.class = "html-input"

(This is just a sample of part of the config.ini file).

  • Modify CSS

Open up css/style.css and check which elements you need to modify (note that some themes may have the style sheets divided up differently.) Some items are obvious, some you will have to use Firebug or another tool to determine which class styles the element. You can always ask in the Omeka themes and display forum if you can’t figure it out.

The Seasons theme has different styles for each color scheme, and in the interests of time I picked the color scheme closest to the color scheme I wanted to end with. You could use the concept of different schemes to identify the collections and/or exhibits of different units. Make sure you read through the whole style sheet first to determine which elements are theme-wide, and which are set in the color scheme.

  • Test, test, test

The 2.0 themes that I’ve experimented with are all responsive and work well with different browsers. This probably goes without saying, but if you have changed the spacing at all, make sure you test your design in multiple window sizes and devices.

  • Voila
LUC2013final

Final version of redesigned theme.

We have a few additional items to add to this design, but it’s met our immediate needs very well, and most importantly matches the design of the Archives and Special Collections website so it’s clear to users that they are still in the right place.

Next steps

Since this was a new content management system to me, I still have a lot to learn about the best ways to do certain things. This experience was helpful not just in learning Omeka, but also as a small-scale test of planning a new theme for our entire library website, which runs on Drupal.


Fear No Longer Regular Expressions

Regex, it’s your friend

You may have heard the term, “regular expressions” before. If you have, you will know that it usually comes in a notation that is quite hard to make out like this:

(?=^[0-5\- ]+$)(?!.*0123)\d{3}-\d{3,4}-\d{4}

Despite its appearance, regular expressions (regex) is an extremely useful tool to clean up and/or manipulate textual data.  I will show you an example that is easy to understand. Don’t worry if you can’t make sense of the regex above. We can talk about the crazy metacharacters and other confusing regex notations later. But hopefully, this example will help you appreciate the power of regex and give you some ideas of how to make use of regex to make your everyday library work easier.

What regex can do for you – an example

I looked for the zipcodes for all cities in Miami-Dade County and found a very nice PDF online (http://www.miamifocusgroup.com/usertpl/1vg116-three-column/miami-dade-zip-codes-and-map.pdf). But when I copy and paste the text from the PDF file to my text editor (Sublime), the format immediately goes haywire. The county name, ‘Miami-Dade,’ which should be in one line becomes three lines, each of which lists Miami, -, Dade.

Ugh, right? I do not like this one bit.  So let’s fix this using regular expressions.

(**Click the images to bring up the larger version.)

Screen Shot 2013-07-24 at 1.43.19 PM

Screen Shot 2013-07-24 at 1.43.39 PM

Like many text editors, Sublime offers the find/replace with regex feature. This is a much powerful tool than the usual find/replace in MS Word, for example, because it allows you to match many complex cases that fit a pattern. Regular expressions lets you specify that pattern.

In this example, I capture the three lines each saying Miami,-,Dade with this regex:

Miami\n-\nDade.

When I enter this into the ‘Find What’ field, Sublime starts highlighting all the matches.  I am sure you already guessed that \n means a new line. Now let me enter Miami-Dade in the ‘Replace With’ field and hit ‘Replace All.’

Screen Shot 2013-07-24 at 2.11.43 PM

As you can see below, things are much better now. But I  want each set of three lines – Miami-Dade, zipcode, and city – to be one line and each element to be separated by comma and a space such as ‘Miami-Dade, 33010, Hialeah’. So let’s do some more magic with regex.

Screen Shot 2013-07-24 at 2.18.17 PM

How do I describe the pattern of three lines – Miami-Dade, zipcode, and city? When I look at the PDF, I notice that the zipcode is a 5 digit number and the city name consists of alphabet characters and space. I don’t see any hypen or comma in the city name in this particular case. And the first line is always ‘Miami-Dade.” So the following regular expression captures this pattern.

Miami-Dade\n\d{5}\n[A-Za-z ]+

Can you guess what this means? You already know that \n means a new line. \d{5} means a 5 digit number. So it will match 33013, 33149, 98765, or any number that consists of five digits.  [A-Za-z ] means any alphabet character either in upper or lower case or space (N.B. Note the space at the end right before ‘]’).

Anything that goes inside [ ] is one character. Just like \d is one digit. So I need to specify how many of the characters are to be matched. if I put {5}, as I did in \d{5}, it will only match a city name that has five characters like ‘Miami,’ The pattern should match any length of city name as long as it is not zero. The + sign does that. [A-Za-z ]+ means that any alphabet character either in upper or lower case or space should appear at least or more than once. (N.B. * and ? are also quantifiers like +. See the metacharacter table below to find out what they do.)

Now I hit the “Find” button, and we can see the pattern worked as I intended. Hurrah!

Screen Shot 2013-07-24 at 2.24.47 PM

Now, let’s make these three lines one line each. One great thing about regex is that you can refer back to matched items. This is really useful for text manipulation. But in order to use the backreference feature in regex, we have to group the items with parentheses. So let’s change our regex to something like this:

(Miami-Dade)\n\(d{5})\n([A-Za-z ]+)

This regex shows three groups separated by a new line (\n). You will see that Sublime still matches the same three line sets in the text file. But now that we have grouped the units we want – county name, zipcode, and city name – we can refer back to them in the ‘Replace With’ field. There were three units, and each unit can be referred by backslash and the order of appearance. So the county name is \1, zipcode is \2, and the city name is \3. Since we want them to be all in one line and separated by a comma and a space, the following expression will work for our purpose. (N.B. Usually you can have up to nine backreferences in total from \1 to\9. So if you want to backreference the later group, you can opt not to create a backreference from a group by using (?: ) instead of (). )

\1, \2, \3

Do a few Replaces and then if satisfied, hit ‘Replace All’.

Ta-da! It’s magic.

Screen Shot 2013-07-24 at 2.54.13 PM

Regex Metacharacters

Regex notations look a bit funky. But it’s worth learning them since they enable you to specify a general pattern that can match many different cases that you cannot catch without the regular expression.

We have already learned the four regex metacharacters: \n, \d, { }, (). Not surprisingly, there are many more beyond these. Below is a pretty extensive list of regex metacharacters, which I borrowed from the regex tutorial here: http://www.hscripts.com/tutorials/regular-expression/metacharacter-list.php . I also highly recommend this one-page Regex cheat sheet from MIT (http://web.mit.edu/hackl/www/lab/turkshop/slides/regex-cheatsheet.pdf).

Note that \w will match not only a alphabetical character but also an underscore and a number. For example, \w+ matches Little999Prince_1892. Also remember that a small number of regular expression notations can vary depending on what programming language you use such as Perl, JavaScript, PHP, Ruby, or Python.

Metacharacter Description
\ Specifies the next character as either a special character, a literal, a back reference, or an octal escape.
^ Matches the position at the beginning of the input string.
$ Matches the position at the end of the input string.
* Matches the preceding subexpression zero or more times.
+ Matches the preceding subexpression one or more times.
? Matches the preceding subexpression zero or one time.
{n} Matches exactly n times, where n is a non-negative integer.
{n,} Matches at least n times, n is a non-negative integer.
{n,m} Matches at least n and at most m times, where m and n are non-negative integers and n <= m.
. Matches any single character except “\n”.
[xyz] A character set. Matches any one of the enclosed characters.
x|y Matches either x or y.
[^xyz] A negative character set. Matches any character not enclosed.
[a-z] A range of characters. Matches any character in the specified range.
[^a-z] A negative range characters. Matches any character not in the specified range.
\b Matches a word boundary, that is, the position between a word and a space.
\B Matches a nonword boundary. ‘er\B’ matches the ‘er’ in “verb” but not the ‘er’ in “never”.
\d Matches a digit character.
\D Matches a non-digit character.
\f Matches a form-feed character.
\n Matches a newline character.
\r Matches a carriage return character.
\s Matches any whitespace character including space, tab, form-feed, etc.
\S Matches any non-whitespace character.
\t Matches a tab character.
\v Matches a vertical tab character.
\w Matches any word character including underscore.
\W Matches any non-word character.
\un Matches n, where n is a Unicode character expressed as four hexadecimal digits. For example, \u00A9 matches the copyright symbol
Matching modes

You also need to know about the Regex matching modes. In order to use these modes, you write your regex as shown above, and then at the end you add one or more of these modes. Note that in text editors, these options often appear as checkboxes and may apply without you doing anything by default.

For example, [d]\w+[g] will match only the three lower case words in ding DONG dang DING dong DANG. On the other hand, [d]\w+[g]/i will match all six words whether they are in the upper or the lower case.

Look-ahead and Look-behind

There are also the ‘look-ahead’ and the ‘look-behind’ features in regular expressions. These often cause confusion and are considered to be a tricky part of regex. So, let me show you a simple example of how it can be used.

Below are several lines of a person’s last name, first name, middle name, separated by his or her department name. You can see that this is a snippet from a csv file. The problem is that a value in one field – the department name- also includes a comma, which is supposed to appear only between different fields not inside a field. So the comma becomes an unreliable separator. One way to solve this issue is to convert this csv file into a tab limited file, that is, using a tab instead of a comma as a field separater. That means that I need to replace all commas with tabs ‘except those commas that appear inside a department field.’

How do I achieve that? Luckily, the commas inside the department field value are all followed by a space character whereas the separator commas in between different fields are not so. Using the negative look-ahead regex, I can successfully specify the pattern of a comma that is not followed by (?!) a space \s.

,(?!\s)

Below, you can see that this regex matches all commas except those that are followed by a space.

lookbehind

For another example, the positive look-ahead regex, Ham(?=burg), will match ‘Ham‘ in Hamburg when it is applied to the text:  Hamilton, Hamburg, Hamlet, Hammock.

Below are the complete look-ahead and look-behind notations both positive and negative.

  • (?=pattern)is a positive look-ahead assertion
  • (?!pattern)is a negative look-ahead assertion
  • (?<=pattern)is a positive look-behind assertion
  • (?<!pattern)is a negative look-behind assertion

Can you think of any example where you can successfully apply a look-behind regular expression? (No? Then check out this page for more examples: http://www.rexegg.com/regex-lookarounds.html)

Now that we have covered even the look-ahead and the look-behind, you should be ready to tackle the very first crazy-looking regex that I introduced in the beginning of this post.

(?=^[0-5\- ]+$)(?!.*0123)\d{3}-\d{3,4}-\d{4}

Tell me what this will match! Post in the comment below and be proud of yourself.

More tools and resources for practicing regular expressions

There are many tools and resources out there that can help you practice regular expressions. Text editors such as EditPad Pro (Windows), Sublime, TextWrangler (Mac OS), Vi, EMacs all provide regex support. Wikipedia (https://en.wikipedia.org/wiki/Comparison_of_text_editors#Basic_features) offers a useful comparison chart of many text editors you can refer to. RegexPal.com is a convenient online Javascript Regex tester. FireFox also has Regular Expressions add-on (https://addons.mozilla.org/en-US/firefox/addon/rext/).

For more tools and resources, check out “Regular Expressions: 30 Useful Tools and Resources” http://www.hongkiat.com/blog/regular-expression-tools-resources/.

Library problems you can solve with regex

The best way to learn regex is to start using it right away every time you run into a problem that can be solved faster with regex. What library problem can you solve with regular expressions? What problem did you solve with regular expressions? I use regex often to clean up or manipulate large data. Suppose you have 500 links and you need to add either EZproxy suffix or prefix to each. With regex, you can get this done in a matter of a minute.

To give you an idea, I will wrap up this post with some regex use cases several librarians generously shared with me. (Big thanks to the librarians who shared their regex use cases through Twitter! )

  • Some ebook vendors don’t alert you to new (or removed!) books in their collections but do have on their website a big A-Z list of all of their titles. For each such vendor, each month, I run a script that downloads that page’s HTML, and uses a regex to identify the lines that have ebook links in them. It uses another regex to extract the useful data from those lines, like URL and Title. I compare the resulting spreadsheet against last month’s (using a tool like diff or vimdiff) to discover what has changed, and modify the catalog accordingly. (@zemkat)
  • Sometimes when I am cross-walking data into a MARC record, I find fields that includes irregular spacing that may have been used for alignment in the old setting but just looks weird in the new one. I use a regex to convert instances of more than two spaces into just two spaces. (@zemkat)
  • Recently while loading e-resource records for government documents, we noted that many of them were items that we had duplicated in fiche: the ones with a call number of the pattern “H, S, or J, followed directly by four digits”. We are identifying those duplicates using a regex for that pattern, and making holdings for those at the same time. (@zemkat)
  • I used regex as part of crosswalking metadata schemas in XML. I changed scientific OME-XML into MODS + METS to include research images into the library catalog.  (@KristinBriney)
  • Parsing MARC just has to be one. Simple things like getting the GMD out of a 245 field require an expression like this: |h\[(.*)\]  MARCedit supports regex search and replace, which catalogers can use. (@phette23)
  • I used regex to adjust the printed label in Millennium depending on several factors, including the material type in the MARC record. (@brianslone )
  • I strip out non-Unicode characters from the transformed finding aids daily with  regex and  replace them with Unicode equivalents. (@bryjbrown)

 


Local Dev Environments For Newbies Part 1: AMP on Mac OSX

There are many cases where having a local development environment is helpful and it is a relatively straightforward thing to do, even if you are new to development.  However, the blessing and the curse is that there are many, many tutorials out there attempting to show you how.  This series of posts will aim to walk through some basic steps with detail, as well as pass on some tips and tricks for setting up your own local dev box.

First, what do I mean by a local development environment?  This is a setup on your computer which allows you to code and tweak and test in a safe environment.  It’s a great way to hammer on a new application with relatively low stakes.  I am currently installing dev environments for two purposes: to test some data model changes I want to make on an existing Drupal site and to learn a new language so I can contribute to an application.  For the purposes of this series, we’re going to focus on the AMP stack – Apache, MySQL and PHP – and how to install and configure those systems for use in web application development.

Apache is the web server which will serve the pages of your website or application to a browser.  You may hear Apache in conjunction with lots of other things – Apache Tomcat, Apache Solr – but generally when someone references just Apache, it’s the web server.  The full name of the project is the Apache HTTP Server Project.

PHP is a scripting language widely used in web development.  MySQL is a database application also frequently used in web development.  “Stack” refers to the combination of several components needed to run a web application.  The AMP stack is the base for many web applications and content management systems, including Drupal and WordPress.

You may have also seen the AMP acronym preceded by an L, M or W.  This merely stands for the operating system of choice – Linux, Mac or Windows.  This can also refer to installer packages that purport to do the whole installation for you, like WAMP or MAMP.  Employing the installer packages can be useful, depending on your situation and operating system.  The XAMPP stack, distributed by Apache Friends, is another example of an installer package designed to set up the whole stack for you.  For this tutorial though, we’ll step through each element of the stack, instead of using a stack installer.

So, why do it yourself if there are installers?  To me, it takes out the mystery of how all the pieces play together and is a good way to learn about what’s going on behind the scenes.  When working on Windows, I will occasionally use a .msi installer for an individual component to make sure I don’t miss something.  But installing and configuring each component individually is actually helpful.

Tips

Before we begin, let’s look at some tips:

  • You will need administrative rights to the computer on which you’re installing.
  • Don’t be afraid of the command line.  There are lots of tutorials around the web on how to use the basic commands – for both Mac (based on UNIX) and Windows.  But, you don’t need to be an expert to set up a dev environment.  Most tutorials give the exact commands you need.
  • Try, if possible, to block off a chunk of time to do this.  Going through all the steps may take awhile, from an hour to an afternoon, especially if you hit a snag.  Several times during my own process, I had to step away from it because of a crisis or because it was the end of the day.  When I was able to come back later, I had some trouble remembering where I left off or the configuration options I had chosen.  If you do have to walk away, write down the last thing you did.
  • When you’re looking for a tutorial, Google away.  Search for the elements of your stack plus your OS, like “Apache MySQL PHP Mac OSX”.  You’ll find lots, and probably end up referencing more than one.  Use your librarian skills: is the tutorial recent?  Does it appear to be from a reputable source?  If it’s a blog, are there comments on the accuracy of the tutorial?  Does it agree with the others you’ve seen?
  • Once you’ve selected one or two to follow, read through the whole tutorial one time without doing anything.  Full disclosure: I never do this and it always bites me.

Let’s get going with Recipe 1 – Install the AMP Stack on Mac OS X

Install the XCode Developer Tools

First, we install the developer tools for XCode.  If you have Mac 10.7 and above, you can download the XCode application from the App Store.  To enable the developer tools, open XCode, go to the XCode menu > Preferences > Downloads tab, and then click on “Install” next to the developer tools.  This tutorial on installing Ruby by Moncef Belyamani has good screenshots of the XCode process.

If you have Snow Leopard (10.6) or below, you’ll need to track down the tools on the Apple Developer Downloads Page.  You will need to register as a developer, but it’s free.  Note:  you can get pretty far in this process without using the XCode command line tools, but down the road as you build more complicated stacks, you’ll want to have them.

Configure Apache and PHP

Next we need to configure Apache and PHP.  Note that I said “configure”, not “install”.  Apache and PHP both come with OS X, we just need to configure them to work together.

Here’s where we open the Terminal to access the command line by going to Applications > Utilities > Terminal.

Open Terminal

Once Terminal is open, a prompt appears where you can type in commands.  The ” ~ ” character indicates that you are at the “home” directory for your user.  This is where you’ll do a lot of your work.  The “$” character delineates the end of the prompt and the beginning of your command.

terminalprompt

Type in the following command:

cd /etc/apache2

“cd” stands for “change directory”.  This is the equivalent of double-clicking on etc, then apache2, if you were in the Finder (but etc is a hidden folder in the Finder).  From here, we want to open the necessary file in an editor.  Enter the following command:

sudo nano httpd.conf

“sudo” elevates your permission to administrator, so that you can edit the config file for Apache, which is httpd.conf.  You will need to type in your administrator password.  The “nano” command opens a text editor in the Terminal window.  (If you’re familiar with vi or emacs, you can use those instead.)

nano

The bottom of your window will show the available commands.  The “^” stands for the Control key.  So, we want to search for the part to change, we press Control + W.  Enter php and press Enter.  We are looking for this line:

#LoadModule php5_module        libexec/apache2/libphp5.so

The “#” at the beginning of this line is a comment, so Apache ignores the line.  We want Apache to see the line, and load the php module.  So, change the text by removing the #:

LoadModule php5_module        libexec/apache2/libphp5.so

Save the file by press Control + O (nano calls this “WriteOut”) and press Enter next to the file name.  The number of lines written displays at the bottom of the window.  Press Control + X to exit nano.

Next, we need to start the Apache server.  Type in the following command:

sudo apachectl start

Now, go to your browser and type in http://localhost.  You should see “It Works!”Apache Browser Test

Apache, as mentioned before, serves web files from a location we designate.  By default, this is /Library/Webserver/Documents.  If you have Snow Leopard (10.6) or below, Apache also automatically looks to username/sites, which is a convenient place to store and work with files.  If you have OS 10.7 or above, creating the Sites folder takes a few steps.  On 10.7, go to System Preferences > Sharing and click on Web Sharing.  If there’s a button that says “Create Personal Web folder”, it has not been created, go ahead and click that button.  If it says, “Open Personal Website folder”, you’re good to go.

On 10.8, the process is a little more involved.  First, go to the Finder, click on your user name and create your sites folder.

sites

Next, we need to open the command line again and create a .conf file for that directory, so that Apache knows where to find it.  Type in these commands:

cd /etc/apache2/users
ls

The ls at the end will list the directory contents.  If you see a file that’s yourusername.conf (ie, mfrazer.conf) in this directory, you’re good to go.  If you don’t, it’s easy to create one.  Type the following command:

sudo nano yourusername.conf

So, mine would be sudo nano mfrazer.conf.  This will create the file and take you into a text editor.  Copy and past the following, making sure to change YOURUSERNAME to your user name.

<Directory "/Users/YOURUSERNAME/Sites/">
  Options Indexes MultiViews
  AllowOverride None
  Deny from all
  Allow from localhost
</Directory>

The first directive, Options, can have lots of different…well, options.  The ones we have here are Indexes and MultiViews.  Indexes means that if a browser requests a directory and there’s no index.html or index.php file, it will serve a directory listing.  Multi-Views means that browsers can request the content in a different format if it exists in the directory (ie, in a different language).  AllowOverride determines if an .htaccess file elsewhere can to override the configuration settings.  For now, None will indicate that no part can be overridden.  For Drupal or other content management systems, it’s possible we’ll want to change these directives, but we’ll cover that later.

The last two lines indicate that traffic can only reach this directory from the local machine, by typing http://localhost/~username in the browser.  For more on Apache security, see the Apache documentation.  If you would like to set it so that other computers on your network can also access this directory, change those last two lines to:

Order allow,deny
Allow from all

Either way, press Control + O to save the file and Control + X to exit.  Restart Apache for the changes to take effect using this command:

sudo apachectl restart

You may also be prompted at some point by OS X to accept incoming network connections for httpd (Apache); I would deny these as I only want access to my directory from my machine, but it’s up to you depending on your setup.

We’ll test this setup with php in the next step.

Test PHP

If you want to check php, you can create a new text document using your favorite text editor.  Type in:

<?php phpinfo(); ?>

Save the file as phpinfo.php in your username/sites directory (so for me, this is mfrazer > Sites)

Then, point your browser to http://localhost/~yourUserName/phpinfo.php  You should see a page of information regarding PHP and the web server, with a header that looks like this:

PHP Info Header

 

 

MySQL

Now, let’s install MySQL.  There’s two ways to do this.  We could go to the MySQL downloads page and use the installers.  The fantastic tutorials at Coolest Guy on the Planet both recommend this, and it’s a fine way to go.

But we can also use Homebrew, mentioned previously on this blog, which is a really convenient way to do things as long as we’re already using the command line.

First, we need to install homebrew.  Enter this at the command prompt:

ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"

Next, type in

brew doctor

If you receive the message: “Your system is raring to brew.” You’re ready to go.  If you get Warnings, don’t lose heart.  Most of them tell you exactly what you need to do to move forward.  Correct the errors and type in brew doctor again until you’re raring to go.  Then, type in the following command:

brew install mysql

That one’s pretty self-explanatory, no?  Homebrew will download and install MySQL, as of this writing version 5.6.10, but pay attention to the download to see the version – it’s in the URL.  After the installation succeeds, Homebrew will give some instructions on finishing the setup, including the commands we discuss below.

I’m going to pause for a second here and talk a little about permissions and directories.  If you get a “permission denied” error, trying running the command again using “sudo” at the beginning.  Remember, this elevates your permission to the administrator level.  Also, if you get a “directory does not exist” error, you can easily create the directory using “mkdir”.  Before we move on, let’s try to check for a directory you’re going to need coming up.  Enter:

cd /usr/local/var

If you are successfully able to change to that directory, great. If not, type in

sudo mkdir /usr/local/var

to create it. Then, let’s go back to our home directory by typing in

cd ~

Now, let’s continue with our procedure. First, we want to set up the databases to run with our user account.  So, we type in the following two commands:

unset TMPDIR
mysql_install_db --verbose --user=`whoami` --basedir="$(brew --prefix mysql)" --datadir=/usr/local/var/mysql --tmpdir=/tmp

The second command here installs the system databases; ‘whoami’ will automatically replace with your user name, so the above command should work verbatim.  But it also works to use your user name, with no quotes, (ie –user=mfrazer).

Next, we want to run the “secure installation” script. This helps you set root passwords without leaving the password in plain text in your editor. First we start the mysql server, then we run the installation scripts and follow the prompts to set your root password, etc:

mysql.server start
sudo /usr/local/Cellar/mysql/5.6.10/bin/mysql_secure_installation

After the script is complete, stop the mysql server.

mysql.server stop

Next, we want to set up MySQL so it starts at login. For that, we run the following two commands:

ln -sfv /usr/local/opt/mysql/*.plist ~/Library/LaunchAgents
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mysql.plist

The ln command, in this case, places a symbolic link to any .plist files in the mysql directory into the LaunchAgents directory.  Then, we load the plist using launchctl to start the server.

One last thing – we need to create one more link to the mysql.sock file.

cd /var/mysql/
sudo ln -s /tmp/mysql.sock

This creates a link to the mysql.sock file, which MySQL uses to communicate, but which resides by default in a tmp directory.  The first command places us in the directory where we want the link (remember, if it doesn’t exist, you can use “sudo mkdir /var/mysql/” to create it) and the second creates the link.

MySQL is ready to go!  And, so is your AMP stack.

But wait, there’s more…

One optional tool to install is phpMyAdmin.  This tool allows you to interact with your database through your browser so you don’t have to continue to use the command line.  I also think it’s a good way to test if everything is working correctly.

First, let’s download the necessary files from the phpMyAdmin website.  These will have a .tar.gz extension.  Place the file in your Sites directory, and double-click to unzip the file.

Rename the folder to remove the version number and everything after it.  I’m going to place the next steps below, but the Coolest Guy on the Planet tutorial referenced earlier does a good job of this step for OS 10.8 (just scroll down to phpMyAdmin) if you need screenshots.

Go to the command line and navigate to your phpMyAdmin directory.  Make a directory called config and change the permissions so that the installer can access the file.  This should looks something like:

cd ~/username/sites/phpMyAdmin
mkdir config
chmod o+w config

Let’s take a look at that last command: chmod changes the permissions on a file.  The o+w sets it so users who are not the directory’s owner can write to the file.

Now, in your browser, go to http://localhost/~username/sites/phpmyadmin/setup and follow these steps:

  1. Click on New Server (button on bottom)
  2. Click on Authentication tab, and enter the root password in the password field.
  3. Click on Save.
  4. Click on Save again on the main page.

Once the setup is finished, go to the Finder and move the config.inc.php file from the config directory into the main phpmyadmin directory and delete the config directory.  So in the end, it looks like this:

phpmyadminlast

Now, go to http://localhost/~username/sites/phpmyadmin in your browser and login with the root account.

You are ready to go!  In future parts of this series, we’ll look at building the AMP stack on Windows and adding Drupal or WordPress on top of the stack.  We will also look at maintaining your environment, as the AMP stack components will need updating occasionally.  Any other recipes you’d like to see?  Do you have questions? Let us know in the comments.

The following tutorials and pages were incredibly useful in writing this post.  While none of these tutorials are exactly the same as what we did here, they all contain useful pieces and may be helpful if you want to skip the explanation and just get to the commands:


How to Git

We have written about version control before at Tech Connect, most notably John Fink’s excellent overview of modern version control. But getting started with VC (I have to abbreviate it because the phrase comes up entirely too much in this post) is intimidating. If you are generally afraid of anything that reminds you of the DOS Prompt, you’re not alone and you’re also totally capable of learning Git.

DOS prompt madness

By the end of this post, we will still not understand what’s going on here.

But why should you learn git?

Because Version Control Isn’t Just for Nerds

OK, never mind, it is, it totally is. But VC is for all kinds of nerds, not just l33t programmers lurking in windowless offices.

Are you into digital preservation and/or personal archiving? Then VC is your wildest dream. It records your changes in meaningful chunks, documenting not just the final product but all the steps it took you to get there. VC repositories show who did what, too. If you care about nerdy things like provenance, then you care about VC. If co-authors would always use VC for their writing, we’d know all the answers to the truly pressing questions, like whether Gilles Deleuze or Félix Guattari wrote the passage “A concept is a brick. It can be used to build a courthouse of reason. Or it can be thrown through the window.”

Are you a web developer? Then knowing Git can get you on GitHub, and GitHub is an immense warehouse of awesomeness. Sure, you can always just download .zip files of other people’s projects, but GitHub also provides more valuable opportunities: you can showcase your awesome tools, your brilliant tweaks to other people’s projects, and you can give back to the community at whatever level you’re comfortable with, from filing bug reports to submitting actual code fixes.

Are you an instruction librarian? Have you ever shared lesson plans, or edited other people’s lesson plans, or inherited poorly documented lesson plans? Basically, have you been an instruction librarian in the past century? Well, I have good news for you: Git can track any text file, so your lessons can easily be versioned and collaborated upon just like software programs are. Did you forget that fun intro activity you used two years ago? Look through your repository’s previous commits to find it. Want to maintain several similar but slightly different lesson plans for different professors teaching the same class? You’ve just described branching, something that Git happens to be great at. The folks over at ProfHacker have written a series of articles on using Git and GitHub for collaborative writing and syllabus design.

Are you a cataloger? Versioning bibliographic records makes a lot of sense. A presentation at last year’s Code4Lib conference talked not only about versioning metadata but data in general, concluding that the approach had both strengths and weaknesses. It’s been proposed that putting bibliographic records under VC solves some of the issues with multiple libraries creating and reusing them.

As an added bonus, having a record’s history can enable interesting analyses of how metadata changes over time. There are powerful tools that take a Git repository’s history and create animated visualizations; to see this in action, take a look at the visualization of Penn State’s ScholarSphere application. Files are represented as nodes in a network map while small orbs which represent individual developers fly around shooting lasers at them. If we want to be a small orb that shoots lasers at nodes, and we definitely do, we need to learn Git.

Alright, so now we know Git is great, but how do we learn it?

It’s As Easy As git rebase -i 97c9d7d

Actually, it’s a lot easier. The author doesn’t even know what git rebase does, and yet here he is lecturing to you about Git.

First off, we need to install Git like any other piece of software. Head over to the official Git website’s downloads page and grab the version for your operating system. The process is pretty straight-forward but if you get stuck, there’s also a nice “Getting Started – Installing Git” chapter of the excellent Pro Git book which is hosted on the official site.

Alright, now that you’ve got Git installed it’s time to start VCing the heck out of some text files. It’s worth noting that there are software packages that put a graphical interface on top of Git, such as Tower and GitHub’s apps for Windows and Mac. There’s a very comprehensive list of graphical Git software on the official Git website. But the most cross-platform and surefire way to understand Git and be able to access all of its features is with the command line so that’s what we’ll be using.

So enough rambling, let’s pop open a terminal (Mac and Linux both have apps simply called “Terminal” and Windows users can try the Git Bash terminal that comes with the Git installer) and make it happen.

$ git clone https://github.com/LibraryCodeYearIG/Codeyear-IG-Github-Project.git
Cloning into 'Codeyear-IG-Github-Project'...
remote: Counting objects: 115, done.
remote: Compressing objects: 100% (73/73), done.
remote: Total 115 (delta 49), reused 108 (delta 42)
Receiving objects: 100% (115/115), 34.38 KiB, done.
Resolving deltas: 100% (49/49), done.
$ cd Codeyear-IG-Github-Project/

 

The $ above is meant to indicate our command prompt, so anything beginning with a $ is something we’re typing. Here we “cloned” a project from a Git repository existing on the web (line 1), which caused Git to give us a little information in return. All Git commands begin with git and most provide useful info about their usage or results. In line 2, we’ve moved inside the project’s folder with a “change directory” command.

We now have a Git repository on our computer, if you peek inside the folder you’ll see some text (specifically Markdown) files and an image or two. But what’s more: we have the project’s entire history too, pretty much every state that any file has been in since the beginning of time.

OK, since the beginning of the project, but still, is that not awesome? Oh, you’re not convinced? Let’s look at the project’s history.

$ git log
commit b006c1afb9acf78b90452b284a111aed4daee4ca
Author: Eric Phetteplace <phette23@gmail.com>
Date:   Fri Mar 1 15:27:47 2013 -0500

    a couple more links, write Getting Setup section

commit 83d92e4a1be0fdca571012cb39f84d86b21121c6
Author: Eric Phetteplace <phette23@gmail.com>
Date:   Fri Feb 22 01:04:24 2013 -0500

    link up the YouTube video

 

We can hit Q to exit the log. In the log, we see the author, date, and a brief description of each change. The terrifying random gibberish which follows the word “commit” is a hash, which is computer science speak for terrifying random gibberish. Think of it as a unique ID for each change in the project’s history.

OK, so we can see previous changes (“commits” in VC-speak, which is like Newspeak but less user friendly), we can even revert back to previous states, but we won’t do that for now. Instead, let’s add a new change to the project’s history. First, we open up the “List of People.mdown” file in the Getting Started folder and add our name to the list. Now the magic sauce.

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   Getting Started/List of People.mdown
#
no changes added to commit (use "git add" and/or "git commit -a")
$ git add "Getting Started/List of People.mdown"
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   Getting Started/List of People.mdown
#
$ git commit -m "adding my name"
$ git status
# On branch master
nothing to commit, working directory clean
$ git log
commit wTf1984doES8th1s3v3Nm34NWtf2666bAaAaAaAa
Author: Awesome Sauce <awesome@sau.ce>
Date:   Wed Mar 13 12:30:35 2013 -0500

    adding my name

commit b006c1afb9acf78b90452b284a111aed4daee4ca
Author: Eric Phetteplace <phette23@gmail.com>
Date:   Fri Mar 1 15:27:47 2013 -0500

    a couple more links, write Getting Setup section

 

Our change is in the project’s history! Isn’t it better than seeing your name on Hollywood Walk of Fame? Here’s precisely what we did:

First we asked for the status of the repository, which is an easy way of seeing what changes you’re working on and how far along they are to being added to the history. We’ll run status throughout this procedure to watch how it changes. Then we added our changes; this tells Git “hey, these are a deliberate set of changes and we’re ready to put them in the project’s history.” It may seem like an unnecessary step but adding select sets of files can help you segment your changes into meaningful, isolated chunks that make sense when viewing the log later. Finally, we commit our change and add a short description inside quotes. This finalizes the change, which we can see in the log command’s results.

I’m Lonely, So Lonely

Playing around with Git on our local computer can be fun, but it sure gets lonely. Yeah, we can roll back to previous versions or use branches to keep similar but separate versions of our files, but really we’re missing the best part of VC: collaboration. VC as a class of software was specifically designed to help multiple programmers work on the same project. The power and brilliance of Git shines best when we can selectively “merge” changes from multiple people into one master project.

Fortunately, we will cover this in a future post. For now, we can visit the LITA/ALCTS Library Code Year‘s GitHub Project—it’s the very same Git project we cloned earlier, so we already have a copy on our computer!—to learn more about collaboration and GitHub. GitHub is a website where people can share and cooperate on Git repositories. It’s been described as “the Facebook of code” because of its popularity and slick user interface. If that doesn’t convince you that GitHub is worth checking out, the site also has a sweet mascot that’s a cross between an octopus and a cat (an octocat). And that’s really all you need to know.

Gangam Octocat

This is an Octocat. It is Awesome.