Local Dev Environments For Newbies Part 2: AMP on Windows 7

Previously, we discussed the benefits of installing a local AMP stack (Apache, MySQL & PHP) for the purposes of development and testing, and walked through installing a stack in the Mac environment.  In this post, we will turn our attention to Windows.  (If you have not read Local Dev Environments for Newbies Part 1, and you are new to the AMP stack, you might want to go read the Introduction and Tips sections before continuing with this tutorial.)

Much like with the Mac stack, there are Windows stack installers that will do all of this for you.  For example, if you are looking to develop for Drupal, there’s an install package called Acquia that comes with a stack installer.   There’s also WAMPserver and XAMPP.  If you opt to go this route, you should do some research and decide which option is the best for you.  This article contains reviews of many of the main players, though it is a year old.

However, we are going to walk through each component manually so that we can see how it all works together.

So, let’s get going with Recipe 2 – Install the AMP Stack on Windows 7.

Prerequisites:

Notepad and Wordpad come with most Windows systems, but you may want to install a more robust code editor to edit configuration files and eventually, your code.  I prefer Notepad++, which is open source and provides much of the basic functionality needed in a code editor.  The examples here will reference Notepad++ but feel free to use whichever code editor works for you.

For our purposes, we are not going to allow traffic from outside the machine to access our test server.  If you need this functionality, you will need to open a port in your firewall on port 80.  Be very careful with this option.

As a prerequisite to installing Apache, we need to install the Visual C++ 2010 SP1 Redistributable Package x86.  As a pre-requisite to installing PHP, we need to install the Visual C++ 2008  SP1 Redistributable Package x86.

I create a directory called opt\local in my C drive to house all of the stack pieces.  I do this because it’s easier to find things on the command line when I need to and I like keeping development environment applications separate from Program Files.  I also create a directory called sites to house my web files.

wamp-opt copy

The last two prerequisites are more like common gotchas.  The first is that while you are manipulating configuration and initialization files throughout this process, you may find the Windows default view settings are getting in your way.  If this is the case, you can change it by going to Organize > Folder and search options > View tab.

wamp-windows

This will bring up a dialog which allows you to set preferences for the folder you are currently viewing.  You can select the option to “show hidden files” and uncheck the “hide file extensions” option, both of which make developing easier.

The other thing to know is that in our example, we will work with a Windows 7 installation – a 64-bit operating system.  However, when we get to PHP, you’ll notice that their website does not provide a 64-bit installer.  I have seen errors in the past when a 32-bit PHP installer and a 64-bit Apache version were both used, so we will install the 32-bit versions for both components.

Ok, I think we’re all set.  Let’s install Apache.

Apache

We want to download the .zip file for latest version.  For Windows binaries, I use apachelounge, which builds windows installer files.  For this example we’ll download httpd-2.4.4-win32.zip to the Desktop of our Windows machine.

wamp-apache1

Next, we want to extract files into chosen location for Apache directory, eg c:\opt\local\Apache24.  You can accomplish this a variety of ways but if you have WinZip, you can follow these steps:

  1. Copy the .zip folder to c:\opt\local
  2. Right-click and select “Extract all files”.
  3. Open the extracted folder, right-click on the Apache24 folder and select Cut.
  4. Go back up one directory and right-click to Paste the Apache24 folder, so that it now resides inside c:\opt\local.

No matter what unzip program you use, this is the configuration we are shooting for:wamp-apachedir

This extraction “installs” Apache; there is no installer to run, but we will need to configure a few things.

We want to open httpd.conf: this file contains all of the configuration settings for our web server.  If you followed the directions above, you can find the file in C:\opt\local\Apache24\conf\httpd.conf – we want to open it with our code editor and make the following changes:

1.  Find this line (in my copy, it’s line 37):

ServerRoot “c:/Apache24”

Change it to match the directory where you installed Apache.  In my case, it reads:

ServerRoot “c:/opt/local/Apache24”

You might notice that our slashes slant in the opposite direction from the usual Windows sytax.  In Windows, backslash ( \ ) delineates different directories, but in Unix, it’s forward slash ( / ).  Apache reads the configuration file in the Unix manner, even though we are working in Windows.  If you get a “directory not found” error at any point, check your slashes.

2.  At Line 58, we are going to change the listen command to just listen to our machine.  Change

Listen 80

to

Listen localhost:80

3.  There are 100 lines around 72-172 that all start with LoadModule.  Some of these are comments (they begin with a “#”).  Later on, you may need to uncomment some of these for a certain web program to work, like SSL.  For now, though, we’ll leave these as is.

4.  Next, we want to change our Document Root and the directory directive to the directory which has the web files.  These lines (beginning on line 237 in my copy) read:

DocumentRoot “c:/Apache24/htdocs”

Later, we’ll want to change this to our “sites” folder we created earlier.  For now, we’re just going to change this to the Apache installation directory for testing.  So, it should read:

DocumentRoot “c:/opt/local/Apache24/htdocs”

Save the httpd.conf file.  (In two of our test cases, after saving the file, closing and re-opening, the file appeared unchanged.  If you are having issues, try doing Save As and save the file to your desktop, then drag it into c:\opt\local\Apache24).

Next, we want to test our Apache configuration.  To do this, we open the command line.  In Windows, you can do this by going to the Start Menu, and typing

cmd.exe

in the Search box.  Then, press Enter.  Once you’re in the command prompt, type in

cd \opt\local\Apache24\bin

(Note that the first part of this path is the install directory I used above.  If you chose a different directory to install Apache, use that instead.)  Next, we start the web server with a “-t” flag to test it.  Type in:

httpd –t

If you get a Syntax OK, you’re golden.

wamp-apache4

Otherwise, try to resolve any errors based on the error message. If the error message does not make any sense after checking your code for typos, go back and make sure that your changes to httpd.conf did actually save.

Once you get Syntax OK, type in:

httpd

This will start the web server.  You should not get a message regarding the firewall if you changed the listen command to localhost:80.  But, if you do, decide what traffic you want to allow to your machine.  I would click “Cancel” instead of “Allow Access”, because I don’t want to allow outside access.

Now the server is running.  You’ll notice that you no longer have a C:\> prompt in the Command window.  To test our server, we open a browser and type in http://localhost  – you should get a website with text that reads “It works!”

wamp-apache5

Instead of starting up the server this way every time, we want to install it as a Windows service.  So, let’s go back to our command prompt and press Ctrl+C to stop web server.  You should now have a prompt again.

To install Apache as a service, type:

httpd.exe –k install

You will most likely get an error that looks like this:

wamp-apache6

We need to run our command prompt as an administrator.  So, let’s close the cmd.exe window and go back to our Start menu.  Go to Start > All Programs > Accessories and right-click on Command Prompt.  Select “Run As Administrator”.

(Note: If for some reason you do not have the ability to right-click, there’s a “How-To Geek” post with a great tip.  Go to the Start menu and in the Run box, type in cmd.exe as we did before, but instead of hitting Enter, hit Ctrl+Shift+Enter.  This does the same thing as the right-click step above.)

Click on Yes at the prompt that comes up, allowing the program to make changes.  You’ll notice that instead of starting in our user directory, we are starting in Windows\system32 So, let’s go back to our bin directory with:

cd \opt\local\Apache24\bin

Now, we can run our

httpd.exe –k install

command again, and it should succeed.  To start the service, we want to open our Services Dialog, located in the Control Panel (Start Menu > Control Panel) in the Administrative Tools section.  If you display your Control Panel by category (the default), you click on System & Security, then Administrative Tools.  If you display your control panel by small icon, Administrative Tools should be listed.

Double click on Services.

wamp-apache8

Find Apache2.4 in the list and select it.  Verify that the Startup Type is set to Automatic if you want the Service to start automatically (if you would prefer that the Service only start at certain times, change this to Manual, but remember that you have to come back in here to start it).  With Apache2.4 selected, click on Start Service in the left hand column.

wamp-apacheserv

Go back to the browser and hit Refresh to verify that everything is still working.  It should still say “It Works!”  And with that affirmation, let’s move to PHP.

 PHP

(Before installing PHP, make sure you have installed the Visual C++ 2008 Redistributable Package from the prerequisite section.)

For our purposes, we want to use the Thread Safe .zip from the PHP Downloads page.    Because we are running PHP under Apache, but not as a CGI, we use the thread safe version.  (For more on thread safe vs. non-thread safe, see this Wikipedia entry or this stackoverflow post)

PHP Download

Once you’ve downloaded the .zip file, extract it to your \opt\local directory.  Then, rename the folder to simply “php”.  As with Apache24, extracting the files does the “install”, we just need to configure everything to run properly.  Go to the directory where you installed PHP, (in my case, c:\opt\local\php) and find php.ini-development.

Make a copy of the file and rename the copy php.ini (this is one of those places where you may want to set the Folder and search options if you’re having problems).

PHP ini file

Open the file in Notepad++ (or your code editor of choice).  Note that here, comments are preceded by a “;” (without quotes) and the directories are delineated using the standard Windows format, with a “\”.  Most of the document is commented out, and includes a large section on recommended settings for production and development, so if you’re not sure of the changes to make you can check in the file (in addition to the PHP documentation).  For this tutorial, we want to make the following changes:

1.  On line 708, uncomment (remove semi-colon) include_path under “Windows” and make sure it matches the directory where you installed PHP (if the line numbers have changed, just search for Paths and Directories).

wamp-php2
2.  On line 730, uncomment the Windows directive for extension_dir and change extension_dir to match c:\opt\local\php\ext

wamp-php3
3.  Beginning on Line 868, in the Windows Extensions section, uncomment (remove the semi-colon) from the following lines (they are not right next to each other, they’re in a longer list, but we want these three uncommented):

extension=php_mysql.dll
extension=php_mysqli.dll
extension=php_pdo_mysql.dll

Save php.ini file.

You may want to double-check that the .dll files we enabled above are actually in the c:\opt\local\php\ext folder before trying to run php, because you will see an error if they are not there.

Next, we want to add the php directory to our path environment variables.  This section is a little tricky; be *extremely* careful when you are making changes to system settings like this.

First, we navigate to the Environment variables by opening the Control Panel and going to System & Security > System > Advanced System Settings > Environment Variables.

In the bottom scroll box, scroll until you find “Path”, click on it, then click on Edit.

wamp-php6

Append the following to the end of the Variable Value list (the semi-colon ends the previous item, then we add our installation path).

;c:\opt\local\php

wamp-php7

Click OK and continue to do so until you are out of the dialog.

Lastly, we need to add some lines to the httpd.conf so that Apache will play nice with PHP.  The httpd.conf file may still be open in your text editor.  If not, go back to c:\opt\local\Apache24\conf and open it.  At the bottom of this file, we need to add the following:

LoadModule php5_module "c:/opt/local/php/php5apache2_4.dll"
AddHandler application/x-httpd-php .php
PHPIniDir "c:/opt/local/php"

This tells Apache where to find php and loads the module needed to work with PHP.  (Note:  php5apache2_4.dll must be installed in the directory you specified above in the LoadModule statement.  It should have been extracted with the other files, but to download the file if it is not there, you can go to the apachelounge additional downloads page.)

While we’re in this file, we also want to tell Apache to look for an index.php file.  We’ll need this for testing, but also for some content management systems.  To do this, we change the DirectoryIndex directive on line 271.  It should look like

<IfModule dir_module>
  DirectoryIndex index.html

We want to change the DirectoryIndex line so it reads

DirectoryIndex index.php index.html

Save httpd.conf.

Before we restart Apache to pick up these changes, we’re going to do one last thing.  To test our php, we want to create a file called index.php with the following text inside:

<!--?php <span class="hiddenSpellError" pre="php "-->phpinfo(); ?&gt;

Save it to c:\opt\local\Apache24\htdocs

wamp-php5

Restart Apache by going back to the Services dialog.  (If you closed it, it’s Control Panel > System & Security > Administrative Tools > Services).  Click on Apache2.4 and then click on Restart.

wamp-php8

If you get an error, you can always go back to the command line, navigate to c:\opt\local\Apache24\bin and run httpd.exe –t again.  This will check your syntax, which is most likely to the be problem.  (This page is also helpful in troubleshooting PHP 5.4 and Apache if you are having issues.)

Open a browser window and type in http://localhost – instead of “It Works!” you should see a list configuration settings for PHP.  (In one of our test cases, the tester needed to close Internet Explorer re-open it for this part to work.)

wamp-php9

Now, we move to the database.

MySQL

To install MySQL, we can follow the directions at the MySQL site.  For the purposes of this tutorial, we’re going to use the most recent version as of this writing, which is 5.6.11.  To download the files we need, we go to the Community Server download page.

MySQL Downloads

Again, we can absolutely use the installer here, which is the first option.  The MySQL installers will prompt you through the setup, and this video does a great job of walking through the process.

But, the since the goal of this tutorial is to see all the parts, I’m going to run through the setup manually.  First, we download the .zip archive.  Choose the .zip file which matches your operating system; I will choose 64-bit (there’s no agreement issue here).  Extract the files to c:\opt\local\mysql.  We do this in the same way we did the Apache24 files above.

Since we’re installing to our opt\local drive, we need to tell MySQL to look there for the program files and the data.  We do this by setting up an option file.  We can modify a file provided for us called my-default.ini.  Change the name to my.ini and open it with your code editor.

wamp-mysql2

In the MySQL config files, we use the Unix directory “/” again, and the comments are again preceded by a “#”.  So, to set our locations, we want to remove the # from the beginning of the basedir and datadir lines, and change to our installation directory as shown below.

wamp-mysql3

Then save my.ini.

As with Apache, we’re going to start MySQL for the first time from the command line, to make sure everything is working ok.  If you still have it open, navigate back there.  If not, remember to select the Run As Administrator option.

From your command prompt, type in

cd \opt\local\mysql\bin
mysqld --console

You should see a bunch of statements scroll by as the first database is created.  You may also get a firewall popup.  I hit Cancel here, so as not to allow access from outside my computer to the MySQL databases.

Ctrl+C to stop the server.  Now, let’s install MySQL as a service.  To do that, we type the command:

mysqld --install

wamp-mysql4

Next, we want to start the MySQL service, so we need to go back to Services.  You may have to Refresh the list in order to see the MySQL service.  You can do this by going to Action > Refresh in the menu.

wamp-mysql5

Then, we start the service my clicking on MySQL and clicking Start Service on the left hand side.

wamp-mysql6

 

One thing about installing MySQL in this manner is that the initial root user for the database will not have a password.  To see this, go back to your command line.  Type in

mysql -u root

This will open the command line MySQL client and allow you to run queries.  The -u flag sets the user, in this case, root.  Notice you are not prompted for a password.  Type in:

select user, host, password from mysql.user;

This command should show all the created user accounts, the hosts from which they can log in, and their passwords.  The semi-colon at the end is crucial – it signifies the end of a SQL command.

wamp-mysql8

Notice in the output that the password column is blank.  MySQL provides documentation on how to fix this on the Securing the Initial Accounts documentation page, but we’ll also step through it here.  We want to use the SET PASSWORD command to set the password for all of the root accounts.

Substituting the password you want for newpwd (keep the single quotes in the command), type in

SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpwd');
SET PASSWORD FOR 'root'@'127.0.0.1' = PASSWORD('newpwd');
SET PASSWORD FOR 'root'@'::1' = PASSWORD('newpwd');

You should get a confirmation after every command.  Now, if you run the select user command from above, you’ll see that there are values in the password field, equivalent to encrypted versions of what you specified.

A note about security: I am not a security expert and for a development stack we are usually less concerned with security.  But it is generally not a good idea to type in plain text passwords in the command line, because if the commands are being logged you’ve just saved your password in a plain text file that someone can access.  In this case, we have not turned on any logging, and the SET PASSWORD should not store the password in plain text.  But, this is something to keep in mind.

As before with Mac OS X, we could stop here.  But then you would have to administer the MySQL databases using the command line.  So we’ll install phpMyAdmin to make it a little easier and test to see how our web server works with our sites folder.

phpMyAdmin

Download the phpmyadmin.zip file from the phpmyadmin page to the sites folder we created all the way at the beginning.  Note that this does *not* go into the opt folder.

Extract the files to a folder called phpmyadmin using the same methods we’ve used previously.

wamp-phpmyadmin

Since we now want to use our sites folder instead of the default htdocs folder, we will need to change the DocumentRoot and Directory directives on lines 237 and 238 of our Apache config file.  So, open httpd.conf again.

We want to change the DocumentRoot to sites, and we’re going to set up the phpMyAdmin directory.

Change Document Root and Directory

Save the httpd.conf file.  Go back to Services and Restart the Apache2.4 service.

We will complete the configuration through the browser.  First, open the browser and try to navigate to http://localhost again.  You should get a 403 error.

wamp-phpmyadmin4

Instead, navigate to http://localhost/phpmyadmin/setup

wamp-phpmyadmin5

Click on the New Server button to set up a connection to our MySQL databases.  Double check that under the Basic Settings tab, the Server Name is set to localhost, and then click on Authentication.  Verify that the type is “cookie”.

At the bottom of the page, click on Save.  Now, change the address in the browser to http://localhost/phpmyadmin and log in with the root user, using the password you set above.

And that’s it.  Your Windows AMP stack should be ready to go.

In the next post, we’ll talk about how to install a content management system like WordPress or Drupal on top of the base stack.  Questions, comments or other recipes you would like to see?  Let us know in the comments.

 


Coding & Collaboration on GitHub

Previously on Tech Connect we wrote about the Git version control system, walking you through “cloning” a project onto to your computer, making some small changes, and committing them to the project’s history. But that post concluded on a sad note: all we could do was work by ourselves, fiddling with Git on our own computer and gaining nothing from the software’s ability to manage multiple contributors. Well, here we will return to Git to specifically cover GitHub, one of the most popular code-sharing websites around.

Git vs. GitHub

Git is open source version control software. You don’t need to rely on any third-party service to use it and you can benefit from many of its features even if you’re working on your own.

GitHub, on the other hand, is a company that hosts Git repositories on their website. If you allow your code to be publicly viewable, then you can host your repository for free. If you want to have a private repository, then you have to pay for a subscription.

GitHub layers some unique features on top of Git. There’s an Issues queue where bug reports and feature requests can be tracked and assigned to contributors. Every project has a Graphs section where interesting information, such as number of lines added and deleted over time, is charted (see the graphs for jQuery, for instance). You can create gists which are mini-repositories, great for sharing or storing snippets of useful code. There’s even a Wiki feature where a project can publish editable documentation and examples. All of these nice features build upon, but ultimately have little to do with, Git.

Collaboration

GitHub is so successful because of how well it facilitates collaboration. Hosted version control repositories are nothing new; SourceForge has been doing this since 1999, almost a decade prior to GitHub’s founding in 2008. But something about GitHub has struck a chord and it’s taken off like wildfire. Depending on how you count, it’s the most popular collection of open source code, over SourceForge and Google Code.[1] The New York Times profiled co-founder Tom Preston-Werner. It’s inspired spin-offs, like Pixelapse which has been called “GitHub for Photoshop” and Docracy which TechCrunch called “GitHub for legal documents.” In fact, just like the phrase “It’s Facebook for {{insert obscure user group}}” became a common descriptor for up-and-coming social networks, “It’s GitHub for {{insert non-code document}}” has become commonplace. There are many inventive projects which use GitHub as more than just a collection of code (more on this later).

Perhaps GitHub’s popularity is due to Git’s own popularity, though similar sites host Git repositories too.[2] Perhaps the GitHub website simply implements better features than its competitors. Whatever the reason, it’s certain that GitHub does a marvelous job of allowing multiple people to manage and work on a project.

Fork It, Bop It, Pull It

Let’s focus two nice features of GitHub—Forking and the Pull Request [3]—to see exactly why GitHub is so great for collaboration.

If you recall our prior post on Git, we cloned a public repository from GitHub and made some minor changes. Then, when reviewing the results of git log, we could see that our changes were present in the project’s history. That’s great, but how would we go about getting our changes back into the original project?

For the actual step-by-step process, see the LibCodeYear GitHub Project’s instructions. There are basically only two changes from our previous process, one at the very beginning and one at the end.

GItHub's Fork Button

First, start by forking the repository you want to work on. To do so, set up a GitHub account, sign in, visit the repository, and click the Fork button in the upper right. After a pretty sweet animation of a book being scanned, a new project (identical to the original in both name and files) will appear on your GitHub account. You can then clone this forked repository onto your local computer by running git clone on the command line and supplying the URL listed on GitHub.

Now you can do your editing. This part is the same as using Git without GitHub. As you change files and commit changes to the repository, the history of your cloned version and the one on your GitHub account diverge. By running git push you “push” your local changes up to GitHub’s remote server. Git will prompt you for your GitHub password, which can get annoying after a while so you may want to set up an SSH key on GitHub so that you don’t need to type it in each time. Once you’ve pushed, if you visit the repository on GitHub and click the “commits” tab right above the file browser, you can see that your local changes have been published to GitHub. However, they’re still not in the original repository, which is underneath someone else’s account. How do you add your changes to the original account?

GitHub's Pull Request Button

In your forked repository on GitHub, something is different: there’s a Pull Request button in the same upper right area where the Fork one is. Click that button to initiate a pull request. After you click it, you can choose which branches on your GitHub repository to push to the original GitHub repository, as well as write a note explaining your changes. When you submit the request, a message is sent to the project’s owners. Part of the beauty of GitHub is in how pull requests are implemented. When you send one, an issue is automatically opened in the receiving project’s Issues queue. Any GitHub account can comment on public pull requests, connecting them to open issues (e.g. “this fixes bug #43″) or calling upon other contributors to review the request. Then, when the request is approved, its changes are merged into the original repository.

diagram of forking & pulling on GitHub

“Pull Request” might seem like a strange term. “Push” is the name of the command that takes commits from your local computer and adds them to some remote server, such as your GitHub account. So shouldn’t it be called a “push request” since you’re essentially pushing from your GitHub account to another one? Think of it this way: you are requesting that your changes be pulled (e.g. the git pull command) into the original project. Honestly, “push request” might be just as descriptive, but for whatever reason GitHub went with “pull request.”

GitHub Applications

While hopefully we’ve convinced you that the command line is a fine way to do things, GitHub also offers Mac and Windows applications. These apps are well-designed and turn the entire process of creating and publishing a Git repository into a point-and-click affair. For instance, here is the fork-edit-pull request workflow from earlier except done entirely through a GitHub app:

  • Visit the original repository’s page, click Fork
  • On your repository’s page, select “Clone in Mac” or “Clone in Windows” depending on which OS you’re using. The repository will be cloned onto your computer
  • Make your changes and then, when you’re ready to commit, open up the GitHub app, selecting the repository from the list of your local ones
  • Type in a commit message and press Commit
    writing a commit message in GitHub for Windows
  • To sync changes with GitHub, click Sync
  • Return to the repository on GitHub, where you can click the Pull Request button and continue from there

GitHub without the command line, amazing! You can even work with local Git repositories, using the app to do commits and view previous changes, without ever pushing to GitHub. This is particularly useful on Windows, where installing Git can have a few more hurdles. Since the GitHub for Windows app comes bundled with Git, a simple installation and login can get you up-and-running. The apps also make the process of pushing a local repository to GitHub incredibly easy, whereas there are a few steps otherwise. The apps’ visual display of “diffs” (differences in a file between versions, with added and deleted lines highlighted) and handy shortcuts to revert to particular commits can appeal even to those of us that love the command line.

viewing a diff in GitHub for Windows

More than Code

In my previous post on Git, I noted that version control has applications far beyond coding. GitHub hosts a number of inventive projects that demonstrate this.

  • The Code4Lib community hosts an Antiharassment Policy on GitHub. Those in support can simply fork the repository and add their name to a text file, while the policy’s entire revision history is present online as well
  • The city of Philadelphia experimented with using GitHub for procurements with successful results
  • ProfHacker just wrapped up a series on GitHub, ending by discussing what it would mean to “fork the academy” and combine scholarly publishing with forking and pull requests
  • The Jekyll static-site generator makes it possible to generate a blog on GitHub
  • The Homebrew package manager for Mac makes extensive use of Git to manage the various formulae for its software packages. For instance, if you want to roll back to a previous version of an installed package, you run brew versions $PACKAGE where $PACKAGE is the name of the package. That command prints a list of Git commits associated with older versions of the package, so you can enter the Homebrew repository and run a Git command like git checkout 0476235 /usr/local/Library/Formula/gettext.rb to get the installation formula for version 0.17 of the gettext package.

These wonderful examples aside, GitHub is not a magic panacea for coding, collaboration, or any of the problems facing libraries. GitHub can be an impediment to those who are intimidated or simply not sold on the value of learning what’s traditionally been a software development tool. On the Code4Lib listserv, it was noted that the small number of signatories on the Antiharassment Policy might actually be due to its being hosted on GitHub. I struggle to sell people on my campus of the value of Google Docs with its collaborative editing features. So, as much as I’d like the Strategic Plan the college is producing to be on GitHub where everyone could submit pull requests and comment on commits, it’s not necessarily the best platform. It is important, however, not to think of it as limited purely to versioning code written by professional developers. GitHub has uses for amateurs and non-coders alike.

Footnotes

[1]^ GitHub Has Passed SourceForge, (June 2, 2011), ReadWrite.

[2]^ Previously-mentioned SourceForge also supports Git, as does Bitbucket.

[3]^ I think this would make an excellent band name, by the way.


A Librarian’s Guide to OpenRefine

Academic librarians working in technical roles may rarely see stacks of books, but they doubtless see messy digital data on a daily basis. OpenRefine is an extremely useful tool for dealing with this data without sophisticated scripting skills and with a very low learning curve. Once you learn a few tricks with it, you may never need to force a student worker to copy and paste items onto Excel spreadsheets.

As this comparison by the creator of OpenRefine shows, the best use for the tool is to explore and transform data, and it allows you to make edits to many cells and rows at once while still seeing your data. This allows you to experiment and undo mistakes easily, which is a great advantage over databases or scripting where you can’t always see what’s happening or undo the typo you made. It’s also a lot faster than editing cell by cell like you would do with a spreadsheet.

Here’s an example of a project that I did in a spreadsheet and took hours, but then I redid in Google Refine and took a lot less time. One of the quickest things to do with OpenRefine is spot words or phrases that are almost the same, and possibly are the same thing. Recently I needed to turn a large export of data from the catalog into data that I could load into my institutional repository. There were only certain allowed values that could be used in the controlled vocabulary in the repository, so I had to modify the bibliographic data from the catalog (which was of course in more or less proper AACR2 style) to match the vocabularies available in the repository. The problem was that the data I had wasn’t consistent–there were multiple types of abbreviations, extra spaces, extra punctuation, and outright misspellings. An example is the History Department. I can look at “Department of History”, “Dep. of History”, “Dep of Hist.” and tell these are probably all referring to the same thing, but it’s difficult to predict those potential spellings. While I could deal with much of this with regular expressions in a text editor and find and replace in Excel, I kept running into additional problems that I couldn’t spot until I got an error. It took several attempts of loading the data until I cleared out all the errors.

In OpenRefine this is a much simpler task, since you can use it to find everything that probably is the same thing despite the slight differences in spelling, punctuation and spelling. So rather than trying to write a regular expression that accounts for all the differences between “Department of History”, “Dep. of History”, “Dep of Hist.”, you can find all the clusters of text that include those elements and change them all in one shot to “History”. I will have more detailed instructions on how to do this below.

Installation and Basics

OpenRefine was called, until last October, Google Refine, and while the content from the Google Refine page is being moved to the Open Refine page you should plan to look at both sites. Documentation and video tutorials refer interchangeably to Google Refine and OpenRefine. The official and current documentation is on the OpenRefine GitHub wiki. For specific questions you will probably want to use the OpenRefine Custom Search Engine, which brings together all the mix of documentation and tutorials on the web. OpenRefine is a web app that runs on your computer, so you don’t need an internet connection to run it. You can get the installation instructions on this page.

While you can jump in right away and get started playing around, it is well worth your time to watch the tutorial videos, which will cover the basic actions you need to take to start working with data. As I said, the learning curve is low, but not all of the commands will make sense until you see them in action. These videos will also give you an idea of what you might be able to do with a data set you have lying around. You may also want to browse the “recipes” on the OpenRefine site, as well search online for additional interesting things people have done. You will probably think of more ideas about what to try. The most important thing to know about OpenRefine is that you can undo anything, and go back to the beginning of the project before you messed up.

A basic understanding of the Google Refine Expression Language, or GREL will improve your ability to work with data. There isn’t a whole lot of detailed documentation, so you should feel free to experiment and see what happens when you try different functions. You will see from the tutorial videos the basics you need to know. Another essential tool is regular expressions. So much of the data you will be starting with is structured data (even if it’s not perfectly structured) that you will need to turn into something else. Regular expressions help you find patterns which you can use to break apart strings into something else. Spending a few minutes understanding regular expression syntax will save hours of inefficient find and replace. There are many tutorials–my go-to source is this one. The good news for librarians is that if you can construct a Dewey Decimal call number, you can construct a regular expression!

Some ideas for librarians

 

(A) Typos

Above I described how you would use OpenRefine to clean up messy and inconsistent catalog data. Here’s how to do it. Load in the data, and select “Text Facet” on the column in question. OpenRefine will show clusters of text that is similar and probably the same thing.

AcademicDept Text Facet

AcademicDept Text Facet

 

Click on Cluster to get a menu for working with multiple values. You can click on the “Merge” check box and then edit the text to whatever you need it to be. You can also edit each text cluster to be the correct text.

Cluster and Edit

Cluster and Edit

You can merge and re-cluster until you have fixed all the typos. Back on the first Text Facet, you can hover over any value to edit it. That way even if the automatic clustering misses some you can edit the errors, or change anything that is the same but you need to look different–for instance, change “Dept. of English” to just “English”.

(B) Bibliographies

The main thing that I have used OpenRefine for in my daily work is to change a bibliography in plain text into columns in a spreadsheet that I can run against an API. This was inspired by this article in the Code4Lib Journal: “Using XSLT and Google Scripts to Streamline Populating an Institutional Repository” by Stephen X. Flynn, Catalina Oyler, and Marsha Miles. I wanted to find a way to turn a text CV into something that would work with the SHERPA/RoMEO API, so that I could find out which past faculty publications could be posted in the institutional repository. Since CVs are lists of data presented in a structured format but with some inconsistencies, OpenRefine makes it very easy to present the data in a certain way as well as remove the inconsistencies, and then to extend the data with a web service. This is a very basic set of instructions for how to accomplish this.

The main thing to accomplish is to put the journal title in its own column. Here’s an example citation in APA format, in which I’ve colored all the “separator” punctuation in red:

Heller, M. (2011). A Review of “Strategic Planning for Social Media in Libraries”. Journal of Electronic Resources Librarianship, 24 (4), 339-240)

From the drop-down menu at the top of the column click on “Split into several columns…” from the “Edit Column” menu. You will get a menu like the one below. This example finds the opening parenthesis and removes that in creating a new column. The author’s name is its own column, and the rest of the text is in another column.

Spit into columns

 

The rest of the column works the same way–find the next text, punctuation, or spacing that indicates a separation. You can then rename the column to be something that makes sense. In the end, you will end up with something like this:

Split columns

When you have the journal titles separate, you may want to cluster the text and make sure that the journals have consistent titles or anything else to clean up the titles. Now you are a ready to build on this data with fetching data from a web service. The third video tutorial posted above will explain the basic idea, and this tutorial is also helpful. Use the pull-down menu at the top of the journal column to select “Edit column” and then “Add column by fetching URLs…”. You will get a box that will help you construct the right URL. You need to format your URL in the way required by SHERPA/RoMEO, and will need a free API key. For the purposes of this example, you can use 'http://www.sherpa.ac.uk/romeo/api29.php?ak=[YOUR API KEY HERE]&qtype=starts&jtitle=' + escape(value,'url'). Note that it will give you a preview to see if the URL is formatted in the way you expect. Give your column a name, and set the Throttle delay, which will keep the service from rejecting too many requests in a short time. I found 1000 worked fine.

refine7

After this runs, you will get a new column with the XML returned by SHERPA/RoMEO. You can use this to pull out anything you need, but for this example I want to get pre-archiving and post-archiving policies, as well as the conditions. A quick way to to this is to use the Googe Refine Expression Language parseHtml function. To use this, click on “Add column based on this column” from the “Edit Column” menu, and you will get a menu to fill in an expression.

refine91

In this example I use the code value.parseHtml().select("prearchiving")[0].htmlText(), which selects just the text from within the prearchving element. Conditions are a little different, since there are multiple conditions for each journal. In that case, you would use the following syntax (after join you can put whatever separator you want): forEach(value.parseHtml().select("condition"),v,v.htmlText()).join(". ")"

So in the end, you will end up with a neatly structured spreadsheet from your original CV with all the bibliographic information in its own column and the publisher conditions listed. You can imagine the possibilities for additional APIs to use–for instance, the WorldCat API could help you determine which faculty published books the library owns.

Once you find a set of actions that gets your desired result, you can save them for the future or to share with others. Click on Undo/Redo and then the Extract option. You will get a description of the actions you took, plus those actions represented in JSON.

refine13

Unselect the checkboxes next to any mistakes you made, and then copy and paste the text somewhere you can find it again. I have the full JSON for the example above in a Gist here. Make sure that if you save your JSON publicly you remove your personal API key! When you want to run the same recipe in the future, click on the Undo/Redo tab and then choose Apply. It will run through the steps for you. Note that if you have a mistake in your data you won’t catch it until it’s all finished, so make sure that you check the formatting of the data before running this script.

Learning More and Giving Back

Hopefully this quick tutorial got you excited about OpenRefine and thinking about what you can do. I encourage you to read through the list of External Resources to get additional ideas, some of which are library related. There is lots more to learn and lots of recipes you can create to share with the library community.

Have you used OpenRefine? Share how you’ve used it, and post your recipes.

 


Playing with JavaScript and JQuery – the Ebook link HTML string generator and the EZproxy bookmarklet generator

In this post, I will describe two cases in which I solved a practical problem with a little bit of JavaScript and JQuery. Check them out first here before reading the rest of the post which will explain how the code works.

  1. Library ebook link HTML string generator
  2. EZproxy bookmarklet generator – Longer version (with EZproxy Suffix)
  3. EZproxy bookmarklet generator – Shorter version (with EZproxy Prefix)

Source: http://www.flickr.com/photos/albaum/448573998/

1. Library ebook link HTML string generator

If you are managing a library website, you will be using a lot of hyperlinks for library resources. Sometimes you can distribute some of the repetitive tasks to student workers, but they are usually not familiar with HTML. I had a situation in which I needed to add a number of links for library e-books for my library’s Course E-book LibGuide page, when I was swamped with many other projects at the same time. So I wondered if I could somehow still use the library’s student assistant’s help by creating an providing a simple tool, so that the student only needs to input the link title and url and get me the result as HTML. This way, I can still delegate some work when I am swamped with other projects that require my attention. (N.B. Needless to say, this doesn’t mean that what I did was the best way to use the student assistance for this type of work. I didn’t want them to edit the page directly because this page had tabs tabs and the student using the WYSWYG editor might inadvertently remove part of the tabbed box code.)

The following code exactly does that.

This HTML form takes an e-book title and the link to the book as input and spits out a hyperlink as a list item as a result. For example, if you fill in the title field with ‘Bradley’s Neurology in Clinical Practice’ and the link field with its url: http://ezproxy.fiu.edu/login?url=http://www.mdconsult.com/public/book/view?title=Daroff:+Bradley’s+Neurology+in+Clinical+Practice, then the result would be shown in the text area : <li><a href=”http://ezproxy.fiu.edu/login?url=http://www.mdconsult.com/public/book/view?title=Daroff:+Bradley’s+Neurology+in+Clinical+Practice”> Bradley’s Neurology in Clinical Practice</a></li>  I also wanted the library student assistant to be able to do this for many e-books at once and just send me the whole set of HTML snippets that cover all ebooks. So after running the form once, if one fills out the title and the link field with another e-book information, the result would be added to the end of the previous HTML string and be displayed in the text area. The result would be like this: <li><a href=”http://ezproxy.fiu.edu/login?url=http://www.mdconsult.com/public/book/view?title=Daroff:+Bradley’s+Neurology+in+Clinical+Practice”> Bradley’s Neurology in Clinical Practice</a></li><li><a href=”http://ezproxy.fiu.edu/login?url=http://www.accessmedicine.com/resourceTOC.aspx?resourceID=64″>Cardiovascular Physiology</a></li>. Since the code is in the text area, the student can also edit if there was any error when s/he was filling out the form after clicking the button ‘Send to Text Area’.

Now, let’s take a look at what is going on behind the scene. This is the entire html file. The Javascript/JQuery code that is generating the html string in the text area is from line 22-32.

<html>
<head>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.0/jquery.min.js"></script>
</head>
<body>
	<p>
		This page creates the html-friendly string for a link with a title and a url. 
		<ul>
		<li>Fill out the form below and click the button. </li>
		<li>If the link is messy, clean up first by using the <a href="http://meyerweb.com/eric/tools/dencoder/">URL decoder</a>.</li>
		<li>Copy and paste the result inside the text area after you are done.</li>
		</ul>	
	</p>
	<p>
		Title: <input type="text" id="title1" size="100"/><br/>
		Link: <input type="text" id="link1" size="100"/><br/>
		<button id="b1">Send to Text Area</button>
	</p>
	<p id="result1"></p>
	<textarea rows="20" cols="80"></textarea>

	<script type="text/javascript">
	$(document).ready(function(){	
	  $("#b1").click(function(){
	  //  alert('<a href="'+$("#link").val()+'">' + $("#title").val()+"</a>");
	    $('#result1').text('<li><a href="'+$("#link1").val()+'">' + $("#title1").val()+"</a></li>");
	    var res1='<li><a href="'+$("#link1").val()+'">' + $("#title1").val()+"</a></li>";
	    $('textarea').val($('textarea').val()+res1);
	 //   $('textarea').text(res1);
	  });
	}); // doc ready
	</script>

</body>
</html>

Since I am using JQuery I am starting with the obligatory $(document).ready in line 2. In line 3,  I am giving a callback function that will be executed when the #b1 – the button with the id of b1 in line 17 above- is clicked.  Line 4 is commented out. I used this initially to test if I am getting the right string out of the input from the title and the link field using the JS alert. Line 5 is filling the p tag with the id of result 1 in line 19 above with the thus-created string. The string is also saved in variable res1 in line 7. Then it is attached to the content of the textarea field in line 8. Line 9 is commented out. If you use line 9 instead of line 8, any existing content in the textarea will be removed and only the new string created from the title and the link field will show up in the text area.

<script type="text/javascript">
	$(document).ready(function(){	
	  $("#b1").click(function(){
	  //  alert('<a href="'+$("#link").val()+'">' + $("#title").val()+"</a>");
	    $('#result1').text('<li><a href="'+$("#link1").val()+'">' + $("#title1").val()+"</a></li>");
	    var res1='<li><a href="'+$("#link1").val()+'">' + $("#title1").val()+"</a></li>";
	    $('textarea').val($('textarea').val()+res1);
	 //   $('textarea').text(res1);
	  });
	}); // doc ready
</script>

You do not have to know a lot about scripting to solve a simple task like this!

2. EZproxy bookmarklet generator – Longer version

My second example is a bookmarklet that reloads the current page through a specific EZproxy system.

If you think that this bookmarklet reinvents the wheel since the LibX toolbar already does this, you are correct. And also if you are a librarian working with e-resources, you already know to add the EZproxy suffix at the end of the domain name of the url when a patron asks if a certain article on a web page is available through the library or not. But I found that no matter how many times I explain this trick of adding the EZproxy suffix to patrons, the trick doesn’t seem to stick in their busy minds. Also, many doctors and medical students, who are the primary patrons of my library, work at the computers in hospitals and they do not have the necessary privilege to install a toolbar on those computers. But they can create a bookmark.

Similarly, many students asked me why there is no LibX toolbar for their mobile devices unlike in their school laptops. (In the medical school where I work, all students are required to purchase a school-designated laptop; this laptop is pre-configured with all the necessary programs including the library LibX toolbar.) Well, mobile devices are not exactly computers and so the browser toolbar doesn’t work. But students want an alternative and they can create a bookmark on their tablets and smartphones. So the proxy bookmarklet is still a worthwhile tool for the mobile device users.

This is where the bookmarklet is: http://htmlpreview.github.com/?https://github.com/bohyunkim/examples/blob/master/bkmklt.html. To test, drag the link on the top that says Full-Text from FIU Library to your bookmark toolbar. Then go to http://jama.jamanetwork.com/Issue.aspx?journalid=67&issueID=4452&direction=P. Click the new bookmarklet you got on your toolbar. The page will reload and you will be asked to log in through the Florida International University EZproxy system. When you are authenticated, you will be seeing the page proxied: http://jama.jamanetwork.com.ezproxy.fiu.edu/Issue.aspx?journalid=67&issueID=4452&direction=P.

You will be surprised to see how simple the bookmarklet is (and there is even a shorter version than this which I will show in the next section). It is a JavaScript function wrapped inside a hyperlink. Lines 2-5 each takes the domain name, the path name, and any search string after the url path from the current window location object. So in the case of http://jama.jamanetwork.com.ezproxy.fiu.edu/Issue.aspx?journalid=67&issueID=4452&direction=P, location.host is http://jama.jamanetwork.com and location.pathname is Issue.aspx . The rest of the url ?journalid=67&issueID=4452&direction=P – is location.search. In line 4, I am putting my institution’s ezproxy suffix between these two, and in line 5, I am asking the browser load this new link.

<a href="javascript:(function(){
	var host=location.host;
	var path=location.pathname;
	var srch=location.search;
	var newlink='http://'+host+'.ezproxy.fiu.edu'+path+srch;
	window.open(newlink);
})();">Full-Text from FIU Library</a>

Now let’s take a look at the whole form. I created this form for those who want to create a ready-made bookmarklet recipe. All they need is their institution’s EZproxy suffix and whatever name they want to give to the bookmarklet. Once one fills out those two fields and click ‘Customize’ button, one will get the full HTML page code with the bookmarklet as a link in it.

<html>
<head>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.0/jquery.min.js"></script>
</head>
<body>
<h1>EZproxy Bookmarklet</h1>	
<p><ul><li>
	<a href="javascript:(function(){
		var host=location.host;
		var path=location.pathname;
                var srch=location.search;
                var newlink='http://'+host+'.ezproxy.fiu.edu'+path+srch;
		window.open(newlink);
	})();">Full-Text from FIU Library</a> 
	(Drag the link to the bookmark toolbar!)
</li></ul></p>
<p>This is a bookmarket that will reroute a current page through FIU EZproxy.
	<br/>
If you have the LibX toolbark installed, you do not need this bookmarklet. Simply select 'Reload Page via XYZ EZproxy' on the menu that appears when you right click.
	<br/>
Created by Bohyun Kim on March, 2013. 	
</p>
<h2>How to Test</h2>
<ul>
	<li>Drag the link above to the bookmark toolbar of your web browser.</li>
	<li>Click the bookmark when you are on a webpage that has an academic article.</li>
	<li>It will ask you to log in through the FIU EZproxy.</li>
	<li>Once you are authenticated and the library has a subscription to the journal, you will will able to get the full-text article on the page.</li>
	<li>Look at the url and see if it contains .ezproxy.fiu.edu. If it does, the bookmarklet is working.</li>
</ul>	

<h2>Make One for Your Library</h2>
	<p>
		Bookmark title: <input type="text" id="title1" size="40" placeholder="e.g. Full-Text ABC Library"/>
		<em>e.g. Full-text ABC Library</em>
		<br/>
		Library EZproxy Suffix: <input type="text" id="link1" size="31" placeholder="e.g. ezproxy.abc.edu"/>
		<em>e.g. ezproxy.abc.edu
		<br/>
		<button id="b1">Customize</button></em>
	</p>
	<p><strong>Copy the following into a text editor and save it as an html file.</strong></p>
	<ul>
		<li>Open the file in a web browser and drag your link to the bookmark toolbar.</li>
	</ul>	
	<p id="result1" style="color:#F7704F;">**Customized code will appear here.**</p>
	<p><strong>If you want to make any changes to the code:</strong></p>
	<textarea rows="10" cols="60"></textarea>

	<script type="text/javascript">
	$(document).ready(function(){	
	  $("#b1").click(function(){
		var pre="&lt;html&gt; &lt;head&gt; &lt;script type=&quot;text/javascript&quot; src=&quot;http://ajax.googleapis.com/ajax/libs/jquery/1.8.0/jquery.min.js&quot;&gt;&lt;/script&gt; &lt;/head&gt; &lt;body&gt; &lt;a href=&quot;javascript:(function(){ var hst=location.host; var path=location.pathname; var srch=location.search; var newlink='http://'+hst";
		var link1=$('#link1').val();
		var post="'+path+srch; window.open(newlink); })();">";
	  	var title=$("#title1").val();
	  	var end="&lt;/a&gt; &lt;/body&gt; &lt;/html&gt;";
	  	var final=$('<div />').html(pre+"+'."+link1+post+title+end).text()
	  	$('#result1').text(final);
	    $('textarea').val(final);
	  });
	}); // doc ready
	</script>

</body>
</html>

That ‘customize’ part is done here with several lines of JQuery. You will notice that the process is quite similar to what I did in my first example. In lines 4-8, I am just stitching together a bunch of text strings to spit out the whole code in the text area eventually when the ‘Customize’ button is clicked. All special characters used in HTML tags such as ‘<’ and ‘>’ have been changed to html enities. In line 9, I am taking the entire string saved in the variable end –I hope you name your variables a little more carefully than I do!–  and adding it to an empty div so that the string would be set as the inner HTML property of that div. And then I retrieve it using the .text() method. The result is the HTML code with the html entities decoded.

<script type="text/javascript">
	$(document).ready(function(){	
	  $("#b1").click(function(){
		var pre="&lt;html&gt; &lt;head&gt; &lt;script type=&quot;text/javascript&quot; src=&quot;http://ajax.googleapis.com/ajax/libs/jquery/1.8.0/jquery.min.js&quot;&gt;&lt;/script&gt; &lt;/head&gt; &lt;body&gt; &lt;a href=&quot;javascript:(function(){ var hst=location.host; alert(hst); var path=location.pathname; var srch=location.search; var newlink='http://'+hst";
		var link1=$('#link1').val();
		var post="'+path+srch; window.open(newlink); })();">";
	  	var title=$("#title1").val();
	  	var end="&lt;/a&gt; &lt;/body&gt; &lt;/html&gt;";
	  	var final=$('<div />').html(pre+"+'."+link1+post+title+end).text()
	  	$('#result1').text(final);
	    $('textarea').val(final);
	  });
	}); // doc ready
</script>

Not too bad, right? I hope these examples show how just a few or several lines of code can be used to solve your practical problems at work. Coding is there for you to automate time-consuming and/or repetitive tasks.

3. EZproxy bookmarklet generator – Shorter version

There is a simpler way to create a EZproxy bookmarklet than the one sketched above. If you simply add EZproxy prefix in front of the entire url of the page where a user is is, you achieve the same result. In this case, you do not have to break the url with host, pathname, search string, etc.

<a href="javascript:void(location.href=%22http://ezproxy.fiu.edu/login?url=%22+location.href)">Full-Text from FIU Library</a>

Here are the code for this much simpler EZproxy bookmarklet and the bookmarklet generator. If you know the prefix of you library’s EZproxy prefix, you can make one for your library.

So there are many ways to get the same thing done. Some are more elegant and some are less so. Usually a shorter one is a more elegant solution. The lesson is that you usually get to the most elegant solution after coming up with many less elegant solutions first.

Do you have a problem that can be solved by creating several lines of code? Have you ever solved a practical problem using a bit of code? Share your experience in the comments!

 


Local Dev Environments For Newbies Part 1: AMP on Mac OSX

There are many cases where having a local development environment is helpful and it is a relatively straightforward thing to do, even if you are new to development.  However, the blessing and the curse is that there are many, many tutorials out there attempting to show you how.  This series of posts will aim to walk through some basic steps with detail, as well as pass on some tips and tricks for setting up your own local dev box.

First, what do I mean by a local development environment?  This is a setup on your computer which allows you to code and tweak and test in a safe environment.  It’s a great way to hammer on a new application with relatively low stakes.  I am currently installing dev environments for two purposes: to test some data model changes I want to make on an existing Drupal site and to learn a new language so I can contribute to an application.  For the purposes of this series, we’re going to focus on the AMP stack – Apache, MySQL and PHP – and how to install and configure those systems for use in web application development.

Apache is the web server which will serve the pages of your website or application to a browser.  You may hear Apache in conjunction with lots of other things – Apache Tomcat, Apache Solr – but generally when someone references just Apache, it’s the web server.  The full name of the project is the Apache HTTP Server Project.

PHP is a scripting language widely used in web development.  MySQL is a database application also frequently used in web development.  ”Stack” refers to the combination of several components needed to run a web application.  The AMP stack is the base for many web applications and content management systems, including Drupal and WordPress.

You may have also seen the AMP acronym preceded by an L, M or W.  This merely stands for the operating system of choice – Linux, Mac or Windows.  This can also refer to installer packages that purport to do the whole installation for you, like WAMP or MAMP.  Employing the installer packages can be useful, depending on your situation and operating system.  The XAMPP stack, distributed by Apache Friends, is another example of an installer package designed to set up the whole stack for you.  For this tutorial though, we’ll step through each element of the stack, instead of using a stack installer.

So, why do it yourself if there are installers?  To me, it takes out the mystery of how all the pieces play together and is a good way to learn about what’s going on behind the scenes.  When working on Windows, I will occasionally use a .msi installer for an individual component to make sure I don’t miss something.  But installing and configuring each component individually is actually helpful.

Tips

Before we begin, let’s look at some tips:

  • You will need administrative rights to the computer on which you’re installing.
  • Don’t be afraid of the command line.  There are lots of tutorials around the web on how to use the basic commands – for both Mac (based on UNIX) and Windows.  But, you don’t need to be an expert to set up a dev environment.  Most tutorials give the exact commands you need.
  • Try, if possible, to block off a chunk of time to do this.  Going through all the steps may take awhile, from an hour to an afternoon, especially if you hit a snag.  Several times during my own process, I had to step away from it because of a crisis or because it was the end of the day.  When I was able to come back later, I had some trouble remembering where I left off or the configuration options I had chosen.  If you do have to walk away, write down the last thing you did.
  • When you’re looking for a tutorial, Google away.  Search for the elements of your stack plus your OS, like “Apache MySQL PHP Mac OSX”.  You’ll find lots, and probably end up referencing more than one.  Use your librarian skills: is the tutorial recent?  Does it appear to be from a reputable source?  If it’s a blog, are there comments on the accuracy of the tutorial?  Does it agree with the others you’ve seen?
  • Once you’ve selected one or two to follow, read through the whole tutorial one time without doing anything.  Full disclosure: I never do this and it always bites me.

Let’s get going with Recipe 1 – Install the AMP Stack on Mac OS X

Install the XCode Developer Tools

First, we install the developer tools for XCode.  If you have Mac 10.7 and above, you can download the XCode application from the App Store.  To enable the developer tools, open XCode, go to the XCode menu > Preferences > Downloads tab, and then click on “Install” next to the developer tools.  This tutorial on installing Ruby by Moncef Belyamani has good screenshots of the XCode process.

If you have Snow Leopard (10.6) or below, you’ll need to track down the tools on the Apple Developer Downloads Page.  You will need to register as a developer, but it’s free.  Note:  you can get pretty far in this process without using the XCode command line tools, but down the road as you build more complicated stacks, you’ll want to have them.

Configure Apache and PHP

Next we need to configure Apache and PHP.  Note that I said “configure”, not “install”.  Apache and PHP both come with OS X, we just need to configure them to work together.

Here’s where we open the Terminal to access the command line by going to Applications > Utilities > Terminal.

Open Terminal

Once Terminal is open, a prompt appears where you can type in commands.  The ” ~ ” character indicates that you are at the “home” directory for your user.  This is where you’ll do a lot of your work.  The “$” character delineates the end of the prompt and the beginning of your command.

terminalprompt

Type in the following command:

cd /etc/apache2

“cd” stands for “change directory”.  This is the equivalent of double-clicking on etc, then apache2, if you were in the Finder (but etc is a hidden folder in the Finder).  From here, we want to open the necessary file in an editor.  Enter the following command:

sudo nano httpd.conf

“sudo” elevates your permission to administrator, so that you can edit the config file for Apache, which is httpd.conf.  You will need to type in your administrator password.  The “nano” command opens a text editor in the Terminal window.  (If you’re familiar with vi or emacs, you can use those instead.)

nano

The bottom of your window will show the available commands.  The “^” stands for the Control key.  So, we want to search for the part to change, we press Control + W.  Enter php and press Enter.  We are looking for this line:

#LoadModule php5_module        libexec/apache2/libphp5.so

The “#” at the beginning of this line is a comment, so Apache ignores the line.  We want Apache to see the line, and load the php module.  So, change the text by removing the #:

LoadModule php5_module        libexec/apache2/libphp5.so

Save the file by press Control + O (nano calls this “WriteOut”) and press Enter next to the file name.  The number of lines written displays at the bottom of the window.  Press Control + X to exit nano.

Next, we need to start the Apache server.  Type in the following command:

sudo apachectl start

Now, go to your browser and type in http://localhost.  You should see “It Works!”Apache Browser Test

Apache, as mentioned before, serves web files from a location we designate.  By default, this is /Library/Webserver/Documents.  If you have Snow Leopard (10.6) or below, Apache also automatically looks to username/sites, which is a convenient place to store and work with files.  If you have OS 10.7 or above, creating the Sites folder takes a few steps.  On 10.7, go to System Preferences > Sharing and click on Web Sharing.  If there’s a button that says “Create Personal Web folder”, it has not been created, go ahead and click that button.  If it says, “Open Personal Website folder”, you’re good to go.

On 10.8, the process is a little more involved.  First, go to the Finder, click on your user name and create your sites folder.

sites

Next, we need to open the command line again and create a .conf file for that directory, so that Apache knows where to find it.  Type in these commands:

cd /etc/apache2/users
ls

The ls at the end will list the directory contents.  If you see a file that’s yourusername.conf (ie, mfrazer.conf) in this directory, you’re good to go.  If you don’t, it’s easy to create one.  Type the following command:

sudo nano yourusername.conf

So, mine would be sudo nano mfrazer.conf.  This will create the file and take you into a text editor.  Copy and past the following, making sure to change YOURUSERNAME to your user name.

<Directory "/Users/YOURUSERNAME/Sites/">
  Options Indexes MultiViews
  AllowOverride None
  Deny from all
  Allow from localhost
</Directory>

The first directive, Options, can have lots of different…well, options.  The ones we have here are Indexes and MultiViews.  Indexes means that if a browser requests a directory and there’s no index.html or index.php file, it will serve a directory listing.  Multi-Views means that browsers can request the content in a different format if it exists in the directory (ie, in a different language).  AllowOverride determines if an .htaccess file elsewhere can to override the configuration settings.  For now, None will indicate that no part can be overridden.  For Drupal or other content management systems, it’s possible we’ll want to change these directives, but we’ll cover that later.

The last two lines indicate that traffic can only reach this directory from the local machine, by typing http://localhost/~username in the browser.  For more on Apache security, see the Apache documentation.  If you would like to set it so that other computers on your network can also access this directory, change those last two lines to:

Order allow,deny
Allow from all

Either way, press Control + O to save the file and Control + X to exit.  Restart Apache for the changes to take effect using this command:

sudo apachectl restart

You may also be prompted at some point by OS X to accept incoming network connections for httpd (Apache); I would deny these as I only want access to my directory from my machine, but it’s up to you depending on your setup.

We’ll test this setup with php in the next step.

Test PHP

If you want to check php, you can create a new text document using your favorite text editor.  Type in:

<?php phpinfo(); ?>

Save the file as phpinfo.php in your username/sites directory (so for me, this is mfrazer > Sites)

Then, point your browser to http://localhost/~yourUserName/phpinfo.php  You should see a page of information regarding PHP and the web server, with a header that looks like this:

PHP Info Header

 

 

MySQL

Now, let’s install MySQL.  There’s two ways to do this.  We could go to the MySQL downloads page and use the installers.  The fantastic tutorials at Coolest Guy on the Planet both recommend this, and it’s a fine way to go.

But we can also use Homebrew, mentioned previously on this blog, which is a really convenient way to do things as long as we’re already using the command line.

First, we need to install homebrew.  Enter this at the command prompt:

ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"

Next, type in

brew doctor

If you receive the message: “Your system is raring to brew.” You’re ready to go.  If you get Warnings, don’t lose heart.  Most of them tell you exactly what you need to do to move forward.  Correct the errors and type in brew doctor again until you’re raring to go.  Then, type in the following command:

brew install mysql

That one’s pretty self-explanatory, no?  Homebrew will download and install MySQL, as of this writing version 5.6.10, but pay attention to the download to see the version – it’s in the URL.  After the installation succeeds, Homebrew will give some instructions on finishing the setup, including the commands we discuss below.

I’m going to pause for a second here and talk a little about permissions and directories.  If you get a “permission denied” error, trying running the command again using “sudo” at the beginning.  Remember, this elevates your permission to the administrator level.  Also, if you get a “directory does not exist” error, you can easily create the directory using “mkdir”.  Before we move on, let’s try to check for a directory you’re going to need coming up.  Enter:

cd /usr/local/var

If you are successfully able to change to that directory, great. If not, type in

sudo mkdir /usr/local/var

to create it. Then, let’s go back to our home directory by typing in

cd ~

Now, let’s continue with our procedure. First, we want to set up the databases to run with our user account.  So, we type in the following two commands:

unset TMPDIR
mysql_install_db --verbose --user=`whoami` --basedir="$(brew --prefix mysql)" --datadir=/usr/local/var/mysql --tmpdir=/tmp

The second command here installs the system databases; ‘whoami’ will automatically replace with your user name, so the above command should work verbatim.  But it also works to use your user name, with no quotes, (ie –user=mfrazer).

Next, we want to run the “secure installation” script. This helps you set root passwords without leaving the password in plain text in your editor. First we start the mysql server, then we run the installation scripts and follow the prompts to set your root password, etc:

mysql.server start
sudo /usr/local/Cellar/mysql/5.6.10/bin/mysql_secure_installation

After the script is complete, stop the mysql server.

mysql.server stop

Next, we want to set up MySQL so it starts at login. For that, we run the following two commands:

ln -sfv /usr/local/opt/mysql/*.plist ~/Library/LaunchAgents
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mysql.plist

The ln command, in this case, places a symbolic link to any .plist files in the mysql directory into the LaunchAgents directory.  Then, we load the plist using launchctl to start the server.

One last thing – we need to create one more link to the mysql.sock file.

cd /var/mysql/
sudo ln -s /tmp/mysql.sock

This creates a link to the mysql.sock file, which MySQL uses to communicate, but which resides by default in a tmp directory.  The first command places us in the directory where we want the link (remember, if it doesn’t exist, you can use “sudo mkdir /var/mysql/” to create it) and the second creates the link.

MySQL is ready to go!  And, so is your AMP stack.

But wait, there’s more…

One optional tool to install is phpMyAdmin.  This tool allows you to interact with your database through your browser so you don’t have to continue to use the command line.  I also think it’s a good way to test if everything is working correctly.

First, let’s download the necessary files from the phpMyAdmin website.  These will have a .tar.gz extension.  Place the file in your Sites directory, and double-click to unzip the file.

Rename the folder to remove the version number and everything after it.  I’m going to place the next steps below, but the Coolest Guy on the Planet tutorial referenced earlier does a good job of this step for OS 10.8 (just scroll down to phpMyAdmin) if you need screenshots.

Go to the command line and navigate to your phpMyAdmin directory.  Make a directory called config and change the permissions so that the installer can access the file.  This should looks something like:

cd ~/username/sites/phpMyAdmin
mkdir config
chmod o+w config

Let’s take a look at that last command: chmod changes the permissions on a file.  The o+w sets it so users who are not the directory’s owner can write to the file.

Now, in your browser, go to http://localhost/~username/sites/phpmyadmin/setup and follow these steps:

  1. Click on New Server (button on bottom)
  2. Click on Authentication tab, and enter the root password in the password field.
  3. Click on Save.
  4. Click on Save again on the main page.

Once the setup is finished, go to the Finder and move the config.inc.php file from the config directory into the main phpmyadmin directory and delete the config directory.  So in the end, it looks like this:

phpmyadminlast

Now, go to http://localhost/~username/sites/phpmyadmin in your browser and login with the root account.

You are ready to go!  In future parts of this series, we’ll look at building the AMP stack on Windows and adding Drupal or WordPress on top of the stack.  We will also look at maintaining your environment, as the AMP stack components will need updating occasionally.  Any other recipes you’d like to see?  Do you have questions? Let us know in the comments.

The following tutorials and pages were incredibly useful in writing this post.  While none of these tutorials are exactly the same as what we did here, they all contain useful pieces and may be helpful if you want to skip the explanation and just get to the commands:


How to Git

We have written about version control before at Tech Connect, most notably John Fink’s excellent overview of modern version control. But getting started with VC (I have to abbreviate it because the phrase comes up entirely too much in this post) is intimidating. If you are generally afraid of anything that reminds you of the DOS Prompt, you’re not alone and you’re also totally capable of learning Git.

DOS prompt madness

By the end of this post, we will still not understand what’s going on here.

But why should you learn git?

Because Version Control Isn’t Just for Nerds

OK, never mind, it is, it totally is. But VC is for all kinds of nerds, not just l33t programmers lurking in windowless offices.

Are you into digital preservation and/or personal archiving? Then VC is your wildest dream. It records your changes in meaningful chunks, documenting not just the final product but all the steps it took you to get there. VC repositories show who did what, too. If you care about nerdy things like provenance, then you care about VC. If co-authors would always use VC for their writing, we’d know all the answers to the truly pressing questions, like whether Gilles Deleuze or Félix Guattari wrote the passage “A concept is a brick. It can be used to build a courthouse of reason. Or it can be thrown through the window.”

Are you a web developer? Then knowing Git can get you on GitHub, and GitHub is an immense warehouse of awesomeness. Sure, you can always just download .zip files of other people’s projects, but GitHub also provides more valuable opportunities: you can showcase your awesome tools, your brilliant tweaks to other people’s projects, and you can give back to the community at whatever level you’re comfortable with, from filing bug reports to submitting actual code fixes.

Are you an instruction librarian? Have you ever shared lesson plans, or edited other people’s lesson plans, or inherited poorly documented lesson plans? Basically, have you been an instruction librarian in the past century? Well, I have good news for you: Git can track any text file, so your lessons can easily be versioned and collaborated upon just like software programs are. Did you forget that fun intro activity you used two years ago? Look through your repository’s previous commits to find it. Want to maintain several similar but slightly different lesson plans for different professors teaching the same class? You’ve just described branching, something that Git happens to be great at. The folks over at ProfHacker have written a series of articles on using Git and GitHub for collaborative writing and syllabus design.

Are you a cataloger? Versioning bibliographic records makes a lot of sense. A presentation at last year’s Code4Lib conference talked not only about versioning metadata but data in general, concluding that the approach had both strengths and weaknesses. It’s been proposed that putting bibliographic records under VC solves some of the issues with multiple libraries creating and reusing them.

As an added bonus, having a record’s history can enable interesting analyses of how metadata changes over time. There are powerful tools that take a Git repository’s history and create animated visualizations; to see this in action, take a look at the visualization of Penn State’s ScholarSphere application. Files are represented as nodes in a network map while small orbs which represent individual developers fly around shooting lasers at them. If we want to be a small orb that shoots lasers at nodes, and we definitely do, we need to learn Git.

Alright, so now we know Git is great, but how do we learn it?

It’s As Easy As git rebase -i 97c9d7d

Actually, it’s a lot easier. The author doesn’t even know what git rebase does, and yet here he is lecturing to you about Git.

First off, we need to install Git like any other piece of software. Head over to the official Git website’s downloads page and grab the version for your operating system. The process is pretty straight-forward but if you get stuck, there’s also a nice “Getting Started – Installing Git” chapter of the excellent Pro Git book which is hosted on the official site.

Alright, now that you’ve got Git installed it’s time to start VCing the heck out of some text files. It’s worth noting that there are software packages that put a graphical interface on top of Git, such as Tower and GitHub’s apps for Windows and Mac. There’s a very comprehensive list of graphical Git software on the official Git website. But the most cross-platform and surefire way to understand Git and be able to access all of its features is with the command line so that’s what we’ll be using.

So enough rambling, let’s pop open a terminal (Mac and Linux both have apps simply called “Terminal” and Windows users can try the Git Bash terminal that comes with the Git installer) and make it happen.

$ git clone https://github.com/LibraryCodeYearIG/Codeyear-IG-Github-Project.git
Cloning into 'Codeyear-IG-Github-Project'...
remote: Counting objects: 115, done.
remote: Compressing objects: 100% (73/73), done.
remote: Total 115 (delta 49), reused 108 (delta 42)
Receiving objects: 100% (115/115), 34.38 KiB, done.
Resolving deltas: 100% (49/49), done.
$ cd Codeyear-IG-Github-Project/

 

The $ above is meant to indicate our command prompt, so anything beginning with a $ is something we’re typing. Here we “cloned” a project from a Git repository existing on the web (line 1), which caused Git to give us a little information in return. All Git commands begin with git and most provide useful info about their usage or results. In line 2, we’ve moved inside the project’s folder with a “change directory” command.

We now have a Git repository on our computer, if you peek inside the folder you’ll see some text (specifically Markdown) files and an image or two. But what’s more: we have the project’s entire history too, pretty much every state that any file has been in since the beginning of time.

OK, since the beginning of the project, but still, is that not awesome? Oh, you’re not convinced? Let’s look at the project’s history.

$ git log
commit b006c1afb9acf78b90452b284a111aed4daee4ca
Author: Eric Phetteplace <phette23@gmail.com>
Date:   Fri Mar 1 15:27:47 2013 -0500

    a couple more links, write Getting Setup section

commit 83d92e4a1be0fdca571012cb39f84d86b21121c6
Author: Eric Phetteplace <phette23@gmail.com>
Date:   Fri Feb 22 01:04:24 2013 -0500

    link up the YouTube video

 

We can hit Q to exit the log. In the log, we see the author, date, and a brief description of each change. The terrifying random gibberish which follows the word “commit” is a hash, which is computer science speak for terrifying random gibberish. Think of it as a unique ID for each change in the project’s history.

OK, so we can see previous changes (“commits” in VC-speak, which is like Newspeak but less user friendly), we can even revert back to previous states, but we won’t do that for now. Instead, let’s add a new change to the project’s history. First, we open up the “List of People.mdown” file in the Getting Started folder and add our name to the list. Now the magic sauce.

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   Getting Started/List of People.mdown
#
no changes added to commit (use "git add" and/or "git commit -a")
$ git add "Getting Started/List of People.mdown"
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   Getting Started/List of People.mdown
#
$ git commit -m "adding my name"
$ git status
# On branch master
nothing to commit, working directory clean
$ git log
commit wTf1984doES8th1s3v3Nm34NWtf2666bAaAaAaAa
Author: Awesome Sauce <awesome@sau.ce>
Date:   Wed Mar 13 12:30:35 2013 -0500

    adding my name

commit b006c1afb9acf78b90452b284a111aed4daee4ca
Author: Eric Phetteplace <phette23@gmail.com>
Date:   Fri Mar 1 15:27:47 2013 -0500

    a couple more links, write Getting Setup section

 

Our change is in the project’s history! Isn’t it better than seeing your name on Hollywood Walk of Fame? Here’s precisely what we did:

First we asked for the status of the repository, which is an easy way of seeing what changes you’re working on and how far along they are to being added to the history. We’ll run status throughout this procedure to watch how it changes. Then we added our changes; this tells Git “hey, these are a deliberate set of changes and we’re ready to put them in the project’s history.” It may seem like an unnecessary step but adding select sets of files can help you segment your changes into meaningful, isolated chunks that make sense when viewing the log later. Finally, we commit our change and add a short description inside quotes. This finalizes the change, which we can see in the log command’s results.

I’m Lonely, So Lonely

Playing around with Git on our local computer can be fun, but it sure gets lonely. Yeah, we can roll back to previous versions or use branches to keep similar but separate versions of our files, but really we’re missing the best part of VC: collaboration. VC as a class of software was specifically designed to help multiple programmers work on the same project. The power and brilliance of Git shines best when we can selectively “merge” changes from multiple people into one master project.

Fortunately, we will cover this in a future post. For now, we can visit the LITA/ALCTS Library Code Year‘s GitHub Project—it’s the very same Git project we cloned earlier, so we already have a copy on our computer!—to learn more about collaboration and GitHub. GitHub is a website where people can share and cooperate on Git repositories. It’s been described as “the Facebook of code” because of its popularity and slick user interface. If that doesn’t convince you that GitHub is worth checking out, the site also has a sweet mascot that’s a cross between an octopus and a cat (an octocat). And that’s really all you need to know.

Gangam Octocat

This is an Octocat. It is Awesome.


The Mobile App Design Process: A Tube Map Infographic

Last June I had a great experience team-teaching a week-long seminar on designing mobile apps at the Digital Humanities Summer Institute (DHSI). Along with my colleagues from WSU Vancouver’s Creative Media and Digital Culture (CMDC) program, I’ll be returning this June to the beautiful University of Victoria in British Columbia to teach the course again1. As part of the course, I created a visual overview of the process we use for app making. I hope you’ll find it a useful perspective on the work involved in crafting mobile apps and an aid to the process of creating your own.

topological map of the mobile app design process

A visual guide to the process of designing and building mobile apps. Start with Requirements Analysis in the upper-left and follow the tracks to Public Release. (Click for full-sized image.)

Creating the Tube Map:

I’m fond of the tube-map infographic style, also know as the topological map2, because of its ability to highlight relationships between systems and especially because of how it distinguishes between linear (do once) and recursive (do over and over) processes. The linear nature of text in a book or images in slide-deck presentations can artificially impose a linearity that does not mirror the creative process we want to impart. In this example, the design and prototyping loops on the tube-map help communicate that a prototype model is an aid to modeling the design process and not a separate step completed only when the design has been finalized.

These maps are also fun and help spur the creative process. There are other tools for process mapping such as using flowcharts or mind-maps, but in this case I found the topological map has a couple of advantages. First and foremost, I associate the other two with our strategic planning process, so the tube map immediately seems more open, fun, and creative. This is, of course, rooted in my own experience and your experiences will vary but if you are looking for a new perspective on process mapping or a new way to display interconnected systems that is vibrant, fun, and shakes things up a bit the tube map may be just the thing.

I created the map using the open source vector-graphics program Inkscape[3. http://inkscape.org/] which can be compared to Adobe Illustrator and Corel Draw. Inkscape is free (both gratis and libre) and is powerful, but there is a bit of a learning curve. Being unfamiliar with vector graphics or the software tools to create them, I worked with an excellent tutorial provided by Wikipedia on creating vector graphic topological maps3. It took me a few days of struggling and slowly becoming familiar with the toolset before I felt comfortable creating with Inkscape. I count this as time well spent, as many graphics used in mobile app and icon sets required by app stores can be made with vector graphic editors. The Inkscape skills I picked up while making the map have come in very handy on multiple occasions since then.

Reading the Mobile App Map:

Our process through the map begins with a requirements analysis or needs assessment. We ask: what does the client want the app to do? What do we know about our end users? How do the affordances of the device affect this? Performing case studies helps us learn about our users before we start designing to meet their needs. In the design stage we want people to make intentional choices about the conceptual and aesthetic aspects of  their app design. Prototype models like wireframe mock-ups, storyboards, or Keynotopia4 prototypes help us visualize these choices, eventually resulting in a working prototype of our app. Stakeholders can test and request modifications to the prototype, avoiding potentially expensive and labor intensive code revisions later in the process.

Once both the designers and clients are satisfied with the prototype and we’ve seen how potential users interact with it, we’re ready to commit our vision to code. Our favored code platform uses HTML 5, CSS 3, jQuery Mobile5, and PhoneGap6 to make hybrid web apps. Hybrid apps are written as web apps–HTML/JavaScript web sites that look and performlike apps–then use a tool like PhoneGap to translate this code into the native format for a device. PhoneGap translates a web app into a format that works with the device’s native programming environment. This provides more direct and thus faster access to device hardware and also enables us to place our app in official app stores. Hybrid apps are not the only available choice and aren’t perfect for every use case. They can be slower than native apps and may have some issues accessing device hardware, but the familiar coding language, multi-device compatibility, and ease of making updates across multiple platforms make them an ideal first step for mobile app design. LITA has an upcoming webinar on creating web apps that employs this system7.

Once the prototype has been coded into a hybrid app, we have another opportunity for evaluation and usability testing. We teach a pervasive approach that includes evaluation and testing all throughout the process, but this stage is very important as it is a last chance to make changes before sending the code to an app marketplace. After the app has been submitted, opportunities to make updates, fix bugs, and add features can be limited, sometimes significantly, by the app store’s administrative processes.

After you have spent some time following the lines of the tube map and reading this very brief description, I hope you can see this infographic as an aid to designing mobile web apps. I find it particularly helpful for identifying the source of a particular problem I’m having and also suggesting tools and techniques that can help resolve it. As a personal example, I am often tempted to start writing code before I’ve completely made up my mind what I want the code to do, which leads to frustration. I use the map to remind me to look at my wireframe and use that to guide the structure of my code. I hope you all find it useful as well.


Reflections on Code4Lib 2013

Disclaimer: I was on the planning committee for Code4Lib 2013, but this is my own opinion and does not reflect other organizers of the conference.

We have mentioned Code4Lib before on this blog, but for those who are unfamiliar, it is a loose collective of programmers working in libraries, librarians, and others interested in code and libraries. (You can read more about it on the website.) The Code4Lib conference has emerged as a venue to share very new technology and have discussions with a wide variety of people who might not attend conferences more geared to librarians. Presentations at the conference are decided by the votes of anyone interested in selecting the program, and additionally lightning talks and breakout sessions allow wide participation and exposure to extremely new projects that have not made it into the literature or to conferences with a longer lead time. The Code4Lib 2013 conference ran February 11-14 at University of Illinois Chicago. You can see a list of all programs here, which includes links to the video archive of the conference.

While there were many types of projects presented, I want to focus on those talks which illustrated what I saw as thread running through the conference–care and emotion. This is perhaps unexpected for a technical conference. Yet those themes underlie a great deal of the work that takes place in academic library technology and the types of projects presented at Code4Lib. We tend to work in academic libraries because we care about the collections and the people using those collections. That intrinsic motivation focuses our work.

Caring about the best way to display collections is central to successful projects. Most (though not all) the presenters and topics came out of academic libraries, and many of the presentations dealt with creating platforms for library and archival metadata and collections. To highlight a few: Penn State University has developed their own institutional repository application called ScholarSphere that provides a better user experience for researchers and managers of the repository. The libraries and archives of the Rock and Roll Hall of Fame dealt with the increasingly common problem of wanting to present digital content alongside more traditional finding aids, and so developed a system for doing so. Corey Harper from New York University presented an extremely interesting and still experimental project to use linked data to enrich interfaces for interacting with library collections. Note that all these projects combined various pieces of open source software and library/web standards to create solutions that solve a problem facing academic or research libraries for a particular setting. I think an important lesson for most academic librarians looking at descriptions of projects like this is that it takes more than development staff to make projects like this. It takes purpose, vision, and dedication to collecting and preserving content–in other words, emotion and care. A great example of this was the presentation about DIYHistory from the University of Iowa. This project started out initially as an extremely low-tech solution for crowdsourcing archival transcription, but got so popular that it required a more robust solution. They were able to adapt open source tools to meet their needs, still keeping the project very within the means of most libraries (the code is here).

Another view of emotion and care came from Mark Matienzo, who did a lightning talk (his blog post gives a longer version with more details). His talk discussed the difficulties of acknowledging and dealing with the emotional content of archives, even though emotion drives interactions with materials and collections. The records provided are emotionless and affectless, despite the fact that they represent important moments in history and lives. The type of sharing of what someone “likes” on Facebook does not satisfactorily answer the question of what they care about,or represent the emotion in their lives. Mark suggested that a tool like Twine, which allows writing interactive stories could approach the difficult question of bringing together the real with the emotional narrative that makes up experience.

One of the ways we express care for our work and for our colleagues is by taking time to be organized and consistent in code. Naomi Dushay of Stanford University Library presented best practices for code handoffs, which described some excellent practices for documenting and clarifying code and processes. One of the major takeaways is that being clear, concise, and straightforward is always preferable, even as much as we want to create cute names for our servers and classes. To preserve a spirit of fun, you can use the cute name and attach a description of what the item actually does.

Originally Bess Sadler, also from Stanford, was going to present with Naomi, but ended up presenting a different talk and the last one of the conference on Creating a Commons (the full text is available here). This was a very moving look at what motivates her to create open source software and how to create better open source software projects. She used the framework of the Creative Commons licenses to discuss open source software–that it needs to be “[m]achine readable, human readable, and lawyer readable.” Machine readable means that code needs to be properly structured and allow for contributions from multiple people without breaking, lawyer readable means that the project should have the correct structure and licensing to collaborate across institutions. Bess focused particularly on the “human readable” aspect of creating communities and understanding the “hacker epistemology,” as she so eloquently put it, “[t]he truth is what works.” Part of understanding that requires being willing to reshape default expectations–for instance, the Code4Lib community developed a Code of Conduct at Bess’s urging to underline the fact that the community aims at inclusion and creating a safe space. She encouraged everyone to keep working to do better and “file bug reports” about open source communities.

This year’s Code4Lib conference was a reminder to me about why I do the work I do as an academic librarian working in a technical role. Even though I may spend a lot of time sitting in front of a computer looking at code, or workflows, or processes, I know it makes access to the collections and exploration of those collections better.


Event Tracking with Google Analytics

In a previous post by Kelly Sattler and Joel Richard, we explored using web analytics to measure a website’s success. That post provides a clear high-level picture of how to create an analytics strategy by evaluating our users, web content, and goals. This post will explore a single topic in-depth; how to set up event tracking in Google Analytics.

Why Do We Need Event Tracking?

Finding solid figures to demonstrate a library’s value and make strategic decisions is a topic of increasing importance. It can be tough to stitch together the right information from a hodgepodge of third-party services; we rely on our ILSs to report circulation totals, our databases to report usage like full-text downloads, and our web analytics software to show visitor totals. But are pageviews and bounce rates the only meaningful measure of website success? Luckily, Google Analytics provides a way to track arbitrary events which occur on web pages. Event tracking lets us define what is important. Do we want to monitor how many people hover over a carousel of book covers, but only in the first second after the page has loaded? How about how many people first hover over the carousel, then the search box, but end up clicking a link in the footer? As long as we can imagine it and JavaScript has an event for it, we can track it.

How It Works

Many people are probably familiar with Google Analytics as a snippet of JavaScript pasted into their web pages. But Analytics also exposes some of its inner workings to manipulation. We can use the _gaq.push method to execute a “_trackEvent” method which sends information about our event back to Analytics. The basic structure of a call to _trackEvent is:

_gaq.push( [ '_trackEvent', 'the category of the event', 'the action performed', 'an optional label for the event', 'an optional integer value that quantifies something about the event' ] );

Looking at the array parameter of _gaq.push is telling: we should have an idea of what our event categories, actions, labels, and quantitative details will be before we go crazy adding tracking code to all our web pages. Once events are recorded, they cannot be deleted from Analytics. Developing a firm plan helps us to avoid the danger of switching the definition of our fields after we start collecting data.

We can be a bit creative with these fields. “Action” and “label” are just Google’s way of describing them; in reality, we can set up anything we like, using category->action->label as a triple-tiered hierarchy or as three independent variables.

Example: A List of Databases

Almost every library has a web page listing third-party databases, be they subscription or open access. This is a prime opportunity for event tracking because of the numerous external links. Default metrics can be misleading on this type of page. Bounce rate—the proportion of visitors who start on one of our pages and then immediately leave without viewing another page—is typically considered a negative metric; if a page has a high bounce rate, then visitors are not engaged with its content. But the purpose of a databases page is to get visitors to their research destinations as quickly as possible; bounce rate is a positive figure. Similarly, time spent on page is typically considered a positive sign of engagement, but on a databases page it’s more likely to indicate confusion or difficulty browsing. With event tracking, we can not only track which links were clicked but we can make it so database links don’t count towards bounce rate, giving us a more realistic picture of the page’s success.

One way of structuring “database” events is:

  • The top-level Category is “database”
  • The Action is the topical category, e.g. “Social Sciences”
  • The Label is the name of the database itself, e.g. “Academic Search Premier”

The final, quantitative piece could be the position of the database in the list or the number of seconds after page load it took the user to click its link. We could report some boolean value, such as whether the database is open access or highlighted in some way.

To implement this, we set up a JavaScript function which will be called every time one of our events occur. We will store some contextual information in variables, push that information to Google Analytics, and then delay the page’s navigation so the event has a chance to be recorded. Let’s walk through the code piece by piece:

function databaseTracking  ( event ) {
    var destination = $( this )[ 0 ].href,
        resource = $( this ).text(),
        // move up from <a> to parent element, then find the nearest preceding <h2> section header
        section = $( this ).parent().prevAll( 'h2' )[ 0 ].innerText,
        highlighted = $( this ).hasClass( 'highlighted' ) ? 1 : 0;

_gaq.push( [ '_trackEvent', 'database', resource, section, highlighted ] );

The top of our function just grabs information from the page. We’re using jQuery to make our lives easier, so all the $( this ) pieces of our code refer to the element that initiated the event. In our case, that’s the link pointing to an external database which the user just clicked. So we set destination to the link’s href attribute, resource to its text (e.g. the database’s name), section to the text inside the h2 element that labels a topical set of databases, and highlighted is a boolean value equal to 1 if the element has a class of “highlighted.” Next, this data is pushed into the _gaq array which is a queue of functions and their parameters that Analytics fires asynchronously. In this instance, we’re telling Analytics to run the _trackEvent function with the parameters that follow. Analytics will then record an event of type “database” with an action of [database name], a label of [section header], and a boolean representing whether it was highlighted or not.

setTimeout( function () {
    window.location = destination;
}, 200 );
event.preventDefault();
}

Next comes perhaps the least obvious piece: we prevent the default browser behavior from occurring, which in the case of a link is navigating away from our page, but then send the user to destination 200 milliseconds later anyways. The _trackEvent function now has a chance to fire; if we let the user follow the link right away it might not complete and our event would not be recorded.1

$( document ).ready(
    // target all anchors in list of databases
    $( '#databases-list a' ).on( 'click', databaseTracking )
);

There’s one last step; merely defining the databaseTracking function won’t cause it to execute when we want it to. JavaScript uses event handlers to execute certain functions based on various user actions, such as mousing over or clicking an element. Here, we add click event handlers to all <a> elements in the list of databases. Now whenever a user clicks a link in the databases list (which has a container with id “databases-list”), databaseTracking will run and send data to Google Analytics.

There is a demo on JSFiddle which uses the code above with some sample HTML. Every time you click a link, a pop-up shows you what the _gaq.push array looks like.

Though we used jQuery in our example, any JavaScript library can be used with event tracking.2 The procedure is always the same: write a function that gathers data to send back to Google Analytics and then add that function as a handler to an appropriate event, such as click or mouseover, on an element.

For another example, complete with code samples, see the article “Discovering Digital Library User Behavior with Google Analytics” in Code4Lib Journal. In it, Kirk Hess of the University of Illinois Urbana-Champaign details how to use event tracking to see how often external links are clicked or files are downloaded. While these events are particularly meaningful to digital libraries, most libraries offer PDFs or other documents online.

Some Ideas

The true power of Event Tracking is that it does not have to be limited to the mere clicking of hyperlinks; any interaction which JavaScript knows about can be recorded and categorized. Google’s own Event Tracking Guide uses the example of a video player, recording when control buttons like play, pause, and fast forward are activated. Here are some more obvious use cases for event tracking:

  • Track video plays on particular pages; we may already know how many views a video gets, but how many come from particular embedded instances of the video?
  • Clicking to external content, such as a vendor’s database or another library’s study materials.
  • If there is a print or “download to PDF” button on our site, we can track each time it’s clicked. Unfortunately, only Internet Explorer and Firefox (versions >= 6.0) have an onbeforeprint event in JavaScript which could be used to detect when a user hits the browser’s native print command.
  • Web applications are particularly suited to event tracking. Many modern web apps have a single page architecture, so while the user is constantly clicking and interacting within the app they rarely generate typical interaction statistics like pageviews or exits.

 

Notes
  1. There is a discussion on the best way to delay outbound links enough to record them as events. A Google Analytics support page condones the setTimeout approach. For other methods, there are threads on StackOverflow and various blog posts around the web. Alternatively, we could use the onmousedown event which fires slightly earlier than onclick but also might record false positives due to click-and-drag scrolling.
  2. Below is an attempt at rewriting the jQuery tracking code in pure JavaScript. It will only work in modern browsers because of use of querySelectorAll, parentElement, and previousElementSibling. Versions of Internet Explorer prior to 9 also use a unique attachEvent syntax for event handlers. Yes, there’s a reason people use libraries to do anything the least bit sophisticated with JavaScript.
function databaseTracking  ( event ) {
        var destination = event.target.href,
            resource = event.target.innerHTML,
            section = "none",
            highlighted = event.target.className.match( /highlighted/ ) ? 1: 0;

        // getting a parent element's nearest <h2> sibling is non-trivial without a library
        var currentSibling = event.target.parentElement;
        while ( currentSibling !== null ) {
            if ( currentSibling.tagName !== "H2" ) {
                currentSibling = currentSibling.previousElementSibling;
            }
            else {
                section = currentSibling.innerHTML;
                currentSibling = null;
            }
        }

        _gaq.push( [ '_trackEvent', 'database', resource, section, highlighted ] );

        // delay navigation to ensure event is recorded
        setTimeout( function () {
            window.location = destination;
        }, 200 );
        event.preventDefault();
    }

document.addEventListener( 'DOMContentLoaded', function () {
        var dbLinks = document.querySelectorAll( '#databases-list a' ),
            len = dbLinks.length;
        for ( i = 0; i < len; i++ ) {
            dbLinks[ i ].addEventListener( 'click', databaseTracking, false );
        }
    }, false );
Association of College & Research Libraries. (n.d.). ACRL Value of Academic Libraries. Retrieved January 12, 2013, from http://www.acrl.ala.org/value/
Event Tracking – Web Tracking (ga.js) – Google Analytics — Google Developers. (n.d.). Retrieved January 12, 2013, from https://developers.google.com/analytics/devguides/collection/gajs/eventTrackerGuide
Hess, K. (2012). Discovering Digital Library User Behavior with Google Analytics. The Code4Lib Journal, (17). Retrieved from http://journal.code4lib.org/articles/6942
Marek, K. (2011). Using Web Analytics in the Library a Library Technology Report. Chicago, IL: ALA Editions. Retrieved from http://public.eblib.com/EBLPublic/PublicView.do?ptiID=820360
Sattler, K., & Richard, J. (2012, October 30). Learning Web Analytics from the LITA 2012 National Forum Pre-conference. ACRL TechConnect Blog. Blog. Retrieved January 18, 2013, from http://acrl.ala.org/techconnect/?p=2133
Tracking Code: Event Tracking – Google Analytics — Google Developers. (n.d.). Retrieved January 12, 2013, from https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiEventTracking
window.onbeforeprint – Document Object Model (DOM) | MDN. (n.d.). Mozilla Developer Network. Retrieved January 12, 2013, from https://developer.mozilla.org/en-US/docs/DOM/window.onbeforeprint