Citation Manager Roundup

In April of this year, the two most popular free citation managers–Mendeley and Zotero–both underwent some big changes. On April 8th, TechCrunch announced that Elsevier had purchased Mendeley, which had been surmised in January. 1 Just a few days later, Zotero announced the release of version 4, with a number of new features. 2 Just as with the sunsetting of Google Reader, this has prompted many to consider what citation managers they have been using and think about switching or changing practices. I will not address subscription or paid products like RefWorks and EndNote specifically, though there are certainly many reasons you might prefer one of those products.

Mendeley: a new Star Wars movie in the making?

The rhetoric surrounding Elsevier’s acquisition of Mendeley was generally alarmist in nature, and the hashtag “#mendelete” that popped up immediately after the announcement suggests that many people’s first instinct was to abandon Mendeley. Elsevier has been held up as a model of anti-open access, and Mendeley as a model for open access. Yet Mendeley has always been a for-profit company, and, like Google, benefits itself and its users (particularly the science community) by knowing what they are reading and sharing. After all, the social features of Mendeley wouldn’t have any value if there was no public sharing. Institutional Mendeley accounts allow librarians to see what their users in aggregate are reading and saving, which helps them make collection development decisions– a service beyond what the average institutional citation manager product accomplishes. Victor Henning promises on the Mendeley blog that nothing will change, and that this will give them more freedom to develop more features 3. As for Elsevier, Oliver Dumon promises that Mendeley will remain independent and allowed to follow their own course–and that bringing it together with ScienceDirect and Scopus will create a “central workflow and collaboration site for authors”.4

There are two questions to be answered in this. First, is it realistic to assume that the Mendeley team will have the creative freedom they say they will have? And second, are users comfortable with their data being available to Elsevier? For many, the answers to both these questions seem to be “no” and “no.” A more optimistic point of view is that if Elsevier must placate Mendeley users who are open access advocates, they will allow more openness than before.

It’s too early to say, but I remain hopeful that Mendeley can continue to create a more open spirit in academic publishing. Peter Hoyt (a former employee of Mendeley and founder of PeerJ) suggests that much of the work that he oversaw to open up Mendeley was being stymied by Elsevier specifically. For him, this went against his personal ethos and so he was unable to stay at Mendeley–but he is confident in the character and ability of the people remaining at Mendeley.  5. I have never been a heavy user of Mendeley, but I have maintained a free account for the past few years. I use it mainly to create a list of my publications on my personal website, using a WordPress plug-in that uses the Mendeley API.

What’s new with Zotero

Zotero is a very different product than Mendeley. First, it is open-source software, with lots of ways to participate in development. Zotero was developed by the Roy Rosenzweig Center for History and New Media at George Mason University, with foundation and user support. It was developed specifically to support the research work of humanists. Originally a Firefox plug-in, Zotero now works as a standalone piece of software that interacts with Firefox, Chrome, and Safari to recognize bibliographic data on websites and pull them into a database that can be synced across computers (and even some third party mobile software). The newest version of Zotero includes several improvements. The one I am most excited about is detailed download display, which tells you what folder you’re saving a reference into, which is crucial for my workflow. Zotero is the citation manager I use on a daily basis, and I rely on it for formatting the footnotes you see on ACRL TechConnect posts or other research articles I produce. Since much of my research is on the open web, books, or other non-journal article resources, I find the ability of Zotero to pick up library catalog records and similar metadata more useful than the Mendeley import bookmarklet.

Both Zotero and Mendeley offer free storage for metadata and PDFs, with a cost for storage above the free level. (It is also possible to use a WebDAV server for syncing Zotero files).

Zotero Mendeley
300 MB Free
2 GB $20 / year 2 GB Free
6 GB $60 / year 5 GB $55 / year
10 GB $100 / year 10 GB $110 / year
25 GB $240 / year Unlimited $165 / year
Some concluding thoughts

Several graduate students in science 6 have written blog posts about switching away from Mendeley to Zotero. But they aren’t the same thing at all, and given the backgrounds of their creators, Mendeley is more skewed to the sciences, and Zotero more to the humanities.

Nor, as I like to point out, must they be mutually exclusive. I use Zotero for my daily citation management since I much prefer it for grabbing citations online, but sync my Zotero library with Mendeley to use the social and API features in Mendeley. I can choose to do this as an individual, but consider carefully the implications of your choice if you are considering an institutional subscription or requiring students or members of a research group to use a particular service.

  1. Lunden, Ingrid. “Confirmed: Elsevier Has Bought Mendeley For $69M-$100M To Expand Its Open, Social Education Data Efforts.” TechCrunch, April 18, 2013. http://techcrunch.com/2013/04/08/confirmed-elsevier-has-bought-mendeley-for-69m-100m-to-expand-open-social-education-data-efforts/.
  2. Takats, Sean. “Zotero 4.0 Launches.” Zotero, April 11, 2013. http://www.zotero.org/blog/zotero-4-0-launches/.
  3. Henning, Victor. “Mendeley and Elsevier – Here’s More Info.” Mend, April 19, 2013. http://blog.mendeley.com/community-relations/mendeley-and-elsevier-heres-more-info/
  4. Dumon, Oliver. “Elsevier Welcomes Mendeley.” Elsevier Connect, April 8, 2013. http://elsevierconnect.com/elsevier-welcomes-mendeley/.
  5. Hoyt, Jason. “My Thoughts on Mendeley/Elsevier & Why I Left to Start PeerJ,” April 9, 2013. http://enjoythedisruption.com/post/47527556151/my-thoughts-on-mendeley-elsevier-why-i-left-to-start.
  6. For one, see “Mendeley Sells Out; I’m Moving to Zotero.” LJ Villanueva’s Research Blog. Accessed May 20, 2013. http://research.coquipr.com/archives/492.

Local Dev Environments For Newbies Part 2: AMP on Windows 7

Previously, we discussed the benefits of installing a local AMP stack (Apache, MySQL & PHP) for the purposes of development and testing, and walked through installing a stack in the Mac environment.  In this post, we will turn our attention to Windows.  (If you have not read Local Dev Environments for Newbies Part 1, and you are new to the AMP stack, you might want to go read the Introduction and Tips sections before continuing with this tutorial.)

Much like with the Mac stack, there are Windows stack installers that will do all of this for you.  For example, if you are looking to develop for Drupal, there’s an install package called Acquia that comes with a stack installer.   There’s also WAMPserver and XAMPP.  If you opt to go this route, you should do some research and decide which option is the best for you.  This article contains reviews of many of the main players, though it is a year old.

However, we are going to walk through each component manually so that we can see how it all works together.

So, let’s get going with Recipe 2 – Install the AMP Stack on Windows 7.

Prerequisites:

Notepad and Wordpad come with most Windows systems, but you may want to install a more robust code editor to edit configuration files and eventually, your code.  I prefer Notepad++, which is open source and provides much of the basic functionality needed in a code editor.  The examples here will reference Notepad++ but feel free to use whichever code editor works for you.

For our purposes, we are not going to allow traffic from outside the machine to access our test server.  If you need this functionality, you will need to open a port in your firewall on port 80.  Be very careful with this option.

As a prerequisite to installing Apache, we need to install the Visual C++ 2010 SP1 Redistributable Package x86.  As a pre-requisite to installing PHP, we need to install the Visual C++ 2008  SP1 Redistributable Package x86.

I create a directory called opt\local in my C drive to house all of the stack pieces.  I do this because it’s easier to find things on the command line when I need to and I like keeping development environment applications separate from Program Files.  I also create a directory called sites to house my web files.

wamp-opt copy

The last two prerequisites are more like common gotchas.  The first is that while you are manipulating configuration and initialization files throughout this process, you may find the Windows default view settings are getting in your way.  If this is the case, you can change it by going to Organize > Folder and search options > View tab.

wamp-windows

This will bring up a dialog which allows you to set preferences for the folder you are currently viewing.  You can select the option to “show hidden files” and uncheck the “hide file extensions” option, both of which make developing easier.

The other thing to know is that in our example, we will work with a Windows 7 installation – a 64-bit operating system.  However, when we get to PHP, you’ll notice that their website does not provide a 64-bit installer.  I have seen errors in the past when a 32-bit PHP installer and a 64-bit Apache version were both used, so we will install the 32-bit versions for both components.

Ok, I think we’re all set.  Let’s install Apache.

Apache

We want to download the .zip file for latest version.  For Windows binaries, I use apachelounge, which builds windows installer files.  For this example we’ll download httpd-2.4.4-win32.zip to the Desktop of our Windows machine.

wamp-apache1

Next, we want to extract files into chosen location for Apache directory, eg c:\opt\local\Apache24.  You can accomplish this a variety of ways but if you have WinZip, you can follow these steps:

  1. Copy the .zip folder to c:\opt\local
  2. Right-click and select “Extract all files”.
  3. Open the extracted folder, right-click on the Apache24 folder and select Cut.
  4. Go back up one directory and right-click to Paste the Apache24 folder, so that it now resides inside c:\opt\local.

No matter what unzip program you use, this is the configuration we are shooting for:wamp-apachedir

This extraction “installs” Apache; there is no installer to run, but we will need to configure a few things.

We want to open httpd.conf: this file contains all of the configuration settings for our web server.  If you followed the directions above, you can find the file in C:\opt\local\Apache24\conf\httpd.conf – we want to open it with our code editor and make the following changes:

1.  Find this line (in my copy, it’s line 37):

ServerRoot “c:/Apache24”

Change it to match the directory where you installed Apache.  In my case, it reads:

ServerRoot “c:/opt/local/Apache24”

You might notice that our slashes slant in the opposite direction from the usual Windows sytax.  In Windows, backslash ( \ ) delineates different directories, but in Unix, it’s forward slash ( / ).  Apache reads the configuration file in the Unix manner, even though we are working in Windows.  If you get a “directory not found” error at any point, check your slashes.

2.  At Line 58, we are going to change the listen command to just listen to our machine.  Change

Listen 80

to

Listen localhost:80

3.  There are 100 lines around 72-172 that all start with LoadModule.  Some of these are comments (they begin with a “#”).  Later on, you may need to uncomment some of these for a certain web program to work, like SSL.  For now, though, we’ll leave these as is.

4.  Next, we want to change our Document Root and the directory directive to the directory which has the web files.  These lines (beginning on line 237 in my copy) read:

DocumentRoot “c:/Apache24/htdocs”

Later, we’ll want to change this to our “sites” folder we created earlier.  For now, we’re just going to change this to the Apache installation directory for testing.  So, it should read:

DocumentRoot “c:/opt/local/Apache24/htdocs”

Save the httpd.conf file.  (In two of our test cases, after saving the file, closing and re-opening, the file appeared unchanged.  If you are having issues, try doing Save As and save the file to your desktop, then drag it into c:\opt\local\Apache24).

Next, we want to test our Apache configuration.  To do this, we open the command line.  In Windows, you can do this by going to the Start Menu, and typing

cmd.exe

in the Search box.  Then, press Enter.  Once you’re in the command prompt, type in

cd \opt\local\Apache24\bin

(Note that the first part of this path is the install directory I used above.  If you chose a different directory to install Apache, use that instead.)  Next, we start the web server with a “-t” flag to test it.  Type in:

httpd –t

If you get a Syntax OK, you’re golden.

wamp-apache4

Otherwise, try to resolve any errors based on the error message. If the error message does not make any sense after checking your code for typos, go back and make sure that your changes to httpd.conf did actually save.

Once you get Syntax OK, type in:

httpd

This will start the web server.  You should not get a message regarding the firewall if you changed the listen command to localhost:80.  But, if you do, decide what traffic you want to allow to your machine.  I would click “Cancel” instead of “Allow Access”, because I don’t want to allow outside access.

Now the server is running.  You’ll notice that you no longer have a C:\> prompt in the Command window.  To test our server, we open a browser and type in http://localhost  – you should get a website with text that reads “It works!”

wamp-apache5

Instead of starting up the server this way every time, we want to install it as a Windows service.  So, let’s go back to our command prompt and press Ctrl+C to stop web server.  You should now have a prompt again.

To install Apache as a service, type:

httpd.exe –k install

You will most likely get an error that looks like this:

wamp-apache6

We need to run our command prompt as an administrator.  So, let’s close the cmd.exe window and go back to our Start menu.  Go to Start > All Programs > Accessories and right-click on Command Prompt.  Select “Run As Administrator”.

(Note: If for some reason you do not have the ability to right-click, there’s a “How-To Geek” post with a great tip.  Go to the Start menu and in the Run box, type in cmd.exe as we did before, but instead of hitting Enter, hit Ctrl+Shift+Enter.  This does the same thing as the right-click step above.)

Click on Yes at the prompt that comes up, allowing the program to make changes.  You’ll notice that instead of starting in our user directory, we are starting in Windows\system32 So, let’s go back to our bin directory with:

cd \opt\local\Apache24\bin

Now, we can run our

httpd.exe –k install

command again, and it should succeed.  To start the service, we want to open our Services Dialog, located in the Control Panel (Start Menu > Control Panel) in the Administrative Tools section.  If you display your Control Panel by category (the default), you click on System & Security, then Administrative Tools.  If you display your control panel by small icon, Administrative Tools should be listed.

Double click on Services.

wamp-apache8

Find Apache2.4 in the list and select it.  Verify that the Startup Type is set to Automatic if you want the Service to start automatically (if you would prefer that the Service only start at certain times, change this to Manual, but remember that you have to come back in here to start it).  With Apache2.4 selected, click on Start Service in the left hand column.

wamp-apacheserv

Go back to the browser and hit Refresh to verify that everything is still working.  It should still say “It Works!”  And with that affirmation, let’s move to PHP.

 PHP

(Before installing PHP, make sure you have installed the Visual C++ 2008 Redistributable Package from the prerequisite section.)

For our purposes, we want to use the Thread Safe .zip from the PHP Downloads page.    Because we are running PHP under Apache, but not as a CGI, we use the thread safe version.  (For more on thread safe vs. non-thread safe, see this Wikipedia entry or this stackoverflow post)

PHP Download

Once you’ve downloaded the .zip file, extract it to your \opt\local directory.  Then, rename the folder to simply “php”.  As with Apache24, extracting the files does the “install”, we just need to configure everything to run properly.  Go to the directory where you installed PHP, (in my case, c:\opt\local\php) and find php.ini-development.

Make a copy of the file and rename the copy php.ini (this is one of those places where you may want to set the Folder and search options if you’re having problems).

PHP ini file

Open the file in Notepad++ (or your code editor of choice).  Note that here, comments are preceded by a “;” (without quotes) and the directories are delineated using the standard Windows format, with a “\”.  Most of the document is commented out, and includes a large section on recommended settings for production and development, so if you’re not sure of the changes to make you can check in the file (in addition to the PHP documentation).  For this tutorial, we want to make the following changes:

1.  On line 708, uncomment (remove semi-colon) include_path under “Windows” and make sure it matches the directory where you installed PHP (if the line numbers have changed, just search for Paths and Directories).

wamp-php2
2.  On line 730, uncomment the Windows directive for extension_dir and change extension_dir to match c:\opt\local\php\ext

wamp-php3
3.  Beginning on Line 868, in the Windows Extensions section, uncomment (remove the semi-colon) from the following lines (they are not right next to each other, they’re in a longer list, but we want these three uncommented):

extension=php_mysql.dll
extension=php_mysqli.dll
extension=php_pdo_mysql.dll

Save php.ini file.

You may want to double-check that the .dll files we enabled above are actually in the c:\opt\local\php\ext folder before trying to run php, because you will see an error if they are not there.

Next, we want to add the php directory to our path environment variables.  This section is a little tricky; be *extremely* careful when you are making changes to system settings like this.

First, we navigate to the Environment variables by opening the Control Panel and going to System & Security > System > Advanced System Settings > Environment Variables.

In the bottom scroll box, scroll until you find “Path”, click on it, then click on Edit.

wamp-php6

Append the following to the end of the Variable Value list (the semi-colon ends the previous item, then we add our installation path).

;c:\opt\local\php

wamp-php7

Click OK and continue to do so until you are out of the dialog.

Lastly, we need to add some lines to the httpd.conf so that Apache will play nice with PHP.  The httpd.conf file may still be open in your text editor.  If not, go back to c:\opt\local\Apache24\conf and open it.  At the bottom of this file, we need to add the following:

LoadModule php5_module "c:/opt/local/php/php5apache2_4.dll"
AddHandler application/x-httpd-php .php
PHPIniDir "c:/opt/local/php"

This tells Apache where to find php and loads the module needed to work with PHP.  (Note:  php5apache2_4.dll must be installed in the directory you specified above in the LoadModule statement.  It should have been extracted with the other files, but to download the file if it is not there, you can go to the apachelounge additional downloads page.)

While we’re in this file, we also want to tell Apache to look for an index.php file.  We’ll need this for testing, but also for some content management systems.  To do this, we change the DirectoryIndex directive on line 271.  It should look like

<IfModule dir_module>
  DirectoryIndex index.html

We want to change the DirectoryIndex line so it reads

DirectoryIndex index.php index.html

Save httpd.conf.

Before we restart Apache to pick up these changes, we’re going to do one last thing.  To test our php, we want to create a file called index.php with the following text inside:

<?php phpinfo(); ?>

Save it to c:\opt\local\Apache24\htdocs

wamp-php5

Restart Apache by going back to the Services dialog.  (If you closed it, it’s Control Panel > System & Security > Administrative Tools > Services).  Click on Apache2.4 and then click on Restart.

wamp-php8

If you get an error, you can always go back to the command line, navigate to c:\opt\local\Apache24\bin and run httpd.exe –t again.  This will check your syntax, which is most likely to the be problem.  (This page is also helpful in troubleshooting PHP 5.4 and Apache if you are having issues.)

Open a browser window and type in http://localhost – instead of “It Works!” you should see a list configuration settings for PHP.  (In one of our test cases, the tester needed to close Internet Explorer re-open it for this part to work.)

wamp-php9

Now, we move to the database.

MySQL

To install MySQL, we can follow the directions at the MySQL site.  For the purposes of this tutorial, we’re going to use the most recent version as of this writing, which is 5.6.11.  To download the files we need, we go to the Community Server download page.

MySQL Downloads

Again, we can absolutely use the installer here, which is the first option.  The MySQL installers will prompt you through the setup, and this video does a great job of walking through the process.

But, the since the goal of this tutorial is to see all the parts, I’m going to run through the setup manually.  First, we download the .zip archive.  Choose the .zip file which matches your operating system; I will choose 64-bit (there’s no agreement issue here).  Extract the files to c:\opt\local\mysql.  We do this in the same way we did the Apache24 files above.

Since we’re installing to our opt\local drive, we need to tell MySQL to look there for the program files and the data.  We do this by setting up an option file.  We can modify a file provided for us called my-default.ini.  Change the name to my.ini and open it with your code editor.

wamp-mysql2

In the MySQL config files, we use the Unix directory “/” again, and the comments are again preceded by a “#”.  So, to set our locations, we want to remove the # from the beginning of the basedir and datadir lines, and change to our installation directory as shown below.

wamp-mysql3

Then save my.ini.

As with Apache, we’re going to start MySQL for the first time from the command line, to make sure everything is working ok.  If you still have it open, navigate back there.  If not, remember to select the Run As Administrator option.

From your command prompt, type in

cd \opt\local\mysql\bin
mysqld --console

You should see a bunch of statements scroll by as the first database is created.  You may also get a firewall popup.  I hit Cancel here, so as not to allow access from outside my computer to the MySQL databases.

Ctrl+C to stop the server.  Now, let’s install MySQL as a service.  To do that, we type the command:

mysqld --install

wamp-mysql4

Next, we want to start the MySQL service, so we need to go back to Services.  You may have to Refresh the list in order to see the MySQL service.  You can do this by going to Action > Refresh in the menu.

wamp-mysql5

Then, we start the service my clicking on MySQL and clicking Start Service on the left hand side.

wamp-mysql6

 

One thing about installing MySQL in this manner is that the initial root user for the database will not have a password.  To see this, go back to your command line.  Type in

mysql -u root

This will open the command line MySQL client and allow you to run queries.  The -u flag sets the user, in this case, root.  Notice you are not prompted for a password.  Type in:

select user, host, password from mysql.user;

This command should show all the created user accounts, the hosts from which they can log in, and their passwords.  The semi-colon at the end is crucial – it signifies the end of a SQL command.

wamp-mysql8

Notice in the output that the password column is blank.  MySQL provides documentation on how to fix this on the Securing the Initial Accounts documentation page, but we’ll also step through it here.  We want to use the SET PASSWORD command to set the password for all of the root accounts.

Substituting the password you want for newpwd (keep the single quotes in the command), type in

SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpwd');
SET PASSWORD FOR 'root'@'127.0.0.1' = PASSWORD('newpwd');
SET PASSWORD FOR 'root'@'::1' = PASSWORD('newpwd');

You should get a confirmation after every command.  Now, if you run the select user command from above, you’ll see that there are values in the password field, equivalent to encrypted versions of what you specified.

A note about security: I am not a security expert and for a development stack we are usually less concerned with security.  But it is generally not a good idea to type in plain text passwords in the command line, because if the commands are being logged you’ve just saved your password in a plain text file that someone can access.  In this case, we have not turned on any logging, and the SET PASSWORD should not store the password in plain text.  But, this is something to keep in mind.

As before with Mac OS X, we could stop here.  But then you would have to administer the MySQL databases using the command line.  So we’ll install phpMyAdmin to make it a little easier and test to see how our web server works with our sites folder.

phpMyAdmin

Download the phpmyadmin.zip file from the phpmyadmin page to the sites folder we created all the way at the beginning.  Note that this does *not* go into the opt folder.

Extract the files to a folder called phpmyadmin using the same methods we’ve used previously.

wamp-phpmyadmin

Since we now want to use our sites folder instead of the default htdocs folder, we will need to change the DocumentRoot and Directory directives on lines 237 and 238 of our Apache config file.  So, open httpd.conf again.

We want to change the DocumentRoot to sites, and we’re going to set up the phpMyAdmin directory.

Change Document Root and Directory

Save the httpd.conf file.  Go back to Services and Restart the Apache2.4 service.

We will complete the configuration through the browser.  First, open the browser and try to navigate to http://localhost again.  You should get a 403 error.

wamp-phpmyadmin4

Instead, navigate to http://localhost/phpmyadmin/setup

wamp-phpmyadmin5

Click on the New Server button to set up a connection to our MySQL databases.  Double check that under the Basic Settings tab, the Server Name is set to localhost, and then click on Authentication.  Verify that the type is “cookie”.

At the bottom of the page, click on Save.  Now, change the address in the browser to http://localhost/phpmyadmin and log in with the root user, using the password you set above.

And that’s it.  Your Windows AMP stack should be ready to go.

In the next post, we’ll talk about how to install a content management system like WordPress or Drupal on top of the base stack.  Questions, comments or other recipes you would like to see?  Let us know in the comments.

 


Coding & Collaboration on GitHub

Previously on Tech Connect we wrote about the Git version control system, walking you through “cloning” a project onto to your computer, making some small changes, and committing them to the project’s history. But that post concluded on a sad note: all we could do was work by ourselves, fiddling with Git on our own computer and gaining nothing from the software’s ability to manage multiple contributors. Well, here we will return to Git to specifically cover GitHub, one of the most popular code-sharing websites around.

Git vs. GitHub

Git is open source version control software. You don’t need to rely on any third-party service to use it and you can benefit from many of its features even if you’re working on your own.

GitHub, on the other hand, is a company that hosts Git repositories on their website. If you allow your code to be publicly viewable, then you can host your repository for free. If you want to have a private repository, then you have to pay for a subscription.

GitHub layers some unique features on top of Git. There’s an Issues queue where bug reports and feature requests can be tracked and assigned to contributors. Every project has a Graphs section where interesting information, such as number of lines added and deleted over time, is charted (see the graphs for jQuery, for instance). You can create gists which are mini-repositories, great for sharing or storing snippets of useful code. There’s even a Wiki feature where a project can publish editable documentation and examples. All of these nice features build upon, but ultimately have little to do with, Git.

Collaboration

GitHub is so successful because of how well it facilitates collaboration. Hosted version control repositories are nothing new; SourceForge has been doing this since 1999, almost a decade prior to GitHub’s founding in 2008. But something about GitHub has struck a chord and it’s taken off like wildfire. Depending on how you count, it’s the most popular collection of open source code, over SourceForge and Google Code.[1] The New York Times profiled co-founder Tom Preston-Werner. It’s inspired spin-offs, like Pixelapse which has been called “GitHub for Photoshop” and Docracy which TechCrunch called “GitHub for legal documents.” In fact, just like the phrase “It’s Facebook for {{insert obscure user group}}” became a common descriptor for up-and-coming social networks, “It’s GitHub for {{insert non-code document}}” has become commonplace. There are many inventive projects which use GitHub as more than just a collection of code (more on this later).

Perhaps GitHub’s popularity is due to Git’s own popularity, though similar sites host Git repositories too.[2] Perhaps the GitHub website simply implements better features than its competitors. Whatever the reason, it’s certain that GitHub does a marvelous job of allowing multiple people to manage and work on a project.

Fork It, Bop It, Pull It

Let’s focus two nice features of GitHub—Forking and the Pull Request [3]—to see exactly why GitHub is so great for collaboration.

If you recall our prior post on Git, we cloned a public repository from GitHub and made some minor changes. Then, when reviewing the results of git log, we could see that our changes were present in the project’s history. That’s great, but how would we go about getting our changes back into the original project?

For the actual step-by-step process, see the LibCodeYear GitHub Project’s instructions. There are basically only two changes from our previous process, one at the very beginning and one at the end.

GItHub's Fork Button

First, start by forking the repository you want to work on. To do so, set up a GitHub account, sign in, visit the repository, and click the Fork button in the upper right. After a pretty sweet animation of a book being scanned, a new project (identical to the original in both name and files) will appear on your GitHub account. You can then clone this forked repository onto your local computer by running git clone on the command line and supplying the URL listed on GitHub.

Now you can do your editing. This part is the same as using Git without GitHub. As you change files and commit changes to the repository, the history of your cloned version and the one on your GitHub account diverge. By running git push you “push” your local changes up to GitHub’s remote server. Git will prompt you for your GitHub password, which can get annoying after a while so you may want to set up an SSH key on GitHub so that you don’t need to type it in each time. Once you’ve pushed, if you visit the repository on GitHub and click the “commits” tab right above the file browser, you can see that your local changes have been published to GitHub. However, they’re still not in the original repository, which is underneath someone else’s account. How do you add your changes to the original account?

GitHub's Pull Request Button

In your forked repository on GitHub, something is different: there’s a Pull Request button in the same upper right area where the Fork one is. Click that button to initiate a pull request. After you click it, you can choose which branches on your GitHub repository to push to the original GitHub repository, as well as write a note explaining your changes. When you submit the request, a message is sent to the project’s owners. Part of the beauty of GitHub is in how pull requests are implemented. When you send one, an issue is automatically opened in the receiving project’s Issues queue. Any GitHub account can comment on public pull requests, connecting them to open issues (e.g. “this fixes bug #43″) or calling upon other contributors to review the request. Then, when the request is approved, its changes are merged into the original repository.

diagram of forking & pulling on GitHub

“Pull Request” might seem like a strange term. “Push” is the name of the command that takes commits from your local computer and adds them to some remote server, such as your GitHub account. So shouldn’t it be called a “push request” since you’re essentially pushing from your GitHub account to another one? Think of it this way: you are requesting that your changes be pulled (e.g. the git pull command) into the original project. Honestly, “push request” might be just as descriptive, but for whatever reason GitHub went with “pull request.”

GitHub Applications

While hopefully we’ve convinced you that the command line is a fine way to do things, GitHub also offers Mac and Windows applications. These apps are well-designed and turn the entire process of creating and publishing a Git repository into a point-and-click affair. For instance, here is the fork-edit-pull request workflow from earlier except done entirely through a GitHub app:

  • Visit the original repository’s page, click Fork
  • On your repository’s page, select “Clone in Mac” or “Clone in Windows” depending on which OS you’re using. The repository will be cloned onto your computer
  • Make your changes and then, when you’re ready to commit, open up the GitHub app, selecting the repository from the list of your local ones
  • Type in a commit message and press Commit
    writing a commit message in GitHub for Windows
  • To sync changes with GitHub, click Sync
  • Return to the repository on GitHub, where you can click the Pull Request button and continue from there

GitHub without the command line, amazing! You can even work with local Git repositories, using the app to do commits and view previous changes, without ever pushing to GitHub. This is particularly useful on Windows, where installing Git can have a few more hurdles. Since the GitHub for Windows app comes bundled with Git, a simple installation and login can get you up-and-running. The apps also make the process of pushing a local repository to GitHub incredibly easy, whereas there are a few steps otherwise. The apps’ visual display of “diffs” (differences in a file between versions, with added and deleted lines highlighted) and handy shortcuts to revert to particular commits can appeal even to those of us that love the command line.

viewing a diff in GitHub for Windows

More than Code

In my previous post on Git, I noted that version control has applications far beyond coding. GitHub hosts a number of inventive projects that demonstrate this.

  • The Code4Lib community hosts an Antiharassment Policy on GitHub. Those in support can simply fork the repository and add their name to a text file, while the policy’s entire revision history is present online as well
  • The city of Philadelphia experimented with using GitHub for procurements with successful results
  • ProfHacker just wrapped up a series on GitHub, ending by discussing what it would mean to “fork the academy” and combine scholarly publishing with forking and pull requests
  • The Jekyll static-site generator makes it possible to generate a blog on GitHub
  • The Homebrew package manager for Mac makes extensive use of Git to manage the various formulae for its software packages. For instance, if you want to roll back to a previous version of an installed package, you run brew versions $PACKAGE where $PACKAGE is the name of the package. That command prints a list of Git commits associated with older versions of the package, so you can enter the Homebrew repository and run a Git command like git checkout 0476235 /usr/local/Library/Formula/gettext.rb to get the installation formula for version 0.17 of the gettext package.

These wonderful examples aside, GitHub is not a magic panacea for coding, collaboration, or any of the problems facing libraries. GitHub can be an impediment to those who are intimidated or simply not sold on the value of learning what’s traditionally been a software development tool. On the Code4Lib listserv, it was noted that the small number of signatories on the Antiharassment Policy might actually be due to its being hosted on GitHub. I struggle to sell people on my campus of the value of Google Docs with its collaborative editing features. So, as much as I’d like the Strategic Plan the college is producing to be on GitHub where everyone could submit pull requests and comment on commits, it’s not necessarily the best platform. It is important, however, not to think of it as limited purely to versioning code written by professional developers. GitHub has uses for amateurs and non-coders alike.

Footnotes

[1]^ GitHub Has Passed SourceForge, (June 2, 2011), ReadWrite.

[2]^ Previously-mentioned SourceForge also supports Git, as does Bitbucket.

[3]^ I think this would make an excellent band name, by the way.


Taking a trek with SCVNGR: Developing asynchronous, mobile orientations and instruction for campus

Embedding the library in campus-wide orientations, as well as developing standalone library orientations, is often part of outreach and first year experience work. Reaching all students can be a challenge, so finding opportunities for better engaging campus helps to promote the library and increase student awareness. Using a mobile app for orientations can provide many benefits such as increasing interactivity and offering an asynchronous option for students to learn about the library on their own time. We have been trying out SCVNGR at the University of Arizona (UA) Libraries and are finding it is a more fun and engaging way to deliver orientations and instruction to students.

Why use game design for library orientations and instruction?

Game-based learning can be a good match for orientations, just as it can be for instruction (I have explored this before with ACRL TechConnect previously, looking at badges). Rather than just presenting a large amount of information to students or having them fill out a paper-based scavenger hunt activity, using something like SCVNGR can get students interacting more with the library in a way that offers more engagement in real time and with feedback. However, simply adding a layer of points and badges or other game mechanics to a non-game situation doesn’t automatically make it fun and engaging for students. In fact, doing this ineffectively can cause more harm than good (Nicholson, 2012). Finding a way to use the game design to motivate participants beyond simply acquiring points tends to be the common goal in using game design in orientations and instruction. Thinking of the WIIFM (What’s In It For Me) principle from a students’ perspective can help, and in the game design we used at the University of Arizona with SCVNGR for a class orientation, we created activities based on common questions and concerns of students.

Why SCVNGR?
scvngr home screen

scvngr home screen

 

SCVNGR is a mobile app game for iPhone and Android where players can complete challenges in specific locations. Rather than getting clues and hints like in a traditional scavenger hunt, this game is more focused on activities within a location instead of finding the location. Although this takes some of the mystery away, it works very well for simply informing people about locations that are new to them and having them interact with the space.

Students need to physically be in the location for the app to work, where they use the location to search for “challenges” (single activities to complete) or “treks” (a series of single activities that make up the full experience for a location), and then complete the challenges or treks to earn points, badges, and recognition.

Some libraries have made their own mobile scavenger hunt activities without the aid of a paid app. For example, North Carolina State University uses the NCSU Libraries’ Mobile Scavenger Hunt, which is a combination of students recording responses in Evernote, real time interaction, and tracking by librarians.  One of the reasons we went with SCVNGR, however, is because this sort of mobile orientation requires a good amount of librarian time and is synchronous, whereas SCVNGR does not require as much face-to-face librarian time and allows for asynchronous student participation. Although we do use more synchronous instruction for some of our classes, we also wanted to have the option for asynchronous activities, and in particular for the large-scale orientations where many different groups will come in at many different times. Although SCVNGR is not free for us, the app is free to students. They offer 24/7 support and other academic institutions offer insight and ideas in a community for universities.

Other academic libraries have used SCVNGR for orientations and even library instruction. A few examples are:

 

How did the UA Libraries use SCVNGR?

Because a lot of instruction has moved online and there are so many students to reach, we are working on SCVNGR treks for both instruction and basic orientations at the University of Arizona (UA). We are in the process of setting up treks for large-scale campus orientations (New Student Orientation, UA Up Close for both parents and students, etc.) that take place during the summer, and we have tested SCVNGR  out on a smaller scale as a pilot for individual classes. There tends to be greater success and engagement if the Trek is tied to something, such as a class assignment or a required portion of an orientation session that must be completed. One concern for an app-based activity is that not all students will have smartphones. This was alleviated by putting students into groups ahead of time, ensuring that at least one person in the group did have a device compatible to use SCVNGR. However, we do lend technology at the UA Libraries, and so if a group was without a smartphone or tablet, they would be able to check one out from the library.

trek page for ais197b at ua libraries

trek page for ais197b at ua libraries

We first piloted a trek on an American Indian Studies student success course (AIS197b). This course for freshmen introduces students to services on campus that will be useful to them while they are at the UA. Last year, we presented a quick information session on library services, and then had the students complete a scavenger hunt for a class grade (participation points) with pencil and paper throughout the library. Although they seemed glad to be able to get out and move around, it didn’t seem particularly fun and engaging. On top of that, every time the students got stuck or had a question, they had to come back to the main floor to find librarians and get help.  In contrast, when students get an answer wrong in SCVNGR, feedback is programmed in to guide them to the correct information. And, because they don’t need clues to make it to the next step (they just go back and select the next challenge in the trek), they are able to continue without one mistake preventing them from moving on to the next activity. This semester, we first presented a brief instruction session (approximately 15-20 min) and then let students get started on SCVNGR.

You can see in the screenshot below how question design works, where you can select the location, how many points count toward the activity, type of activity (taking a photo, answering with text, or scanning a QR code), and then providing feedback. If a student answers a question incorrectly, as I mention above, they will receive feedback to help them in figuring out the correct answer. I really like that when students get answers right, they know instantly. This is positive reinforcement for them to continue.

scvngr answer feedback

scvngr answer feedback

The activities designed for students in this class were focused on photo and text-based challenges. We stayed away from QR codes because they can be finicky with some phones, and simply taking a picture of the QR code meets the challenge requirement for that option of activity. Our challenges included:

  • Meet the reference desk (above): Students meet desk staff and ask how they can get in touch for reference assistance; answers are by text and students type in which method they think they would use the most: email, chat, phone, or in person.
  • Prints for a day: Students find out about printing (a frequent question of new students), and text in how to pay for printing after finding the information at the Express Documents Center.
  • Playing favorites: Students wander around the library and find their favorite study spot. Taking a picture completes the challenge, and all images are collected in the Trek’s statistics.
  • Found in the stacks: After learning how to use the catalog (we provided a brief instruction session to this class before setting them loose), students search the catalog for books on a topic they are interested in, then locate the book on the shelf and take a picture. One student used this time to find books for another class and was really glad he got some practice.
  • A room of one’s own: The UA Libraries implemented online study room reservations as of a year ago. In order to introduce this new option to students, this challenge had them use their smartphones to go to the mobile reservation page and find out what the maximum amount of hours study rooms can be reserved for and text that in.

SCVNGR worked great with this class for simple tasks, such as meeting people at the reference desk, finding a book, or taking a picture of a favorite study spot, but for tasks that might require more critical thinking or more intricate work, this would not be the best platform to use in that level of instruction. SCVNGR’s assessment options are limited for students to respond to questions or complete an activity. Texting in detailed answers or engaging in tasks like searching a database would be much harder to record. Likewise, because more instruction that is tied to critical thinking is not so much location-based (evaluating a source or exploring copyright issues, for example), and so it would be hard to tie these tasks and acquisition of skill to an actual location-based activity to track. One instance of this was with the Found in the Stacks challenge; students were supposed to search for a book in the catalog and then locate it on the shelf, but there would be nothing stopping them from just finding a random book on the shelf and taking a picture of it to complete the challenge. SCVNGR provides a style guide to help in game design, and the overall understanding from this document is that simplicity is most effective for this platform.

Another feature that works well is being able to choose if the Trek is competitive or not, and also use “SmartRoute,” which is the ability to have challenges show up for participants based on distance and least-crowded areas. This is wonderful, particularly as students get sort of congested at certain points in a scavenger hunt: they all crowd around the same materials or locations simultaneously because they’re making the same progress through the activity. We chose to use SmartRoute for this class so they would be spread out during the game.

scvngr trek settings

scvngr trek settings

When trying to assess student effort and impact of the trek, you can look at stats and rankings. It’s possible to view specific student progress, all activity by all participants, and rankings organized by points.

scvngr statistics

scvngr statistics

Another feature is the ability to collect items submitted for challenges (particularly pictures). One of our challenges is for students to find their favorite study spot in the library and take a picture of it. This should be fun for them to think about and is fairly easy, and it helps us do some space assessment. It’s then possible to collect pictures like the following (student’s privacy protected via purple blob).

student images of ua main library via scvngr

student images of ua main library via scvngr

On the topic of privacy, students enter in their name to set up an account, but only their first name and first initial of their last name appear as their username. Although last names are then hidden, SCVNGR data is viewable by anyone who is within the geographical range to access the challenge: it is not closed to an institution. If students choose to take pictures of themselves, their identity may be revealed, but it is possible to maintain some privacy by not sharing images of specific individuals or sharing any personal information through text responses. On the flip side of  not wanting to associate individual students with their specific activities, it gets trickier when an instructor plans to award points for student participation. In that case, it’s possible to request reports from SCVNGR for instructors so they can see how much and which students participated. In a large class of over 100 students, looking at the data can be messier, particularly if students have the same first name and last initial. Because of this issue, SCVNGR might be better used for large-scale orientations where participation does not need to be tracked, and small classes where instructors would be easily able to know who is who in the data for activity.

Lessons learned

Both student and instructor feedback was very positive. Students seemed to be having fun, laughing, and were not getting stuck nearly as much as the previous year’s pencil-and-paper hunt. The instructor noted it seemed a lot more streamlined and engaging for the class. When students checked in with us at the end before heading out, they said they enjoyed the activity and although there were a couple of hiccups with the software and/or how we designed the trek, they said it was a good experience and they felt more comfortable with using the library.

Next time, I would be more careful about using text responses. I had gone down to our printing center to tell the current student worker what answers students in the class would be looking for so she could answer it for them, but they wound up speaking with someone else and getting different answers. Otherwise, the level of questions seemed appropriate for this class and it was a good way to pilot how SCVNGR works, if students might like it, and how long different types of questions take for bringing this to campus on a larger scale. I would also be cautious about using SCVNGR too heavily for instruction, since it doesn’t seem to have capabilities for more complex tasks or a great deal of critical thinking. It is more suited to basic instruction and getting students more comfortable in using the library.

Pros

  • Ability to reach many students and asynchronously
  • Anyone can complete challenges and treks; this is great for prospective students and families, community groups, and any programs doing outreach or partnerships outside of campus since a university login is not required.
  • Can be coordinate with campus treks if other units have accounts or a university-wide license is purchased.
  • WYSIWYG interface, no programming skills necessary
  • Order of challenges in a trek can be assigned staggered so not everyone is competing for the same resources at the same time.
  • Can collect useful data through users submitting photos and comments (for example, we can examine library space and student use by seeing where students’ favorite spots to study are).

Cons

  • SCVNGR is not free to use, an annual fee applies (in the $900-range for a library-only license, which is not institution-wide).
  • Privacy is a concern since anyone can see activity in a location; it’s not possible to close this to campus.
  • When completing a trek, users do not get automatic prompts to proceed to the next challenge; instead, they must go back to the home location screen and choose the next challenge (this can get a little confusing for students).
  • SCVNGR is more difficult to use with instruction, especially when looking to incorporate critical thinking and more complex activities
  • Instructors might have a harder time figuring out how to grade participation because treks are open to anyone; only students’ first name and last initial appear, so if either a large class completes a trek for an assignment or if an orientation trek for the public is used, a special report must be requested from SCVNGR that the library could send to the instructor for grading purposes.

 

Conclusion

SCVNGR is a good way to increase awareness and get students and other groups comfortable in using the library. One of the main benefits is that it’s asynchronous, so a great deal of library staff time is not required to get people interacting with services, collections, and space. Although this platform is not perfect for more in-depth instruction, it does work at the basic orientation level, and students and the instructor in the course we piloted it on had a good experience.

 

References

Nicholson, S. (2012). A user-centered theoretical framework for meaningful gamification. Paper Presented at Games+Learning+Society 8.0, Madison, WI. Retrieved from http://scottnicholson.com/pubs/meaningfulframework.pdf.

—-

About Our Guest Author: Nicole Pagowsky is an Instructional Services Librarian at the University of Arizona where she explores game-based learning, student retention, and UX. You can find her on Twitter, @pumpedlibrarian.


A Librarian’s Guide to OpenRefine

Academic librarians working in technical roles may rarely see stacks of books, but they doubtless see messy digital data on a daily basis. OpenRefine is an extremely useful tool for dealing with this data without sophisticated scripting skills and with a very low learning curve. Once you learn a few tricks with it, you may never need to force a student worker to copy and paste items onto Excel spreadsheets.

As this comparison by the creator of OpenRefine shows, the best use for the tool is to explore and transform data, and it allows you to make edits to many cells and rows at once while still seeing your data. This allows you to experiment and undo mistakes easily, which is a great advantage over databases or scripting where you can’t always see what’s happening or undo the typo you made. It’s also a lot faster than editing cell by cell like you would do with a spreadsheet.

Here’s an example of a project that I did in a spreadsheet and took hours, but then I redid in Google Refine and took a lot less time. One of the quickest things to do with OpenRefine is spot words or phrases that are almost the same, and possibly are the same thing. Recently I needed to turn a large export of data from the catalog into data that I could load into my institutional repository. There were only certain allowed values that could be used in the controlled vocabulary in the repository, so I had to modify the bibliographic data from the catalog (which was of course in more or less proper AACR2 style) to match the vocabularies available in the repository. The problem was that the data I had wasn’t consistent–there were multiple types of abbreviations, extra spaces, extra punctuation, and outright misspellings. An example is the History Department. I can look at “Department of History”, “Dep. of History”, “Dep of Hist.” and tell these are probably all referring to the same thing, but it’s difficult to predict those potential spellings. While I could deal with much of this with regular expressions in a text editor and find and replace in Excel, I kept running into additional problems that I couldn’t spot until I got an error. It took several attempts of loading the data until I cleared out all the errors.

In OpenRefine this is a much simpler task, since you can use it to find everything that probably is the same thing despite the slight differences in spelling, punctuation and spelling. So rather than trying to write a regular expression that accounts for all the differences between “Department of History”, “Dep. of History”, “Dep of Hist.”, you can find all the clusters of text that include those elements and change them all in one shot to “History”. I will have more detailed instructions on how to do this below.

Installation and Basics

OpenRefine was called, until last October, Google Refine, and while the content from the Google Refine page is being moved to the Open Refine page you should plan to look at both sites. Documentation and video tutorials refer interchangeably to Google Refine and OpenRefine. The official and current documentation is on the OpenRefine GitHub wiki. For specific questions you will probably want to use the OpenRefine Custom Search Engine, which brings together all the mix of documentation and tutorials on the web. OpenRefine is a web app that runs on your computer, so you don’t need an internet connection to run it. You can get the installation instructions on this page.

While you can jump in right away and get started playing around, it is well worth your time to watch the tutorial videos, which will cover the basic actions you need to take to start working with data. As I said, the learning curve is low, but not all of the commands will make sense until you see them in action. These videos will also give you an idea of what you might be able to do with a data set you have lying around. You may also want to browse the “recipes” on the OpenRefine site, as well search online for additional interesting things people have done. You will probably think of more ideas about what to try. The most important thing to know about OpenRefine is that you can undo anything, and go back to the beginning of the project before you messed up.

A basic understanding of the Google Refine Expression Language, or GREL will improve your ability to work with data. There isn’t a whole lot of detailed documentation, so you should feel free to experiment and see what happens when you try different functions. You will see from the tutorial videos the basics you need to know. Another essential tool is regular expressions. So much of the data you will be starting with is structured data (even if it’s not perfectly structured) that you will need to turn into something else. Regular expressions help you find patterns which you can use to break apart strings into something else. Spending a few minutes understanding regular expression syntax will save hours of inefficient find and replace. There are many tutorials–my go-to source is this one. The good news for librarians is that if you can construct a Dewey Decimal call number, you can construct a regular expression!

Some ideas for librarians

 

(A) Typos

Above I described how you would use OpenRefine to clean up messy and inconsistent catalog data. Here’s how to do it. Load in the data, and select “Text Facet” on the column in question. OpenRefine will show clusters of text that is similar and probably the same thing.

AcademicDept Text Facet

AcademicDept Text Facet

 

Click on Cluster to get a menu for working with multiple values. You can click on the “Merge” check box and then edit the text to whatever you need it to be. You can also edit each text cluster to be the correct text.

Cluster and Edit

Cluster and Edit

You can merge and re-cluster until you have fixed all the typos. Back on the first Text Facet, you can hover over any value to edit it. That way even if the automatic clustering misses some you can edit the errors, or change anything that is the same but you need to look different–for instance, change “Dept. of English” to just “English”.

(B) Bibliographies

The main thing that I have used OpenRefine for in my daily work is to change a bibliography in plain text into columns in a spreadsheet that I can run against an API. This was inspired by this article in the Code4Lib Journal: “Using XSLT and Google Scripts to Streamline Populating an Institutional Repository” by Stephen X. Flynn, Catalina Oyler, and Marsha Miles. I wanted to find a way to turn a text CV into something that would work with the SHERPA/RoMEO API, so that I could find out which past faculty publications could be posted in the institutional repository. Since CVs are lists of data presented in a structured format but with some inconsistencies, OpenRefine makes it very easy to present the data in a certain way as well as remove the inconsistencies, and then to extend the data with a web service. This is a very basic set of instructions for how to accomplish this.

The main thing to accomplish is to put the journal title in its own column. Here’s an example citation in APA format, in which I’ve colored all the “separator” punctuation in red:

Heller, M. (2011). A Review of “Strategic Planning for Social Media in Libraries”. Journal of Electronic Resources Librarianship, 24 (4), 339-240)

From the drop-down menu at the top of the column click on “Split into several columns…” from the “Edit Column” menu. You will get a menu like the one below. This example finds the opening parenthesis and removes that in creating a new column. The author’s name is its own column, and the rest of the text is in another column.

Spit into columns

 

The rest of the column works the same way–find the next text, punctuation, or spacing that indicates a separation. You can then rename the column to be something that makes sense. In the end, you will end up with something like this:

Split columns

When you have the journal titles separate, you may want to cluster the text and make sure that the journals have consistent titles or anything else to clean up the titles. Now you are a ready to build on this data with fetching data from a web service. The third video tutorial posted above will explain the basic idea, and this tutorial is also helpful. Use the pull-down menu at the top of the journal column to select “Edit column” and then “Add column by fetching URLs…”. You will get a box that will help you construct the right URL. You need to format your URL in the way required by SHERPA/RoMEO, and will need a free API key. For the purposes of this example, you can use 'http://www.sherpa.ac.uk/romeo/api29.php?ak=[YOUR API KEY HERE]&qtype=starts&jtitle=' + escape(value,'url'). Note that it will give you a preview to see if the URL is formatted in the way you expect. Give your column a name, and set the Throttle delay, which will keep the service from rejecting too many requests in a short time. I found 1000 worked fine.

refine7

After this runs, you will get a new column with the XML returned by SHERPA/RoMEO. You can use this to pull out anything you need, but for this example I want to get pre-archiving and post-archiving policies, as well as the conditions. A quick way to to this is to use the Googe Refine Expression Language parseHtml function. To use this, click on “Add column based on this column” from the “Edit Column” menu, and you will get a menu to fill in an expression.

refine91

In this example I use the code value.parseHtml().select("prearchiving")[0].htmlText(), which selects just the text from within the prearchving element. Conditions are a little different, since there are multiple conditions for each journal. In that case, you would use the following syntax (after join you can put whatever separator you want): forEach(value.parseHtml().select("condition"),v,v.htmlText()).join(". ")"

So in the end, you will end up with a neatly structured spreadsheet from your original CV with all the bibliographic information in its own column and the publisher conditions listed. You can imagine the possibilities for additional APIs to use–for instance, the WorldCat API could help you determine which faculty published books the library owns.

Once you find a set of actions that gets your desired result, you can save them for the future or to share with others. Click on Undo/Redo and then the Extract option. You will get a description of the actions you took, plus those actions represented in JSON.

refine13

Unselect the checkboxes next to any mistakes you made, and then copy and paste the text somewhere you can find it again. I have the full JSON for the example above in a Gist here. Make sure that if you save your JSON publicly you remove your personal API key! When you want to run the same recipe in the future, click on the Undo/Redo tab and then choose Apply. It will run through the steps for you. Note that if you have a mistake in your data you won’t catch it until it’s all finished, so make sure that you check the formatting of the data before running this script.

Learning More and Giving Back

Hopefully this quick tutorial got you excited about OpenRefine and thinking about what you can do. I encourage you to read through the list of External Resources to get additional ideas, some of which are library related. There is lots more to learn and lots of recipes you can create to share with the library community.

Have you used OpenRefine? Share how you’ve used it, and post your recipes.