Fear No Longer Regular Expressions

Regex, it’s your friend

You may have heard the term, “regular expressions” before. If you have, you will know that it usually comes in a notation that is quite hard to make out like this:

(?=^[0-5\- ]+$)(?!.*0123)\d{3}-\d{3,4}-\d{4}

Despite its appearance, regular expressions (regex) is an extremely useful tool to clean up and/or manipulate textual data.  I will show you an example that is easy to understand. Don’t worry if you can’t make sense of the regex above. We can talk about the crazy metacharacters and other confusing regex notations later. But hopefully, this example will help you appreciate the power of regex and give you some ideas of how to make use of regex to make your everyday library work easier.

What regex can do for you – an example

I looked for the zipcodes for all cities in Miami-Dade County and found a very nice PDF online (http://www.miamifocusgroup.com/usertpl/1vg116-three-column/miami-dade-zip-codes-and-map.pdf). But when I copy and paste the text from the PDF file to my text editor (Sublime), the format immediately goes haywire. The county name, ‘Miami-Dade,’ which should be in one line becomes three lines, each of which lists Miami, -, Dade.

Ugh, right? I do not like this one bit.  So let’s fix this using regular expressions.

(**Click the images to bring up the larger version.)

Screen Shot 2013-07-24 at 1.43.19 PM

Screen Shot 2013-07-24 at 1.43.39 PM

Like many text editors, Sublime offers the find/replace with regex feature. This is a much powerful tool than the usual find/replace in MS Word, for example, because it allows you to match many complex cases that fit a pattern. Regular expressions lets you specify that pattern.

In this example, I capture the three lines each saying Miami,-,Dade with this regex:

Miami\n-\nDade.

When I enter this into the ‘Find What’ field, Sublime starts highlighting all the matches.  I am sure you already guessed that \n means a new line. Now let me enter Miami-Dade in the ‘Replace With’ field and hit ‘Replace All.’

Screen Shot 2013-07-24 at 2.11.43 PM

As you can see below, things are much better now. But I  want each set of three lines – Miami-Dade, zipcode, and city – to be one line and each element to be separated by comma and a space such as ‘Miami-Dade, 33010, Hialeah’. So let’s do some more magic with regex.

Screen Shot 2013-07-24 at 2.18.17 PM

How do I describe the pattern of three lines – Miami-Dade, zipcode, and city? When I look at the PDF, I notice that the zipcode is a 5 digit number and the city name consists of alphabet characters and space. I don’t see any hypen or comma in the city name in this particular case. And the first line is always ‘Miami-Dade.” So the following regular expression captures this pattern.

Miami-Dade\n\d{5}\n[A-Za-z ]+

Can you guess what this means? You already know that \n means a new line. \d{5} means a 5 digit number. So it will match 33013, 33149, 98765, or any number that consists of five digits.  [A-Za-z ] means any alphabet character either in upper or lower case or space (N.B. Note the space at the end right before ‘]’).

Anything that goes inside [ ] is one character. Just like \d is one digit. So I need to specify how many of the characters are to be matched. if I put {5}, as I did in \d{5}, it will only match a city name that has five characters like ‘Miami,’ The pattern should match any length of city name as long as it is not zero. The + sign does that. [A-Za-z ]+ means that any alphabet character either in upper or lower case or space should appear at least or more than once. (N.B. * and ? are also quantifiers like +. See the metacharacter table below to find out what they do.)

Now I hit the “Find” button, and we can see the pattern worked as I intended. Hurrah!

Screen Shot 2013-07-24 at 2.24.47 PM

Now, let’s make these three lines one line each. One great thing about regex is that you can refer back to matched items. This is really useful for text manipulation. But in order to use the backreference feature in regex, we have to group the items with parentheses. So let’s change our regex to something like this:

(Miami-Dade)\n\(d{5})\n([A-Za-z ]+)

This regex shows three groups separated by a new line (\n). You will see that Sublime still matches the same three line sets in the text file. But now that we have grouped the units we want – county name, zipcode, and city name – we can refer back to them in the ‘Replace With’ field. There were three units, and each unit can be referred by backslash and the order of appearance. So the county name is \1, zipcode is \2, and the city name is \3. Since we want them to be all in one line and separated by a comma and a space, the following expression will work for our purpose. (N.B. Usually you can have up to nine backreferences in total from \1 to\9. So if you want to backreference the later group, you can opt not to create a backreference from a group by using (?: ) instead of (). )

\1, \2, \3

Do a few Replaces and then if satisfied, hit ‘Replace All’.

Ta-da! It’s magic.

Screen Shot 2013-07-24 at 2.54.13 PM

Regex Metacharacters

Regex notations look a bit funky. But it’s worth learning them since they enable you to specify a general pattern that can match many different cases that you cannot catch without the regular expression.

We have already learned the four regex metacharacters: \n, \d, { }, (). Not surprisingly, there are many more beyond these. Below is a pretty extensive list of regex metacharacters, which I borrowed from the regex tutorial here: http://www.hscripts.com/tutorials/regular-expression/metacharacter-list.php . I also highly recommend this one-page Regex cheat sheet from MIT (http://web.mit.edu/hackl/www/lab/turkshop/slides/regex-cheatsheet.pdf).

Note that \w will match not only a alphabetical character but also an underscore and a number. For example, \w+ matches Little999Prince_1892. Also remember that a small number of regular expression notations can vary depending on what programming language you use such as Perl, JavaScript, PHP, Ruby, or Python.

Metacharacter Description
\ Specifies the next character as either a special character, a literal, a back reference, or an octal escape.
^ Matches the position at the beginning of the input string.
$ Matches the position at the end of the input string.
* Matches the preceding subexpression zero or more times.
+ Matches the preceding subexpression one or more times.
? Matches the preceding subexpression zero or one time.
{n} Matches exactly n times, where n is a non-negative integer.
{n,} Matches at least n times, n is a non-negative integer.
{n,m} Matches at least n and at most m times, where m and n are non-negative integers and n <= m.
. Matches any single character except “\n”.
[xyz] A character set. Matches any one of the enclosed characters.
x|y Matches either x or y.
[^xyz] A negative character set. Matches any character not enclosed.
[a-z] A range of characters. Matches any character in the specified range.
[^a-z] A negative range characters. Matches any character not in the specified range.
\b Matches a word boundary, that is, the position between a word and a space.
\B Matches a nonword boundary. ‘er\B’ matches the ‘er’ in “verb” but not the ‘er’ in “never”.
\d Matches a digit character.
\D Matches a non-digit character.
\f Matches a form-feed character.
\n Matches a newline character.
\r Matches a carriage return character.
\s Matches any whitespace character including space, tab, form-feed, etc.
\S Matches any non-whitespace character.
\t Matches a tab character.
\v Matches a vertical tab character.
\w Matches any word character including underscore.
\W Matches any non-word character.
\un Matches n, where n is a Unicode character expressed as four hexadecimal digits. For example, \u00A9 matches the copyright symbol
Matching modes

You also need to know about the Regex matching modes. In order to use these modes, you write your regex as shown above, and then at the end you add one or more of these modes. Note that in text editors, these options often appear as checkboxes and may apply without you doing anything by default.

For example, [d]\w+[g] will match only the three lower case words in ding DONG dang DING dong DANG. On the other hand, [d]\w+[g]/i will match all six words whether they are in the upper or the lower case.

Look-ahead and Look-behind

There are also the ‘look-ahead’ and the ‘look-behind’ features in regular expressions. These often cause confusion and are considered to be a tricky part of regex. So, let me show you a simple example of how it can be used.

Below are several lines of a person’s last name, first name, middle name, separated by his or her department name. You can see that this is a snippet from a csv file. The problem is that a value in one field – the department name- also includes a comma, which is supposed to appear only between different fields not inside a field. So the comma becomes an unreliable separator. One way to solve this issue is to convert this csv file into a tab limited file, that is, using a tab instead of a comma as a field separater. That means that I need to replace all commas with tabs ‘except those commas that appear inside a department field.’

How do I achieve that? Luckily, the commas inside the department field value are all followed by a space character whereas the separator commas in between different fields are not so. Using the negative look-ahead regex, I can successfully specify the pattern of a comma that is not followed by (?!) a space \s.

,(?!\s)

Below, you can see that this regex matches all commas except those that are followed by a space.

lookbehind

For another example, the positive look-ahead regex, Ham(?=burg), will match ‘Ham‘ in Hamburg when it is applied to the text:  Hamilton, Hamburg, Hamlet, Hammock.

Below are the complete look-ahead and look-behind notations both positive and negative.

  • (?=pattern)is a positive look-ahead assertion
  • (?!pattern)is a negative look-ahead assertion
  • (?<=pattern)is a positive look-behind assertion
  • (?<!pattern)is a negative look-behind assertion

Can you think of any example where you can successfully apply a look-behind regular expression? (No? Then check out this page for more examples: http://www.rexegg.com/regex-lookarounds.html)

Now that we have covered even the look-ahead and the look-behind, you should be ready to tackle the very first crazy-looking regex that I introduced in the beginning of this post.

(?=^[0-5\- ]+$)(?!.*0123)\d{3}-\d{3,4}-\d{4}

Tell me what this will match! Post in the comment below and be proud of yourself.

More tools and resources for practicing regular expressions

There are many tools and resources out there that can help you practice regular expressions. Text editors such as EditPad Pro (Windows), Sublime, TextWrangler (Mac OS), Vi, EMacs all provide regex support. Wikipedia (https://en.wikipedia.org/wiki/Comparison_of_text_editors#Basic_features) offers a useful comparison chart of many text editors you can refer to. RegexPal.com is a convenient online Javascript Regex tester. FireFox also has Regular Expressions add-on (https://addons.mozilla.org/en-US/firefox/addon/rext/).

For more tools and resources, check out “Regular Expressions: 30 Useful Tools and Resources” http://www.hongkiat.com/blog/regular-expression-tools-resources/.

Library problems you can solve with regex

The best way to learn regex is to start using it right away every time you run into a problem that can be solved faster with regex. What library problem can you solve with regular expressions? What problem did you solve with regular expressions? I use regex often to clean up or manipulate large data. Suppose you have 500 links and you need to add either EZproxy suffix or prefix to each. With regex, you can get this done in a matter of a minute.

To give you an idea, I will wrap up this post with some regex use cases several librarians generously shared with me. (Big thanks to the librarians who shared their regex use cases through Twitter! )

  • Some ebook vendors don’t alert you to new (or removed!) books in their collections but do have on their website a big A-Z list of all of their titles. For each such vendor, each month, I run a script that downloads that page’s HTML, and uses a regex to identify the lines that have ebook links in them. It uses another regex to extract the useful data from those lines, like URL and Title. I compare the resulting spreadsheet against last month’s (using a tool like diff or vimdiff) to discover what has changed, and modify the catalog accordingly. (@zemkat)
  • Sometimes when I am cross-walking data into a MARC record, I find fields that includes irregular spacing that may have been used for alignment in the old setting but just looks weird in the new one. I use a regex to convert instances of more than two spaces into just two spaces. (@zemkat)
  • Recently while loading e-resource records for government documents, we noted that many of them were items that we had duplicated in fiche: the ones with a call number of the pattern “H, S, or J, followed directly by four digits”. We are identifying those duplicates using a regex for that pattern, and making holdings for those at the same time. (@zemkat)
  • I used regex as part of crosswalking metadata schemas in XML. I changed scientific OME-XML into MODS + METS to include research images into the library catalog.  (@KristinBriney)
  • Parsing MARC just has to be one. Simple things like getting the GMD out of a 245 field require an expression like this: |h\[(.*)\]  MARCedit supports regex search and replace, which catalogers can use. (@phette23)
  • I used regex to adjust the printed label in Millennium depending on several factors, including the material type in the MARC record. (@brianslone )
  • I strip out non-Unicode characters from the transformed finding aids daily with  regex and  replace them with Unicode equivalents. (@bryjbrown)

 

More APIs: writing your own code (2)

My last post “The simpest AJAX: writing your own code (1)” discussed a few Javascript and JQuery examples that make the use of the Flickr API. In this post, I try out APIs from providers other than Flickr. The examples will look plain to you since I didn’t add any CSS to dress them up. But remember that the focus here is not in presentation but in getting the data out and re-displaying it on your own. Once you get comfortable with this process, you can start thinking about a creative and useful way in which you can present and mash up the same data. We will go through 5 examples I created with three different APIs.  Before taking a look at the codes, check out the results below first.

 

I. Pinboard API

The first example is Pinboard. Many libraries moved their bookmarks in Del.icio.us to a different site when there was a rumor that Del.cio.us may be shut down by Yahoo. One of those sites were Pinboard. By getting your bookmark feeds from Pinboard and manipulating them, you can easily present a subset of your bookmark as part of your website.

(a) Display bookmarks in Pinboard using its API

The following page uses JQuery to access the JSONP feed of my public bookmarks in Pinboard. $.ajax() method is invoked on line 13. Line 15, jsonp:”cb”, gives the name to a callback function that will wrap the JSON feed data in it. Note line 18 where I print out data received into the console. This way, you can check if you are receiving JSONP feed in the console of Firebug. Line 19-22 uses $.each() function to access each element in the JSONP feed and the .append() method to add each bookmark’s title and url to the “pinboard” div. JQuery documentation has detailed explanation and examples for its functions and methods. So make sure to check it out if you have any questions about a JQuery function or method.

Pinboard API – example 1

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Pinboard: JSONP-AJAX Example</title>
<script type="text/javascript" src="http://jqueryjs.googlecode.com/files/jquery-1.3.2.min.js"></script>
</head>
<body>
<p> This page takes the JSON feed from <a href="http://pinboard.in/u:bohyunkim/">my public links in Pinboard</a> and re-presents them here. See <a href="http://feeds.pinboard.in/json/u:bohyunkim/">the Pinboard's JSON feed of my links</a>.</p>
<div id="pinboard"><h1>All my links in Pinboard</h1></div>
<script>
$(document).ready(function(){	
$.ajax({
  url:'http://feeds.pinboard.in/json/u:bohyunkim',
  jsonp:"cb",
  dataType:'jsonp',
  success: function(data) {
  	console.log(data); //dumps the data to the console to check if the callback is made successfully
    $.each(data, function(index, item){
      $('#pinboard').append('<div><a href="' + item.u + '">' + item.d
+ '</a></div>');
      }); //each
    } //success
  }); //ajax

});//ready
</script>
</body>
</html>

Here is the screenshot of the page. I opened up the Console window of the Firebug (since I dumped the received in line 18) and you can see the incoming data here. (Note. Click the images to see the large version.)

But it is much more convenient to see the organization and hierarchy of the JSONP feed in the Net panel of Firebug.

And each element of the JSONP feed can be drilled down for further details by clicking the object in each row.

(b) Display only a certain number of bookmarks

Now, let’s display only a certain number of bookmarks. In order to do this, one more line is needed. Line 9 checks the position of each element and breaks the loop when the 5th element is processed.

Pinboard API – example 2

$.ajax({
  url:'http://feeds.pinboard.in/json/u:bohyunkim',
  jsonp:"cb",
  dataType:'jsonp',
  success: function(data) {
    $.each(data, function(index, item){
      	$('#pinboard').append('<div><a href="' + item.u + '">' + item.d
+ '</a></div>');
    	if (index == 4) return false; //only display 5 items
      }); //each
    } //success
  }); //ajax

(c) Display bookmarks with a certain tag

Often libraries want to display bookmarks with a particular tag. Here I add a line using JQuery method $.inArray() to display only bookmarks tagged with ‘fiu.’ $.inArray()  method takes value and array as parameters and returns 0 if the value is found in the array otherwise -1. Line 7 checks if the tag array of a bookmark (item.t) does include ‘fiu,’ and only in such case displays the bookmark. As a result, only the bookmarks with the tag ‘fiu’ are shown in the page.

Pinboard API – example 3

$.ajax({
  url:'http://feeds.pinboard.in/json/u:bohyunkim',
  jsonp:"cb",
  dataType:'jsonp',
  success: function(data) {
    $.each(data, function(index, item){
    	if ($.inArray("fiu", item.t)!==-1) // if the tag is 'fiu'
      		$('#pinboard').append('<div><a href="' + item.u + '">' + item.d
+ '</a></div>');
	}); //each
    } //success
  }); //ajax
II. Reddit API

My second example uses Reddit API. Reddit is a site where people comment on news items of interest. Here I used $.getJSON() instead of $.ajax() in order to process the JSONP feed from the Science section of Reddit. In the case of Pinboard API, I could not find out a way to construct a link that includes a call back function in the url. Some of the parameters had to be specified such as jsonp:”cb”, dataType:’jsonp’. For this reason, I needed to use $.ajax() function. On the other hand, in Reddit, getting the JSONP feed url was straightforward: http://www.reddit.com/r/science/.json?jsonp=cb.

Line 19 adds a title of the page. Line 20-22 extracts the title and link to the news article that is being commented and displays it. Under the news item, the link to the comments for that article in Reddit is added as a bullet item. You can see that, in Line 17 and 18, I have used the console to check if I get the right data and targeting the element I want and then commented out later.

This is just an example, and for that reason, the result is a rather simplified version of the original Reddit page with less information. But as long as you are comfortable accessing and manipulating data at different levels of the JSONP feed sent from an API, you can slice and dice the data in a way that suits your purpose best. So in order to make a clever mash-up, not only the technical coding skills but also your creative ideas of what different sets of data and information to connect and present to offer something new that has not been available or apparent before.

Reddit API – example

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Reddit-Science: JSONP-AJAX Example</title>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.0/jquery.min.js"></script>
</head>

<body>
<p> This page takes the JSONP feed from <a href="http://www.reddit.com/r/science/">Reddit's Science section</a> and presents the link to the original article and the comments in Reddit to the article as a pair. See <a href="http://www.reddit.com/r/science/.json?jsonp=?">JSONP feed</a> from Reddit.</p>

<div id="feed"> </div>
<script type="text/javascript">
	//run function to parse json response, grab title, link, and media values, and then place in html tags
$(document).ready(function(){		
	$.getJSON('http://www.reddit.com/r/science/.json?jsonp=?', function(rd){
		//console.log(rd);
		//console.log(rd.data.children[0].data.title);
		$('#feed').html('<h1>*Subredditum Scientiae*</h1>');
		$.each(rd.data.children, function(k, v){
      		$('#feed').append('<div><p>'+(k+1)+': <a href="' + v.data.url + '">' + v.data.title+'</a></p><ul><li style="font-variant:small-caps;font-size:small"><a href="http://www.reddit.com'+v.data.permalink+'">Comments from Reddit</a></li></ul></div>');
      }); //each
	}); //getJSON
});//ready	
</script>

</body>
</html>

The structure of a JSON feed can be confusing to make out particularly. So make sure to use the Firebug Net window to figure out the organization of the feed content and the property name for the value you want.

But what if the site from which you would like to get data doesn’t offer JSONP feed? Fortunately you can convert any RSS or XML feed into JSONP feed. Let’s take a look!

III. PubMed Feed with Yahoo Pipes API

Consider this PubMed search. This is simple search that looks for items in PubMed that has to do with Florida International University College of Medicine where I work. You may want to access the data feed of this search result, manipulate, and display in your library website. So far, we have performed a similar task with the Pinboard and the Reddit API using JQuery. But unfortunately PubMed does not offer any JSON feed. We only get RSS feed instead from PubMed.

This is OK, however. You can either manipulate the RSS feed directly or convert the RSS feed into JSON, which you are more familiar with now. Yahoo Pipes is a handy tool for just that purpose. You can do the following tasks with Yahoo Pipes:

  • combine many feeds into one, then sort, filter and translate it.
  • geocode your favorite feeds and browse the items on an interactive map.
  • power widgets/badges on your web site.
  • grab the output of any Pipes as RSS, JSON, KML, and other formats.

Furthermore, there may be a pipe that has been already created for exactly what you want to do by someone else. PubMed is a popular resource. As I expected, I found a pipe for PubMed search. I tested, copied the pipe, and changed the search term. Here is the screenshot of my Yahoo Pipe.

If you want to change the pipe, you can click “View Source” and make further changes. Here I just changed the search terms and saved the pipe.

After that, you want to get the results of the Pipe as JSON. If you hover over the “Get as JSON” link in the first screenshot above, you will get a link: http://pipes.yahoo.com/pipes/pipe.run?_id=e176c4da7ae8574bfa5c452f9bb0da92&_render=json&limit=100&term=”Florida International University” and “College of Medicine” But this returns JSON, not JSONP.

In order to get that JSON feed wrapped into a callback function, you need to add this bit, &_callback=myCallback, at the end of the url: http://pipes.yahoo.com/pipes/pipe.run?_id=e176c4da7ae8574bfa5c452f9bb0da92&_render=json&limit=10&term=%22Florida+International+University%22+and+%22College+of+Medicine%22&_callback=myCallback. Now the JSON feed appears wrapped in a callback function like this: myCallback( ). See the difference?

Line 25 enables you to bring in this JSONP feed and invokes the callback function named “myCallback.” Line 14-23 defines this callback function to process the received feed. Line 18-20 takes the JSON data received at the level of data.value. item, and prints out each item’s title (item.title) with a link (item.link). Here I am giving a number for each item by (index+1). If you don’t put +1, the index will begin from 0 instead of 1. Line 21 stops the process when the processed item reaches 5 in number.

Yahoo Pipes API/PubMed – example

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>PubMed and Yahoo Pipes: JSONP-AJAX Example</title>
<script type="text/javascript" src="http://jqueryjs.googlecode.com/files/jquery-1.3.2.min.js"></script>
</head>
<body>
<p> This page takes the JSONP feed from <a href="http://pipes.yahoo.com/pipes/pipe.info?_id=e176c4da7ae8574bfa5c452f9bb0da92"> a Yahoo Pipe</a>, which creates JSONP feed out of a PubMed search results and re-presents them here. 
<br/>See <a href="http://pipes.yahoo.com/pipes/pipe.run?_id=e176c4da7ae8574bfa5c452f9bb0da92&_render=json&limit=10&term=%22Florida+International+University%22+and+%22College+of+Medicine%22&_callback=myCallback">the Yahoo Pipe's JSON feed</a> and <a href="http://www.ncbi.nlm.nih.gov/pubmed?term=%22Florida%20International%20University%22%20and%20%22College%20of%20Medicine%22">the original search results in PubMed</a>.</p>
<div id="feed"></div>
<script type="text/javascript">
	//run function to parse json response, grab title, link, and media values - place in html tags
	function myCallback(data) {
		//console.log(data);
		//console.log(data.count);
		$("#feed").html('<h1>The most recent 5 publications from <br/>Florida International University College of Medicine</h1><h2>Results from PubMed</h2>');
		$.each(data.value.items, function(index, item){
      		$('#feed').append('<p>'+(index+1)+': <a href="' + item.link + '">' + item.title
+ '</a></p>');
        if (index == 4) return false; //display the most recent five items
      }); //each
	} //function
	</script>
<script type="text/javascript" src="http://pipes.yahoo.com/pipes/pipe.run?_id=e176c4da7ae8574bfa5c452f9bb0da92&_render=json&limit=10&term=%22Florida+International+University%22+and+%22College+of+Medicine%22&_callback=myCallback"></script>
</body>
</html>

Do you feel more comfortable now with APIs? With a little bit of JQuery and JSON, I was able to make a good use of third-party APIs. Next time, I will try the Worldcat Search API, which is closer to the library world and see how that works.

 

The simplest AJAX: writing your own code (1)

It has been 8 months since the Code Year project started. Back in January, I have provided some tips. Now I want to check in to see if how well you have been following along. Falling behind? You are not alone. Luckily there are still 3-4 months left.

Teaching oneself how to code is not easy. One of the many challenges is keeping at it on a regular basis. Both at home and at work, there seems to be always a dozen things higher in priority than code lessons. Another problem is that often we start a learning project by reading a book with some chosen examples. The Code Year project is somewhat better since it provides interactive tutorials. But at the end of many tutorials, you may have experienced the nagging feeling of doubt about whether you can now go out to the real world and make something that works. Have you done any real-time project yet?

If you are like me, the biggest obstacle in starting your own first small coding project is not so much the lack of knowledge as the fantasy that you still have yet more to learn before trying any such real-life-ish project. I call this ‘fantasy’ because there is never such a time when you are in full possession of knowledge before jumping into a project. In most cases, you discover what you need to learn only after you start a project and run into a problem that you need to solve.

So for this blog post, I tried building something very small. During the process, I had to fight constantly with the feeling that I should go back to the Code Year Project and take those comforting lessons in Javascript and JQeury that I didn’t have time to work through yet. But I also knew that I would be so much more motivated to keep learning if I can just see myself making something on my own. I decided to try some very simple AJAX stuff and started by looking at two examples on the Web.  Here I will share those examples and my review process that enabled me to write my own bit of code. After looking at these, I was able to use different APIs to get the same result. My explanation below is intentionally detailed for beginners. But if you can understand the examples without my line-by-line explanation, feel free to skip and go directly to the last section where the challenge is.  For what would your AJAX skill be useful?  There are a lot of useful data in the cloud. Using AJAX, you can dynamically display your library’s photos stored in Flickr in your library’s website or generate a custom bibliography on the fly using the tags in Pinboard or MESH (Medical Subject Heading) and other filters in PubMed. You can mash up data feeds from multiple providers and create something completely new and interesting such as HealthMap, iSpiecies, and Housing Maps.

Warm-up 1: Jason’s Flickr API example

I found this example, “Flickr API – Display Photos (JSON)” quite useful. This example is at Jason Clark’s website. Jason has many cool code examples and working programs under the Code & Files page. You can see the source of the whole HTML page here . But let’s see the JS part below.

<script type="text/javascript">
//run function to parse json response, grab title, link, and media values - place in html tags
function jsonFlickrFeed(fr) {
    var container = document.getElementById("feed");
    var markup = '<h1>' + '<a href="' + fr.link+ '">' + fr.title + '</a>'+ '</h1>';
    for (var i = 0; i < fr.items.length; i++) {
    markup += '<a title="' + fr.items[i].title + '" href="' + fr.items[i].link + '"><img src="' + fr.items[i].media.m + '" alt="' + fr.items[i].title + '"></a>';
}
container.innerHTML = markup;
}
</script>
<script type="text/javascript" src="http://api.flickr.com/services/feeds/photos_public.gne?tags=cil2009&format=json">
</script>

After spending a few minutes looking at the source of the page, you can figure out the following:

  • Line 12 imports data formatted in JSON from Flickr, and the JSON data is wrapped in a JS function called jsonFlickrFeed. You can find these data source urls in API documentation usually. But many API documentations are often hard to decipher. In this case, this MashupGuide page by Raymond Yee was quite helpful.
  • Line 3-8 are defining the jsonFlickrFeed function that processes the JSON data.

You can think of JSON as a JS object or an associative array of them. Can you also figure out what is going on inside the jsonFlickrFeed function? Let’s go through it line by line.

  • Line 4 creates a variable, container, and sets it to the empty div given the id of the “feed.”
  • Line 5 creates another variable, markup, which will include a link and a title of “fr,” which is an arbitrary name that refers to the JSON data thrown inside the jsonFlickrFeed fucntion.
  • Line 6-8 are a for-loop that goes through every object in the items array and extracts its title and link as well as the image source link and title. The loop also adds the resulting HTML string to the markup variable.
  • Line 9 assigns the content of the markup variable as the value of the HTML content of the variable, container. Since the empty div with the “feed” id was assigned to the variable container, now the feed div has the content of var markup as its HTML content.

So these two JS snippets take an empty div like this:

<div id="feed"></div>

Then they dynamically generate the content inside with the source data from Flickr following some minimal presentation specified in the JS itself. Below is the dynamically generated content for the feed div. The result like this.

<div id="feed">
<h1>
<a href="http://www.flickr.com/photos/tags/cil2009/">Recent Uploads tagged cil2009</a>
</h1>
<a href="http://www.flickr.com/photos/matthew_francis/3458100856/" title="Waiting at Vienna metro (cropped)">
<img alt="Waiting at Vienna metro (cropped)" src="http://farm4.staticflickr.com/3608/3458100856_d01b26cf1b_m.jpg">
</a>
<a href="http://www.flickr.com/photos/libraryman/3448484629/" title="Laptop right before CIL2009 session">
<img alt="Laptop right before CIL2009 session" src="http://farm4.staticflickr.com/3389/3448484629_9874f4ab92_m.jpg">
</a>
<a href="http://www.flickr.com/photos/christajoy42/4814625142/" title="Computers in Libraries 2009">
<img alt="Computers in Libraries 2009" src="http://farm5.staticflickr.com/4082/4814625142_f9d9f90118_m.jpg">
</a>
<a href="http://www.flickr.com/photos/librarianinblack/3613111168/" title="David Lee King">
<img alt="David Lee King" src="http://farm4.staticflickr.com/3354/3613111168_02299f2b53_m.jpg">
</a>
<a href="http://www.flickr.com/photos/librarianinblack/3613111084/" title="Aaron Schmidt">
<img alt="Aaron Schmidt" src="http://farm4.staticflickr.com/3331/3613111084_b5ba9e70bd_m.jpg">
</a>
<a href="http://www.flickr.com/photos/librarianinblack/3612296027/" block"="" libraries"="" in="" computers="" title="The Kids on the ">
<img block"="" libraries"="" in="" computers="" alt="The Kids on the " src="http://farm3.staticflickr.com/2426/3612296027_6f4043077d_m.jpg">
</a>
<a href="http://www.flickr.com/photos/pegasuslibrarian/3460426841/" title="Dave and Greg look down at CarpetCon">
<img alt="Dave and Greg look down at CarpetCon" src="http://farm4.staticflickr.com/3576/3460426841_ef2e57ab49_m.jpg">
</a>
<a href="http://www.flickr.com/photos/pegasuslibrarian/3460425549/" title="Jason and Krista at CarpetCon">
<img alt="Jason and Krista at CarpetCon" src="http://farm4.staticflickr.com/3600/3460425549_55443c5ddb_m.jpg">
</a>
<a href="http://www.flickr.com/photos/pegasuslibrarian/3460422979/" title="Lunch with Dave, Laura, and Matt">
<img alt="Lunch with Dave, Laura, and Matt" src="http://farm4.staticflickr.com/3530/3460422979_96c020a440_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3436564507/" title="IMG_0532">
<img alt="IMG_0532" src="http://farm4.staticflickr.com/3556/3436564507_551c7c5c0d_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3436566975/" title="IMG_0529">
<img alt="IMG_0529" src="http://farm4.staticflickr.com/3328/3436566975_c8bfe9b081_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3436556645/" title="IMG_0518">
<img alt="IMG_0518" src="http://farm4.staticflickr.com/3579/3436556645_9b01df7f93_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3436569429/" title="IMG_0530">
<img alt="IMG_0530" src="http://farm4.staticflickr.com/3371/3436569429_92d0797719_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3436558817/" title="IMG_0524">
<img alt="IMG_0524" src="http://farm4.staticflickr.com/3331/3436558817_3ff88a60be_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3437361826/" title="IMG_0521">
<img alt="IMG_0521" src="http://farm4.staticflickr.com/3371/3437361826_29a38e0609_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3437356988/" title="IMG_0516">
<img alt="IMG_0516" src="http://farm4.staticflickr.com/3298/3437356988_5aaa94452c_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3437369906/" title="IMG_0528">
<img alt="IMG_0528" src="http://farm4.staticflickr.com/3315/3437369906_01015ce018_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3436560613/" title="IMG_0526">
<img alt="IMG_0526" src="http://farm4.staticflickr.com/3579/3436560613_98775afc79_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3437359398/" title="IMG_0517">
<img alt="IMG_0517" src="http://farm4.staticflickr.com/3131/3437359398_7e339cf161_m.jpg">
</a>
<a href="http://www.flickr.com/photos/jezmynne/3436535739/" title="IMG_0506">
<img alt="IMG_0506" src="http://farm4.staticflickr.com/3646/3436535739_c164062d6b_m.jpg">
</a>
</div>

Strictly speaking, Flickr is returning data in JSONP rather than JSON here. You will see what JSONP means in a little bit. But for now, don’t worry about that distinction. What is cool is that you can grab the data from a third party like Flickr and then you can remix and represent them in your own page.

Warm-up 2: Doing the same with JQuery using $.getJSON()

Since I had figured out how to display data from Flickr using Javascript (thanks to Jason’s code example), the next I wanted to try was to do the same with JQuery.  After some googling, I discovered that there is a convenient JQeury method called $.getJSON().  The official JQuery page on this $.getJSON() method includes not only the explanation about JSONP (which allows you to load the data from the domain other than yours in your browser and manipulate it unlike JSON which will be restricted by the same origin policy) but also the JQuery example of processing the same Flickr JSONP data. This is the example from the JQuery website.

$.getJSON("http://api.flickr.com/services/feeds/photos_public.gne?jsoncallback=?",
  {
    tags: "mount rainier",
    tagmode: "any",
    format: "json"
  },
  function(data) {
    $.each(data.items, function(i,item){
      $("<img/>").attr("src", item.media.m).appendTo("#images");
      if ( i == 3 ) return false;
    });
  });

As you can see in the first line, the data feed urls for JSONP response have a part similar to &jasoncallback=? at the end. The function name can vary and the API documentation of a data provider provides that bit of information. Let’s go through the codes line by line:

  • Line 1-6 requests and takes in the data feed from the speicified URL in JSONP format.
  • Once the data is received and ready, the script invokes the anonymous function from line 7-11. This function makes use of the JQuery method $.each().
  • For each of data.items, the anonymous function applies another anonymous function from line 9-10.
  • Line 9 creates an image tag – $(“<img/>”), attaches each item’s media.m element as the source attribute to the image tag – .attr(“src”, item.media.m), and lastly appends the resulting string to the empty div with the id of “images” – .appendTo(“#images”).
  • Line 10 makes sure that no more than 4 items in data.items is processed.

You can see the entire HTML page codes in the JQuery website’s $.getJSON() page.

Your Turn: Try out an API other than Flickr

So far we have looked through two examples.  Not too bad, right? To keep the post at a reasonable length, I will get to the little bit of code that I wrote in the next post.  This means that you can try the same and we can compare the result next time. Now here is the challenge. Both examples we saw used the Flickr API. Could you write code for a different API provider that does the same thing? Remember that you have to pick a data provider that offers feeds in JSONP if you want to avoid dealing with the same origin policy.

Here are a few providers you might want to check out. They all offer their data feeds in JSONP.

First, find out what data URLs you can use to get JSONP responses. Then write several lines of codes in JS and JQuery to process and display the data in the way you like in your own webpage. You may end up with some googling and research while you are at it.

Here are a few tips that will help you along the way:

  • Verify the data feed URL to see if you are getting the right JSONP responses. Just type the source url into the browser window and see if you get something like this.
  • Get the Firebug for debugging if you don’t already have it.
  • Use the Firebug’s NET panel to see if you are receiving the data OK.
  • Use the Console panel for debugging. The part of data that you want to pick up may be in several levels deep. So it is useful to know if you are getting the right item first before trying to manipulate it.

Happy Coding! See the following screenshots for the Firebug NET panel and Console panel. (Click the images to see the bigger and clearer version.) Don’t forget to share your completed project in the comments section as well as any questions, comments, advice, suggestions!

Net panel in Firebug

 

Console panel in Firebug

Library Code Year IG Meeting at ALA Annual Conference 2012

The LITA/ALCTS Library Code Year Interest Group was born from the wide spread interest in computer programming among librarians which coincided with Codecademy‘s Code Year program. The Library Code Year IG is active on both ALA Connect and on the Catcode wiki and held its inaugural meeting at ALA Annual last month.

The meeting started with introductions, which gave the membership an opportunity to share our goals for the group while also learning about common problems and frustrations that people have encountered while learning to code. The group came together over shared concerns and frustrations ranging from getting stuck on problems that can’t be solved alone to finding the lessons too dry when there is no real life application. Members also discussed the sense of frustration that comes from knowing that you need to know more about computer programming to keep up-to-date and simultaneously feeling guilty about time spent on computer programming lessons that aren’t directly related to a specific job duty.

Participants discussed techniques that they found helpful in teaching themselves to program, including:

  • reviewing lesson walkthroughs or keys (though some avoid this because it feels like “cheating”),
  • working through the problem with another student/mentor,
  • setting aside an allotted time daily or weekly to practice coding skills,
  • saving up multiple lessons or projects to work through in a single day of non-stop coding, and
  • finding code online that you can learn from and adapt for your own purposes.

These suggestions highlighted the importance of learning style and schedule flexibility when it comes to successfully teaching oneself to program. Just as importantly, the conversation showed that for most participants, committing to a long-term practice of regularly using these skills was key to success. This discussion provided an excellent foundation for the topics covered in the rest of the meeting.

The second portion of the meeting was devoted to lightning talks. Eric Phetteplace offered an introduction to bookmarklets. These relatively simple programs can be created with just a small amount of Javascript and can allow users to exert a powerful effect on websites through their browser. Bookmarklets run the gamut from the fun, such as Kick Ass, a bookmarklet which allows you to play Asteroids on any website, to Instapaper, which allows you to save and reformat webpages for future reading. Eric discussed some of the possible uses for libraries, including data harvesting or adding proxy server information to all links on a page. Any data on the web page can be accessed and changed with a simple script.

For those inspired to get started writing their own bookmarklets, Eric also provided concrete information on how to get started. He advocates using a template found online, echoing the meeting’s recurring theme that coders, particularly beginners, shouldn’t feel the need to reinvent the wheel for every project. Instead, finding templates online that can be adapted for your purposes is often a much more efficient way to start a project and a great way to learn from the work that other coders have already done. Eric also discussed tricks and tips for bookmarklets, such as having the bookmarklet point to code hosted elsewhere for easy updates, the importance of not making assumptions about the types of websites on which the bookmarklet will be used and the difficulty (to the point of virtual impossibility) of using bookmarklets on mobile browsers.

I gave the second lightning talk, which covered resources that can be used for learning or teaching programming. As was evident from our introductions, members of the group have a wide range of different interests and approaches to learning. While Code Year has worked for some people who want to learn more about Javascript, JQuery and web programming, my talk highlighted other tools that can be used to learn Python, Ruby, Java and other languages through tutorials, videos and exercises. I also discussed options for finding in-person programming classes locally for those who prefer to work with a group in person. Those interested in finding these alternative tools can refer to the handout I prepared for this talk or to my Pearltree on the topic.

The final, and arguably most important, agenda item for the meeting was discussing plans for the future. The group brainstormed and settled on focusing our efforts on a number of different types of how-to projects including:

  • A Python preconference event for beginners based on the curriculum developed by Boston Python Users Group,
  • A project based on OCLC’s APIs,
  • A Git and GitHub how-to session,
  • An IRC how-to session, and
  • A collection of resources to support those who want to host a Hackathon.

You can see the full list of volunteers for these projects on our ALA Connect space, but we are definitely looking for more helpers for these and other projects, so let us know if you want to help out! We also hope to maintain a list of members’ areas of expertise to facilitate helping each other out. If you want to coordinate this project, or if you would just like to be included on the list, add a comment on our ALA Connect space.

This first meeting is just the first step in what we hope will be a long history for this interest group. Even if you weren’t able to attend the meeting, we want you to be able to get as much as possible out of our activities. Be sure to stay in touch and please think about getting involved with us!

About Our Guest Author: Carli Spina is the Emerging Technologies and Research Librarian at the Harvard Law School Library. She has an MSLIS from Simmons College and a JD from the University of Chicago Law School and she is one of the co-chairs of the LITA/ALCTS Library Code Year Interest Group. Her interests include emerging technologies, innovation in libraries and coding. She can be found on Twitter @CarliSpina.

How to make peace with error messages

At last, after all those hours toiling under the glow of the computer screen, your first script is completed. All those hours learning to code have finally paid off. Holding your breath, you enter the command to execute the script… only to have an error message appear on the screen. You shake your fist to the sky and curse whatever deities you believe in, but the error message remains unchanged, almost like it is staring into your soul…

Error messages != fun, or the Art of Debugging

Error messages happen to everyone. The causes of error messages vary; sometimes the error is caused by a bug hidden in the system, other times the error stems from the human typing at the keyboard. In every case, error messages can be frustrating and time consuming (For those of you who have hunted for hours to find that missing closing bracket in your script, you know the pain that I speak of). Error messages are here to stay no matter how thorough you are with your code, so you’ll need to know how to deal with them in an efficient manner.

Making bugs visible

The first step of debugging your script is to make sure that you’re actually receiving error messages when things go wrong. Depending on the language you use, you can change your error settings to output errors in a log, in the command window, on a web page, etc. You can also change how detailed the error messages are or even create custom error messages to help pinpoint where in the script the code is failing. If your script is not running and you don’t see any errors pop up, your best bet is to look at the documentation for that programming language for error reporting.

Speaking of documentation…

RTFM

For the purposes of this post, RTFM means “Read the Friendly Manual”. Most programming languages have various documentations available in both online and print formats. The documentation is a good place to start when you suspect that the error is caused due to syntax errors or a function that is not being used properly.

Example: You have a python script that is returning the following error when you run the script [1]

>>> while True print 'Hello world'
  File "<stdin>", line 1, in ?
    while True print 'Hello world'
                   ^
SyntaxError: invalid syntax

The  “SyntaxError: invalid syntax” tells you that your code is not formatted correctly and you have a little carrot pointing to the last letter on the print function, so one would assume that the error involves the use of print in the while statement in some way. A quick search on the python website shows the proper syntax for constructing the while statement:

while True: print 'Hello world'

Note that we were missing a colon between the expression (while True) and the suite (print ‘Hello world’)

Reading the documentation is also a good way to reduce errors, so it’s worth your time to seek out good resources about the language you’re using and study them before and while you’re building the script.

GIYF

GIYF – Google is your friend. Librarians have a love-hate relationship with Google for reasons that have been covered extensively elsewhere. Google, however, is a staple in the programmer’s life. Copy the error message, paste it in the search box, and you’ll receive a multitude of hits of varying quality. If I do an exact phrase search on the (very common) PHP error “Parse error: syntax error, unexpected T_VARIABLE”, Google comes back to me with around 82,200 results. That’s a lot of results to wade through. A good thing to remember is that the same criteria that many librarians teach their user in evaluating sites for research can be applied while seeking out help with troubleshooting errors. By applying what is taught in many Information Literacy sessions, you will quickly narrow down the number of sites to use in your bug squashing process.

Once you have done a few searches for errors in one language, you hopefully will have found a few web sites that are consistent in providing detailed, accurate information about troubleshooting errors.  Some sites and resources that have been useful when I tracked down various error messages:

Some library-related resources for those dealing with errors while working on various library modules/script libraries in various programming languages:

There are many other places where you can search for help which you will find in your error tracking time.

Now for some audience participation:

  • Do you have a resource that helped you with errors and bugs?
  • What errors have you come across with coding for library related projects, and how did you find a fix?

Please share in the comments below. Happy bug squashing!

 

[1] Error example from http://docs.python.org/tutorial/errors.html#syntax-errors

 

I Want to Learn to Code, but…

"Coder Rage" by librarykitty, some rights reserved.

You may have seen people posting that they are learning to code with CodeYear, mentioned in our earlier blog post “Tips for Everyone Doing the #codeyear”.  While CodeYear and Codecademy are not the first sites to teach programming, CodeYear has seen quite a bit of marketing and notice, especially in the library world (#libcodeyear and #catcode).

Many find themselves, however, in a familiar situation when dealing with learning to code. And it starts with the person saying or thinking “I want to learn to code, but…

Do you fall under any of these categories?

1. “I don’t have enough time to learn coding.”

You can work through the time issue in two ways. The first way is block off time. You have to look at your schedule and decide, for example, “ok, I’m working on my coding lesson between 1 to 2pm.” Once you made that decision, tell the rest of the world, so that they know that you’re working on learning something during that time.

For some folks, though, blocking off an hour may be impossible due to disruptions from work or personal life. When you’re in a situation where frequent disruptions are a fact of life, documentation is your friend. Keep notes of what you learned, what questions you have, what issues you ran across, and so on – this will make sure that you do not end up having to repeat a lesson, or losing track of your thoughts during a lesson.

2.  “This is too hard.”

Here I must stress one of the key survival traits for people learning to code: ask questions! Find people who are taking the same lesson and ask. Find coders and ask. Find an online forum and ask. Post your question on Twitter, Facebook, blog, or any other broadcasting medium. Just ASK.

More often than not your question will be answered, or you will be pointed in the right direction in answering your question. The overused saying “there is no such thing as a stupid question” applies here. Coding is a community activity, and it’s to your benefit to approach it as such.

3. “I don’t like the tutorial/course.”

It’s OK to say “hey, this course isn’t what I thought it would be” or “hey, I’m not finding this course useful.” Ask yourself, “in which environment do I feel like I learn the most?” Is it a physical classroom? A virtual classroom? Do you like learning on your own? With a small group of friends? With a large group?  There are various formats and venues where you can find courses in coding, from credit-earning classes to how-to books. For example, the Catcode wiki lists a variety of coding lessons or learning opportunities at various levels of coding knowledge. Choose the one (or a few) that will fit best with you, and go for it. It might take a few tries, but you will find something that works for you.

So, if you find yourself saying “I want to learn code, but…,” there is hope for you yet.

Find what’s holding you back, tackle it, and work out a possible solution. If you don’t get it the first time, that’s OK. It’s OK to fail, as long as you learn and understand why it failed, and apply what you learned in future endeavors. For now, we are stuck in learning coding the hard way: practice, practice, practice.

Learning code the hard way, on the other hand, is not too hard once you have taken the first few steps.