Web Scraping: Creating APIs Where There Were None

Websites are human-readable. That’s great for us, we’re humans. It’s not so great for computer programs, which tend to be better at navigating structured data rather than visuals.

Web scraping is the practice of “scraping” information from a website’s HTML. At its core, web scraping lets programs visit and manipulate a website much like people do. The advantage to this is that, while programs aren’t great at navigating the web on their own, they’re really good at repeating things over and over. Once a web scraping script is set up, it can run an operation thousands of times over without breaking a sweat. Compare that to the time and tedium of clicking through a thousand websites to copy-paste the information you’re interested in and you can see the appeal of automation.

Why web scraping?

Why would anybody use web scraping? There are a few good reasons which are, unfortunately, all too common in libraries.

You need an API where there is none.

Many of the web services we subscribe to don’t expose their inner workings via an API. It’s worth taking a moment to explain the term API, which is used frequently but rarely given a better definition beyond the uninformative “Application Programming Interface”.

Let’s consider a common type of API, a search API. When you visit Worldcat and search, the site checks an enormous database of millions of metadata records and returns a nice, visually formatted list of ones relevant to your query. Again, this is great for humans. We can read through the results and pick out the ones we’re interested in. But what happens when we want to repurpose this data elsewhere? What if we want to build a bento search box, displaying results from our databases and Worldcat alongside each other?1 The answer is that we can’t easily accomplish this without an API.

For example, the human-readable results of search engine may look like this:

1. Instant PHP Web Scraping

by Jacob Ward

Publisher: Packt Publishing 2013

2. Scraping by: wage labor, slavery, and survival in early Baltimore

by Seth Rockman

Publisher: Johns Hopkins University Press 2009

That’s fine for human eyes, but for our search application it’s a pain in the butt. Even if we could embed a result like this using an iframe, the styling might not match what we want and the metadata fields might not display in a manner consistent with our other records (e.g. why is the publication year included with publisher?). What an API returns, on the other hand, may look like this:

[
  {
    "title": "Instant PHP Web Scraping",
    "author": "Jacob Ward",
    "publisher": "Packt Publishing",
    "publication_date": "2013"
  },
  {
    "title": "Scraping by: wage labor, slavery, and survival in early Baltimore",
    "author": "Seth Rockman",
    "publisher": "Johns Hopkins University Press",
    "publication_date": "2009"
  }
]

Unless you really love curly braces and quotation marks, that looks awful. But it’s very easy to manipulate in many programming languages. Here’s an incomplete example in Python:

results = json.load( data )
for result in results:
  print result.title + ' - ' + result.author

Here “data” is our search results from above and we can use a function to easily parse that data into a variable. The script then loops over each search result and prints out its title in author in a format like “Instant PHP Web Scraping – Jacob Ward”.

An API is hard to use or doesn’t have the data you need.

Sometimes services do expose their data via an API, but the API has limitations that the human interface of the website doesn’t. Perhaps it doesn’t expose all the metadata which is visible in search results. Fellow Tech Connect author Margaret Heller mentioned that Ulrich’s API doesn’t include subject information, though it’s present in the search results presented to human users.

Some APIs can also be more difficult to use than web scraping. The ILS at my place of work is like this; you have to pay extra to get the API activated and it requires server configuration on a shared server I don’t have access to. The API has strict authentication requirements which are required even for read-only calls (e.g. I’m just accessing publicly-viewable data, not making account changes). The boilerplate code the vendor provides doesn’t work, or rather only works for trivial examples. All these hurdles combine to make scraping the catalog appealing.

As a side effect, how you reconfigure a site’s data might inspire its own API. Are you sorely missing a feature so bad you need to hack around it? Writing a nice proof-of-concept with web scraping might prove that there’s a use case for a particular API feature.

How?

More or less all web scraping works the same way:

  • Use a scripting language to get the HTML of a particular page
  • Find the interesting pieces of a page using CSS, XPath, or DOM traversal—any means of identifying specific HTML elements
  • Manipulate those pieces, extracting the data you need
  • Pipe the data somewhere else, e.g. into another web page, spreadsheet, or script

Let’s go through an example using the Directory of Open Access Journals. Now, the DOAJ has an API of sorts; it supports retrieving metadata via the OAI-PMH verbs. This means a request for a URL like http://www.doaj.org/oai?verb=GetRecord&identifier=18343147&metadataPrefix=oai_dc will return XML with information about one of the DOAJ journals. But OAI-PMH doesn’t support any search APIs; we can use standard identifiers and other means of looking up specific articles or publications, but we can’t do a traditional keyword search.

Libraries, of the code persuasion

Before we get too far, let’s lean on those who came before us. Scraping a website is both a common task and a complex one. Remember last month, when I said that we don’t need to reinvent the wheel in our programming because reusable modules exist for most common tasks? Please let’s not write our own web scraping library from scratch.

Code libraries, which go by different names depending on the language (most amusingly, they’re called “eggs” in Python and “gems” in Ruby), are pre-written chunks of code which help you complete common tasks. Any task which several people have had to do before probably has a library devoted to it. Google searches for “best [insert task] module for [insert language]” typically turn up useful guidance on where to start.

While each language has its own means of incorporating others’ code into your own, they all basically have two steps: 1) download the external library somewhere onto your hard drive or server, often using a command-line tool, and 2) import the code into your script. The external library should have some documentation on how to use it’s special features once you’re imported it.

What does this look like in PHP, the language our example will be in? First, we visit the Simple HTML DOM website on Sourceforge to download a single PHP file. Then, we place that file in the same directory that our scraping script will live. In our scraping script, we write a single line up at the top:

<?php
require_once( 'simple_html_dom.php' );
?>

Now it’s as if the whole contents of the simple_html_dom.php file were in our script. We can use functions and classes which were defined in the other file, such as the file_get_html function which is not otherwise available. PHP actually has a few functions which are used to import code in different ways; the documentation page for the include function describes the basic mechanics.

Web scraping a DOAJ search

While the DOAJ doesn’t have a search API, it does have a search bar which we can manipulate in our scraping. Let’s run a test search, view the HTML source of the result, and identify the elements we’re interested in. First, we visit doaj.org and type in a search. Note the URL:

doaj.org/doaj?func=search&template=&uiLanguage=en&query=librarianship

I’ve highlighted the key-value pairs in the URLs query string, making the keys bold and the values italicized. Here our search term was “librarianship” which is the value associated with the appropriately-named “query” key. If we change the word “librarianship” to a different search term and visit the new URL, we see results for the new term, predictably. With easily hackable URLs like this, it’s easy for us to write a web scraping script. Here’s the first half of our example in PHP:

<?php
// see http://simplehtmldom.sourceforge.net/manual_api.htm for documentation
require_once( 'simple_html_dom.php' );

$base = 'http://www.doaj.org/doaj?func=search&template=&uiLanguage=en&query=';
$query = urlencode( 'librarianship' );

$html = file_get_html( $base . $query );
// to be continued...
?>

So far, everything is straightforward. We insert the web scraping library we’re using, then use what we’ve figured out about the DOAJ URL structure: it has a base which won’t change and a query which we want to change according to our interests. You could have the query come from command-line arguments or web form data like the $_GET array in PHP, but let’s just keep it as a simple string.

We urlencode the string because we don’t want spaces or other illegal characters sneaking their way in there; while the script still works with $query = 'new librarianship' for example, using unencoded text in URLs is a bad habit to get into. Other functions, such as file_get_contents, will produce errors if passed a URL with spaces in it. On the other hand, urlencode( 'new librarianship' ) returns the appropriately encoded string “new+librarianship”. If you do take user input, remember to sanitize it before using it elsewhere.

For the second part, we need to investigate the HTML source of DOAJ’s search results page. Here’s a screenshot and a simplified example of what it looks like:

2 search results from the DOAJ

A couple search results from DOAJ for the term “librarianship”

<div id="result">
  <div class="record" id="record1">
    <div class="imageDiv">
      <img src="/doajImages/journal.gif"><br><span><small>Journal</small></span>
    </div><!-- END imageDiv -->
    <div class="data">
      <a href="/doaj?func=further&amp;passMe=http://www.collaborativelibrarianship.org">
        <b>Collaborative Librarianship</b>
      </a>
      <strong>ISSN/EISSN</strong>: 19437528
      <br><strong>Publisher</strong>: Regis University
      <br><strong>Subject</strong>:
      <a href="/doaj?func=subject&amp;cpId=129&amp;uiLanguage=en">Library and Information Science</a>
      <br><b>Country</b>: United States
      <b>Language</b>: English<br>
      <b>Start year</b> 2009<br>
      <b>Publication fee</b>:
    </div> <!-- END data -->
    <!-- ...more markup -->
  </div> <!-- END record -->
  <div class="recordColored" id="record2">
    <div class="imageDiv">
      <img src="/doajImages/article.png"><br><span><small>Article</small></span>
    </div><!-- END imageDiv -->
    <div class="data">
       <b>Mentoring for Emerging Careers in eScience Librarianship: An iSchool – Academic Library Partnership </b>
      <div style="color: #585858">
        <!-- author (s) -->
         <strong>Authors</strong>:
          <a href="/doaj?func=search&amp;query=au:&quot;Gail Steinhart&quot;">Gail Steinhart</a>
          ---
          <a href="/doaj?func=search&amp;query=au:&quot;Jian Qin&quot;">Jian Qin</a><br>
        <strong>Journal</strong>: <a href="/doaj?func=issues&amp;jId=88616">Journal of eScience Librarianship</a>
        <strong>ISSN/EISSN</strong>: 21613974
        <strong>Year</strong>: 2012
        <strong>Volume</strong>: 1
        <strong>Issue</strong>: 3
        <strong>Pages</strong>: 120-133
        <br><strong>Publisher</strong>: University of Massachusetts Medical School
      </div><!-- End color #585858 -->
    </div> <!-- END data -->
    <!-- ...more markup -->
   </div> <!-- END record -->
   <!-- more records -->
</div> <!-- END results list -->

Even with much markup removed, there’s a lot going on here. We need to zone in on what’s interesting and find patterns in the markup that help us retrieve it. While it may not be obvious from the example above, the title of each search result is contained in a <b> tag towards the beginning of each record (lines 8 and 26 above).

Here’s a sketch of the element hierarchy leading to the title: a <div> with id=”result” > a <div> with a class of either “record” or “recordColored” > a <div> with a class of “data” > possibly an <a> tag (present in the first example, absent in the second) > the <b> tag containing the title. Noting the conditional parts of this hierarchy is important; if we didn’t note that sometimes an <a> tag is present and that the class can be either “record” or “recordColored”, we wouldn’t be getting all the items we want.

Let’s try to return the titles of all search results on the first page. We can use Simple HTML DOM’s find method to extract the content of specific elements using CSS selectors. Now that we know how the results are structured, we can write a more complete example:

<?php
require_once( 'simple_html_dom.php' );

$base = 'http://www.doaj.org/doaj?func=search&template=&uiLanguage=en&query=';
$query = urlencode( 'librarianship' );

$html = file_get_html( $base . $query );

// using our knowledge of the DOAJ results page
$records = $html->find( '.record .data, .recordColored .data' );

foreach( $records as $record ) {
  echo $record->getElementsByTagName( 'b', 0 )->plaintext . PHP_EOL;
}
?>

The beginning remains the same, but this time we actually do something with the HTML. We use find to pull the records which have class “data.” Then we echo the first <b> tag’s text content. The getElementsByTagName method typically returns an array, but if you pass a second integer parameter it returns the array element at that index (0 being the first element in the array, because computer scientists count from zero). The ->plaintext property simply contains the text found in the element, if we echoed the element itself we would see opening and closing <b> tags wrapped around the title. Finally, we append an “end-of-line” (EOL) character just to make the output easier to read.

To see our results, we can run our script on the command line. For Linux or Mac users, that likely means merely opening a terminal (in Applications/Utilities on a Mac) since they come with PHP pre-installed. On Windows, you may need to use WAMP or XAMPP to run PHP scripts. XAMPP gives you a “shell” button to open a terminal, while you can put the PHP executable in your Windows environment variables if you’re using WAMP.

Once you have a terminal open, the php command will execute whatever PHP script you pass it as a parameter. If we run php name-of-our-script.php in the same directory as our script, we see ten search result titles printed to the terminal:

> php doaj-search.php
Collaborative Librarianship
Mentoring for Emerging Careers in eScience Librarianship: An iSchool – Academic Library Partnership
Education for Librarianship in Turkey Education for Librarianship in Turkey
Turkish Librarianship: A Selected Bibliography Turkish Librarianship: A Selected Bibliography
Journal of eScience Librarianship
Editorial: Our Philosophies of Librarianship
Embedded Academic Librarianship: A Review of the Literature
Model Curriculum for 'Oriental Librarianship' in India
A General Outlook on Turkish Librarianship and Libraries
The understanding of subject headings among students of librarianship

This is a simple, not-too-useful example. But it could expanded in many ways. Try copying the script above and attempting some of the following:

  • Make the script return more than the ten items on the first page of results
  • Use some of DOAJ’s advanced search functions, for instance a date limiter
  • Only return journals or articles, not both
  • Return more than just the title of results, for instance the author(s), URLs, or publication date

Accomplishing these tasks involves learning more about DOAJ’s URL and markup structure, but also learning more about the scraping library you’re using.

Common Problems

There are a couple possible hangups when web scraping. First of all, many websites employ user-agent sniffing to serve different versions of themselves to different devices. A user agent is a hideous string of text which web browsers and other HTTP clients use to identify themselves.2 If a site misinterprets our script’s user agent, we may end up on a mobile or other version of a site instead of the desktop one we were expecting. Worse yet, some sites try to prevent scraping by blacklisting certain user agents.

Luckily, most web scraping libraries have tools built in to work around this problem. A nice example is Ruby’s Mechanize, which has an agent.user_agent_alias property which can be set to a number of popular web browsers. When using an alias, our script essentially tells the responding web server that it’s a common desktop browser and thus is more likely to get a standard response.

It’s also routine that we’ll want to scrape something behind authentication. While IP authentication can be circumvented by running scripts from an on-campus connection, other sites may require login credentials. Again, most web scraping libraries already have built-in tools for handling authentication. We can find which form controls on the page we need to fill in, insert your username and password into the form, and then submit it programmatically. Storing a login in a plain text script is never a good idea though, so be careful.

Considerations

Not all web scraping is legitimate. Taking data which is copyrighted and merely re-displaying it on our site without proper attribution is not only illegal, it’s just not being a good citizen of the web. The Wikipedia article on web scraping has a lengthy section on legal issues with a few historical cases from various countries.

It’s worth noting that web scraping can be very brittle, meaning it breaks often and easily. Scraping typically relies on other people’s markup to remain consistent. If just a little piece of HTML changes, our entire script might be thrown off, looking for elements that no longer exist.

One way to counteract this is to write selectors which are as broad as possible. For instance, let’s return to the DOAJ search results markup. Why did we use such a concise CSS selector to find the title when we could have been much more specific? Here’s a more explicit way of getting the same data:

$html->find( 'div#result > div.record > div.data, div#result > div.recordColored > div.data' );

What’s wrong with these selectors? We’re relying on so much more to stay the same. We need: the result wrapper to be a <div>, the result wrapper to have an id of “result”, the record to be a <div>, and the data inside the record to be a <div>. Our use of the child selector “>” means we need the element hierarchy to stay precisely the same. If any of these properties of the DOAJ markup changed, our selector wouldn’t find anything and our script would need to be updated. Meanwhile, our much more generic line still grabs the right information because it doesn’t depend on particular tags or other aspects of the markup remaining constant:

$html->find( '.record .data, .recordColored .data' );

We’re still relying on a few things—we have to, there’s no getting around that in web scraping—but a lot could change and we’d be set. If the DOAJ upgraded to HTML5 tags, swapping out <div> for <article> or <section>, we would be OK. If the wrapping <div> was removed, or had its id change, we’d be OK. If a new wrapper was inserted in between the “data” and “record” <div>, we’d be OK. Our approach is more resilient.

If you did try running our PHP script, you probably noticed it was rather slow. It’s not like typing a query into Google and seeing results immediately. We have to request a page from an external site, which then queries its backend database, processes the results, and displays HTML which we ultimately don’t use, at least not as intended. This highlights that web scraping isn’t a great option for user-facing searches; it can take too long to return results. One option is to cache searches, for instance storing results of previous scrapings in a database and then checking to see if the database has something relevant before resorting to pulling content off an external site.

It’s also worth noting that web scraping projects should try to be reasonable about the number of times they request an external resource. Every time our script pulls in a site’s HTML, it’s another request that site’s server has to process. A site may not have an API because it cannot handle the amount of traffic one would attract. If our web scraping project is going to be sending thousands of requests per hour, we should consider how reasonable that is. A simple email to the third party explaining what we’re doing and the amount of traffic it may generate is a nice courtesy.

Overall, web scraping is handy in certain situations (see below) or for scripts which are run seldom or a single time. For instance, if we’re doing an analysis of faculty citations at our institution, we might not have access to a raw list of citations. But faculty may have university web pages where they list all their publications in a consistent format. We could write a script which only needs to run once, culling a large list of citations for analysis. Once we’ve scraped that information, you could use OpenRefine or other power tools to extract particular journal titles or whatever else we’re interested in.

How is web scraping used in libraries?

I asked Twitter what other libraries are using web scraping for and got a few replies:

@phette23 Pulling working papers off a departmental website for the inst repo. Had to web scrape for metadata.
— Ondatra libskoolicus (@LibSkrat) September 25, 2013

Matthew Reidsma of Grand Valley State University also had several examples:

To fuel a live laptop/iPad availability site by scraping holdings information from the catalog. See the availability site as well as the availability charts for the last few days and the underlying code which does the scraping. This uses the same Simple HTML Dom library as our example above.

It’s also used to create a staff API by scraping the GVSU Library’s Staff Directory and reformatting it; see the code and the result. The result may not look very readable—it’s JSON, a common data format that’s particularly easy to reuse in some languages such as JavaScript—but remember that APIs are for machine-readable data which can be easily reused by programs, not people.

Jacqueline Hettel of Stanford University has a great blog post that describes using a Google Chrome extension and XPath queries to scrape acknowledgments from humanities monographs in Google Books; no coding required! She and Chris Bourg are presenting their results at the Digital Library Federation in November.

Finally, I use web scraping to pull hours information from our main library site into our mobile version. I got tired of updating the hours in two places every time they changed, so now I pull them in using a PHP script. It’s worth noting that this dual-maintenance annoyance is one major reason websites can and should be done in responsive designs.

Most of these library examples are good uses of web scraping because they involve simply transporting our data from one system to another; scraping information from the catalog to display it elsewhere is a prime use case. We own the data, so there are no intellectual property issues, and they’re our own servers so we’re responsible for keeping them up.

Code Libraries

While we’ve used PHP above, there’s no need to limit ourselves to a particular programming language. Here’s a set of popular web scraping choices in a few languages:

To provide a sense of how the different tools above work, I’ve written a series of gists which uses each to scrape titles from the first page of a DOAJ search.

Notes
  1. See the NCSU or Stanford library websites for examples of this search style. Essentially, results from several different search engines—a catalog, databases, the library website, study guides—are all displaying on the same page in seperate “bento” compartments.
  2. The browser I’m in right now, Chrome, has this beauty for a user agent string: “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36″. Yes, that’s right: Mozilla, Mac, AppleWebKit, KHTML, Gecko, Chrome, & Safari all make an appearance.

A Brief Look at Cryptography for Librarians

You may not think much about cryptography on a daily basis, but it underpins your daily work and personal existence. In this post I want to talk about a few realms of cryptography that affect the work of academic librarians, and talk about some interesting facets you may never have considered. I won’t discuss the math or computer science basis of cryptography, but look at it from a historical and philosophical point of view. If you are interested in the math and computer science, I have a few a resources listed at the end in addition to a bibliography.

Note that while I will discuss some illegal activities in this post, neither I nor anyone connected with the ACRL TechConnect blog is suggesting that you actually do anything illegal. I think you’ll find the intellectual part of it stimulation enough.

What is cryptography?

Keeping information secret is as simple as hiding it from view in, say, an envelope, and trusting that only the person to whom it is addressed will read that information and then not tell anyone else. But we all know that this doesn’t actually work. A better system would only allow a person with secret credentials to open the envelope, and then for the information inside to be in a code that only she could know.

The idea of codes to keep important information secret goes back thousands of years , but for the purposes of computer science, most of the major advances have been made since the 1970s. In the 1960s with the advent of computing for business and military uses, it was necessary to come up with ways to encrypt data. In 1976, the concept of public-key cryptography was developed, but it wasn’t realized practically until 1978 with the paper by Rivest, Shamir, and Adleman–if you’ve ever wondered what RSA stood for, there’s the answer. There were some advancements to this system, which resulted in the digital signature algorithm as the standard used by the federal government.1 Public-key systems work basically by creating a private and a public key–the private one is known only to each individual user, and the public key is shared. Without the private key, however, the public key can’t open anything. See the resources below for more on the math that makes up these algorithms.

Another important piece of cryptography is that of cryptographic hash functions, which were first developed in the late 1980s. These are used to encrypt blocks of data– for instance, passwords stored in databases should be encrypted using one of these functions. These functions ensure that even if someone unauthorized gets access to sensitive data that they cannot read it. These can also be used to verify the identify of a piece of digital content, which is probably how most librarians think about these functions, particularly if you work with a digital repository of any kind.

Why do you care?

You probably send emails, log into servers, and otherwise transmit all kinds of confidential information over a network (whether a local network or the internet). Encrypted access to these services and the data being transmitted is the only way that anybody can trust that any of the information is secret. Anyone who has had a credit card number stolen and had to deal with fraudulent purchases knows first-hand how upsetting it can be when these systems fail. Without cryptography, the modern economy could not work.

Of course, we all know a recent example of cryptography not working as intended. It’s no secret (see above where keeping something a secret requires that no one who knows the information tells anyone else) by now that the National Security Agency (NSA) has sophisticated ways of breaking codes or getting around cryptography though other methods 2 Continuing with our envelope analogy from above, the NSA coerced companies to allow them to view the content of messages before the envelopes were sealed. If the messages were encoded, they got the keys to decode the data, or broke the code using their vast resources. While these practices were supposedly limited to potential threats, there’s no denying that this makes it more difficult to trust any online communications.

Librarians certainly have a professional obligation to keep data about their patrons confidential, and so this is one area in which cryptography is on our side. But let’s now consider an example in which it is not so much.

Breaking DRM: e-books and DVDs

Librarians are exquisitely aware of the digital rights management realm of cryptography (for more on this from the ALA, see The ALA Copyright Office page on digital rights ). These are algorithms that encode media in such a way that you are unable to copy or modify the material. Of course, like any code, once you break it, you can extract the material and do whatever you like with it. As I covered in a recent post, if you purchase a book from Amazon or Apple, you aren’t purchasing the content itself, but a license to use it in certain proscribed ways, so legally you have no recourse to break the DRM to get at the content. That said, you might have an argument under fair use, or some other legitimate reason to break the DRM. It’s quite simple to do once you have the tools to do so. For e-books in proprietary formats, you can download a plug-in for the Calibre program and follow step by step instructions on this site. This allows you to change proprietary formats into more open formats.

As above, you shouldn’t use software like that if you don’t have the rights to convert formats, and you certainly shouldn’t use it to pirate media. But just because it can be used for illegal purposes, does that make the software itself illegal? Breaking DVD DRM offers a fascinating example of this (for a lengthy list of CD and DVD copy protection schemes, see here and for a list of DRM breaking software see here). The case of CSS (Content Scramble System) descramblers illustrates some of the strange philosophical territory into which this can end up. The original code was developed in 1999, and distributed widely, which was initially ruled to be illegal. This was protested in a variety of ways; the Gallery of CSS Descramblers has a lot more on this 3. One of my favorite protest CSS descramblers is the “illegal” prime number, which is a prime number that contains the entire code for breaking the CSS DRM. The first illegal prime number was discovered in 2001 by Phil Carmody (see his description here) 4. This number is, of course, only illegal inasmuch as the information it represents is illegal–in this case it was a secret code that helped break another secret code.

In 2004, after years of court hearings, the California Court of Appeal overturned one of the major injunctions against posting the code, based on the fact that  source code is protected speech under the first amendment , and that the CSS was no longer a trade secret. So you’re no longer likely to get in trouble for posting this code–but again, using it should only be done for reasons protected under fair use. [5.“DVDCCA v Bunner and DVDCCA v Pavlovich.” Electronic Frontier Foundation. Accessed September 23, 2013. https://www.eff.org/cases/dvdcca-v-bunner-and-dvdcca-v-pavlovich.] One of the major reasons you might legitimately need to break the DRM on a DVD is to play DVDs on computers running the Linux operating system, which still has no free legal software that will play DVDs (there is legal software with the appropriate license for $25, however). Given that DVDs are physical media and subject to the first sale doctrine, it is unfair that they are manufactured with limitations to how they may be played, and therefore this is a code that seems reasonable for the end consumer to break. That said, as more and more media is streamed or otherwise licensed, that argument no longer applies, and the situation becomes analogous to e-book DRM.

Learning More

The Gambling With Secrets video series explains the basic concepts of cryptography, including the mathematical proofs using colors and other visual concepts that are easy to grasp. This comes highly recommended from all the ACRL TechConnect writers.

Since it’s a fairly basic part of computer science, you will not be surprised to learn that there are a few large open courses available about cryptography. This Cousera class from Stanford is currently running, and this Udacity class from University of Virginia is a self-paced course. These don’t require a lot of computer science or math skills to get started, though of course you will need a great deal of math to really get anywhere with cryptography.

A surprising but fun way to learn a bit about cryptography is from the NSA’s Kids website–I discovered this years ago when I was looking for content for my X-Files fan website, and it is worth a look if for nothing else than to see how the NSA markets itself to children. Here you can play games to learn basics about codes and codebreaking.

  1. Menezes, A., P. van Oorschot, and S. Vanstone. Handbook of Applied Cryptography. CRC Press, 1996. http://cacr.uwaterloo.ca/hac/. 1-2.
  2. See the New York Times and The Guardian for complete details.
  3. Touretzky, D. S. (2000) Gallery of CSS Descramblers. Available: http://www.cs.cmu.edu/~dst/DeCSS/Gallery, (September 18, 2013).
  4. For more, see Caldwell, Chris. “The Prime Glossary: Illegal Prime.” Accessed September 17, 2013. http://primes.utm.edu/glossary/xpage/Illegal.html.

10 Practical Tips for Compiling Your Promotion or Tenure File

 

Flickr image by Frederic Bisson http://www.flickr.com/photos/38712296@N07/3604417507/

If you work at an academic library, you may count as faculty. Whether the faculty status comes with a tenure track or not, it usually entails a more complicated procedure for promotion than the professional staff status.  At some libraries, the promotion policy and procedure is well documented, and a lot of help and guidance are given to those who are new to the process. At other libraries, on the other hand, there may be less help available and the procedure documentation can be not quite clear. I recently had the experience of compiling my promotion file. I thought that creating a promotion file would not be too difficult since I have been collecting most of my academic and professional activities. But this was not the case at all. Looking back, there are many things I would have done differently to make the process less stressful.

While this post does not really cover a technology topic that we at ACRL TechConnect usually write about, applying for promotion and/or tenure is something that many academic librarians go through. So I wanted to share some lessons that I learned from my first-time experience of crating a promotion binder as a non-tenure track faculty.

Please bear in mind that the actual process of assembling your promotion or tenure file can differ depending on your institution. At my university, everything has to be printed and filed in a binder and multiple copies of such a binder are required for the use of the tenure and promotion committee. At some places, librarians may only need to print all documentation but don’t need to actually create a binder. At other places, you may do everything online using a system such as Digital Measures, Sedona or Interfolio, and you do not have to deal with papers or binders at all. Be aware that if you do have to deal with actual photocopying, filing, and creating a binder, there will be some additional challenges.

Also my experience described here was for promotion, not tenure. If you are applying for a tenure, see these posts that may be helpful:

1. Get a copy of the promotion and tenure policy manual of your library and institution.

In my case, this was not possible since my library as well as the College of Medicine, to which the library belongs, did not have the promotion policy until very recently. But if you work at an established academic library, there will be a promotion/tenure procedure and policy manual for librarians. Some of the manual may refer to the institution’s faculty promotion and tenure policy manual as well. So get a copy of both and make sure to find out under which category librarians fall. You may count as non-tenured faculty, tenure-track faculty, or simply professionals. You may also belong to an academic department and a specific college, or you may belong to simply your library which counts as a college with a library dean.

You do not have to read the manual as soon as you start working. It will certainly not be a gripping read. But do get a copy and file it in your binder. (It is good to have a binder for promotion-related records even if you do not actually have to create a promotion binder yourself or everything can be filed electronically.)

2. Know when you become eligible for the application for promotion/tenure and what the criteria are.

Once you obtain a copy of your library’s promotion/tenure policy, take a quick look at the section that specifies how many years of work is required for you to apply for promotion or tenure and what the promotion / tenure criteria are. An example of  the rankings of non-tenure track librarians at an academic library are: Instructor, Assistant, Associate, and University Librarian. This mirrors the academic faculty rankings of Instructor, Assistant, Associate, and Full Professor. But again, your institution may have a different system. Each level of promotion will have a minimum number of years required, such as 2 years for the promotion from Instructor to Assistant Librarian, and specific criteria applied to that type of promotion. This is good to know early in your career, so that you can coordinate and organize your academic and professional activities to match what your institution expects its librarians to perform as much as possible.

3. Ask those who went through the same process already.

Needless to day, the most helpful advice comes from those librarians who went through the same process. They have a wealth of knowledge to share. So don’t hesitate to ask them what the good preparatory steps are for future application. Even if you have a very general question, they will always point out what to pay attention to in advance.

Also at some libraries, the promotion and tenure committee holds an annual workshop for those who are interested in submitting an application. Even if you are not yet planning to apply and it seems way too early to even consider such a thing, it may be a good idea to attend one just to get an overview. The committee is very knowledgeable about the whole process and consists of librarians experienced in the promotion and tenure process.

4. Collect and gather documentation under the same categories that your application file requires.

The promotion file can require a lot of documentation that you may neglect to collect on a daily basis. For example, I never bothered to keep track of the committee appointment notification e-mails, and the only reason I saved the conference program booklets were because of a colleague’s advice that I got to save them for the promotion binder in the future as the proof of attendance.  (It would have never dawned on me. And even then, I lost some program booklets for conferences I attended.) This is not a good thing.

Since there was no official promotion policy for my library when I started, I simply created a binder and filed anything and everything that might be relevant to the promotion file some day. However, over the last five years, this binder got extremely fat. This is also not a good practice. When I needed those documentation to actually create and organize my promotion file, it was a mess. It was good that I had at least quite a bit of documentation saved. But I had to look through all of them again because they belonged to different categories and the dates were all mixed up.

So, it is highly recommended that you should check the categories of the application file that your library/institution requires before creating a binder. Do not just throw things into a binder or a drawer if possible. Make separate binders or drawers under the same categories that your application file requires such as publication, presentation, university service, community service, professional services, etc.  Also organize the documentation by year and keep the list of items in each category. Add to the list every time you file something. Pretend that you are doing this for your work, not for your promotion, to motivate yourself.

Depending on your preference and the way your institution handles the documentation for your promotion or tenure application, it may make a better sense to scan and organize everything in a digital form as long as the original document is not required. You can use citation management system such as Mendeley or Refworks to keep the copies of all your publications for example. These will easily generate the up-to-date bibliography of your publications for your CV. If your institution uses a system that keeps track of faculty’s research, grant, publication, teaching, and service activities such as Digital Measures or Sedona, those systems may suit you better as you can keep track of more types of activities than just publications. You can also keep a personal digital archive of everything that will go into your application file either on your local computer or on your Dropbox, Google Drive, or SkyDrive account. The key is to save and organize when you have something that would count towards promotion and tenure in hand “right away.”

One more thing. If you publish a book chapter, depending on the situation, you may not get the copy of the book or the final version of your book chapter as a PDF from your editor or publisher. This is no big deal until you have to ask your college to do a rush ILL for you. So take time in advance to obtain at least one hard copy or  the finished PDF version of your publication particularly in the case of book chapters.

5. What do I put into my promotion file or tenure dossier?

There are common items such as personal statement, CV, publications, and services, which are specified in the promotion/tenure policy manual. But some of the things that may not belong to these categories or that make you wonder if it is worth putting into your promotion file. It really depends on what else you have it in your application file. If your application file is strong enough, you may skip things like miscellaneous talks that you gave or newsletter articles that you wrote for a regional professional organization. But ask a colleague for advice first and check if your file looks balanced in all areas.

6. Make sure to keep documentation for projects that only lived a short life.

Another thing to keep in mind is to keep track of all the projects you worked on. As time goes on, you may forget some of the work you did. If you create a website, a LibGuide, a database application, a section in the staff intranet, etc., some of those may last a long time, but others may get used only for a while and then disappear or be removed. Once disappeared projects are hard to show in your file as part of your work and achievement unless you documented the final project result when it was up and running and being used. So take the screenshots, print out the color copies of those screenshots, and keep the record of the dates during which you worked on the project and of the date on which the project result was released, implemented etc.

If you work in technology, you may have more of this type of work than academic publications. Check your library’s promotion and tenure policy manual to see if it has the category of ‘Creative Works’ or something similar under which you can add these items.

If you are assembling your binder right now, and some projects you worked on are completely gone, check the WayBack machine from the Internet Archive and see if you can find an archived copy. Not always available, but if you don’t have anything else, this may the only way to find some evidence of your work that you can document.

7. Update your CV and the list of Continuing Education activities on a regular basis.

Ideally, you will be doing this every year when you do your performance review. But it may not be required. Updating your CV is certainly not the most exciting thing to do, but it must be done. Over the last five years, I have done CV updates only when it was required for accreditation purposes (which requires the current CVs of all faculty). This was better than not having updated my CV at all for sure. But since I did not really update it with the promotion application in mind, when I needed to create one for the promotion application file, I had to redo the CV moving items and organizing them in different categories. So make sure to check your library’s or your institution’s faculty promotion/tenure policy manual. The manual includes the format of CV that the dossier needs to adopt. Use that format for your CV and update it every year. (I think that during the Christmas holidays may be a good time for this kind of task from now on for me.)

Some people keep the most up-to-date CV in their Dropbox’s public folder, and that is also a good idea if you have a website and share your CV there.

Some of the systems I have mentioned earlier – Digital Measure and Sedona – also allow you to create a custom template which you can utilize for the promotion/tenure application purpose. If the system has been in use for many years in your institution, there may be  a pre-made template for promotion and tenure purposes.

8. Make sure to collect all appointment e-mails to committees and other types of services you do.

Keeping the records of all services is a tricky thing as we tend to pay little attention to the appointment e-mails to committees or other types of services that we perform for universities or professional organizations. I assumed that they were all in my inbox somewhere and did not properly organize them. As a result, I had to spend hours looking for them when I was compiling my binder.

This can be easily avoided if you keep a well-organized e-mail archive where you file e-mails as they come in. Sometimes, I found that I either lost the appointment e-mail or never received one. You can file other email correspondences as documentation for that service. But the official appointment e-mail would certainly be better in this case.

This also reminded me that I should write thank-you e-mails to the members of my committees that I chaired and to the committee chairs I worked with as a board member of ALA New Members Round Table. It is always nice to file a letter of appreciation rather than a letter of appointment. And as a committee chair or a board member, it should be something that you do without being asked. By sending these thank you e-mails, your committee members or chairs can file and use them when they need for their performance, promotion, or tenure review without requesting and waiting.

9. Check the timeline dates for the application.

Universities and colleges usually have a set of deadlines you have to meet in order for you to be considered for promotion or tenure. For example, you may have to have a meeting by a certain date with your supervisor or your library dean and get the green light to go for promotion. Your supervisor may have to file an official memorandum to the dean’s office until a certain date as a formal notification. Your department chair (if you are appointed to an academic department) may have to receive a memo about your application by a certain date. Your promotion file may have to be submitted to your academic department’s Promotion and Tenure committee by some time in advance before it gets forwarded to the Promotion and Tenure committee of the college. The list goes on and on. These deadlines are hard to keep tabs on but have to be tracked carefully not to miss them.

10. Plan ahead

I had to compile and create my promotion binder and three copies within a week’s notice, but this is a very unusual case due to special circumstances.  Something like this is unlikely to happen to you, but remember that creating the whole application file will take much more time than you imagine. I could have done some of the sorting out and organizing documentation work myself in advance but delayed it because there was a web data application which I was developing for my library. Looking back, I should have at least started working on the promotion file even if things were unclear and even if I had little time to spare outside of my ongoing work projects. It would have given me a much more accurate sense of how much time I will have to spend eventually on the whole dossier.

Also remember to request evaluation letters in advance. This was the most crazy part for me because I was literally given one week to request and get letters from internal and external reviewers. Asking people of a letter in a week’s time is close to asking for the impossible particularly if the reviewers are outside of your institution and have to be contacted not by you but by a third party.  I was very lucky to get all the signed PDF letters in time, but I do not recommend this kind of experience to anyone.

Plan ahead and plan well in advance. Find out whether you need letters from internal or external reviewers, how many, and what the letters need to cover. Make sure to create a list of colleagues you can request a letter from who are familiar with your work. When you request a letter, make sure to highlight the promotion or tenure criteria and what the letter needs to address, so that letter writers can quickly see what they need to focus on when they review your work. If there are any supplementary materials such as publications, book chapters, presentations, etc., make sure to forward them as well along with your CV and statement.

Lastly, if Your Application File Must be In Print…

You are lucky if you have the option to submit everything electronically or to simply submit the documentation to someone who will do the rest of work such as photocopying, filing, making binders, etc. But your institution may require the application file to be submitted in print, in multiple copies sometimes. And you may be responsible for creating those binders and copies yourself. I had to submit 4 binders, each of which exactly identical, and I was the one who had to do all the photocopying, punching holes, and filing them into a binder. I can tell you photocopying and punching holes for the documents that fill up a very thick binder and doing that multiple times was not exactly inspiring work. If this is your case as well, I recommend creating one binder as a master copy and using the professional photocopy/binding service to create copies. It would have been so much better for my sanity. In my case, the time was too short for me to create one master copy and then bring it to the outside service to make additional copies. So plan ahead and make sure you have time to use outside service. I highly recommend not using your own labor for photocopying and filing.

*       *       *

If you have any extra tips or experience to share about the promotion or tenure process at an academic library, please share them in the comments section. Hopefully in the future, all institutions will allow people to file their documentation electronically. There are also tools such as Interfolio (http://www.interfolio.com/) that you can use, which is particularly convenient for those who has to get external letters that directly have to go to the tenure and promotion committee.

Are there any other tools? Please share them in the comments section as well. Best of luck to all librarians going for promotion and tenure!

 


Library Quest: Developing a Mobile Game App for A Library

This is the story  of Library Quest (iPhone, Android), the App That (Almost) Wasn’t. It’s a (somewhat) cautionary tale of one library’s effort to leverage gamification and mobile devices to create a new and different way of orienting students to library services and collections.  Many libraries are interested in the possibilities offered by both games and mobile devices,  and they should be.  But developing for mobile platforms is new and largely uncharted territory for libraries, and while there have been some encouraging developments in creating games in library instruction, other avenues of game creation are mostly unexplored.  This is what we learned developing our first mobile app and our first large-scale game…at the same time!

Login Screen

The login screen for the completed game. We use integrated Facebook login for a host of technical reasons.

Development of the Concept: Questing for Knowledge

The saga of Library Quest began in February of 2012, when I came on board at Grand Valley State University Libraries as Digital Initiatives Librarian.  I had been reading some books on gamification and was interested in finding a problem that the concept might solve.  I found two.  First, we were about to open a new 65 million dollar library building, and we needed ways to take advantage of the upsurge of interest we knew this would create.  How could we get people curious about the building to learn more about our services, and to strengthen that into a connection with us?  Second, GVSU libraries, like many other libraries, was struggling with service awareness issues.  Comments by our users in the service dimension of our latest implementation of Libqual+ indicated that many patrons missed out on using services like inter-library loan because they were unaware that they existed.  Students often are not interested in engaging with the library until they need something specific from us, and when that need is filled, their interest declines sharply.  How could we orient students to library services and create more awareness of what we could do for them?

We designed a very simple game to address both problems.  It would be a quest or task based game, in which students actively engaged with our services and spaces, earning points and rewards as they did so.  The game app would offer tasks to students, verify their progress through multistep tasks by asking users to input alphanumeric codes or by scanning QR codes (which we ended up putting on decals that could be stuck to any flat surface).  Because this was an active game, it seemed natural to target it at mobile devices, so that people could play as they explored.  The mobile marketplace is more or less evenly split between iOS and Android devices, so we knew we wanted the game to be available on both platforms.  This became the core concept for Library Quest.  Library administration gave the idea their blessing and approval to use our technology development budget, around $12,000, to develop the game.  Back up and read that sentence over if you need to, and yes, that entire budget was for one mobile app.  The expense of building apps is the first thing to wrap your mind around if you want to create one.  While people often think of apps as somehow being smaller and simpler than desktop programs, the reality is very different.

IMG_0101

The main game screen. We found a tabbed view worked best, with quests that are available in one tab, quests that have been accepted but not completed in another, and finished quests in the third.

We contracted with Yeti CGI, a outside game development firm, to do the coding.  This was essential-app development is complicated and we didn’t have the necessary skills or experience in-house.  If we hadn’t used an outside developer, the game app would never have gotten off the ground.  We had never worked with a game-development company before, and Yeti had never worked with a library, although they had ties to higher education and were enthusiastic about the project.  Working with an outside developer always carries certain risks and advantages, and communication is always an issue.

One thing we could have done more of at this stage was spend time working on game concept and doing paper prototyping of that concept.  In his book Game Design Workshop, author Tracey Fullerton stresses two key components in designing a good game: defining the experience you want the player to have, and doing paper prototyping.  Defining the game experience from the player’s perspective forces the game designer to ask questions about how the game will play that it might not otherwise occur to them to ask.  Will this be a group or a solo experience?  Where will the fun come from?  How will the player negotiate the rules structure of the game?  What choices will they have at what points?  As author Jane McGonigal notes, educational games often fail because they do not put the fun first, which is another way of saying that they haven’t fully thought through the player’s experience.  Everything in the game: rules, rewards, format, etc.  should be shaped from the concept of the experience the designer wants to give the player.  Early concepts can and should be tested with paper prototyping.  It’s a lot easier to change rules structure for a game made with paper, scissors, and glue than with code and developers (and a lot less expensive).  In retrospect, we could have spent more time talking about experience and more time doing paper prototypes before we had Yeti start writing code.  While our game is pretty solid, we may have missed opportunities to be more innovative or provide a stronger gameplay experience.

Concept to conception: Wireframing and Usability Testing

The first few months of development were spent creating, approving, and testing paper wireframes of the interface and art concepts.  While we perhaps should have done more concept prototyping, we did do plenty of usability testing of the game interface as it developed, starting with the paper prototypes and continuing into the initial beta version of the game.  That is certainly something I would recommend that anyone else do as well.  Like a website or anything else that people are expected to use, a mobile app interface needs to be intuitive and conform to user expectations about how it should operate, and just as in website design, the only way to create an interface that does so is to engage in cycles of iterative testing with actual users.  For games, this is particularly important because they are supposed to be fun, and nothing is less fun than struggling with poor interface design.

A side note related to usability: one of the things that surfaced in doing prototype testing of the app was that giving players tasks involving library resources and watching them try to accomplish those tasks turns out to be an excellent way of testing space and service design as well.  There were times when students were struggling not with the interface, but with the library! Insufficient signage, space layout that was not clear, assumed knowledge of or access to information the students had no way of knowing, were all things became apparent in watching students try to do tasks that should have been simple.  It serves as a reminder that usability concepts apply to the physical world as much as they do to the web, and that we can and should test services in the real world the same way we test them in virtual spaces.

photo

A quest in progress. We can insert images and links into quest screens, which allows us to use webpages and images as clues.

Development:  Where the Rubber Meets the Phone

Involving an outside developer made the game possible, but it also meant that we had to temper our expectations about the scale of app development.  This became much more apparent once we’d gotten past paper prototyping and began testing beta versions of the game.  There were several ideas that we  developed early on, such as notifications of new quests, and an elaborate title system, that had to be put aside as the game evolved because of cost, and because developing other features that were more central to gameplay turned out to be more difficult than anticipated.  For example, one of the core concepts of the game was that students would be able to scan QR codes to verify that they had visited specific locations.  Because mobile phone users do not typically have QR code reader software installed, Yeti built QR code reader functionality into the game app.  This made scanning the code a more seamless part of gameplay, but getting the scanner software to work well on both the android and iOS versions proved a major challenge (and one that’s still vexing us somewhat at launch).  Tweaks to improve stability and performance on iOS threw off the android version, and vice versa.  Despite the existence of programs like Phonegap and Adobe Air, which will supposedly produce versions of the software that run on both platforms, there can still be a significant amount of work involved in tuning the different versions to get them to work well.

Developing apps that work on the android platform is particularly difficult and expensive.  While Apple has been accused of having a fetish for control, their proprietary approach to their mobile operating system produces a development environment that is, compared to android, easy to navigate.  This is because android is usually heavily modified by specific carriers and manufacturers to run on their hardware. Which means that if you want to ensure that your app runs well on an android device, the app must be tested and debugged on that specific combination of android version and hardware.  Multiply the 12 major versions of android still commonly used by the hundreds of devices that run it, and you begin to have an idea of the scope of the problem facing a developer.  While android only accounts for 50% of our potential player base, it easily took up 80% of the time we spent with Yeti debugging, and the result is an app that we are sure works on only a small selection of android devices out there.  By contrast, it works perfectly well on all but the very oldest versions of iOS.

Publishing a Mobile App: (Almost) Failure to Launch

When work began on Library Quest, our campus had no formal approval process for mobile apps, and the campus store accounts were controlled by our student mobile app development lab.  In the year and a half we spent building it, control of the campus store accounts was moved to our campus IT department, and formal guidelines and a process for publishing mobile apps started to materialize.  All of which made perfect sense, as more and more campus entities were starting to develop mobile apps and campus was rightly concerned about branding and quality issues, as well as ensuring any apps that were published furthered the university’s teaching and research mission.  However, this resulted in us trying to navigate an approval process as it materialized around us very late in development, with requests coming in for changes to the game appearance to bring it into like with new branding standards when the game was almost complete.

It was here the game almost foundered as it was being launched. During some of the discussions, it surfaced that one of the commercial apps being used by the university for campus orientation bore some superficial resemblance to Library Quest in terms of functionality, and the concern was raised that our app might be viewed as a copy.  University counsel got involved.  For a while, it seemed the app might be scrapped entirely, before it ever got out to the students!  If there had been a clear approval process when we began the app, we could have dealt with this at the outset, when the game was still in the conceptual phase.  We could have either modified the concept, or addressed the concern before any development was done.  Fortunately, it was decided that the risk was minimal and we were allowed to proceed.

A quest completion screen for one of our test quests.  These screens stick around when the quest is done, forming a kind of personalized FAQ about library services and spaces.

A quest completion screen for one of our test quests. These screens stick around when the quest is done, forming a kind of personalized FAQ about library services and spaces.

Post-Launch: Game On!

As I write this, it’s over a year since Library Quest was conceived and it has just been released “into the wild” on the Apple and Google Play stores.  We’ve yet to begin the major advertising push for the game, but it already has over 50 registered users.  While we’ve learned a great deal, some of the most important questions about this project are still up in the air.  Can we orient students using a game?  Will they learn anything?  How will they react to an attempt to engage with them on mobile devices?  There are not really a lot of established ways to measure success for this kind of project, since very few libraries have done anything remotely like it.  We projected early on in development that we wanted to see at least 300 registered users, and that we wanted at least 50 of them to earn the maximum number of points the game offered.  Other metrics for success are “squishier,” and involve doing surveys and focus groups once the game wraps to see what reactions students had to the game.  If we aren’t satisfied with performance at the end of the year, either because we didn’t have enough users or because the response was not positive, then we will look for ways to repurpose the app, perhaps as part of classroom teaching in our information literacy program, or as part of more focused and smaller scale campus orientation activities.

Even if it’s wildly successful, the game will eventually need to wind down, at least temporarily.  While the effort-reward cycle that games create can stimulate engagement, keeping that cycle going requires effort and resources.  In the case of Library Quest, this would include the money we’ve spent on our prizes and the effort and time we spend developing quests and promoting the game.  If Library Quest endures, we see it having a cyclical life that’s dependent on the academic year.  We would start it anew each fall, promoting it to incoming freshmen, and then wrap it up near the end of our winter semester, using the summers to assess and re-engineer quests and tweak the app.

Lessons Learned:  How to Avoid Being a Cautionary Tale
  1. Check to see if your campus has an approval process and a set of guidelines for publishing mobile apps. If it doesn’t, do not proceed until they exist. Lack of such a process until very late in development almost killed our game. Volunteer to help draft these guidelines and help create the process, if you need to.  There should be some identified campus app experts for you to talk to before you begin work, so you can ask about apps already in use and about any licensing agreements campus may have. There should be a mechanism to get your concept approved at the outset, as well as the finished product.
  2. Do not underestimate the power of paper.  Define your game’s concept early, and test it intensively with paper prototypes and actual users.  Think about the experience you want the players to have, as well as what you want to teach them.  That’s a long way of saying “think about how to make it fun.”  Do all of this before you touch a line of code.
  3. Keep testing throughout development.  Test your wireframes, test your beta version, test, test, test with actual players.  And pay attention to anything your testing might be telling you about things outside the game, especially if the game interfaces with the physical world at all.
  4. Be aware that mobile app development is hard, complex, and expensive.  Apps seem smaller because they’re on small devices, but in terms of complexity, they are anything but.  Developing cross-platform will be difficult (but probably necessary), and supporting android will be an ongoing challenge.  Wherever possible, keep it simple.  Define your core functionality (what does the app *have* to do to accomplish its mission) and classify everything else you’d like it to do as potentially droppable features.
  5. Consider your game’s life-cycle at the outset.  How long do you need it to run to do what you want it to do?  How much effort and money will you need to spend to keep it going for that long?  When will it wind down?
References

Fullerton, Tracy.  Game Design Workshop (4th Edition).  Amsterdam, Morgan Kaufmann.  2008

McGonigal, Jane.  Reality is Broken: Why Games Make us Better and How they Can Change the World. Penguin Press, New York.  2011

About our Guest Author:
Kyle Felker is the Digital Initiatives Librarian at Grand Valley State University Libraries, where he has worked since February of 2012.  He is also a longtime gamer.  He can be reached at felkerk@gvsu.edu, or on twitter @gwydion9.  

 


Local Dev Environments for Newbies Part 3: Installing Drupal and WordPress

Once you have built a local development environment using an AMP stack, the next logical question is, “now what?”  And the answer is, truly, whatever you want.  As an example, in this blog post we will walk through installing Drupal and WordPress on your local machine so that you can develop and test in a low-risk environment.  However, you can substitute other content management systems or development platforms and the goal is the same: we want to mimic our web server environment on our local machine.

The only prerequisite for these recipes is a working AMP stack (see our tutorials for Mac and Windows), and administrative rights to your computer.  The two sets of steps are very similar.  We need to download and unpack the files to our web root, create a database and point to it from a configuration file, and run an install script from the browser.

There are tutorials around the web on how to do both things, but I think there’s two likely gotchas for newbies:

  • There’s no “installer” that installs the platform to your system.  You unzip and copy the files to the correct place.  The “install” script is really a “setup” script, and is run after you can access the site through a browser.
  • Setting up and linking the database must be done correctly, or the site won’t work.

So, we’ll step through each process with some extra explanation.

Drupal

Drupal is an open source content management platform.  Many libraries use it for their website because it is free and it allows for granular user permissions.  So as the site administrator, I can provide access for staff to edit certain pages (ie, reference desk schedule) but not others (say, colleague’s user profiles).  In our digital library, my curatorial users can edit content, authorized student users can see content just for our students, and anonymous users can see public collections.  The platform has its downsides, but there is a large and active user community.  A problem’s solution is usually only a Google search (or a few) away.

The Drupal installation guide is a little more technical, so feel free to head there if you’re comfortable on the command line.

First, download the Drupal files from the Drupal core page.  The top set of downloads (green background) are stable versions of the platform.  The lower set are versions still in development.  For our purposes we want the green download, and because I am on my Mac, I will download the tar.gz file for the most recent version (at the time of this writing, 7.23).  If you are on a Windows machine, and have 7zip installed, you can also use the .tar.gz file.  If you do not have 7zip installed, use the .zip file.

dcore

Now, we need to create the database we’re going to use for Drupal.  In building the AMP stack, we also installed phpMyAdmin, and we’ll use it now.  Open a browser and navigate to the phpMyAdmin installation (if you followed the earlier tutorials, this will be http://localhost/~yourusername/phpmyadmin on Mac and http://localhost/phpmyadmin on Windows).  Log in with the root user you created when you installed MySQL.

The Drupal installation instructions suggest creating a user first, and through that process, creating the database we will use.  So, start by clicking on Users.

dusers

Look for the Add user button.

dadduser

Next, we want to create a username – which will serve as the user login as well as the name of the database.  Create a password and select “Local” from the Host dropdown.  This will only allow traffic from the local machine.  Under the “Database for user”, we want to select “Create database with same name and grant all privileges.”

ddb

Next, let’s copy the Drupal files and configure the settings.  Locate the file you downloaded in the first step above and move it to your web root folder.  This is the folder you previously used to test Apache and install phpMyAdmin so you could access files through your browser.  For example, on my Mac this is in mfrazer/sites.

dstructure

You may want to change the folder name from drupal-7.23 to something a little more user friendly, e.g. drupal without the version number.  Generally, it’s bad practice to have periods in file or folder names.  However, for the purposes of this tutorial, I’m going to leave the example unchanged.

Now, we want to create our settings file.  Inside your Drupal folder, look for the sites folder. We want to navigate to sites/default and create a copy of the file called default.settings.php.  Rename the copy to settings.php and open in your code editor.

dsettings

Each section of this file contains extensive directions on how to set the settings.  At the end of the Database Settings section, (line 213 as of this writing), we want to replace this

$databases = array();

with this

$databases['default']['default'] = array(
 'driver' => 'mysql',
 'database' => 'sampledrupal',
 'username' => 'sampledrupal',
 'password' => 'samplepassword',
 'host' => 'localhost',
 'prefix' => '',
 );

Remember, if you followed the steps above, ‘database’ and ‘username’ should have the same value.  Save the file.

Go back to your unpacked folder and create a directory called “files” in the same directory as our settings.php file.

dfiles

 

Now we can navigate to the setup script in our browser.  The URL is comprised of the web root, the name of the folder you extracted the drupal files into and then install.php.  So, in my case this is:

localhost/~mfrazer/drupal-7.23/install.php

If I was on a Windows machine, and had changed the name of the folder to be mydrupal, then the path would be

localhost\mydrupal\install.php

Either way, you should get something that looks like this:

dinstall

For your first installation, I would choose Standard, so you can see what the Standard install looks like.  I use Minimal for many of my sites, but if it’s your first pass into Drupal it is good to see what’s there.

Next, pick a language and click Save and Continue.  Now, the script is going to attempt to verify your requirements.  You may run into an error that looks like this:

derror

We need to make our files directory writable by our web server users.  We can do this a bunch of different ways.  It’s important to think about what you’re doing, because it involves file permissions, especially if you are allowing users in from outside your system.

On my Mac, I choose to make _www (which is the hidden web server user) the owner of the folder.  To do this, I open Terminal and type in

cd ~/sites/drupal-7.23/sites/default
sudo chown _www files

Remember, sudo will elevate us to administrator.  Type in your password when prompted. The next command is chown, followed by the new owner the folder in question.  So this command will change the owner to _www for the folder “files”.

In Windows, I did not see this error.  However, if needed, I would handle the permissions through the user interface, by navigating to the files folder, right-clicking and selecting Properties.  Click on the Security tab, then click on Edit.  In this case, we are just going to grant permissions to the users of this machine, which will include the web server user.

dwperma

Click on Users, then scroll down to click the check box under “Allow” and next to “Write.”  Click on Apply and then Ok.  Click OK again to close the Properties window.

dwperm

On my Windows machine, I got a PHP error instead of the file permissions error.

phperror

This is an easy fix, we just need to enable the gd2 and mbstring extensions in our php.ini file and restart Apache to pick up the changes.

To do this, open your php.ini file (if you followed our tutorials, this will be in your c:\opt\local directory).  Beginning on Line 868, in the Windows Extensions section, uncomment (remove the semi-colon) from the following lines (they are not right next to each other, they’re in a longer list, but we want these three uncommented):

extension=php_gd2.dll
extension=php_mbstring.dll

Save php.ini file.  Restart Apache by going to Services, click on Apache 2.4 and click Restart.

Once you think you’ve fixed the issues, go back to your browser and click on Refresh.  The Verify Requirements should pass and you should see a progress bar as Drupal installs.

Next you are taken to the Configure Site page, where you fill in the site name, your email address and create the first user.  This is important, as there are a couple of functions restricted only to this user, so remember the user name and password that you choose.  I usually leave the Server Settings alone and uncheck the Notification options.

dconfigsite

Click Save and Continue.  You should be congratulated and provided a link to your new site.

dfinal

WordPress

WordPress is a very common blogging platform; we use it at ACRL TechConnect.  It can also be modified to be used as a content management platform or an image gallery.

Full disclosure: Until writing this post, I have never done a local install of WordPress.  Fortunately, I can report that it’s very straightforward.  So, let’s get started.

The MAMP instructions for WordPress advise creating the database first, and using the root credentials.  I am not wild about this solution, because I prefer to have separate users for my databases and I do not want my root credentials sitting in a file somewhere.  So we will set up the database the same way we did above:  create a user and a database at the same time.

Open a browser and navigate to the phpMyAdmin installation (if you followed the earlier tutorials, this will be http://localhost/~yourusername/phpmyadmin on Mac and http://localhost/phpmyadmin on Windows).  Log in with the root user you created when you installed MySQL and click on Users.

dusers

Look for the Add user button.

dadduser

Next, we want to create a username – which will serve as the user login as well as the name of the database.  Create a password and select “Local” from the Host dropdown.  This will only allow traffic from the local machine.  Under the “Database for user”, we want to select “Create database with same name and grant all privileges.”

wpdb

Now, let’s download our files.  Go to http://wordpress.org/download and click 0n Download WordPress.  Move the .zip file to your web root folder and unzip it.  This is the folder you previously used to test Apache and install phpMyAdmin so you could access files through your browser.  For example, on my Mac this is in mfrazer/sites.  If you followed our tutorial for Windows, it would be c:\sites

wpunpack

Next, we need create a config file.  WordPress comes with a wp-config-sample.php file.  Make a copy of it and rename it to wp-config.php and open it with your code editor.

wpconfig

Enter in the database name, user name and password we just created.  Remember, if you followed the steps above, the database name and user name should be the same.  Verify that the host is set to local and save the file.

Navigate in your browser to the WordPress folder.  The URL is comprised of the web root and the name of the folder where you extracted the WordPress files.  So, in my case this is:

localhost/~mfrazer/wordpress

If I was on a Windows machine, and had changed the name of the folder to be wordpress-dev, then the path would be

localhost\wordpress-dev

Either way, you should get something that looks like this:

wpform

Fill in the form and click on Install WordPress.  It might take a few minutes, but you should get a success message and a Log In button.  Log in to your site using the credentials you just created in the form.

You’re ready to start coding and testing.  The next step is to think about what you want to do.  You might take a look at the theming resources provided by both WordPress and Drupal.  You might want to go all out and write a module.  No matter what, though, you now have an environment that will help you abide by the cardinal rule of development: Thou shalt not mess around in production.

Let us know how it’s going in the comments!


Demystifying Programming

We talk quite a bit about code here at Tech Connect and it’s not unusual to see snippets of it pasted into a post. But most of us, indeed most librarians, aren’t professional programmers or full-time developers; we had to learn like everyone else. Depending on your background, some parts of coding will be easy to pick up while others won’t make sense for years. Here’s an attempt to explain the fundamental building blocks of programming languages.

The Languages

There are a number of popular programming languages: C, C#, C++, Java, JavaScript, Objective C, Perl, PHP, Python, and Ruby. There are numerous others, but this semi-arbitrary selection cover the ones most commonly in use. It’s important to know that each programming language requires its own software to run. You can write Python code into a text file on a machine that doesn’t have the Python interpreter installed, but you can’t execute it and see the results.

A lot of learners stress over which language to learn first unnecessarily. Once you’ve picked up one language, you’ll understand all of the foundational pieces listed below. Then you’ll be able to transition quickly to another language by understanding a few syntax changes: Oh, in JavaScript I write function myFunction(x) to define a function, while in Python I write def myFunction(x). Programming languages differ in other ways too, but knowing the basics of one provides a huge head start on learning the basics of any other.

Finally, it’s worth briefly distinguishing compiled versus interpreted languages. Code written in a compiled language, such as all the capital C languages and Java, must first be passed to a compiler program which then spits out an executable—think a file ending if .exe if you’re on Windows—that will run the code. Interpreted languages, like Perl, PHP, Python, and Ruby, are quicker to program in because you just pass your code along to an interpreter program which immediately executes it. There’s one fewer step: for a compiled language you need to write code, generate an executable, and then run the executable while interpreted languages sort of skip that middle step.

Compiled languages tend to run faster (i.e. perform more actions or computations in a given amount of time) than interpreted ones, while interpreted ones tend to be easier to learn and more lenient towards the programmer. Again, it doesn’t matter too much which you start out with.

Variables

Variables are just like variables in algebra; they’re names which stand in for some value. In algebra, you might write:

x = 10 + 3

which is also valid code in many programming languages. Later on, if you used the value of x, it would be 13.

The biggest difference between variables in math and in programming is that programming variables can be all sort of things, not just numbers. They can be strings of text, for instance. Below, we combine two pieces of text which were stored in variables:

name = 'cat'
mood = ' is laughing'
both = name + mood

In the above code, both would have a value of ‘cat is laughing’. Note that text strings have to be wrapped in quotes—often either double or single quotes is acceptable—in order to distinguish them from the rest of the code. We also see above that variables can be the product of other variables.

Comments

Comments are pieces of text inside a program which are not interpreted as code. Why would you want to do that? Well, comments are very useful for documenting what’s going on in your code. Even if your code is never going to be seen by anyone else, writing comments helps understand what’s going on if you return to a project after not thinking about it for a while.

// This is a comment in JavaScript; code is below.
number = 5;
// And a second comment!

As seen above, comments typically work by having some special character(s) at the beginning of the line which tells the programming language that the rest of the line can be ignored. Common characters that indicate a line is a comment are # (Python, Ruby), // (C languages, Java, JavaScript, PHP), and /* (CSS, multi-line blocks of comments in many other languages).

Functions

As with variables, functions are akin to those in math: they take an input, perform some calculations with it, and return an output. In math, we might see:

f(x) = (x * 3)/4

f(8) = 6

Here, the first line is a function definition. It defines how many parameters can be passed to the function and what it will do with them. The second line is more akin to a function execution. It shows that the function returns the value 6 when passed the parameter 8. This is really, really close to programming already. Here’s the math above written in Python:

def f(x):
  return (x * 3)/4

f(8)
# which returns the number 6

Programming functions differ from mathematical ones in much the same way variables do: they’re not limited to accepting and producing numbers. They can take all sorts of data—including text—process it, and then return another sort of data. For instance, virtually all programming languages allow you to find the length of a text string using a function. This function takes text input and outputs a number. The combinations are endless! Here’s how that looks in Python:

len('how long?')
# returns the number 9

Python abbreviates the word “length” to simply “len” here, and we pass the text “how long?” to the function instead of a number.

Combining variables and functions, we might store the result of running a function in a variable, e.g. y = f(8) would store the value 6 in the variable y if f(x) is the same as above. This may seem silly—why don’t you just write y = 6 if that’s what you want!—but functions help by abstracting out blocks of code so you can reuse them over and over again.

Consider a program you’re writing to manage the e-resource URLs in your catalog, which are stored in MARC field 856 subfield U. You might have a variable named num_URLs (variable names can’t have spaces, thus the underscore) which represents the number of 856 $u subfields a record has. But as you work on records, that value is going to change; rather than manually calculate it each time and set num_URLs = 3 or num_URLs = 2 you can write a function to do this for you. Each time you pass the function a bibliographic record, it will return the number of 856 $u fields, substantially reducing how much repetitive code you have to write.

Conditionals

Many readers are probably familiar with IFTTT, the “IF This Then That” web service which can glue together various accounts, for instance “If I post a new photo to Instagram, then save it to my Dropbox backup folder.” These sorts of logical connections are essential to programming, because often whether or not you perform a particular action varies depending on some other condition.

Consider a program which counts the number of books by Virginia Woolf in your catalog. You want to count a book only if the author is Virginia Woolf. You can use Ruby code like this:

if author == 'Virginia Woolf'
  total = total + 1
end

There are three parts here: first we specify a condition, then there’s some code which runs only if the condition is true, and then we end the condition. Without some kind of indication that the block of code inside the condition has ended, the entire rest of our program would only run depending on if the variable author was set to the right string of text.

The == is definitely weird to see for the first time. Why two equals? Many programming languages use a variety of double-character comparisons because the single equals already has a meaning: single equals assigns a value to a variable (see the second line of the example above) while double-equals compares two values. There are other common comparisons:

  • != often means “is not equal to”
  • > and < are the typical greater or lesser than
  • >= and <= often mean “greater/lesser than or equal to”

Those can look weird at first, and indeed one of the more common mistakes (made by professionals and newbies alike!) is accidentally putting a single equals instead of a double.[1] While we’re on the topic of strange double-character equals signs, it’s worth pointing out that += and -= are also commonly seen in programming languages. These pairs of symbols respectively add or subtract a given number from a variable, so they do assign a value but they alter it slightly. For instance, above I could have written total += 1 which is identical in outcome as total = total + 1.

Lastly, conditional statements can be far more sophisticated than a mere “if this do that.” You can write code that says “if blah do this, but if bleh do that, and if neither do something else.” Here’s a Ruby script that would count books by Virginia Woolf, books by Ralph Ellison, and books by someone other than those two.

total_vw = 0
total_re = 0
total_others = 0
if author == 'Virginia Woolf'
  total_vw += 1
elsif author == 'Ralph Ellison'
  total_re += 1
else
  total_others += 1
end

Here, we set all three of our totals to zero first, then check to see what the current value of author is, adding one to the appropriate total using a three-part conditional statement. The elsif is short for “else if” and that condition is only tested if the first if wasn’t true. If neither of the first two conditions is true, our else section serves as a kind of fallback.

Arrays

An array is simply a list of variables, in fact the Python language has an array-like data type named “list.” They’re commonly denoted with square brackets, e.g. in Python a list looks like

stuff = [ "dog", "cat", "tree"]

Later, if I want to retrieve a single piece of the array, I just access it using its index wrapped in square brackets, starting from the number zero. Extending the Python example above:

stuff[0]
# returns "dog"
stuff[2]
# returns "tree"

Many programming languages also support associative arrays, in which the index values are strings instead of numbers. For instance, here’s an associative array in PHP:

$stuff = array(
  "awesome" => "sauce",
  "moderate" => "spice",
  "mediocre" => "condiment",
);
echo $stuff["mediocre"];
// prints out "condiment"

Arrays are useful for storing large groups of like items: instead of having three variables, which requires more typing and remembering names, we have just have one array containing everything. While our three natural numbers aren’t a lot to keep track of, imagine a program which deals with all the records in a library catalog, or all the search results returned from a query: having an array to store that large list of items suddenly becomes essential.

Loops

Loops repeat an action a set number of times or until a condition is met. Arrays are commonly combined with loops, since loops make it easy to repeat the same operation on each item in an array. Here’s a concise example in Python which prints every entry in the “names” array to the screen:

names = ['Joebob', 'Suebob', 'Bobob']
for name in names:
  print name

Without arrays and loops, we’d have to write:

name1 = 'Joebob'
name2 = 'Suebob'
name3 = 'Bobob'
print name1
print name2
print name3

You see how useful arrays are? As we’ve seen with both functions and arrays, programming languages like to expose tools that help you repeat lots of operations without typing too much text.

There are a few types of loops, including “for” loops and “while” loops loops. Our “for” loop earlier went through a whole array, printing each item out, but a “while” loop only keeps repeating while some condition is true. Here is a bit of PHP that prints out the first four natural numbers:

$counter = 1;
while ( $counter < 5 ) {
  echo $counter;
  $counter = $counter + 1;
}

Each time we go through the loop, the counter is increased by one. When it hits five, the loop stops. But be careful! If we left off the $counter = $counter + 1 line then the loop would never finish because the while condition would never be false. Infinite loops are another potential bug in a program.

Objects & Object-Oriented Programming

Object-oriented programming (oft-abbreviated OOP) is probably the toughest item in this post to explain, which is why I’d rather people see it in action by trying out Codecademy than read about it. Unfortunately, it’s not until the end of the JavaScript track that you really get to work with OOP, but it gives you a good sense of what it looks like in practice.

In general, objects are simply a means of organizing code. You can group related variables and functions under an object. You make an object inherit properties from another one, if it needs to use all the same variables and functions but also add some of its own.

For example, let’s say we have a program that deals with a series of people, each of which have a few properties like their name and age but also the ability to say hi. We can create a people class which is kind of like a template; it helps us stamp out new copies of objects without rewriting the same code over and over. Here’s an example in JavaScript:

function Person(name, age) {
  this.name = name;
  this.age = age;
  this.sayHi = function() {
    console.log("Hi, I'm " + name + ".");
  };
}

Joebob = new Person('Joebob', 39);
Suebob = new Person('Suebob', 40);
Bobob = new Person('Bobob', 3);
Bobob.sayHi();
// prints "Hi, I'm Bobob."
Suebob.sayHi();
// prints "Hi, I'm Suebob."

Our Person function is essentially a class here; it allows us to quickly create three people who are all objects with the same structure, yet they have unique values for their name and age.[2] The code is a bit complicated and JavaScript isn’t a great example, but basically think of this: if we wanted to do this without objects, we’d end up repeating the content of the Person block of code three times over.

The efficiency gained with objects is similar to how functions save us from writing lots of redundant code; identifying common structures and grouping them together under an object makes our code more concise and easier to maintain as we add new features. For instance, if we wanted to add a myAgeIs function that prints out the person’s age, we could just add it to the Person class and then all our people objects would be able to use it.

Modules & Libraries

Lest you worry that every little detail in your programs must be written from scratch, I should mention that all popular programming languages have mechanisms which allow you to reuse others’ code. Practically, this means that most projects start out by identifying a few fundamental building blocks which already exist. For instance, parsing MARC data is a non-trivial task which takes some serious knowledge both of the data structure and the programming language you’re using. Luckily, we don’t need to write a MARC parsing program on our own, because several exist already:

The Code4Lib wiki has an even more extensive list of options.

In general, it’s best to reuse as much prior work as possible rather than spend time working on problems that have already been solved. Complicated tasks like writing a full-fledged web application take a lot of time and expertise, but code libraries already exist for this. Particularly when you’re learning, it can be rewarding to use a major, well-developed project at first to get a sense of what’s possible with programming.

Attention to Detail

The biggest hangup for new programmers often isn’t conceptual: variables, functions, and these other constructs are all rather intuitive, especially once you’ve tried them a few times. Instead, many newcomers find out that programming languages are very literal and unyielding. They can’t read your mind and are happy to simply give up and spit out errors if they can’t understand what you’re trying to do.

For instance, earlier I mentioned that text variables are usually wrapped in quotes. What happens if I forget an end quote? Depending on the language, the program may either just tell you there’s an error or it might badly misinterpret your code, treating everything from your open quote down to the next instance of a quote mark as one big chunk of variable text. Similarly, accidentally misusing double equals or single equals or any of the other arcane combinations of mathematical symbols can have disastrous results.

Once you’ve worked with code a little, you’ll start to pick up tools that ease a lot of minor issues. Most code editors use syntax highlighting to distinguish different constructs  which helps to aid in error recognition. This very post uses a syntax highlighter for WordPress to color keywords like “function” and distinguish variable names. Other tools can “lint” your code for mistakes or code which, while technically valid, can easily lead to trouble. The text editor I commonly use does wonderful little things like provide closing quotes and parens, highlight lines which don’t pass linting tests, and enable me to test-run selected snippets of code.

There’s lots more…

Code isn’t magic; coders aren’t wizards. Yes, there’s a lot to programming and one can devote a lifetime to its study and practice. There are also thousands of resources available for learning, from MOOCs to books to workshops for beginners. With just a few building blocks like the ones described in this post, you can write useful code which helps you in your work.

Footnotes

[1]^ True story: while writing the very next example, I made this mistake.

[2]^ Functions which create objects are called constructor functions, which is another bit of jargon you probably don’t need to know if you’re just getting started.