Reflections on Code4Lib 2013

Disclaimer: I was on the planning committee for Code4Lib 2013, but this is my own opinion and does not reflect other organizers of the conference.

We have mentioned Code4Lib before on this blog, but for those who are unfamiliar, it is a loose collective of programmers working in libraries, librarians, and others interested in code and libraries. (You can read more about it on the website.) The Code4Lib conference has emerged as a venue to share very new technology and have discussions with a wide variety of people who might not attend conferences more geared to librarians. Presentations at the conference are decided by the votes of anyone interested in selecting the program, and additionally lightning talks and breakout sessions allow wide participation and exposure to extremely new projects that have not made it into the literature or to conferences with a longer lead time. The Code4Lib 2013 conference ran February 11-14 at University of Illinois Chicago. You can see a list of all programs here, which includes links to the video archive of the conference.

While there were many types of projects presented, I want to focus on those talks which illustrated what I saw as thread running through the conference–care and emotion. This is perhaps unexpected for a technical conference. Yet those themes underlie a great deal of the work that takes place in academic library technology and the types of projects presented at Code4Lib. We tend to work in academic libraries because we care about the collections and the people using those collections. That intrinsic motivation focuses our work.

Caring about the best way to display collections is central to successful projects. Most (though not all) the presenters and topics came out of academic libraries, and many of the presentations dealt with creating platforms for library and archival metadata and collections. To highlight a few: Penn State University has developed their own institutional repository application called ScholarSphere that provides a better user experience for researchers and managers of the repository. The libraries and archives of the Rock and Roll Hall of Fame dealt with the increasingly common problem of wanting to present digital content alongside more traditional finding aids, and so developed a system for doing so. Corey Harper from New York University presented an extremely interesting and still experimental project to use linked data to enrich interfaces for interacting with library collections. Note that all these projects combined various pieces of open source software and library/web standards to create solutions that solve a problem facing academic or research libraries for a particular setting. I think an important lesson for most academic librarians looking at descriptions of projects like this is that it takes more than development staff to make projects like this. It takes purpose, vision, and dedication to collecting and preserving content–in other words, emotion and care. A great example of this was the presentation about DIYHistory from the University of Iowa. This project started out initially as an extremely low-tech solution for crowdsourcing archival transcription, but got so popular that it required a more robust solution. They were able to adapt open source tools to meet their needs, still keeping the project very within the means of most libraries (the code is here).

Another view of emotion and care came from Mark Matienzo, who did a lightning talk (his blog post gives a longer version with more details). His talk discussed the difficulties of acknowledging and dealing with the emotional content of archives, even though emotion drives interactions with materials and collections. The records provided are emotionless and affectless, despite the fact that they represent important moments in history and lives. The type of sharing of what someone “likes” on Facebook does not satisfactorily answer the question of what they care about,or represent the emotion in their lives. Mark suggested that a tool like Twine, which allows writing interactive stories could approach the difficult question of bringing together the real with the emotional narrative that makes up experience.

One of the ways we express care for our work and for our colleagues is by taking time to be organized and consistent in code. Naomi Dushay of Stanford University Library presented best practices for code handoffs, which described some excellent practices for documenting and clarifying code and processes. One of the major takeaways is that being clear, concise, and straightforward is always preferable, even as much as we want to create cute names for our servers and classes. To preserve a spirit of fun, you can use the cute name and attach a description of what the item actually does.

Originally Bess Sadler, also from Stanford, was going to present with Naomi, but ended up presenting a different talk and the last one of the conference on Creating a Commons (the full text is available here). This was a very moving look at what motivates her to create open source software and how to create better open source software projects. She used the framework of the Creative Commons licenses to discuss open source software–that it needs to be “[m]achine readable, human readable, and lawyer readable.” Machine readable means that code needs to be properly structured and allow for contributions from multiple people without breaking, lawyer readable means that the project should have the correct structure and licensing to collaborate across institutions. Bess focused particularly on the “human readable” aspect of creating communities and understanding the “hacker epistemology,” as she so eloquently put it, “[t]he truth is what works.” Part of understanding that requires being willing to reshape default expectations–for instance, the Code4Lib community developed a Code of Conduct at Bess’s urging to underline the fact that the community aims at inclusion and creating a safe space. She encouraged everyone to keep working to do better and “file bug reports” about open source communities.

This year’s Code4Lib conference was a reminder to me about why I do the work I do as an academic librarian working in a technical role. Even though I may spend a lot of time sitting in front of a computer looking at code, or workflows, or processes, I know it makes access to the collections and exploration of those collections better.


What Should Technology Librarians Be Doing About Alternative Metrics?

Bibliometrics– used here to mean statistical analyses of the output and citation of periodical literature–is a huge and central field of library and information science. In this post, I want to address the general controversy surrounding these metrics when evaluating scholarship and introduce the emerging alternative metrics (often called altmetrics) that aim to address some of these controversies and how these can be used in libraries. Librarians are increasingly becoming focused on the publishing side of the scholarly communication cycle, as well as supporting faculty in new ways (see, for instance, David Lankes’s thought experiment of the tenure librarian). What is the reasonable approach for technology-focused academic librarians to these issues? And what tools exist to help?

There have been many articles and blog posts expressing frustration with the practice of using journal impact factors for judging the quality of a journal or an individual researcher (see especially Seglen). One vivid illustration of this frustration is in a recent blog post by Stephen Curry titled “Sick of Impact Factors”. Librarians have long used journal impact factors in making purchasing decisions, which is one of the less controversial uses of these metrics 1 The essential message of all of this research about impact factors is that traditional methods of counting citations or determining journal impact do not answer questions about what articles are influential and how individual researchers contribute to the academy. For individual researchers looking to make a case for promotion and tenure, questions of use of metrics can be all or nothing propositions–hence the slightly hysterical edge in some of the literature. Librarians, too, have become frustrated with attempting to prove the return on investment for decisions–see “How ROI Killed the Academic Library”–going by metrics alone potentially makes the tools available to researchers more homogeneous and ignores niches. As the alt metrics manifesto suggests, the traditional “filters” in scholarly communication of peer review, citation metrics, and journal impact factors are becoming obsolete in their current forms.

Traditional Metrics

It would be of interest to determine, if possible, the part which men of different calibre [sic] contribute to the progress of science.

Alfred Lotka (a statistician at the Metropolitan Life Insurance Company, famous for his work in demography) wrote these words in reference to his 1926 statistical analysis of the journal output of chemists 2 Given the tools available at the time, it was a fairly limited sample size, looking at just the first two letters of an author index for the period of 16 years compared with a slim 100 page volume of important works “from the beginning of history to 1900.” His analysis showed that the more articles published in a field, the less likely it is for an individual author to publish more than one article. As Per Seglen puts it, this showed the “skewness” of science 3

The original journal impact factor was developed by Garfield in the 1970s, and used the “mean number of citations to articles published in two preceding years” 4.   Quite clearly, this is supposed to measure the general amount that a journal was cited, and hence a guide to how likely a researcher was to read and immediately find useful the body of work in this journal in his or her own work. This is helpful for librarians trying to make decisions about how to stretch a budget, but the literature has not found that a journal’s impact has much to do with an individual article’s citedness and usefulness 5 As one researcher suggests, using it for anything other than its intended original use constitutes pseudoscience 6 Another issue with which those at smaller institutions are very familiar is the cost of accessing traditional metrics. The major resources that provide these are Thomson Reuters’ Journal Citation Reports and Web of Science, and Elsevier’s Scopus, and both are outside the price range of many schools.

Metrics that attempt to remedy some of these difficulties have been developed. At the journal level, the Eigenfactor® and Article Influence Score™ use network theory to estimate “the percentage of time that library users spend with that journal”, and the Article Influence Score tracks the influence of the journal over five years. 7. At the researcher level, the h-index tracks the impact of specific researchers (it was developed with physicists in mind). The h-index takes into account the number of papers the researcher has published in how much time when looking at citations. 8

These are included under the rubric of alternative metrics since they are an alternative to the JCR, but rely on citations in traditional academic journals, something which the “altmetric” movement wants to move beyond.

Alt Metrics

In this discussion of alt metrics I will be referring to the arguments and tools suggested by Altmetrics.org. In the alt metrics manifesto, Priem et al. point to several manifestations of scholarly communication that are unlike traditional article publications, including raw data, “nanopublication”, and self-publishing via social media (which was predicted as so-called “scholarly skywriting” at the dawn of the World Wide Web 9). Combined with sharing of traditional articles more readily due to open access journals and social media, these all create new possibilities for indicating impact. Yet the manifesto also cautions that we must be sure that the numbers which alt metrics collect “really reflect impact, or just empty buzz.”  The research done so far is equally cautious. A 2011 study suggests that tweets about articles (tweetations) do correlate with citations but that we cannot say that number of tweets about an article really measures the impact. 10

A criticism expressed in the media about alt metrics is that alternative metrics are no more likely to be able to judge the quality or true impact of a scientific paper than traditional metrics. 11 As Per Seglen noted in 1992, “Once the field boundaries are broken there is virtually no limit to the number of citations an article may accrue.” 12 So an article that is interdisciplinary in nature is likely to do far better in the alternative metrics realm than a specialized article in a discipline that still may be very important. Mendeleley’s list of top research papers demonstrates this–many (though not all) the top articles are about scientific publication in general rather than about specific scientific results.

What can librarians use now?

Librarians are used to questions like “What is the impact factor of Journal X?” For librarians lucky enough to have access to Journal Citation Reports, this is a matter of looking up the journal and reporting the score. They could answer “How many times has my article been cited?” in Web of Science or Scopus using some care in looking for typos. Alt metrics, however, remind us that these easy answers are not telling the whole story. So what should librarians be doing?

One thing that librarians can start doing is helping their campus community get signed up for the many different services that will promote their research and provide article level citation information. Below are listed a small number (there are certainly others out there) of services that you may want to consider using yourself or having your campus community use. Some, like PubMed, won’t be relevant to all disciplines. Altmetrics.org lists several tools beyond what is listed below to provide additional ideas.

These tools offer various methods for sharing. PubMed allows one to embed “My Bibliography” in a webpage, as well as to create delegates who can help curate the bibliography. A developer can use the APIs provided by some of these services to embed data for individuals or institutions on a library website or institutional repository. ImpactStory has an API that makes it relatively easy to embed data for individuals or institutions on a library website or institutional repository. Altmetric.com also has an API that is free for non-commercial use. Mendeley has many helpful apps that integrate with popular content management systems.

Since this is such a new field, it’s a great time to get involved. Altmetrics.org held a hackathon in November 2012 and has a Google Doc with the ideas for the hackathon. This is an interesting overview of what is going on with open source hacking on alt metrics.

Conclusion

The altmetrics manifesto program calls for a complete overhaul of scholarly communication–alternative research metrics are just a part of their critique. And yet, for librarians trying to help researchers, they are often the main concern. While science in general calls for a change to the use of these metrics, we can help to shape the discussion through educating and using alternative metrics.

 

Works Cited and Suggestions for Further Reading
Bourg, Chris. 2012. “How ROI Killed the Academic Library.” Feral Librarian. http://chrisbourg.wordpress.com/2012/12/18/how-roi-killed-the-academic-library/.
Cronin, Blaise, and Kara Overfelt. 1995. “E-Journals and Tenure.” Journal of the American Society for Information Science 46 (9) (October): 700-703.
Curry, Stephen. 2012. “Sick of Impact Factors.” Reciprocal Space. http://occamstypewriter.org/scurry/2012/08/13/sick-of-impact-factors/.
“Methods”, 2012. Eigenfactor.org.
Eysenbach, Gunther. 2011. “Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact.” Journal Of Medical Internet Research 13 (4) (December 19): e123-e123.
Gisvold, Sven-Erik. 1999. “Citation Analysis and Journal Impact Factors – Is the Tail Wagging the Dog?” Acta Anaesthesiologica Scandinavica 43 (November): 971-973.
Hirsch, J. E. “An Index to Quantify an Individual’s Scientific Research Output.” Proceedings of the National Academy of Sciences of the United States of America 102, no. 46 (November 15, 2005): 16569–16572. doi:10.1073/pnas.0507655102.
Howard, Jennifer. 2012. “Scholars Seek Better Ways to Track Impact Online.” The Chronicle of Higher Education, January 29, sec. Technology. http://chronicle.com/article/As-Scholarship-Goes-Digital/130482/.
Jump, Paul. 2012. “Alt-metrics: Fairer, Faster Impact Data?” Times Higher Education, August 23, sec. Research Intelligence. http://www.timeshighereducation.co.uk/story.asp?storycode=420926.
Lotka, Alfred J. 1926. “The Frequency Distribution of Scientific Productivity.” Journal of the Washington Academy of Sciences 26 (12) (June 16): 317-324.
Mayor, Julien. 2010. “Are Scientists Nearsighted Gamblers? The Misleading Nature of Impact Factors.” Frontiers in Quantitative Psychology and Measurement: 215. doi:10.3389/fpsyg.2010.00215.
Oransky, Ivan. 2012. “Was Elsevier’s Peer Review System Hacked to Get More Citations?” Retraction Watch. http://retractionwatch.wordpress.com/2012/12/18/was-elseviers-peer-review-system-hacked-to-get-more-citations/.
Priem, J., D. Taraborelli, P. Groth, and C. Neylon. 2010. “Altmetrics: A Manifesto.” Altmetrics.org. http://altmetrics.org/manifesto/.
Seglen, Per O. 1992. “The Skewness of Science.” Journal of the American Society for Information Science 43 (9) (October): 628-638.
———. 1994. “Causal Relationship Between Article Citedness and Journal Impact.” Journal of the American Society for Information Science 45 (1) (January): 1-11.
Vanclay, Jerome K. 2011. “Impact Factor: Outdated Artefact or Stepping-stone to Journal Certification?” Scientometrics 92 (2) (November 24): 211-238. doi:10.1007/s11192-011-0561-0.
Notes
  1. Jerome K. Vanclay,  “Impact Factor: Outdated Artefact or Stepping-stone to Journal Certification?” Scientometrics 92 (2) (2011):  212.
  2. Alfred Lotka, “The Frequency Distribution of Scientific Productivity.” Journal of the Washington Academy of Sciences 26 (12) (1926)): 317.
  3. Per Seglen, “The Skewness of Science.” Journal of the American Society for Information Science 43 (9) (1992): 628.
  4. Vanclay, 212.
  5. Per Seglen, “Causal Relationship Between Article Citedness and Journal Impact.” Journal of the American Society for Information Science 45 (1) (1994): 1-11.
  6. Vanclay, 211.
  7. “Methods”, Eigenfactor.org, 2012.
  8. J.E. Hirsch, “An Index to Quantify an Individual’s Scientific Research Output.” Proceedings of the National Academy of Sciences of the United States of America 102, no. 46 (2005): 16569–16572.
  9. Blaise Cronin and Kara Overfelt, “E-Journals and Tenure.” Journal of the American Society for Information Science 46 (9) (1995): 700.
  10. Gunther Eysenbach, “Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact.” Journal Of Medical Internet Research 13 (4) (2011): e123.
  11. see in particular Jump.
  12. Seglen, 637.

Taking Google Forms to the Next Level

Many libraries use Google Forms for collecting information from patrons, particularly for functions like registering for a one-time event or filling out a survey. It’s a popular option because these forms are very easy to set up and start using with no overhead. With a little additional effort and a very small amount of code you can make these forms even more functional.

In this post, we’ll look at the process for adapt a simple library workshop registration form to send a confirmation email and introduce you to the Google Apps Scripts documentation. This is adapted from a tutorial for creating a help desk application, which you can see here. I talked about the overall process of creating simple applications for free with minimal coding skills at this year’s LITA Forum, and you can see the complete presentation here. In this post I will focus on the Google Forms tricks.

A few things to keep in mind before you get started. Use a library account when you actually deploy the applications, since that will remain “owned” by the library even if the person who creates it moved on. These instructions are also intended for regular “consumer” Google accounts–there are additional tools available for Google Apps business customers, which I don’t address here.

Creating Your Form

Create a form as you normally would. Here’s an example of a simple workshop registration form.

There are a few potential problems with the way this form is set up, but here’s an even bigger problem. Once the person signing up clicks submit, the form disappears, and he receives a page saying “Thank you for registering!”

If this person did not record the workshop, he now has no real idea of what he signed up for. What he intended to do and what he actually did may not be the same thing!

What comes next? You, the librarian hosting the workshop, goes into the spreadsheet to see if anyone has signed up. If you want to confirm the sign-up, you can copy the patron’s email address into your email program, and then copy in a message to confirm the sign-up. If you only have a few people signed up, this may not take long, but it adds many unnecessary  steps and requires you to remember to do it.

Luckily, Google has provided all the tools you need to add in an email confirmation function, and it’s not hard to use as long as you know some basic Javascript. Let’s look at an example.

Adding in an email confirmation

To access these functions, visit your spreadsheet, and click on Script Editor in the Tools menu.

You will get many options, which you can use, or you can simply create a script for  a Blank Project (first option) You will get this in your blank project:

function myFunction() {

}

Change the name of the function to be something meaningful. Now you can fill in the details for the function. Basically we use the built-in Google Spreadsheet functions to grab the value of each column we want to include and store these in a variable. You just put in the column number–but remember we are starting from 0 (which is the Timestamp column in our current example).

function emailConfirm(e) {
  var userEmail = e.values[3];
  var firstName = e.values[1];
  var lastName = e.values[2];
  var workshopDate = e.values[4];
  MailApp.sendEmail(userEmail, 
                    "Registration confirmation", 
                    "Thanks for registering for the library workshop on " + workshopDate + " \n\nYou will " +
                    "recieve a reminder email 24 hours prior. \n\nLibrary",                    
                    {name:"Library"});
}

The MailApp class is another built-in Google Apps script. The sendEmail method takes the following arguments: recipients, subject, body, optAdvancedArgs. You can see in the above example that the userEmail variable (the patron’s email address in the form) is the recipient, the subject is “Registration confirmation”, the body contains a generic thank you plus the date of the workshop, which we’ve stored in workshopDate variable. Then we’ve put in advanced arguments the name “Library”–this is optional, particularly if it’s coming from a library email account.

Note that if a patron hits “reply” to cancel or ask a question, the email will automatically go to the email account that deployed the application. But you may want reply emails to go somewhere else. You can modify the last “advanced” argument to be some other email address with the replyto argument. (Note that this doesn’t always work–and that people can see that the email comes from elsewhere, so make sure that someone is checking the email from which the application is deployed).

 {name:"Library", replyto:"mheller@dom.edu"});
Running the script

Once you’ve filled in your script and hit save (it will do a quick debug when you save), you have to set up when the script should run. Select “Current script’s triggers…” from the Resources menu.

Now select the trigger “On form submit”. While you’re here, also click on notifications.

The notifications will tell you any time your script fails to run. For your first script, choose “immediately” so you can see what went wrong if it didn’t work. In the future you can select daily or weekly.

Before you can save either your trigger or failure notifications, you need to authorize that Google can run the script for you.

Now your script will work! Next time a patron fills out your form to register for a workshop, he will receive this email:

Doing More

After working with this very basic script you can explore the Google Apps Script documentation. If you are working with Google Forms, you will find the Spreadsheet Services classes very useful. There are also some helpful tutorials you can work through to learn how to use all the features. This one will teach you how to send emails from the spreadsheet–something you can use when it’s time to remind patrons of which workshops they have signed up for!


PeerJ: Could it Transform Open Access Publishing?

Open access publication makes access to research free for the end reader, but in many fields it is not free for the author of the article. When I told a friend in a scientific field I was working on this article, he replied “Open access is something you can only do if you have a grant.” PeerJ, a scholarly publishing venture that started up over the summer, aims to change this and make open access publication much easier for everyone involved.

While the first publication isn’t expected until December, in this post I want to examine in greater detail the variation on the “gold” open-access business model that PeerJ states will make it financially viable 1, and the open peer review that will drive it. Both of these models are still very new in the world of scholarly publishing, and require new mindsets for everyone involved. Because PeerJ comes out of funding and leadership from Silicon Valley, it can more easily break from traditional scholarly publishing and experiment with innovative practices. 2

PeerJ Basics

PeerJ is a platform that will host a scholarly journal called PeerJ and a pre-print server (similar to arXiv) that will publish biological and medical scientific research. Its founders are Peter Binfield (formerly of PLoS ONE) and Jason Hoyt (formerly of Mendeley), both of whom are familiar with disruptive models in academic publishing. While the “J” in the title stands for Journal, Jason Hoyt explains on the PeerJ blog that while the journal as such is no longer a necessary model for publication, we still hold on to it. “The journal is dead, but it’s nice to hold on to it for a little while.” 3. The project launched in June of this year, and while no major updates have been posted yet on the PeerJ website, they seem to be moving towards their goal of publishing in late 2012.

To submit a paper for consideration in PeerJ, authors must buy a “lifetime membership” starting at $99. (You can submit a paper without paying, but it costs more in the end to publish it). This would allow the author to publish one paper in the journal a year. The lifetime membership is only valid as long as you meet certain participation requirements, which at minimum is reviewing at least one article a year. Reviewing in this case can mean as little as posting a comment to a published article. Without that, the author might have to pay the $99 fee again (though as yet it is of course unclear how strictly PeerJ will enforce this rule). The idea behind this is to “incentivize” community participation, a practice that has met with limited success in other arenas. Each author on a paper, up to 12 authors, must pay the fee before the article can be published. The Scholarly Kitchen blog did some math and determined that for most lab setups, publication fees would come to about $1,124 4, which is equivalent to other similar open access journals. Of course, some of those researchers wouldn’t have to pay the fee again; for others, it might have to be paid again if they are unable to review other articles.

Peer Review: Should it be open?

PeerJ, as the name and the lifetime membership model imply, will certainly be peer-reviewed. But, keeping with its innovative practices, it will use open peer review, a relatively new model. Peter Binfield explained in this interview PeerJ’s thinking behind open peer review.

…we believe in open peer review. That means, first, reviewer names are revealed to authors, and second, that the history of the peer review process is made public upon publication. However, we are also aware that this is a new concept. Therefore, we are initially going to encourage, but not require, open peer review. Specifically, we will be adopting a policy similar to The EMBO Journal: reviewers will be permitted to reveal their identities to authors, and authors will be given the choice of placing the peer review and revision history online when they are published. In the case of EMBO, the uptake by authors for this latter aspect has been greater than 90%, so we expect it to be well received. 5

In single blind peer review, the reviewers would know the name of the author(s) of the article, but the author would not know who reviewed the article. The reviewers could write whatever sorts of comments they wanted to without the author being able to communicate with them. For obvious reasons, this lends itself to abuse where reviewers might not accept articles by people they did not know or like or tend to accept articles from people they did like 6 Even people who are trying to be fair can accidentally fall prey to bias when they know the names of the submitters.

Double blind peer review in theory takes away the ability for reviewers to abuse the system. A link that has been passed around library conference planning circles in the past few weeks is the JSConf EU 2012 which managed to improve its ratio of female presenters by going to a double-blind system. Double blind is the gold standard for peer review for many scholarly journals. Of course, it is not a perfect system either. It can be hard to obscure the identity of a researcher in a small field in which everyone is working on unique topics. It also is a much lengthier process with more steps involved in the review process.  To this end, it is less than ideal for breaking medical or technology research that needs to be made public as soon as possible.

In open peer review, the reviewers and the authors are known to each other. By allowing for direct communication between reviewer and researcher, this speeds up the process of revisions and allows for greater clarity and speed 7.  Open peer review doesn’t affect the quality of the reviews or the articles negatively, it does make it more difficult to find qualified reviewers to participate, and it might make a less well-known researcher more likely to accept the work of a senior colleague or well-known lab.  8.

Given the experience of JSConf and a great deal of anecdotal evidence from women in technical fields, it seems likely that open peer review is open to the same potential abuse of single peer review. While  open peer review might make the rejected author able to challenge unfair rejections, this would require that the rejected author feels empowered enough in that community to speak up. Junior scholars who know they have been rejected by senior colleagues may not want to cause a scene that could affect future employment or publication opportunities. On the other hand, if they can get useful feedback directly from respected senior colleagues, that could make all the difference in crafting a stronger article and going forward with a research agenda. Therein lies the dilemma of open peer review.

Who pays for open access?

A related problem for junior scholars exists in open access funding models, at least in STEM publishing. As open access stands now, there are a few different models that are still being fleshed out. Green open access is free to the author and free to the reader; it is usually funded by grants, institutions, or scholarly societies. Gold open access is free to the end reader but has a publication fee charged to the author(s).

This situation is very confusing for researchers, since when they are confronted with a gold open access journal they will have to be sure the journal is legitimate (Jeffrey Beall has a list of Predatory Open Access journals to aid in this) as well as secure funding for publication. While there are many schemes in place for paying publication fees, there are no well-defined practices in place that illustrate long-term viability. Often this is accomplished by grants for the research, but not always. The UK government recently approved a report that suggests that issuing “block grants” to institutions to pay these fees would ultimately cost less due to reduced library subscription fees.  As one article suggests, the practice of “block grants” or other funding strategies are likely to not be advantageous to junior scholars or those in more marginal fields 9. A large research grant for millions of dollars with the relatively small line item for publication fees for a well-known PI is one thing–what about the junior humanities scholar who has to scramble for a few thousand dollar research stipend? If an institution only gets so much money for publication fees, who gets the money?

By offering a $99 lifetime membership for the lowest level of publication, PeerJ offers hope to the junior scholar or graduate student to pursue projects on their own or with a few partners without worrying about how to pay for open access publication. Institutions could more readily afford to pay even $250 a year for highly productive researchers who were not doing peer review than the $1000+ publication fee for several articles a year. As above, some are skeptical that PeerJ can afford to publish at those rates, but if it is possible, that would help make open access more fair and equitable for everyone.

Conclusion

Open access with low-cost paid up front could be very advantageous to researchers and institutional  bottom lines, but only if the quality of articles, peer reviews, and science is very good. It could provide a social model for publication that will take advantage of the web and the network effect for high quality reviewing and dissemination of information, but only if enough people participate. The network effect that made Wikipedia (for example) so successful relies on a high level of participation and engagement very early on to be successful [Davis]. A community has to build around the idea of PeerJ.

In almost the opposite method, but looking to achieve the same effect, this last week the Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3) announced that after years of negotiations they are set to convert publishing in that field to open access starting in 2014. 10 This means that researchers (and their labs) would not have to do anything special to publish open access and would do so by default in the twelve journals in which most particle physics articles are published. The fees for publication will be paid upfront by libraries and funding agencies.

So is it better to start a whole new platform, or to work within the existing system to create open access? If open (and through a commenting s system, ongoing) peer review makes for a lively and engaging network and low-cost open access  makes publication cheaper, then PeerJ could accomplish something extraordinary in scholarly publishing. But until then, it is encouraging that organizations are working from both sides.

  1. Brantley, Peter. “Scholarly Publishing 2012: Meet PeerJ.” PublishersWeekly.com, June 12, 2012. http://www.publishersweekly.com/pw/by-topic/digital/content-and-e-books/article/52512-scholarly-publishing-2012-meet-peerj.html.
  2. Davis, Phil. “PeerJ: Silicon Valley Culture Enters Academic Publishing.” The Scholarly Kitchen, June 14, 2012. http://scholarlykitchen.sspnet.org/2012/06/14/peerj-silicon-valley-culture-enters-academic-publishing/.
  3. Hoyt, Jason. “What Does the ‘J’ in ‘PeerJ’ Stand For?” PeerJ Blog, August 22, 2012. http://blog.peerj.com/post/29956055704/what-does-the-j-in-peerj-stand-for.
  4. http://scholarlykitchen.sspnet.org/2012/06/14/is-peerj-membership-publishing-sustainable/
  5. Brantley
  6. Wennerås, Christine, and Agnes Wold. “Nepotism and sexism in peer-review.” Nature 387, no. 6631 (May 22, 1997): 341–3.
  7. For an ingenious way of demonstrating this, see Leek, Jeffrey T., Margaret A. Taub, and Fernando J. Pineda. “Cooperation Between Referees and Authors Increases Peer Review Accuracy.” PLoS ONE 6, no. 11 (November 9, 2011): e26895.
  8. Mainguy, Gaell, Mohammad R Motamedi, and Daniel Mietchen. “Peer Review—The Newcomers’ Perspective.” PLoS Biology 3, no. 9 (September 2005). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1201308/.
  9. Crotty, David. “Are University Block Grants the Right Way to Fund Open Access Mandates?” The Scholarly Kitchen, September 13, 2012. http://scholarlykitchen.sspnet.org/2012/09/13/are-university-block-grants-the-right-way-to-fund-open-access-mandates/.
  10. Van Noorden, Richard. “Open-access Deal for Particle Physics.” Nature 489, no. 7417 (September 24, 2012): 486–486.

The Digital Public Library of America: What Does a New Platform Mean for Academic Research?

Robert Darnton asked in the New York Review of Books blog nearly two years ago: “Can we create a National Digital Library?” 1 Anyone who recalls reference homework exercises checking bibliographic information for United States imprints versus British or French will certainly remember the United States does not have a national library in the sense of a library that collects all the works of that country and creates a national bibliography 2 Certain libraries, such as the Library of Congress, have certain prerogatives for collection and dissemination of standards 3, but there is no one library that creates a national bibliography. Such it was for print, and so it remains even more so for digital. So when Darnton asks that–as he goes on to illuminate further in his article–he is asking a much larger question about  libraries in the United States. European and Asian countries have created national digital libraries as part of or in addition to their national print libraries.  The question is: if others can do it, why can’t we? Furthermore, why can’t we join those libraries with our national digital library? The DPLA has  announced collaboration with Europeana, which has already had notable successes with digitizing content and making it and its metadata freely available. This indicates that we could potentially create a useful worldwide digital library, or at least a North American/European one.The dream of Paul Otlet’s universal bibliography seems once again to be just out of reach.

In this post, I want to examine what the Digital Public Library of America claims to do, and what approaches it is taking. It is still new enough and there are still enough unanswered questions to give any sort of final answer to whether this will actually be the national digital library. Nonetheless, there seems to be enough traction and, perhaps more importantly, funding that we should pay close attention to what is delivered in April 2013.

Can we reach a common vision about the nature of the DPLA?

The planning for the DPLA started in the  fall of 2010 when Harvard’s Berkman Center received a grant from the Sloan Foundation to begin planning the project in earnest. The initial idea was to digitize all the materials which it was legal to digitize, and create a platform that would be accessible to all people in the US (or nationally). Google had already proved that it was possible, so it seemed that with many libraries working together it would be concievable to repeat their sucesses, but with solely non-commerical motives  4.

The initials stages of planning brought out many different ideas and perspectives about the philosophical and practical components of the DPLA, many of which are still unanswered. The theme of debate that has emerged are whether the DPLA would be a true “public” library, and what in fact ought to be in such a library. David Rothman argues that the DPLA as described by Darnton would be a wonderful tool for making humanities research easy and viable for more people, but would not solve the problems of making popular e-books  accessible through libraries or getting students up-to-date textbooks. The latter two aims are much more challenging than getting access to public domain or academic materials because a lot more money is at stake 5.

One of the projects for the Audience and Content workstream is to figure out how average Americans might actually use a digital public library of America. One of the potential use cases is a student who can just use DPLA to write a whole paper on the Iriquois Nations. Teachers and librarians posted some questions about this in the comments, including questioning whether it is appropriate to tell students to use one portal for all research. We generally counsel students to check multiple sources–and getting students used to searching one place that happens to be appropriate for searching one topic may not work if the DPLA has nothing available on say, the latest computer technology.

Digital content and the DPLA

What content the DPLA will provide will surely become more clear over the following months. They have appointed Emily Gore as Director of Content, and continue to hold further working groups on content and audience. The DPLA website promises a remarkable vision for content:

The DPLA will incorporate all media types and formats including the written record—books, pamphlets, periodicals, manuscripts, and digital texts—and expanding into visual and audiovisual materials in concert with existing repositories. In order to lay a solid foundation for its collections, the DPLA will begin with works in the public domain that have already been digitized and are accessible through other initiatives. Further material will be added incrementally to this basic foundation, starting with orphan works and materials that are in copyright but out-of-print. The DPLA will also explore models for digital lending of in-copyright materials. The content that is contributed to or funded by the DPLA will be made available, including through bulk download, with no new restrictions, via a service available to libraries, museums, and archives in the United States, with use and reuse governed only by public law.  6

All of these models exist in one way or another already, however, so how is this something new?

The major purveyors of out of copyright digital book content are Google Books and HathiTrust. The potential problems with Google Books are obvious just in the name–Google is a publicly traded company with aspirations to be the hub of all world information. Privacy and availability, not to mention legality, are a few of the concerns. HathiTrust is a collective of research universities digitizing collections, many in concert with Google Books, but the full text of these books in a convenient format is generally only available to members of HathiTrust. HathiTrust faced a lawsuit from the Authors Guild about its digitization of orphan works, which is an issue the DPLA is also planning to address.

Other projects exist trying to make currently in copyright digital books more accessible, of which Unglue.it is probably best known. This requires a critical mass of people to actively work to pay to release a book into the public domain, and so may not serve the scholar with a unique research project. Some future plans for the DPLA include to obtain funds to pay authors for use–but this may or may not include releasing books into the public domain.

DPLA is not meant to include books alone. Planning so far suggests that books make a logical jumping off point. The “Concept Note” points out that “if it takes the sky as its limit, it will never get off the ground.” Despite this caution, ideally it would eventually be a portal to all types of materials already made available by cultural institutions, including datasets and government information.

Do we need another platform?

The first element of the DPLA is code–it will use open source technologies in developing a platform, and will release all code (and the tools and services this code builds) as open source software.  The so-called “Beta Sprint” that took place last year invited people to “grapple, technically and creatively, with what has already been accomplished and what still need to be developed…” 7. The winning “betas” deal largely with issues of interoperability and linked data. Certainly if a platform could be developed that solved these problems, this would be a huge boon to the library world.

Getting involved withe DPLA and looking to the future

While the governance structure is becoming more formal, there are plenty of opportunities to become involved with the DPLA. Six working groups (called workstreams) were formed to discuss content, audience, legal issues, business models, governance, and technical issues. Becoming involved with the DPLA is as easy as signing up for an  account on the wiki and noting your name and comments on the working group page in which are interested. You can also sign up mailing lists to stay involved in the project. Like many such projects, the work is done by the people who show up and speak up. If you read this and have an opinion on the direction the DPLA should take, it is not difficult to make sure your opinion gets heard by the right people.

Like all writing about the DPLA since the planning began, turning to a thought experiment seems the next logical rhetorical step. Let’s say that the DPLA succeeds to the point where all public domain books in the United States are digitized and available in multiple formats to any person in the country, and a significant number of in copyright works are also available. What does this mean for libraries as a whole? Does it make public libraries research libraries? How does it change the nature of research libraries? And lastly, will all this information create a new desire for knowledge among the American people?

References
  1. Darnton, Robert. “A Library Without Walls.” NYRblog, October 4, 2010. http://www.nybooks.com/blogs/nyrblog/2010/oct/04/library-without-walls/.
  2. McGowan, Ian. “National Libraries.” In Encyclopedia of Library and Information Sciences, Third Edition, 3850–3863.
  3. “Frequently Asked Questions – About the Library (Library of Congress).” Text, n.d. http://www.loc.gov/about/faqs.html#every_book
  4. Dillon, Cy. “Planning the Digital Public Library of America.” College & Undergraduate Libraries 19, no. 1 (March 2012): 101–107.
  5. Rothman, David H. “It’s Time for a National Digital-Library System.” The Chronicle of Higher Education, February 24, 2011, sec. The Chronicle Review. http://chronicle.com/article/Its-Time-for-a-National/126489/.
  6. “Elements of the DPLA.” Digital Public Library of America, n.d. http://dp.la/about/elements-of-the-dpla/.
  7. “Digital Public Library of America Steering Committee Announces ‘Beta Sprint’ ”, May 20, 2011. http://cyber.law.harvard.edu/newsroom/Digital_Public_Library_America_Beta_Sprint.

Creating quick solutions and having fun: the joy of hackathons

Women Who Code

Photo credit Adria Richards, used CC BY-SA 2.0

Hackathons– aka “hackfests”, “codefests”, or “codeathons”, are time periods dedicated to “hacking” on a problem, or creating a quick and dirty technical solution. (They have nothing to do with “hackers” in the virus or breaking into computers sense of the word). Traditionally, hackathons gave developers a chance to meet in person to work on specific technologies or platforms.  But increasingly, the concept of hackathons are used to work on solving technical problems or developing new ideas using technology in fields such as  law, public data, water supply, and making the world a better place. Academic librarians should be thinking about hackathons for several reasons: first, we help researchers to learn about innovative tools and resources in their areas, and these days a lot of this work is happening in hackathon settings. Second, hackathons are often improve library technology in open source and proprietary products alike. And third, hackathons are sometimes taking place in academic libraries (such as the University of Michigan and the University of Florida). Even non-coders can and should keep an eye on what’s going on with hackathons and start getting involved.

Origins of hackathons

People have, of course, hacked at technical problems and created innovative technical solutions since the beginning of computing. But the first known use of the term “hackathon” to describe a specific event was in June of 1999 when a group of OpenBSD developers met in Calgary to work on cryptography (see more on the record of OpenBSD hackathons). Later that same month, Sun Microsystems used the term on a Palm V project. 1 Just as in a marathon, individuals came together to accomplish a very challenging project in a short and fixed amount of time.

The term and concept became increasingly popular over the course of the first decade of the 2000s. The concept can vary widely, but is usually understood to mean a short time period (often a weekend) during which a specific problem is addressed by a group of developers working together, often by themselves but in close enough proximity to each other to meet and discuss issues. They usually are in person events where everyone meets in one location, but can be distributed virtual events. Often hackathons have prizes for best solution, and are a chance for developers to show off their talent to potential employers–sometimes companies sponsor them specifically to find new employees. But they can also be an opportunity for incubating new and learning developers (Layer 7).

Hackathons can be organized around an existing open source software community, but also frequently take place within a company to give developers a chance to come up with innovative ideas. One notable example is Facebook. In Pedram Keyani’s post, he describes the excitement that regular hackathons provide for Facebook’s engineers by giving them a chance to work on an idea without worrying about whether it scales to 900 million people. After the hackathon, developers present their prototypes to the rest of the team and have two minutes to prove that they should be part of Facebook. Some features that were developed during hackathons include the “Like” button and the ability to tag users in comments–huge pieces of functionality that might not be there without hackathons.

Hackathons in library technology

The first library technology hackathon we know about happened at the Access 2002 conference, and was modeled after PyCon code sprints (Art Rhyno, email message to author, July 18, 2012). The developers at this hackathon worked on projects related to content management systems for cultural content, citation digests, and EZProxy tools. Since then, each Access conference has had a hackathon as part of the conference. The Code4Lib conference has also had elements of hackathons (often as pre-conferences) throughout the years.

Another example of hackathons those sponsored by library vendors to promote the use of their  products’ API’s. Simply put, APIs are ways that data can go between platforms or programs so that you can create new tools with pieces of data from other systems. In 2008, OCLC sponsored a hackathon in New York City where they provided special access to various pieces of WorldCat and other OCLC products. Staff from OCLC were on hand to answer questions and facilitate breakout sessions. Hacks included work with controlled vocabularies, “find more like this” recommendation services, and several other items (Morgan). Eric Morgan, one of the participants, described  the event as a success partly because it was a good example of how librarians can take control of their vendor provided tools by learning how to get the data out and use in other ways.

How to get involved with hackathons

It’s easy to be discouraged or overwhelmed about the idea of participating in a hackathon if you are new to the open source software world. First of all, it’s important to remember that librarians who work with technology on a daily basis have a lot of ideas about how to improve the tools in their libraries. An example of this are the ideas submitted for the Access 2011 Hackfest. Ideas included bookmarklets, augmented reality in the library, and using iPads for self-checkout among many others. Reading that list may start to jog your own memory for tools you would love to see in your library but didn’t have a chance to work on yet or don’t completely understand.

But how to take those ideas and get involved with fellow developers who can help complete those projects? Many resources exist to help with this, but there are a few specifically geared at hackathons. First,  OpenHatch  is an open source project with the mission to make it easier to participate in open source software. One feature helpful to those just starting out are “Training Missions” that walk through basic skills you need such as working on the command line and using version control systems. Another area of OpenHatch shows lists of projects suitable for beginners and information on how non-coders can participate in projects. Keep an eye on the events listed there to find events geared for beginners or people still learning. Another resource for finding out and signing up for hackathons is  Hackathon.io.

Try to participate in a hackathon at the next technical library conference you attend. You can also start small by meeting up with librarians in your area for a very informal library technology hackathon. Make sure that you document what you work on and what the results were. Don’t worry about having judges or prizes–just make it a fun and collaborative event that allows everyone to participate and learn something new. You don’t need to create something new, either. This could be a great opportunity to learn how to work all the bells and whistles of a vendor platform or a social media tool.

Don’t worry–just start hacking

You can approach hackathons in whatever way works for you. For some, hackathons provide  the excitement of competing for prizes or great jobs by staying up all night coding amongst fellow developers. If the idea of staying up all night looking at a computer screen leaves you cold, don’t worry. In a April blog post, Andromeda Yelton shared her experience attending her first hackathon, and encouraged those new to this type of event to “sit at the table” both physically and by understanding that they have something to contribute even if they are not experts. She suggests that the minimum it should take to be involved in hackathons or similar projects is “interest, aptitude… [and a] drive to contribute.” (Yelton)

There are a lot of problems out there in the library world. Hackathons show us that sometimes all it takes is a weekend to get closer to a solution. But don’t worry about solving all the problems. Just pick the one you are most concerned about, find some friends, and start hacking on it.

Works cited
Layer 7 Technologies. “How to Run a Successful Hackathon for Your Open APIs”. July 12, 2012. http://www.slideshare.net/rnewton/how-to-run-a-successful-hackathon-for-your-open-apis.
Morgan, Eric Lease. “WorldCat Hackathon « Infomotions Mini-Musings.” Infomotions Mini-Musings, November 9, 2008. http://infomotions.com/blog/2008/11/worldcat-hackathon/.
Yelton, Andromeda. “My First Hackathon; or, Gender, Status, Code, and Sitting at the Table.” Across Divided Networks, April 6, 2012. http://andromedayelton.com/blog/2012/04/06/my-first-hackathon-or-gender-status-code-and-sitting-at-the-table/.
  1. This information comes from Wikipedia, but does not have a citation and I am unable to independently verify it. This is presented as common knowledge in a variety of sources, but not cited.

Linked Data in Libraries: Getting into the W3C Library Linked Data Incubator Group

What are libraries doing (or not doing) about linked data? This was the question that the W3C Library Linked Data Incubator Group investigated between May 2010 and August 2011. In this post, I will take a look at the  final report of the W3C Library Linked Data Incubator Group (October 2011) and provide an overview of their recommendations and my own analysis of the issues. Incubator Groups were a program that the W3C ran from 2006-2012 to get work done quickly on innovative ideas where there wasn’t enough to actually begin working on creating the web standards for which the W3C exists. (The Incubator Group program has transitioned into Community and Business Groups).

In this report, the participants in the group made several key recommendations aimed at library leaders, library standards bodies, data and systems designers, and librarian and archivists. The recommendations indicate just how far we are from really being able to implement open linked data in every library but also reveal the current landscape.
Library Leaders

An illustration of the VIAF authority file for Jane Austen

The report calls on library leaders to identify potentially very useful sets of data that can be exposed easily using current practices. That is, they should not try to revolutionize workflows, but to evolve towards more linked data. They mention authority files as an example of a data set that is ideal for this purpose, since authority files are lists of real world people with attributes that connect to real things. Having some semantic context for authority files helps–we could imagine a scenario in which you are searching for a common name, but the system recognizes that you are searching for a twentieth century American author and so does not show you a sixteenth century British author. Catalogers don’t necessarily have to do anything differently, either, since these authority files can link to other data to make a whole
picture. VIAF (Virtual International Authority File) is a project between OCLC and several national libraries to create such a linked international authority file using linked data and enter into the semantic web.

Library leadership must face the issue of rights in an open data world. It is a trope that libraries hold much valuable cultural and bibliographic data. Yet in many cases we have purchased or leased this data from a vendor rather than creating it ourselves (certainly we do in the case of indexes and often with catalog records)–and the license terms may not allow for open sharing of the data. We must be aware that exposing linked data openly is probably not going to mesh well with the way we have done things traditionally. Harvard recently released 12 million bibliographic records under a CC0 (public domain) license. Many libraries might not be in the position to release their own bibliographic records if they did not create them originally. Of course the same goes for indexes or bibliographies, other categories of traditional library materials that seems ripe for linking semantically. Library leadership will have to address this before open linked data is truly possible.
Library Standards Bodies

The report calls on library standards bodies to attack the problem from both sides. First, librarians need to be involved with standardizing semantic web technologies in a way that meets their needs and ensures that the library world stays in line with the way the technology is moving generally. Second, creators of library data standards need to ensure that those standards are compatible with semantic web technologies. Library data, when encoded in MARC, combines meaning and the structure in one unit. This works well for people who are reading the data, but is not easy for computers to parse semantically.  For instance, consider:

245 10|aPride and prejudice /|cby Jane Austen.
which viewed in the browser or on the catalog card like:

Pride and prejudice /
by Jane Austen.

The 245 tells us that this is a main title, and then the 1 tells us there is an added entry, in this case for Jane Austen. The 0 tells us that the title doesn’t begin with an article, or “nonfiling character”. The |a gives the actual title, followed by a / character, and then the |c is the statement of responsibility, followed by a period. Note that there is semantic meaning mixed together with punctuation and words that are helpful for people, such as “by”, which follow the rules of AACR2. There are good reasons for these rules, but the rules were meant to serve the information needs of humans. Given the capabilities of computers to parse and present structured data meaningfully to humans, it seems vital to make library data understandable to computers and know that we can use it to make something more useful to people. You may have noticed that HTML has changed over the past few years in the same way that library data will have to change. If you, for instance, want to give emphasis to a word, you use the <em></em>  tags. People know the word is emphasized because it’s in italics, the computer knows it’s emphasized because you told it that it was. Indicating that a word should be italicized using the <i></i> tags looks the same to a human reader who can understand the context for the use of italics, but doesn’t tell the computer that the word is particularly important. HTML 5 has even more use of semantic tags to make more of the standard ways of presenting information on the web meaningful to computers.

Systems Designers
The recommendations for data and systems designers are to start building tools that use linked data. Without a “killer app”, it’s hard to get excited about semantic technologies. Just after my last post went up, Google released its “Knowledge Graph”. This search takes words that traditionally would be matched as words, and matches them with “things.” For instance, if I type the search string Lincoln Hall into Google. Google guesses that I probably mean a concert venue in Chicago with that name and shows me that as the first result. It also displays a map, transit directions, reviews, and an upcoming schedule on the sidebar–certainly very convenient if that’s what I was looking for. But below the results for the concert venue, I get a box stating “See results about Lincoln Hall, Climber.” When I click on this, my results change to information about the Australian climber who recently died, and the side bar changes to information about him. Now as a librarian, I know that there would have been many ways to improve my search. But because semantic web technologies allow Google’s algorithms to understand that despite having the same name, an entity of a concert venue and a mountaineer are very different. This neatly disposes of the need for sophisticated searching for facts about things.  Whether this is, indeed, revolutionary remains to be seen. But try it as a user. You might be pleasantly surprised by how it makes your search easier. It may be that web-scale discovery will do the same thing for libraries, but this is a tool that remains out of reach of many libraries.
Librarians and Archivists
Librarians and archivists have, as always, a duty to collect and preserve linked data sets. We know how valuable the earliest examples of any piece of data storage are–whether it’s a clay tablet, a book, or an index. We create bibliographies to see how knowledge changed over time or in different contexts. We need to be careful to  preserve important data sets currently being produced, and maintain them over time so they remain accessible for future needs. But there’s another danger inherent in not being scrupulous about data integrity. Maintaining accurate and diverse data sets will help keep future information factual and unbiased. When a fact is one step removed from its source, it becomes even more difficult to check it for accuracy. While outright falsehood or misstatement is possible to correct, it will also be important to present alternate perspectives to ensure that scholarship can progress. (For an example of the issues in only presenting the most mainstream understanding of history, see “The ‘Undue Weight’ of Truth on Wikipedia”). If linked data doesn’t help us find out anything novel, will there have been a point in linking it?
Conclusion 
If you  haven’t yet read it, the report is a quick read and clear to people without a technical background, so I encourage you to take a look at it, particularly with reference to the use cases and data sets already extant. I hope you will get excited about the possibilities, and even if you are not in a position to use linked data yet, be thinking about what the future could hold. As I mentioned in my last post, the LODLAM (International Linked Open Data in Libraries, Archives, and Museums Summit) blog and the Digital Library Federation sponsored LOD-LAM Zotero group have lots of resources. There is also an ALA Library Linked Data Interest Group which sponsors discussions and has a mailing list.

Real World Semantic Web?: Facebook’s Open Graph Protocol

Original image available at https://developers.facebook.com/docs/opengraph/

Librarians need to understand what the semantic web is and how to use it, but this can be challenging. While the promise of the semantic web has existed for over a decade, to the uninitiated there may not seem to be many implementations that are accessible to the average person.

One implementation that most people use daily is Facebook’s Open Graph Protocol, which is their version of the semantic web. This is a useful example to illustrate the ideas behind the semantic web and linked data. Libraries and other cultural institutions want and need to make their data open, and Facebook’s openness is highly questionable, so it will also illustrate some of the potential problems with linked data that isn’t open. There is much great work being done in the library world with the semantic web and linked data, which will be addressed in more detail in further posts.

The Semantic Web and Linked Data

The “semantic web” describes a web where data is understood by computers in some of the same ways humans understand it. Tim Berners-Lee illustrates this wonderfully in his 2001 Scientific American article with a future in which the diagnosis of a family member with cancer is made easier by the smart device which can find the most appropriate specialist in a convenient location at a convenient time, with very little work on the part of the searcher. This is only possible, however, when data is semantically meaningful. Open hours for a doctor (or a library) written on a website mean something to a human, but very little to a computer. Once those hours are structured in a way that can be made meaningful, the computer can tell you if the doctor’s office is open–and if it has access to your calendar, what you have to cancel to go there.

Linking data takes this implementation a step further and makes it possible to connect data, to avoid, as the W3C says “a sheer collection of datasets”. Berners-Lee outlines the steps that need to be followed to make linked data in a 2006 post, namely to use uniform resource indicators (URIs) as names, to present those URIs in the hypertext protocol, use a standard format such as RDF to present useful information, and link to additional URIs with related information. A 2010 follow-up points out that to be linked open data, the data must be presented with a license that allows free unimpeded use, such as the Creative Commons CC-BY license. Such data doesn’t have to be structured in any particular way as long as it’s open. He says that “…you get one (big!) star if the information has been made public at all, even if it is a photo of a scan of a fax of a table — if it has an open licence.” But “five-star” linked open data meets all of the above requirements as well.

Facebook’s Open Graph Protocol

Moving into a different world, let’s consider what the semantic web and linked data look like at Facebook. First, it is interesting to consider what Facebook was before it was semantic. When Facebook first started in 2005, you could make a list of things you “liked”. You might have said you “liked” the movie Clueless and “liked” running, but these were just lists that would let others in your college classes know a few facts about you next time you saw them in class or at a party. In theory you could use these lists to find others that shared your interests, but this required a person to understand what interests matched each other.

But starting in 2010 these “likes” took on a real semantic meaning. Suddenly “liking” the movie Clueless meant that, among other things, the owners of the “Clueless” identity on Facebook could directly send you marketing announcements. In addition, you could “like” content outside of Facebook completely as long as that website used the correct markup on the page to speak to Facebook, and thus link together content with people. Unlike Facebook’s earlier scheme of Beacon, it was easier to understand how you were exposing yourself to advertisers and to control privacy and sharing, though this still left people troubled.

In late 2011/early 2012 Facebook opened up this system even more to third party developers, which went along with the new Facebook Timeline. Now any person could perform any verb with any application. So “Margaret read a book on Goodreads” or “Margaret listened to a song on Spotify”–real world actions–turn into semantically meaningful statements on my Facebook Timeline. As long as the user authenticates the application, the application can access the necessary information to grab the information about the object from the webpage and show the user’s interaction with it.

Developing for the Open Graph

The Open Graph protocol was developed based on the idea of the “social graph”, which represents the connections between people and the types of relationships they have with each other. In the Facebook universe, this includes the relationships people have with other types of entities, such as media, products, and companies.  It was developed by Facebook to make a quick and easy way for websites to include semantically meaningful data. It is based on the standard RDF specification for linked data and includes basic and optional metadata, as well as different types of structured data about objects, of which music and videos are the most well-defined.

To see the Open Graph in action, simply replace “www” with “graph” at the beginning of any Facebook page. For instance, let’s take a look at my own library’s information at http://graph.facebook.com/rebeccacrownlibrary. You can see that this page describes a library, and get our phone number, physical location, and open hours. Most important, a computer viewing this page can understand this information. For complete details, see the Graph API documentation–even for non-developers this is interesting; for instance, find out how to get the URL for your current profile picture to embed in other sites. To get access to this information, you can use various methods, including the Facebook Query Language.

Of course, you only get access to this information if it’s explicitly made public by the page. For anything beyond that, applications must use authentication in order to access more. Linking information from outside of Facebook is one way only–you can’t pull very much at all out of Facebook into the open web. Note that, for instance, Google searches will pull up only basic information from a Facebook page rather than any content that page has posted.

Outside of Facebook–How “Open” is the Open Graph?

It is precisely this closed effect that has a lot of people worried about Facebook’s implementation of the semantic web. Brad Fitzpatrick described the problems in 2007 inherent in implementations of the “social graph” on the web, which was that standards were quirky, non-interoperable, and usually completely walled off. The solution would be a Social Graph API that would create a social graph outside of any one company and belonging to all. This would allow people to find friends and connections without signing up for additional services or relying on Facebook or any other company.  Fitzpatrick did later create a Social Graph API, which Google recently pulled out of their products. Some of the problems of an open social graph are familiar to librarians: people are hesitant to share too much information with just anyone about with whom they associate, what they like, and what they think (Prodromou). The great boon for advertisers in social networking services is that inside walled gardens with reasonable privacy controls is that people are willing to share much more information. Thus the walled garden of Facebook, inaccessible to Google, means that that valuable social data is inaccessible. It is perhaps not coincidental that around the same time Google stopped supporting the open Social Graph API that they released the API for their own social networking service Google Plus.

Concerns with the Open Graph remain that it is not actually open, and in particular that it uses the open standard of RDF to ingest but not share content (Turenhout). The Open Graph Protocol website states that a variety of big websites are publishing websites with Open Graph markup and it is ingested by Facebook (of course), Google, and mixi. It remains unclear how much this particular standard will be adopted outside of Facebook.

Conclusion

Whether or not you think  you have any idea what linked data is, any time you click a “like” button on a website or sign up for a social sharing app in Facebook, you are participating in the semantic web. But every time that data link goes behind a Facebook wall, it fails in being open linked data. Just as librarians have always worked to keep the world’s knowledge available to all, we must continue to ensure that potentially important linked data is kept open as well–and with no commercial motive. The LODLAM Summit has outlined and continues to work on what linked open data looks like for libraries, archives, and museums. The W3C Library Linked Data Incubator Group released its final report in fall 2011, which provides a thorough overview of the roles and responsibilities of libraries in the world of linked open data. There is a lot of possibility around this area right now, and the future openness of the world wide web may very well depend on action taken right now.

In a future post, we will examine some specific examples of work being done in the library world around the semantic web and linked data.

Works Cited

Axon, Samuel. “Facebook’s Open Graph Personalizes the Web.” Mashable, April 21, 2010. http://mashable.com/2010/04/21/facebook-open-graph/.
Berners-Lee, Tim, James Hendler, and Ora Lassila. “The Semantic Web.” Scientific American 284, no. 5 (May 2001): 34. doi: 10.1038/scientificamerican0501-34
Berners-Lee, Tim. “Linked Data.” Design Issues, July 27, 2006. http://www.w3.org/DesignIssues/LinkedData.html.
Fitzpatrick, Brad, and David Recordon. “Thoughts on the Social Graph.” Bradfitz.com, August 17, 2007. http://bradfitz.com/social-graph-problem/.
Geron, Tomio. “Facebook Expands Open Graph To 60 New Apps, Many More Coming.” Forbes.com (January 18, 2011): 20.
Giles, Jim. “If Facebook Likes the Semantic Web, You’ll Love It.” New Scientist, July 31, 2010.
Iskold, Alex. “Social Graph: Concepts and Issues.” Read Write Web, September 11, 2007. http://www.readwriteweb.com/archives/social_graph_concepts_and_issues.php.
Mitchell, Jon. “Google Plus Releases APIs for Search, +1s and Comments.” Read Write Web, October 4, 2011. http://www.readwriteweb.com/archives/google_plus_releases_apis_for_search_1s_and_commen.php.
Prodromou, Evan. “On the Social Graph API.” Evan Prodromou: His Life and Times, February 21, 2012. http://evanprodromou.name/2012/02/21/on-the-social-graph-api/.
Turenhout, Ryanne. “Harry Halpin on the Hidden History of the ‘Like’ Button.” Institute of Network Cultures, March 10, 2012. http://networkcultures.org/wpmu/unlikeus/2012/03/10/harry-halpin-on-the-hidden-history-of-the-like-button/.

Personal Data Monitoring: Gamifying Yourself

The academic world has been talking about gamification of learning for some time now. The 2012 Horizon Report says gamification of learning will become mainstream in 2-3 years. Gamification taps into the innate human love of narrative and displaying accomplishments.  Anyone working through Code Year is personally familiar with the lure of the green bar that tells you how far you are to your next badge. In this post I want to address a related but slightly different topic: personal data capture and analytics.

Where does the library fit into this? One of the roles of the academic library is to help educate and facilitate the work of researchers. Effective research requires collecting a wide variety of relevant sources, reading them, and saving the relevant information for the future. The 2010 book Too Much to Know by Ann Blair describes the note taking and indexing habits taught to scholars in early modern Europe. Keeping a list of topics and sources was a major focus of scholars, and the resulting notes and indexes were published in their own right. Nowadays maintaining a list of sources is easier than ever with the many tools to collect and store references–but challenges remain due to the abundance of sources and pressure to publish, among others.

New Approaches and Tools in Personal Data Monitoring

Tracking one’s daily habits, reading lists and any other personal information is a very old human habit. Understanding what you are currently doing is the first step in creating better habits, and technology makes it easier to collect this data. Stephen Wolfram has been using technology to collect data about himself for nearly 25 years, and he posted some visual examples of this a few weeks ago. This includes items such as how many emails he’s sent and received, keystrokes made, and file types created. The Felton report, produced by Nick Felton, is a gorgeously designed book with personal data about himself and his family. But you don’t have to be a data or design whiz to collect and display personal information. For instance, to display your data in a visually compelling way you can use a service such as Daytum to create a personal data dashboard.

Hours of Activity recorded by Fitbit

In the realm of fitness and health, there are many products that will help capture, store, and analyze personal data.  Devices like the Fitbit now clip or strap to your body and count steps taken, floors climbed, and hours slept. Pedometers and GPS enabled sport watches help those trying to get in shape, but the new field of personal genetic monitoring and behavior analytics promise to make it possible to know very specific information about your health and understand potential future choices to make. 23andMe will map your personal genome and provide a portal for analyzing and understanding your genetic profile, allowing unprecedented ability to understand health. (Though there is doubt about whether this can accurately predict disease). For the behavioral and lifestyle aspects of health a new service called Ginger.io will help collect daily data for health professionals.

Number of readers recorded by Mendeley

Visual cues of graphs of accomplishments and green progress bars can be as helpful in keeping up research and monitoring one’s personal research habits just as much as they help in learning to code or training for a marathon. One such feature is the personal reading challenge on Goodreads,which lets you set a goal of how many books to read in the year, tracks what you’ve read, and lets you know how far behind or ahead you are at your current reading pace. Each book listed as in progress has a progress bar indicating how far along in the book you are. This is a simple but effective visual cue. Another popular tool, Mendeley, provides a convenient way to store PDFs and track references of all kinds. Built into this is a small green icon that indicates a reference is unread. You can sort references by read/unread–by marking a reference as “read”, the article appears as read in the Mendeley research database. Academia.eduprovides another way for scholars to share research papers and see how many readers they have.

Libraries and Personal Data

How can libraries facilitate this type of personal data monitoring and make it easy for researchers to keep track of what they have done and help them set goals for the future? Last November the Academic Book Writing Month (#acbowrimo) Twitter hashtag community spun off of National Novel Writing Month and challenged participants to complete the first draft of an academic book or other lengthy work. Participants tracked daily word counts and research goals and encouraged each other to complete the work. Librarians could work with researchers at their institutions, both faculty and students, on this type of peer encouragement. We already do this type of activity, but tools like Twitter make it easier to share with a community who might not come to the library often.

The recent furor over the change in Google’s privacy settings prompted many people to delete their Google search histories. Considered another way, this is a treasure trove of past interests to mine for a researcher trying to remember a book he or she was searching for some years ago—information that may not be available anywhere else. Librarians have certain professional ethics that make collecting and analyzing that type of personal data extremely complex. While we collect all types of data and avidly analyze it, we are careful to not keep track of what individuals read, borrowed, or asked of a librarian. This keeps individual researchers’ privacy safe; the major disadvantage is that it puts the onus on the individual to collect his own data. For people who might read hundreds or thousands of books and articles it can be a challenge to track all those individual items. Library catalogs are not great at facilitating this type of recordkeeping. Some next generation catalogs provide better listing and sharing features, but the user has to know how to add each item. Even if we can’t provide users a historical list of all items they’ve ever borrowed, we can help to educate them on how to create such lists. And in fact, unless we do help researchers create lists like this we lose out on an important piece of the historical record, such as the library borrowing history in Dissenting Academies Online.

Conclusion

What are some types of data we can ethically and legally share to help our researchers track personal data? We could share statistics on the average numbers of books checked out by students and faculty, articles downloaded, articles ordered, and other numbers that will help people understand where they fall along a continuum of research. Of course all libraries already collect this information–it’s just a matter of sharing it in a way that makes it easy to use. People want to collect and analyze data about what they do to help them reach their goals. Now that this is so easy we must consider how we can help them.

 

Works Cited
Blair, Ann. Too Much to Know : Managing Scholarly Information Before the Modern Age. New Haven: Yale University Press, 2010.