A Short and Irreverently Non-Expert Guide to CSS Preprocessors and Frameworks

It took me a long time to wrap my head around what a CSS preproccessor was and why you might want to use one. I knew everyone was doing it, but for the amount of CSS I was doing, it seemed like overkill. And one more thing to learn! When you are a solo web developer/librarian, it’s always easier to clutch desperately at the things you know well and not try to add more complexity. So this post is for those of you are in my old position. If you’re already an expert, you can skip the post and go straight to the comments to sell us on your own favorite tools.

The idea, by the way, is that you will be able to get one of these up and running today. I promise it’s that easy (assuming you have access to install software on your computer).

Ok, So Why Should I Learn This?

Creating a modern and responsive website requires endless calculations, CSS adjustments, and other considerations that it’s not really possible to do it (especially when you are a solo web developer) without building on work that lots of other people have done. Even if you aren’t doing anything all that sophisticated in your web design, you probably want at minimum to have a nicely proportioned columnar layout with colors and typefaces that are easy to adapt as needed. Bonus points if your site is responsive so it looks good on tablets and phones. All of that takes a lot of math and careful planning to accomplish, and requires much attention to documentation if you want others to be able to maintain the site.

Frameworks and preprocessors take care of many development challenges. They do things like provide responsive columnar layouts that are customizable to your design needs, provide “mixins” of code others have written, allow you to nest elements rather than writing CSS with selectors piled on selectors, and create variables for any elements you might repeat such as typefaces and colors. Once you figure it out, it’s pretty addictive to figure out where you can be more efficient in your CSS. Then if your institution switches the shade of red used for active links, H1s, and footer text, you only have to change one variable to update this rather than trying to find all the places that color appears in your stylesheets.

What You Should Learn

If I’ve convinced you this is something worth putting time into, now you have to figure out where you should spend that time.This post will outline one set of choices, but you don’t need to stick with this if you want to get more involved in learning more.

Sometimes these choices are made for you by whatever language or content management system you are using, and usually one will dictate another. For instance, if you choose a framework that uses Sass, you should probably learn Sass. If the theme you’ve chosen for your content management system already includes preprocessor files, you’ll save yourself lots of time just by going with what that theme has chosen. It’s also important to note that you can use a preprocessor with any project, even just a series of flat HTML files, and in that case definitely use whatever you prefer.

I’m going to point out just a few of each here, since this is basically what I’ve used with which I am familiar. There are lots and lots of all of these things out there, and I would welcome people with specific experience to describe it in the comments.

Before you get too scared about the languages these tools use, it’s really only relevant to beginners insofar as you have to have one of these languages installed on your system to compile your Sass or Less files into CSS.

Preprocessors

Sass was developed in 2006. It can be written with Sass or SCSS syntax. Sass runs on Ruby.

Less was developed in 2009. It was originally written in Ruby, but now is in Javascript.

A Completely Non-Comprehensive List of CSS Frameworks

Note! You don’t have to use a framework for every project. If you have a two column layout with a simple breakpoint and don’t need all the overhead, feel free to not use one. But for most projects, frameworks provide things like responsive layouts and useful features (for instance tabs, menus, image galleries, and various CSS tricks) you might want to build in your site without a lot of thought.

  • Compass (runs on Sass)
  • Bootstrap (runs on Less)
  • Foundation (runs on Sass)
  • There are a million other ones out there too. Google search for css frameworks gets “About 4,860,000 results”. So I’m not kidding about needing to figure it out based on the needs of your project and what preprocessor you prefer.
A Quick and Dirty Tutorial for Getting These Up and Running

Let’s try out Sass and Compass. I started working with them when I was theming Omeka, and I run them on my standard issue work Windows PC, so I know you can do it too!

Ingredients:

Stir to combine.

Ok it’s not quite that easy. But it’s not a lot harder.

  1. Sass has great installation instructions. I’ve always run it from the command line without a fuss, and I don’t even particularly care for the command line. So feel free to follow those instructions.
  2. If you don’t have Ruby installed already (likely on a standard issue work Windows PC–it is already installed on Macs), install it. Sass suggests RubyInstaller, and I second that suggestion. There are some potential confusing or annoying things about doing it this way, but don’t worry about it until one pops up.
  3. If you are on Windows, open up a new command line window by typing cmd in your start search bar. If you are on a Mac, open up your Terminal app.
  4. Now all you have to do is install Sass with one line of code in your command line client/terminal window: gem install sass. If you are on a Mac, you might get an error message and  have to use the command sudo gem install sass.
  5. Next install Compass. You’ve got it, just type gem install compass.
  6. Read through this tutorial on Getting Started with Sass and Compass. It’s basically what I just said, but will link you out to a bunch of useful tutorials.
  7. If you decide the command line isn’t for you, you might want to look into other options for installing and compiling your files. The Sass installation instructions linked above give a few suggestions.
Working on a Project With These Tools

When you start a project, Compass will create a series of files Sass/SCSS files (you choose which syntax) that you will edit and then compile into CSS files. The easiest way to get started is to head over to the Compass installation page and use their menu to generate the command line text you’ll need to get started.

For this example, I’ll create a new project called acrltest by typing compass create acrltest. In the image below you’ll see what this looks like. This text also provides some information about what to do next, so you may want to copy it to save it for later.

compass1

You’ll now have a new directory with several recommended SCSS files, but you are by no means limited to these files. Sass provides the @import command, which imports additional SCSS files, which can either be full CSS files, or “partials”, which allow you to define styles without creating an additional CSS file. Partials start with an underscore. The normal practice is to have a partial _base.scss file, where you identify base styles. You will then import these to your screen.scss file to use whatever you have in that file.

Here’s what this looks like.

On _base.scss, I’ve created two variables.

$headings: #345fff;
$links: #c5ff34;

Now on my screen.scss file, I’ve imported my file and can use these variables.

@import "base";

a {
    color: $links;
   }

 h1 {
    color: $headings;
 }

Now this doesn’t do anything yet since there are no CSS files that you can use on the web. Here’s how you actually make the CSS files.

Open up your command prompt or terminal window and change to the directory your Compass project is in. Then type compass watch, and Compass will compile your SCSS files to CSS. Everytime you save your work, the CSS will be updated, which is very handy.

compass2

Now you’ll have something like this in the screen.css file in the stylesheets directory:

 

/* line 9, ../sass/screen.scss */
a {
  color: #c5ff34;
}

/* line 13, ../sass/screen.scss */
h1 {
  color: #345fff;
}

Note that you will have comments inserted telling you where this all came from.

Next Steps

This was an extremely basic overview, and there’s a lot more I didn’t cover–one of the major things being all the possibilities that frameworks provide. But I hope you start to get why this can ultimately make your life easier.

If you have a little experience with this stuff, please share it in the comments.


What Should Academic Librarians Know about Net Neutrality?

John Oliver describes net neutrality as the most boring important issue. More than that, it’s a complex idea that can be difficult to understand without a strong grasp of the architecture of the internet, which is not at all intuitive. An additional barrier to having a measured response is that most of the public discussions about net neutrality conflate it with negotiations over peering agreements (more on that later) and ultimately rest in contracts with unknown terms. The hyperbole surrounding net neutrality may be useful in riling up public sentiment, but the truth seems far more subtle. I want to approach a definition and an understanding of the issues surrounding net neutrality, but this post will only scratch the surface. Despite the technical and legal complexities, this is something worth understanding, since as academic librarians our daily lives and work revolve around internet access for us and for our students.

The most current public debate about net neutrality surrounds the Federal Communications Commission’s (FCC) ability to regulate internet service providers after a January 2014 court decision struck down the FCC’s 2010 Open Internet Order (PDF). The FCC is currently in an open comment period on a new plan to promote and protect the open internet.

The Communications Act of 1934 (PDF) created the FCC to regulate wire and radio communication. This classified phone companies and similar services as “common carriers”, which means that they are open to all equally. If internet service providers are classified in the same way, this ensures equal access, but for various reasons they are not considered common carriers, which was affirmed by the Supreme Court in 2005. The FCC is now seeking to use section 706 of the 1996 Telecommunications Act (PDF) to regulate internet service providers. Section 706 gave the FCC regulatory authority to expand broadband access, particularly to elementary and high schools, and this piece of it is included in the current rulemaking process.

The legal part of this is confusing to everyone, not least the FCC. We’ll return to that later. But for now, let’s turn our attention to the technical part of net neutrality, starting with one of the most visible spats.

A Tour Through the Internet

I am a Comcast customer for my home internet. Let’s say I want to watch Netflix. How do I get there from my home computer? First comes the traceroute that shows how the request from my computer travels over the physical lines that make up the internet.


 

C:\Users\MargaretEveryday>tracert netflix.com

Tracing route to netflix.com [69.53.236.17]
over a maximum of 30 hops:

  1     1 ms    <1 ms    <1 ms  10.0.1.1
  2    24 ms    30 ms    37 ms  98.213.176.1
  3    43 ms    40 ms    29 ms  te-0-4-0-17-sur04.chicago302.il.chicago.comcast.
net [68.86.115.41]
  4    20 ms    32 ms    36 ms  te-2-6-0-11-ar01.area4.il.chicago.comcast.net [6
8.86.197.133]
  5    33 ms    30 ms    37 ms  he-3-14-0-0-cr01.350ecermak.il.ibone.comcast.net
 [68.86.94.125]
  6    27 ms    34 ms    30 ms  pos-1-4-0-0-pe01.350ecermak.il.ibone.comcast.net
 [68.86.86.162]
  7    30 ms    41 ms    54 ms  chp-edge-01.inet.qwest.net [216.207.8.189]
  8     *        *        *     Request timed out.
  9    73 ms    69 ms    69 ms  63.145.225.58
 10    65 ms    77 ms    96 ms  te1-8.csrt-agg01.prod1.netflix.com [69.53.225.6]

 11    80 ms    81 ms    74 ms  www.netflix.com [69.53.236.17]

Trace complete.
Airport

Step 1. My computer sends data to this wireless router, which is hooked to my cable modem, which is wired out to the telephone pole in front of my apartment.

 

 

 

 

 

 

 

 

 

 

2. The cables travel through the city underground, accessed through manholes like this one.

2-4. The cables travel through the city underground, accessed through manholes like this one.

 

 

 

 

 

 

 

 

 

 

 

 

 

5- . Eventually my request to go to Netflix makes it to 350 E. Cermak, which is a major collocation and internet exchange site. If you've ever taken the shuttle bus at ALA in Chicago, you've gone right past this building.

5- 6. Eventually my request to go to Netflix makes it to 350 E. Cermak, which is a major collocation and internet exchange site. If you’ve ever taken the shuttle bus at ALA in Chicago, you’ve gone right past this building. Image © 2014 Google.

 

 

 

 

 

 

 

 

 

 

 

7-9. Now the request leaves Comcast, and goes out to a Tier 1 internet provider, which owns cables that cross the country. In this case, the cables belong to CenturyLink (which recently purchased Qwest).

10. My request has now made it to Grand Forks, ND, where Netflix buys space from Amazon Web Services.

10. My request has now made it to Grand Forks, ND, where Netflix buys space from Amazon Web Services. All this happened in less than a second. Image © 2014 Google.

 

 

 

 

 

 

 

 

 

 

Why should Comcast ask Netflix to pay to transmit their data over Comcast’s networks? Understanding this requires a few additional concepts.

Peering

Peering is an important concept in the structure of the internet. Peering is a physical link of hardware to hardware between networks in internet exchanges, which are (as pictured above) huge buildings filled with routers connected to each other. 1.  Facebook Peering is an example of a very open peering policy. Companies and internet service providers can use internet exchange centers to plug their equipment together directly, and so make their connections faster and more reliable. For websites such as Facebook which have an enormous amount of upload and download traffic, it’s well worth the effort for a small internet service provider to peer with Facebook 2.

Peering relies on some equality of traffic, as the name implies. The various tiers of internet service providers you may have heard of are based on with whom they “peer”. Tier 1 ISPs are large enough that they all peer with each other, and thus form what is usually called the backbone of the internet.

Academic institutions created the internet originally–computer science departments at major universities literally had the switches in their buildings. In the US this was ARPANET, but a variety of networks at academic institutions existed throughout the world. Groups such as Internet2 allow educational, research, and government networks to connect and peer with each other and commercial entities (including Facebook, if the traceroute from my workstation is any indication). Smaller or isolated institutions may rely on a consumer ISP, and what bandwidth is available to them may be limited by geography.

The Last Mile

Consumers, by contrast, are really at the mercy of whatever company dominates in their neighborhoods. Consumers obviously do not have the resources to lay their own fiber optic cables directly to all the websites they use most frequently. They rely on an internet service provider to do the heavy lifting, just as most of us rely on utility companies to get electricity, water, and sewage service (though of course it’s quite possible to live off the grid to a certain extent on all those services depending on where you live). We also don’t build our own roads, and we expect that certain spaces are open for traveling through by anyone. This idea of roads open for all to get from the wider world to arterial streets to local neighborhoods is thus used as an analogy for the internet–if internet service providers (like phone companies) must be common carriers, this ensures the middle and last miles aren’t jammed.

When Peering Goes Bad

Think about how peering works–it requires a roughly equal amount of traffic being sent and received through peered networks, or at least an amount of traffic to which both parties can agree. This is the problem with Netflix. Unlike big companies such as Facebook, and especially Google, Netflix is not trying to build its own network. It relies on content delivery services and internet backbone providers to get content from its servers (all hosted on Amazon Web Services) to consumers. But Netflix only sends traffic, it doesn’t take traffic, and this is the basis of most of the legal battles going on with internet service providers that service the “last mile”.

The Netflix/Comcast trouble started in 2010, when Netflix contracted with Level 3 for content delivery. Comcast claimed that Level 3 was relying on a peering relationship that was no longer valid with this increase in traffic, no matter who was sending it. (See this article for complete details.) Level 3, incidentally, accused another Tier 1 provider, Cogent, of overstepping their settlement-free peering agreement back in 2005, and cut them off for a short time, which cut pieces of the internet off from each other.

Netflix tried various arrangements, but ultimately negotiated with Comcast to pay for direct access to their last mile networks through internet exchanges, one of which is illustrated above in steps 4-6. This seems to be the most reasonable course of action for Netflix to get their outbound content over networks, since they really don’t have the ability to do settlement-free peering. Of course, Reed Hastings, the CEO of Netflix, didn’t see it that way. But for most cases, settlement-free peering is still the only way the internet can actually work, and while we may not see the agreements that make this happen, it won’t be going anywhere. In this case, Comcast was not offering Netflix paid prioritization of its content, it was negotiating for delivery of the content at all. This might seem equally wrong, but someone has to pay for the bandwidth, and why shouldn’t Netflix pay for it?

What Should We Do?

If companies want to connect with each other or build their own network connections, they can do under whatever terms work best for them. The problem would be if certain companies were using the same lines that everyone was using but their packets got preferential treatment. The imperfect road analogy works well enough for these purposes. When a firetruck, police car, and ambulance are racing through traffic with sirens blazing, we are usually ok with the resulting traffic jam since we can see this requires that speed for an emergency situation. But how do we feel when we suspect a single police car has turned on a siren just to cut in line to get to lunch faster? Or a funeral procession blocks traffic? Or an elected official has a motorcade? Or a block party? These situations are regulated by government authorities, but we may or may not like that these uses of public ways are being allowed and causing our own travel to slow down. Going further, it is clearly illegal for a private company to block a public road and charge a high rate for faster travel, but imagine if no governmental agency had the power to regulate this? The FCC is attempting to make sure they have those regulatory powers.

That said it doesn’t seem like anyone is actually planning to offer paid prioritization. Even Comcast claims “no company has had a stronger commitment to openness of the Internet…” and that they have no plans of offering such a service . I find it unlikely that we will face a situation that Barbara Stripling describes as “prioritizing Mickey Mouse and Jennifer Lawrence over William Shakespeare and Teddy Roosevelt.”

I certainly won’t advocate against treating ISPs as common carriers–my impression is that this is what the 1996 Telecommunications Act was trying to get at, though the legal issues are confounding. However, a larger problem facing libraries (not so much large academics, but smaller academics and publics) is the digital divide. If there’s no fiber optic line to a town, there isn’t going to be broadband access, and an internet service provider has no business incentive to create a line for a small town that may not generate a lot of revenue. I think we need to remain vigilant about ensuring that everyone has access to the internet at all or at a fast speed, and not get too sidetracked about theoretical future possible malfeasance by internet service providers. These points are included in the FCC’s proposal, but are not receiving most of the attention, despite the fact that they are given explicit regulatory authority to do this.

Public comments are open at the FCC’s website until July 15, so take the opportunity to leave a comment about Protecting and Promoting the Open Internet, and also consider comments on E-rate and broadband access, which is another topic the FCC is currently considering. (You can read ALA’s proposal about this here (PDF).)

  1. Blum, Andrew. Tubes: a Journey to the Center of the Internet. New York: Ecco, 2012, 80.
  2. Blum, 125-126.

Lightweight Project Management Tools in the Real World

My life got extra complicated in the last few months. I gave birth to my first child in January, and in between the stress of a new baby, unexpected hospital visits, and the worst winter in 35 years, it was a trying time. While I was able to step back from many commitments during my 8 week maternity leave, I didn’t want to be completely out the loop, and since I would come back to three conferences back to back, I needed to be able to jump back in and monitor collaborative projects from wherever. All of us have times in our lives that are this hectic or even more so, but even in the regular busy thrum of our professional lives it’s too easy to let ongoing commitments like committee work completely disappear from our mental landscapes other than the nagging feeling that you are missing something.

There are various methods and tools to enhance productivity, which we’ve looked at before. Some basic collaboration tools such as Google Docs are always good to have any time you are working on a group project that builds into something like a presentation or report. But for committee work or every day work in a department, something more specialized can be even better. I want to look at some real-life examples of using lightweight project management tools to keep projects that you work on with others going strong—or not so strong, depending on how they are used. Over the past 4-5 months I’ve gotten experience using Trello for committee work and Asana for work projects. Both of them have some great features, but as always the implementation doesn’t depend entirely on the software’s functionality. Beyond my experience with these two implementations I’ll address a few other tools and my experience with effective usage of them.

Asana

I have the great fortune of having an entire wall of my office painted with white board paint, Asana Screenshotand use it to sketch out ideas and projects. For that to be useful, I need to be physically be in the office. So before I went on maternity leave, I knew I needed to get all my projects at work organized in a way that I could give tasks I would normally do to others, as well as monitor what was happening on large on-going projects. I had used Asana before in another context, so I decided to give it a try for this purpose. Asana has projects, tasks, and due dates that anyone in a workspace can follow and assign. It’s a pretty flexible system–the screenshot shows one potential way of setting it up, but we use different models for different projects, and there are many ideas out there. My favorite feature is project templates, which I use in another workspace that I share with my graduate assistant. This allows you to create a new project based on a standard series of steps, which means that she could create new projects while I was away based on the normal workflow we follow and I could work on them when I returned. All of this requires a very strict attention to keeping projects organized, however, and if you don’t have an agreed upon system for naming and organizing tasks they can get out of hand very quickly.

We also use Asana as part of our help request system. We wanted to set up a system to track requests from all the library staff not only for my maternity leave but in general. I looked at many different systems, but they were almost all too heavy-duty for what we needed. I made our own very lightweight system using the Webform module in Drupal on our intranet. Staff submits requests through that form, which sends an email using a departmental email address to our Issue Tracking queue in Asana. Once the task is completed we explain the problem in an Asana comment (or just mark completed if it’s a normal request such as new user account), and then send a reply to the requestor through the intranet. They can see all the requests they’ve made plus the replies through that system. The nice thing about doing it this way is that everything is in one place–trouble tickets become projects with tasks very easily.

Trello

Trello screenshotTrello is designed to mimic the experience of using index cards or sticky notes on a wall to track ideas and figure out what is going on at a glance. This is particularly useful for ongoing work where you have multiple projects in a set of pipelines divvied up among various people. You can easily see how many ideas you have in the inception stage and how many are closer to completion, which can be a good motivator to move items along. Another use is to store detailed project ideas and notes and then sort them into lists once you figure out a structure.

Trello starts with a virtual board, which is divided into lists of cards. Trello cards can be assigned to specific people, and anyone can follow a card to get notifications. Clicking on a card brings up a whole set of additional options, including who is working on the project, attachments, due dates, color coding, and anything else you might want. The screenshot shows how the LITA Education Committee uses Trello to plan educational offerings. The white areas with small boxes indicate cards (we use one card per program/potential idea) that are active and assigned, the gray areas indicate cards which haven’t been touched in a while and so probably need followup. Not surprisingly, there are many more cards, many of which are inactive, at the beginning of the pipeline than at the end with programs already set up. This is a good visual reminder that we need to keep things moving along.

In this case I didn’t set up Trello, and I am not always the best user of it. Using this for committee work has been useful, but there are a few items to keep in mind for it to actually work to keep projects going. First, and this goes for everything, including analog cards or sticky notes, all the people working on the project need to check into it on a regular basis and use it consistently. One thing that I found was important to do to get it into a regular workflow was turn on email notifications. While it would be nice to stay out of email more, most of us are used to finding work show up there, and if you have a sane relationship to your inbox (i.e. you don’t use it to store work in progress), it can be helpful to know to log in to work on something. I haven’t used the mobile app yet, but that is another option for notifications.

Other Tools

While I have started using Asana and Trello more heavily recently, there are a number of other tools out there that you may need to use in your job or professional life. Here are a few:

Box

Many institutions have some sort of “cloud” file system now such as Box or Google Drive. My work uses Box, and I find it very useful for parts of projects where I need many people (but a slightly different set each time) to collaborate on completing a single task. I upload a spreadsheet that I need everyone to look at, use the information to do something, and then add additional information to the spreadsheet. This is a very common scenario that organizations often use a shared drive to accomplish, but there are a number of problems with that approach. If you’ve ever been confronted with the filename “Spring2014_report-Copy-Copy-DRAFT.xlsx” or not been able to open a file because someone else left it open on her desktop and went to lunch, you know what I mean. Instead of that, I upload the file to Box, and assign a task to the usernames of all the people I need to look at the document. They can use a tool called Box Edit to open the file in Excel and any changes they make are immediately saved back to the shared document, just as a Google Doc would do. They can then mark the task complete, and the system only sends email reminders to people who haven’t yet finished the task.

ALA Connect

This section is only relevant to people working on projects with an American Library Association group, whether a committee or interest group. Since this happens to most people working in academic libraries at some point, I think it’s worth considering. But if not,  skip to the conclusion. ALA Connect is the central repository for institutional memory and documents for work around ALA, including committees and interest groups. It can also be a good place to work on project collaboratively, but it takes some setup. As a committee chair, I freely admit that I need to organize my own ALA Connect page much better. My normal approach was to use an online document (so something editable by everyone) for each project and file each document under a subcommittee heading, but in practice I find it way too hard to find the right document to see what each subcommittee is working on. I am going to experiment with a new approach. I will create “groups” for each project, and use the Group Headings sidebar to organize these. If you’re on a committee and not the chair, you don’t have access to reorganize the sidebar or posts, but suggest this approach to your chair if you can’t find anything in “General News & Discussions”. Also, try to document the approach you’ve taken so future chairs will know what you did, and let other chairs know what works for your committee.

You also need to make a firm commitment as a chair to hold certain types of discussions on your committee mailing list, and certain discussions on ALA Connect, and then to document any pertinent mailing list discussions on ALA Connect. That way you won’t be unable to figure out where you are on the project because half your work is in email and half on ALA Connect. (This obviously goes for any other tool other than email as well).

 Conclusion

With all the tools above, you really have no excuse to be running projects through email, which is not very effective unless everyone you are working with is very strict with their email filing and reply times. (Hint: they aren’t—see above about a sane relationship with your inbox.) But any tool requires a good plan to understand how its strengths mesh with work you have to accomplish. If your project is to complete a document by a certain date, a combination of Google Docs or Box (or ALA Connect for ALA work) and automated reminders might be best. If you want to throw a lot of ideas around and then organize them, Trello or Asana might work. Since these are all free to try, explore a few tools before starting a big project to see what works for you and your collaborators. Once you pick one, dedicate a bit of time on a weekly or monthly basis to keeping your virtual workspace organized. If you find it’s no longer working, figure out why. Did the scope of your project change over time, and a different tool is now more effective? This can happen when you are planning to implement something and switch over from the implementation to ongoing work using the new system. Or maybe people have gotten complacent about checking in on work to do. Explore different types of notifications or mobile apps to reinvigorate your team.

I would love to hear about your own approach to lightweight project management with these tools or others in the comments.

 


Taking a Practical Look at the Google Books Case

Last month we got the long-awaited ruling in favor of Google in the Authors Guild vs. Google Books case, which by now has been analyzed extensively. Ultimately the judge in the case decided that Google’s digitization was transformative and thus constituted fair use. See InfoDocket for detailed coverage of the decision.

The Google Books project was part of the Google mission to index all the information available, and as such could never have taken place without libraries, which hold all those books. While most, if not all, the librarians I know use Google Books in their work, there has always been a sense that the project should not have been started by a commercial enterprise using the intellectual resources of libraries, but should have been started by libraries themselves working together.  Yet libraries are often forced to be more conservative about digitization than we might otherwise be due to rules designed to protect the college or university from litigation. This ruling has made it seem as though we could afford to be less cautious. As Eric Hellman points out, the decision seems to imply that with copyright the ends are the important part, not the means. “In Judge Chin’s analysis, copyright is concerned only with the ends, not the means. Copyright seems not to be concerned with what happens inside the black box.” 1 As long as the end use of the books was fair, which was deemed to be the case, the initial digitization was not a problem.

Looking at this from the perspective of repository manager, I want to address a few of the theoretical and logistical issues behind such a conclusion for libraries.

What does this mean for digitization at libraries?

At the beginning of 2013 I took over an ongoing digitization project, and as a first-time manager of a large-scale long-term project, I learned a lot about the processes involved in such a project. The project I work with is extremely small-scale compared with many such projects, but even at this scale the project is expensive and time-consuming. What makes it worth it is that long-buried works of scholarship are finally being used and read, sometimes for reasons we do not quite understand. That gets at the heart of the Google Books decision—digitizing books in library stacks and making them more widely available does contribute to education and useful arts.

There are many issues that we need to address, however. Some of the most important ones are what access can and should be provided to what works, and making mass digitization more available to smaller and international cultural heritage institutions. Google Books could succeed because it had the financial and computing resources of Google matched with the cultural resources of the participating research libraries. This problem is international in scope. I encourage you to read this essay by Amelia Sanz, in which she argues that digitization efforts so far have been inherently unequal and a reflection of colonialism. 2 But is there a practical way of approaching this desire to make books available to a wider audience?

Providing Access

There are several separate issues in providing access. Books that are in the public domain are unquestionably fine to digitize, though differences in international copyright law make it difficult to determine what can be provided to whom. As Amelia Sanz points out, Google can only digitize Spanish works prior to 1870 in Spain, but may digitize the complete work in the United States. The complete work is not available to Spanish researchers, but it is available in full to US researchers.

That aside, there are several reasons why it is useful to digitize works still unquestionably under copyright. One of the major reasons is textual corpus analysis–you need to have every word of many texts available to draw conclusions about use of words and phrases in those texts. Google Books ngram viewer is one such tool that comes out of mass digitization. Searching for phrases in Google and finding that phrase as a snippet in a book is an important way to find information in books that might otherwise be ignored in favor of online sources. Some argue that this means that those books will not be purchased when they might have otherwise been, but it is equally possible that this leads to greater discovery and more purchases, which research into music piracy suggests may be the case.

Another reason to digitize works still under copyright is to highlight the work of marginalized communities, though in that case it is imperative to work with those communities to ensure that the digitization is not exploitative. Many orphan works, for whom a rights-holder cannot be located, fall under this, and I know from some volunteer work that I have done that small cultural heritage institutions are eager to digitize material that represents the cultural and intellectual output of their communities.

In all the above cases, it is crucial to put into place mechanisms for ensuring that works under copyright are not abused. Google Books uses an algorithm that makes it impossible to read an entire book, which is probably beyond the abilities of most institutions. (If anyone has an idea for how to do this, I would love to hear it.) Simpler and more practical solutions to limiting access are to only make a chapter or sample of a book available for public use, which many publishers already allow. For instance, Oxford University Press allows up to 10% of a work (within certain limits) on personal websites or institutional repositories. (That is, of course, assuming you can get permission from the author). Many institutions maintain “dark archives“, which are digitized and (usually) indexed archives of material inaccessible to the public, whether institutional or research information. For instance, the US Department of Energy Office of Scientific and Technical Information maintains a dark archive index of technical reports comprising the equivalent of 6 million pages, which makes it possible to quickly find relevant information.

In any case where an institution makes the decision to digitize and make available the full text of in-copyright materials for reasons they determine are valid, there are a few additional steps that institutions should take. Institutions should research rights-holders or at least make it widely known to potential rights-holders that a project is taking place. The Orphan Works project at the University of Michigan is an example of such a project, though it has been fraught with controversy. Another important step is to have a very good policy for taking down material when a rights-holder asks–it should be clear to the rights-holder whether any copies of the work will be maintained and for what purposes (for instance archival or textual analysis purposes).

Digitizing, Curating, Storing, Oh My!

The above considerations are only useful when it is even possible for institutions without the resources of Google to start a digitization program. There are many examples of DIY digitization by individuals, for instance see Public Collectors, which is a listing of collections held by individuals open for public access–much of it digitized by passionate individuals. Marc Fischer, the curator of Public Collectors, also digitizes important and obscure works and posts them on his site, which he funds himself. Realistically, the entire internet contains examples of digitization of various kinds and various legal statuses. Most of this takes place on cheap and widely available equipment such as flatbed scanners. But it is possible to build an overhead book scanner for large-scale digitization with individual parts and at a reasonable cost. For instance, the DIY Book Scanning project provides instructions and free software for creating a book scanner. As they say on the site, all the process involves is to “[p]oint a camera at a book and take pictures of each page. You might build a special rig to do it. Process those pictures with our free programs. Enjoy reading on the device of your choice.”

“Processing the pictures” is a key problem to solve. Turning images into PDF documents is one thing, but providing high quality optical character recognition is extremely challenging. Free tools such as FreeOCR make it possible to do OCR from image or PDF files, but this takes processing power and results vary widely, particularly if the scan quality is lower. Even expensive tools like Adobe Acrobat or ABBYY FineReader have the same problems. Karen Coyle points out that uncorrected OCR text may be sufficient for searching and corpus analysis, but does not provide a faithful reproduction of the text and thus, for instance, provide access to visually impaired persons 3 This is a problem well known in the digital humanities world, and one solved by projects such as Project Gutenberg with the help of dedicated volunteer distributed proofreaders. Additionally, a great deal of material clearly in the public domain is in manuscript form or has text that modern OCR cannot recognize. In that case, crowdsourcing transcriptions is the only financially viable way for institutions to make text of the material available. 4 Examples of successful projects using volunteer transcriptors or proofreaders include Ancient Lives to transcribe ancient papyrus, What’s on the Menu at the New York Public Library, and DIYHistory at the University of Iowa libraries. (The latter has provided step by step instructions for building your own version using open source tools).

So now you’ve built your low-cost DIY book scanner, and put together a suite of open source tools to help you process your collections for free. Now what? The whole landscape of storing and preserving digital files is far beyond the scope of this post, but the cost of accomplishing this is probably the highest of anything other than staffing a digitization project, and it is here where Google clearly has the advantage. The Internet Archive is a potential solution to storing public domain texts (though they are not immune to disaster), but if you are making in-copyright works available in any capacity you will most likely have to take the risk on your own servers. I am not a lawyer, but I have never rented server space that would allow copyrighted materials to be posted.

Conclusion: Is it Worth It?

Obviously from this post I am in favor of taking on digitization projects of both public domain and copyrighted materials when the motivations are good and the policies are well thought out. From this perspective, I think the Google Books decision was a good thing for libraries and for providing greater access to library collections. Libraries should be smart about what types of materials to digitize, but there are more possibilities for large-scale digitization, and by providing more access, the research community can determine what is useful to them.

If you have managed a DIY book scanning project, please let me know in the comments, and I can add links to your project.

  1. Hellman, Eric. “Google Books and Black-Box Copyright Jurisprudence.” Go To Hellman, November 18, 2013. http://go-to-hellman.blogspot.com/2013/11/google-books-and-black-box-copyright.html.
  2. Sanz, Amelia. “Digital Humanities or Hypercolonial Studies?” Responsible Innovation in ICT (June 26, 2013). http://responsible-innovation.org.uk/torrii/resource-detail/1249#_ftnref13.
  3. Coyle, Karen. “It’s FAIR!” Coyle’s InFormation, November 14, 2013. http://kcoyle.blogspot.com/2013/11/its-fair.html.
  4. For more on this, see Ben Brumfield’s work on crowdsourced transcription, for example Brumfield, Ben W. “Collaborative Manuscript Transcription: ‘The Landscape of Crowdsourcing and Transcription’ at Duke University.” Collaborative Manuscript Transcription, November 23, 2013. http://manuscripttranscription.blogspot.com/2013/11/the-landscape-of-crowdsourcing-and.html.

Responsibilities For Open Access

In honor of Open Access Week, I want to look at some troubling recent discussions about open access, and what academic librarians who work with technology can do. As the manager of an open access institutional repository, I strongly believe that providing greater access to academic research is a good worth pursuing. But I realize that this comes at a cost, and that we have a responsibility to ensure that open access also means integrity and quality.

On “stings” and quality

By now, the article by John Bohannon in Science has been thoroughly dissected in the blogosphere 1. This was not a study per se, but rather a piece of investigative journalism looking into the practices of open access journals. Bohannon submitted variations on an article written under African pseudonyms from fake universities that “any reviewer with more than a high-school knowledge of chemistry…should have spotted the paper’s short-comings immediately.” Over the course of 10 months, he submitted these articles to 304 open access journals whose names he drew from the Directory of Open Access Journals and Jeffrey Beall’s list of predatory open access publishers. Ultimately 157 of the journals accepted the article and 98 rejected it, when any real peer review would have meant that it was rejected in all cases. It is very worth noting that in an analysis of the raw data that Bohannon supplied some publishers on Beall’s list rejected the paper immediately, which is a good reminder to take all curative efforts with an appropriate amount of skepticism 2.

There are certainly many methodological flaws in this investigation, which Mike Taylor outlines in detail in his post 3, and which he concludes was specifically aimed at discrediting open access journals in favor of journals such as Science. As Michael Eisen outlinesScience has not been immune to publishing articles that should have been rejected after peer review–though Bohannon informed Eisen that he intended to look at a variety of journals but this was not practical, and this decision was not informed by editors at Science. Eisen’s conclusion is that “peer review is a joke” and that we need to stop regarding the publication of an article in any journal as evidence that the article is worthwhile 4. Phil Davis at the Scholarly Kitchen took issue with this conclusion (among others noted above), since despite the flaws, this did turn up incontrovertible evidence that “a large number of open access publishers are willfully deceiving readers and authors that articles published in their journals passed through a peer review process…” 5. His conclusion is that open access agencies such as OASPA and DOAJ should be better at policing themselves, and that on the other side Jeffrey Beall should be cautious about suggesting a potential for guilt without evidence. I think one of the more level-headed responses to this piece comes from outside the library and scholarly publishing world in Steven Novella’s post on Neurologica, a blog focused on science and skepticism written by an academic neurologist. He is a fan of open access and wider access to information, but makes the point familiar to all librarians that the internet creates many more opportunities to distribute both good and bad information. Open access journals are one response to the opportunities of the internet, and in particular author-pays journals like “all new ‘funding models’ have the potential of creating perverse incentives.” Traditional journals fall into the same trap when they rely on impact factor to drive subscriptions, which means they may end up publishing “sexy” studies of questionable validity or failing to publish replication studies which are the backbone of the scientific method–and in fact the only real way to establish results no matter what type of peer review has been done 6.

More “perverse incentives”

So far the criticisms of open access have revolved around one type of “gold” open access, wherein the author (or a funding agency) pays article publication fees. “Green” open access, in which a version of the article is posted in a repository is not susceptible to abuse in quite the same way. Yet a new analysis of embargo policies by Shan Sutton shows that some publishers are targeting green open access through new policies. Springer used to have a 12 month embargo for mandated deposit in repositories such as PubMed, but now has extended it to all institutional repositories. Emerald changed its policy so that any mandated deposit to a repository (whether by funder or institutional mandate) was subject to a 24 month embargo  7.

In both cases, paid immediate open access is available for $1,595 (Emerald) or $3,000 (Springer). It seems that the publishers are counting that a “mandate” means that funds are available for this sort of hyrbid gold open access, but that ignores the philosophy behind such mandates. While federal open access mandates do in theory have a financial incentive that the public should not have to pay twice for research, Sutton argues that open access “mandates” at institutions are actually voluntary initiatives by the faculty, and provide waivers without question 8. Additionally, while this type of open access does provide public access to the article, it does not address larger issues of reuse of the text or data in the true sense of open access.

What should a librarian do?

The issues above are complex, but there are a few trends that we can draw on to understand our responsibilities to open access. First, there is the issue of quality, both in terms of researcher experience in working with a journal, and that of being able to trust the validity of an individual article. Second, we have to be aware of the terms under which institutional policies may place authors. As with many such problems, the technological issues are relatively trivial. To actually address them meaningfully will not happen with technology alone, but with education, outreach, and network building.

The major thing we can take away from Bohannon’s work is that we have to help faculty authors to make good choices about where they submit articles. Anyone who works with faculty has stories of extremely questionable practices by journals of all types, both open access and traditional. Speaking up about those practices on an individual basis can result in lawsuits, as we saw earlier this year. Are there technical solutions that can help weed out predatory publishers and bad journals and articles? The Library Loon points out that many factors, some related to technology, have meant that both positive and negative indicators of journal quality have become less useful in recent years. The Loon suggests that “[c]reating a reporting mechanism where authors can rate and answer relatively simple questions about their experiences with various journals seems worthwhile.” 9

The comments to this post have some more suggestions, including open peer review and a forum backed by a strong editor that could be a Yelp-type site for academic publisher reputation. I wrote about open peer review earlier this year in the context of PeerJ, and participants in that system did indeed find the experience of publishing in a journal with quick turnarounds and open reviews pleasant. (Bohannon did not submit a fake article to PeerJ). This solution requires that journals have a more robust technical infrastructure as well as a new philosophy to peer review. More importantly, this is not a solution librarians can implement for our patrons–it is something that has to come from the journals.

The idea that seems to be catching on more is the “Yelp” for scholarly publishers. This seems like a good potential solution, albeit one that would require a great deal of coordinated effort to be truly useful. The technical parts of this type of solution would be relatively easy to carry out. But how to ensure that it is useful for its users? The Yelp analog may be particularly helpful here. When it launched in 2004, it asked users who were searching for some basic information about their question, and to provide the email addresses of additional people whom they would have traditionally asked for this information. Yelp then emailed those people as well as others with similar searches to get reviews of local businesses to build up its base of information. 10 Yelp took a risk in pursuing content in that way, since it could have been off-putting to potential users. But local business information was valuable enough to early users that they were willing to participate, and this seems like a perfect model to build up a base of information on journal publisher practices.

This helps address the problem of predatory publishers and shifting embargoes, but it doesn’t help as much with the issue of quality assurance for the article content. Librarians teach students how to find articles that claim to be peer reviewed, but long before Bohannon we knew that peer review quality varies greatly, and even when done well tells us nothing about the validity of the research findings. Education about the scholarly communication cycle, the scientific method, and critical thinking skills are the most essential tools to ensure that students are using appropriate articles, open access or not. However, those skills are difficult to bring to bear for even the most highly experienced researchers trying to keep up with a large volume of published research. There are a few technical solutions that may be of help here. Article level metrics, particularly alternative metrics, can aid in seeing how articles are being used. (For more on altmetrics, see this post from earlier this year).

One of the easiest options for article level metrics is the Altmetric.com bookmarklet. This provides article level metrics for many articles with a DOI, or articles from PubMed and arXiv. Altmetric.com offers an API with a free tier to develop your own app. An open source option for article level metrics is PLOS’s Article-Level Metrics, a Ruby on Rails application. These solutions do not guarantee article quality, of course, but hopefully help weed out more marginal articles.

No one needs to be afraid of open access

For those working with institutional repositories or other open access issues, it sometimes seems very natural for Open Access Week to fall so near Halloween. But it does not have to be frightening. Taking responsibility for thoughtful use of technical solutions and on-going outreach and education is essential, but can lead to important changes in attitudes to open access and changes in scholarly communication.

 

Notes

  1. Bohannon, John. “Who’s Afraid of Peer Review?” Science 342, no. 6154 (October 4, 2013): 60–65. doi:10.1126/science.342.6154.60.
  2. “Who Is Afraid of Peer Review: Sting Operation of The Science: Some Analysis of the Metadata.” Scholarlyoadisq, October 9, 2013. http://scholarlyoadisq.wordpress.com/2013/10/09/who-is-afraid-of-peer-review-sting-operation-of-the-science-some-analysis-of-the-metadata/.
  3. Taylor, Mike. “Anti-tutorial: How to Design and Execute a Really Bad Study.” Sauropod Vertebra Picture of the Week. Accessed October 17, 2013. http://svpow.com/2013/10/07/anti-tutorial-how-to-design-and-execute-a-really-bad-study/.
  4. Eisen, Michael. “I Confess, I Wrote the Arsenic DNA Paper to Expose Flaws in Peer-review at Subscription Based Journals.” It Is NOT Junk, October 3, 2013. http://www.michaeleisen.org/blog/?p=1439.
  5. Davis, Phil. “Open Access ‘Sting’ Reveals Deception, Missed Opportunities.” The Scholarly Kitchen. Accessed October 17, 2013. http://scholarlykitchen.sspnet.org/2013/10/04/open-access-sting-reveals-deception-missed-opportunities/.
  6. Novella, Steven. “A Problem with Open Access Journals.” Neurologica Blog, October 7, 2013. http://theness.com/neurologicablog/index.php/a-problem-with-open-access-journals/.
  7. Sutton, Shan C. “Open Access, Publisher Embargoes, and the Voluntary Nature of Scholarship: An Analysis.” College & Research Libraries News 74, no. 9 (October 1, 2013): 468–472.
  8. Ibid., 469
  9. Loon, Library. “A Veritable Sting.” Gavia Libraria, October 8, 2013. http://gavialib.com/2013/10/a-veritable-sting/.
  10. Cringely, Robert. “The Ears Have It.” I, Cringely, October 14, 2004. http://www.pbs.org/cringely/pulpit/2004/pulpit_20041014_000829.html.

A Brief Look at Cryptography for Librarians

You may not think much about cryptography on a daily basis, but it underpins your daily work and personal existence. In this post I want to talk about a few realms of cryptography that affect the work of academic librarians, and talk about some interesting facets you may never have considered. I won’t discuss the math or computer science basis of cryptography, but look at it from a historical and philosophical point of view. If you are interested in the math and computer science, I have a few a resources listed at the end in addition to a bibliography.

Note that while I will discuss some illegal activities in this post, neither I nor anyone connected with the ACRL TechConnect blog is suggesting that you actually do anything illegal. I think you’ll find the intellectual part of it stimulation enough.

What is cryptography?

Keeping information secret is as simple as hiding it from view in, say, an envelope, and trusting that only the person to whom it is addressed will read that information and then not tell anyone else. But we all know that this doesn’t actually work. A better system would only allow a person with secret credentials to open the envelope, and then for the information inside to be in a code that only she could know.

The idea of codes to keep important information secret goes back thousands of years , but for the purposes of computer science, most of the major advances have been made since the 1970s. In the 1960s with the advent of computing for business and military uses, it was necessary to come up with ways to encrypt data. In 1976, the concept of public-key cryptography was developed, but it wasn’t realized practically until 1978 with the paper by Rivest, Shamir, and Adleman–if you’ve ever wondered what RSA stood for, there’s the answer. There were some advancements to this system, which resulted in the digital signature algorithm as the standard used by the federal government.1 Public-key systems work basically by creating a private and a public key–the private one is known only to each individual user, and the public key is shared. Without the private key, however, the public key can’t open anything. See the resources below for more on the math that makes up these algorithms.

Another important piece of cryptography is that of cryptographic hash functions, which were first developed in the late 1980s. These are used to encrypt blocks of data– for instance, passwords stored in databases should be encrypted using one of these functions. These functions ensure that even if someone unauthorized gets access to sensitive data that they cannot read it. These can also be used to verify the identify of a piece of digital content, which is probably how most librarians think about these functions, particularly if you work with a digital repository of any kind.

Why do you care?

You probably send emails, log into servers, and otherwise transmit all kinds of confidential information over a network (whether a local network or the internet). Encrypted access to these services and the data being transmitted is the only way that anybody can trust that any of the information is secret. Anyone who has had a credit card number stolen and had to deal with fraudulent purchases knows first-hand how upsetting it can be when these systems fail. Without cryptography, the modern economy could not work.

Of course, we all know a recent example of cryptography not working as intended. It’s no secret (see above where keeping something a secret requires that no one who knows the information tells anyone else) by now that the National Security Agency (NSA) has sophisticated ways of breaking codes or getting around cryptography though other methods 2 Continuing with our envelope analogy from above, the NSA coerced companies to allow them to view the content of messages before the envelopes were sealed. If the messages were encoded, they got the keys to decode the data, or broke the code using their vast resources. While these practices were supposedly limited to potential threats, there’s no denying that this makes it more difficult to trust any online communications.

Librarians certainly have a professional obligation to keep data about their patrons confidential, and so this is one area in which cryptography is on our side. But let’s now consider an example in which it is not so much.

Breaking DRM: e-books and DVDs

Librarians are exquisitely aware of the digital rights management realm of cryptography (for more on this from the ALA, see The ALA Copyright Office page on digital rights ). These are algorithms that encode media in such a way that you are unable to copy or modify the material. Of course, like any code, once you break it, you can extract the material and do whatever you like with it. As I covered in a recent post, if you purchase a book from Amazon or Apple, you aren’t purchasing the content itself, but a license to use it in certain proscribed ways, so legally you have no recourse to break the DRM to get at the content. That said, you might have an argument under fair use, or some other legitimate reason to break the DRM. It’s quite simple to do once you have the tools to do so. For e-books in proprietary formats, you can download a plug-in for the Calibre program and follow step by step instructions on this site. This allows you to change proprietary formats into more open formats.

As above, you shouldn’t use software like that if you don’t have the rights to convert formats, and you certainly shouldn’t use it to pirate media. But just because it can be used for illegal purposes, does that make the software itself illegal? Breaking DVD DRM offers a fascinating example of this (for a lengthy list of CD and DVD copy protection schemes, see here and for a list of DRM breaking software see here). The case of CSS (Content Scramble System) descramblers illustrates some of the strange philosophical territory into which this can end up. The original code was developed in 1999, and distributed widely, which was initially ruled to be illegal. This was protested in a variety of ways; the Gallery of CSS Descramblers has a lot more on this 3. One of my favorite protest CSS descramblers is the “illegal” prime number, which is a prime number that contains the entire code for breaking the CSS DRM. The first illegal prime number was discovered in 2001 by Phil Carmody (see his description here) 4. This number is, of course, only illegal inasmuch as the information it represents is illegal–in this case it was a secret code that helped break another secret code.

In 2004, after years of court hearings, the California Court of Appeal overturned one of the major injunctions against posting the code, based on the fact that  source code is protected speech under the first amendment , and that the CSS was no longer a trade secret. So you’re no longer likely to get in trouble for posting this code–but again, using it should only be done for reasons protected under fair use. [5.“DVDCCA v Bunner and DVDCCA v Pavlovich.” Electronic Frontier Foundation. Accessed September 23, 2013. https://www.eff.org/cases/dvdcca-v-bunner-and-dvdcca-v-pavlovich.] One of the major reasons you might legitimately need to break the DRM on a DVD is to play DVDs on computers running the Linux operating system, which still has no free legal software that will play DVDs (there is legal software with the appropriate license for $25, however). Given that DVDs are physical media and subject to the first sale doctrine, it is unfair that they are manufactured with limitations to how they may be played, and therefore this is a code that seems reasonable for the end consumer to break. That said, as more and more media is streamed or otherwise licensed, that argument no longer applies, and the situation becomes analogous to e-book DRM.

Learning More

The Gambling With Secrets video series explains the basic concepts of cryptography, including the mathematical proofs using colors and other visual concepts that are easy to grasp. This comes highly recommended from all the ACRL TechConnect writers.

Since it’s a fairly basic part of computer science, you will not be surprised to learn that there are a few large open courses available about cryptography. This Cousera class from Stanford is currently running, and this Udacity class from University of Virginia is a self-paced course. These don’t require a lot of computer science or math skills to get started, though of course you will need a great deal of math to really get anywhere with cryptography.

A surprising but fun way to learn a bit about cryptography is from the NSA’s Kids website–I discovered this years ago when I was looking for content for my X-Files fan website, and it is worth a look if for nothing else than to see how the NSA markets itself to children. Here you can play games to learn basics about codes and codebreaking.

  1. Menezes, A., P. van Oorschot, and S. Vanstone. Handbook of Applied Cryptography. CRC Press, 1996. http://cacr.uwaterloo.ca/hac/. 1-2.
  2. See the New York Times and The Guardian for complete details.
  3. Touretzky, D. S. (2000) Gallery of CSS Descramblers. Available: http://www.cs.cmu.edu/~dst/DeCSS/Gallery, (September 18, 2013).
  4. For more, see Caldwell, Chris. “The Prime Glossary: Illegal Prime.” Accessed September 17, 2013. http://primes.utm.edu/glossary/xpage/Illegal.html.

Creating themes in Omeka 2.0

Omeka is an easy to use content management system for digital exhibits created by the Ray Rosenzweig Center for History and New Media. It’s very modular, so you can customize it for various functions. I won’t go into the details here on how to set up Omeka, but you can read documentation and see example collections at Omeka.org. If you want to experiment with Omeka without installing it on your own server, you can set up a hosted account at Omeka.net

Earlier this year Omeka was completely rewritten and released a 2.0 version (now 2.1). Like with many open source content management systems, it took awhile for the contributed plug-ins and themes to catch up to the new release. As of July, most of the crucial contributed plug-ins were available, and if you haven’t yet updated your installation this is a good time to think about doing so. In this post I’m going to focus on the process of customizing Omeka 2.0 to your institution, and specifically creating a custom theme. While there are now several good themes available for 2.0, you will probably want to make a default theme that matches the rest of your website. One of the nice features of Omeka that is quite different from other content management systems is that it is very easy for  the people who create exhibits to pick a custom theme that differs from the default theme. That said, providing a custom theme for your institution makes it easy for visitors to know where they are, and will also make it easier on the staff who are creating exhibits since you can adapt the theme to their needs.

Planning

Like any design project, you should start with a discussion with the people who use the system most. (If you are new to design, check the ACRL TechConnect posts on design). In my case, there are two archives on campus who both use Omeka for their exhibits. Mock up what the layout should look like–you may not be able to get it perfectly, but use this as a guide to future development. We came up with a rough sketch based on what the archivist liked and didn’t like about templates available, and worked together on determining the priorities for the design. (Side note: if you can get your whole wall painted with whiteboard paint this is a very fun collaborative project.)

Rough sketch of ideas for new theme.

Rough sketch of ideas for new theme.

Development

Development is very easy to start when you are modifying an existing theme. Start with a theme (there are only a few that are 2.0 compatible) that is close to what you need. Rather than the subtheme system you may be used to with Drupal or WordPress, with Omeka you can pick the theme you want to hack on and copy the entire directory and rename it.

Here was the process I followed to build my theme. I suggest that you set up a local development environment (I used XAMPP) to do this work, but make sure that you have at least one exhibit to test, since some of the CSS is different for exhibits than for the rest of the site.

  • Pick a theme
Seasons Autumn

Seasons (with the Autumn color scheme)

I started with the Seasons theme. I copied the seasons directory from the themes directory and pasted it back with a new name of luctest (which I renamed when it was time to move it to a production environment).

  • Modify theme.ini

This is what you will start with. You really only need to edit the author, title, and description unless you want to edit the rest.

[theme]
author = "Roy Rosenzweig Center for History and New Media"
title = "Seasons"
description = "A colorful theme with a configuration option to switch style sheets for a particular season, plus 'night'."
license = "GPLv3"
website = "<a href="http://omeka.org">http://omeka.org</a>"
support_link = "<a href="http://omeka.org/forums/forum/themes-and-public-display">http://omeka.org/forums/forum/themes-and-public-display</a>"
omeka_minimum_version="2.0"
omeka_target_version="2.0"
version="2.1.1"
tags="yellow, blue, summer, season, fall, orange, green, dark"
  • Modify config.ini

Check which elements are set in the configuration (i.e. the person such as an archivist who is creating the exhibit can set them) and which you need to set in the theme. This can cause a lot of frustration when you attempt to style an element whose value is actually set by the user. If you don’t want to allow the user to change anything, you can take that option out of the config.ini, just make sure you’ve set it elsewhere.

[config]

; Style Sheet
style_sheet.type = "select"
style_sheet.options.label = "Style Sheet"
style_sheet.options.description = "Choose a style sheet"
style_sheet.options.multiOptions.spring = "Spring"
style_sheet.options.multiOptions.summer = "Summer"
style_sheet.options.multiOptions.autumn = "Autumn"
style_sheet.options.multiOptions.winter = "Winter"
style_sheet.options.multiOptions.night = "Night"
style_sheet.options.value = "winter"

logo.type = "file"
logo.options.label = "Logo File"
logo.options.description = "Choose a logo file. This will replace the site title in the header of the theme. Recommended maximum width for the logo is 500px."
logo.options.validators.count.validator = "Count"
logo.options.validators.count.options.max = "1"

display_featured_item.type = "checkbox"
display_featured_item.options.label = "Display Featured Item"
display_featured_item.options.description = "Check this box if you wish to show the featured item on the homepage."
display_featured_item.options.value = "1"

display_featured_collection.type = "checkbox"
display_featured_collection.options.label = "Display Featured Collection"
display_featured_collection.options.description = "Check this box if you wish to show the featured collection on the homepage."
display_featured_collection.options.value = "1"

display_featured_exhibit.type = "checkbox"
display_featured_exhibit.options.label = "Display Featured Exhibit"
display_featured_exhibit.options.description = "Check this box if you wish to show the featured exhibit on the homepage."
display_featured_exhibit.options.value = "1"

homepage_recent_items.type = "text"
homepage_recent_items.options.label = "Homepage Recent Items"
homepage_recent_items.options.description = "Choose a number of recent items to be displayed on the homepage."
homepage_recent_items.options.maxlength = "2"

homepage_text.type = "textarea"
homepage_text.options.label = "Homepage Text"
homepage_text.options.description = "Add some text to be displayed on your homepage."
homepage_text.options.rows = "5"
homepage_text.options.attribs.class = "html-input"

(This is just a sample of part of the config.ini file).

  • Modify CSS

Open up css/style.css and check which elements you need to modify (note that some themes may have the style sheets divided up differently.) Some items are obvious, some you will have to use Firebug or another tool to determine which class styles the element. You can always ask in the Omeka themes and display forum if you can’t figure it out.

The Seasons theme has different styles for each color scheme, and in the interests of time I picked the color scheme closest to the color scheme I wanted to end with. You could use the concept of different schemes to identify the collections and/or exhibits of different units. Make sure you read through the whole style sheet first to determine which elements are theme-wide, and which are set in the color scheme.

  • Test, test, test

The 2.0 themes that I’ve experimented with are all responsive and work well with different browsers. This probably goes without saying, but if you have changed the spacing at all, make sure you test your design in multiple window sizes and devices.

  • Voila
LUC2013final

Final version of redesigned theme.

We have a few additional items to add to this design, but it’s met our immediate needs very well, and most importantly matches the design of the Archives and Special Collections website so it’s clear to users that they are still in the right place.

Next steps

Since this was a new content management system to me, I still have a lot to learn about the best ways to do certain things. This experience was helpful not just in learning Omeka, but also as a small-scale test of planning a new theme for our entire library website, which runs on Drupal.


Digital Content: Who May Publish? Who May Sell? Who May Access?

No matter whether a small university press focusing on niche markets to the Big Six giants looking for the next massive bestseller, the publishing industry has been struggling to come to terms with the reality of new distribution models. Those models tends to favor cheaper and faster production with a much lower threshold for access, which generally has been good news for consumers. Several recent rulings and  statements have brought the issues to the forefront of conversation and perhaps indicated some common themes in publishing which are relevant to all libraries and their ability to purchase and/or provide digital content.

Academic Publishing: Dissertation == Monograph?

On July 22 the American Historical Association issued a “Statement on Policies Regarding the Embargoing of Completed History PhD Dissertations”. In this statement, the American Historical Association recommended that all libraries and graduate programs allow dissertations to be embargoed for up to six years. This is, in theory, to allow junior scholars enough time to publish a monograph based on the dissertation in order to receive tenure. This would be under the assumption that academic publishers would not publish a book based on a dissertation freely available online. Reactions to this statement prompted the AHA to release a Q & A page to clarify and support their position, including pointing out that publishers’ positions are too unclear to be sure there is no risk to an open access dissertation, and “like it or not”, junior faculty must produce a monograph to get tenure. They claim that in some cases that this benefits junior scholars to give them more time to revise their work before publication–while this is true, it indicates that a dissertation is not equivalent to a published scholarly monograph. The argument from the publisher’s side appears to be that libraries (who are the main purchasers of scholarly monographs) will not purchase books based on revised dissertations freely available online, the truth of which has been debated widely. Libraries do purchase print copies of titles (both monographs and serials) which are freely available online.

From my personal experience as an institutional repository manager, I know the attitude to embargoing dissertations varies widely by advisor and department. Like most people making an argument about this topic, I do not have much more than anecdotes to provide. I checked the most commonly downloaded dissertations from the past year, and it appeared the most frequently downloaded title (over 2000 over 2012-2013) is also the only one that has been published as a book that has been purchased by at least one library. Clearly this does not control for all variables and warrants further study, but it is a useful clue that open access availability does not always affect publication and later purchase. Further, from the point of view of open access creating more equal access to resources across the world, Google Analytics for that dissertation indicates that the sessions over the past year with the most engaged users came from, in order, the UK, the United States, Mauritius, and Sri Lanka.

What Should a Digital Book Cost?

In mid-July Denise Cote, the judge in the Apple e-book price fixing case, issued an opinion stating that Apple did collude with the publishers to set prices on ebooks. Reading the story of the negotiations in the opinion is a thrilling behind the scenes look at companies trying to get a handle on a fairly new market and trying to understand how they will make money. Below I summarize the 160 page opinion, which is well worth reading in its entirety.

The  problem with ebook pricing started with Amazon, which set a price of $9.99 for new releases that normally would have had list prices of $25-$30. This was frustrating to the major publishing houses, who worried (probably rightly so) that consumers would be unwilling to pay more than $10 for books after getting used to this low price point. Amazon would effectively price everyone else out of the market. Even after publishers raised the wholesale price of new releases, Amazon would sell them at loss to preserve the $9.99 price. The publishers spent 2009 developing strategies to combat Amazon, but it wasn’t until late 2009 with the entry of Apple into the ebook market that they saw a real opportunity.

Apple agreed with the Big Six publishers that setting all books at $9.99 was too low, but was unwilling to enter into a market in which they could not compete with Amazon. To accomplish this, they wanted the publishers to agree to the same terms, which included lower wholesale prices for ebooks. The negotiations that followed over late 2009 and early 2010 started positively, but ended in dissatisfaction. Because Apple was unwilling to sell anything as a loss leader, they felt that a wholesale model would leave them too vulnerable to Amazon. To address that, they proposed to sell books with an agency model (which several publishers had suggested). With an agency model, Apple would collect a 30% commission on sales just as they did with the App Store. To ensure that publishers did not set unrealistically high prices, Apple would set pricing caps. The other crucial move that Apple made was to insist that publishers move all retailers of ebooks to the agency model in order to ensure Apple would be able to compete on price across the board. Amazon  had no interest in the agency model, and in early 2010 had a series of meeting with the publishers that made this clear. After all the agreements were signed with Apple (the only Big Six publisher who did not participate was Random House), the publishers needed to move Amazon to an agency model to fulfill the terms of their contract. Macmillan was the first publisher to set up a meeting with Amazon to discuss this requirement. The response to the meeting was for Amazon to remove the “buy” button from all Macmillan books, both print and Kindle editions. Amazon eventually had to capitulate to the publishers to move to an agency model, which was complete by mid-2010, but submitted a complaint to the Federal Trade Commission. Random House finally agreed to an agency model with Apple in early 2011, thanks to a spot of blackmail on Apple’s part (it wouldn’t allow any Random House apps without a agency deal).

Ultimately the court determined that Apple violated the Sherman Act by conspiring with the publishers to force all their retailers to sell books at the same prices and thus removing competition. A glance at Amazon’s Kindle store bestsellers today shows books priced from $1.99 to $13.99 for the newest Stephanie Plum mystery (the same price as it is in the Apple bookstore). For all titles priced higher than $9.99, Amazon notes that the “price is set by the publisher.” Whether this means anything to the average consumer is debatable. Compare these negotiations to the on-going struggle libraries have had with availability of ebooks for lending–publishers have a lot to learn about libraries in addition to new models for digital sales, some of which was covered at the series of talks with the Big Six publishers that Maureen Sullivan held in early 2012. Over recent months publishers have made more ebooks available to libraries. But some libraries, most notably the Douglas County, Colorado libraries, are setting their own terms for purchasing and lending ebooks.

What Can You Do With a Digital File?

The last ruling I want to address is about the music resale service ReDigi, about which Kevin Smith goes into detail. This was was a service that provided a way for people to re-sell purchased MP3s, but ultimately the judge ruled that it was impossible to transfer the original file and so this did not fit under the first sale doctrine. The first sale doctrine (17 USC § 109) holds that “the owner of a particular copy or phonorecord lawfully made … is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord.” Another case that was decided in April by the Supreme Court, Kirtsaeng v. Wiley, upheld this in the case of international sales of physical items, which was an important decision for libraries. But digital materials are more complicated. First sale applies to computer programs on physical media (except in certain circumstances), but does not cover material that has been licensed rather than sold, which is how most digital files are distributed. (For how the US Attorney’s Office approaches this in criminal investigations, see this document.) So when you “buy” that Kindle book from Amazon or load a book onto your iPad you are licensing the product for limited use on a limited number of devices and no legal recourse for lending or getting rid of the content, even if you try hard to follow the law as ReDigi did. Librarians are well aware of this and its implications, and license quite a bit of content that we can loan and/or distribute under limited circumstances. Libraries are safest in the long term if they can own the content outright rather than licensing, as are consumers. But it will be a long time before there is clarity about the legal way to transfer owner of a digital file at the consumer level.

Conclusion

Librarians and publishers have a complicated relationship. We need each other if either is to succeed, but even if our ends are the ultimately the same, our means are very different. These recent events indicate that there is still much in flux and plenty of room for constructive dialog with content creators and publishers.


Collecting Data: How Much do We Really Need?

Many of us have had conversations in the past few weeks about data collection due to the reports about the NSA’s PRISM program, but ever since April and the bombings at the Boston Marathon, there has been an increased awareness of how much data is being collected about people in an attempt to track down suspects–or, increasingly, stop potential terrorist events before they happen. A recent Nova episode about the manhunt for the Boston bombers showed one such example of this at the New York Police Department. This program is called the Domain Awareness System at the New York Police Department, and consists of live footage from almost every surveillance camera in the New York City playing in one room, with the ability to search for features of individuals and even the ability to detect people acting suspiciously. Added to that a demonstration of cutting edge facial recognition software development at Carnegie Mellon University, and reality seems to be moving ever closer to science fiction movies.

Librarians focused on technical projects love to collect data and make decisions based on that data. We try hard to get data collection systems as close to real-time as possible, and work hard to make sure we are collecting as much data as possible and analyzing it as much as possible. The idea of a series of cameras to track in real-time exactly what our patrons are doing in the library in real-time might seem very tempting. But as librarians, we value the ability of our patrons to access information with as much privacy as possible–like all professions, we treat the interactions we have with our patrons (just as we would clients, patients, congregants, or sources) with care and discretion (See Item 3 of the Code of Ethics of the American Library Association). I will not address the national conversation about privacy versus security in this post–I want to address the issue of data collection right where most of us live on a daily basis inside analytics programs, spreadsheets, and server logs.

What kind of data do you collect?

Let’s start with an exercise. Write a list of all the statistical reports you are expected to provide your library–for most of us, it’s probably a very long list. Now, make a list of all the tools you use to collect the data for those statistics.

Here are a few potential examples:

Website visitors and user experience

  • Google Analytics or some other web analytics tool
  • Heat map tool
  • Server logs
  • Surveys

Electronic resource access reports

  • Electronic resources management application
  • Vendor reports (COUNTER and other)
  • Link resolver click-through report
  • Proxy server logs

The next step may require a little digging. For library created tools, do you have a privacy policy for this data? Has it gone through the Institutional Review Board? For third-party tools, is there a privacy policy? What are the terms or use or user license? (And how many people have ever read the entire terms of service?). We will return to this exercise in a moment.

How much is enough?

Think about with these tools what type of data you are collecting about your users. Some of it may be very private indeed. For instance, the heat map tool I’ve recently started using (Inspectlet) not only tracks clicks, but actually records sessions as patrons use the website. This is fascinating information–we had, for instance, one session that was a patron opening the library website, clicking the Facebook icon on the page, and coming back to the website nearly 7 hours later. It was fun to see that people really do visit the library’s Facebook page, but the question was immediately raised whether it was a visit from on campus. (It was–and wouldn’t have taken long to figure out if it was a staff machine and who was working that day and time). IP addresses from off campus are very easy to track, sometimes down to the block–again, easy enough to tie to an individual. We like to collect IP addresses for abusive or spamming behavior and block users based on IP address all the time. But what about in this case? During the screen recordings I can see exactly what the user types in the search boxes for the catalog and discovery system. Luckily, Inspectlet allows you to obscure the last two octets (which is legally required some places) of the IP address, so you can have less information collected. All similar tools should allow you the same ability.

Consider another case: proxy server logs. In the past when I did a lot of EZProxy troubleshooting, I found the logs extremely helpful in figuring out what went wrong when I got a report of trouble, particularly when it had occurred a day or two before. I could see the username, what time the user attempted to log in or succeeded in logging in, and which resources they accessed. Let’s say someone reported not being able to log in at midnight– I could check to see the failed logins at midnight, and then that username successfully logging in at 1:30 AM. That was a not infrequent occurrence, as usually people don’t think to write back and say they figured out what they did wrong! But I could also see everyone else’s logins and which articles they were reading, so I could tell (if I wanted) which grad students were keeping up with their readings or who was probably sharing their login with their friend or entire company. Where I currently work, we don’t keep the logs for more than a day, but I know a lot of people are out there holding on to EZProxy logs with the idea of doing “something” with them someday. Are you holding on to more than you really want to?

Let’s continue our exercise. Go through your list of tools, and make a list of all the potentially personally identifying information the tool collects, whether or not you use them. Are you surprised by anything? Make a plan to obscure unused pieces of data on a regular basis if it can’t be done automatically. Consider also what you can reasonably do with the data in your current job requirements, rather than future study possibilities. If you do think the data will be useful for a future study, make sure you are saving anonymized data sets unless it is absolutely necessary to have personally identifying information. In the latter case, you should clear your study in advance with your Institutional Review Board and follow a data management plan.

A privacy and data management policy should include at least these items:

  • A statement about what data you are collecting and why.
  • Where the data is stored and who has access to it.
  • A retention timeline.

F0r example, in the past I collected all virtual reference transaction logs for studying the effectiveness of a new set of virtual reference services. I knew I wanted at least a year’s worth of logs, and ideally three years to track changes over time. I was able to save the logs with anonymized IP addresses and once I had the data I needed I was able to delete the actual transcripts. The privacy policy described the process and where the data would be stored to ensure it was secure. In this case, I used the RUSA Guidelines for Implementing and Maintaining Virtual Reference Services as a guide to creating this policy. Read through the ALA Guidelines to Drafting a Library Privacy Policy for additional specific language and items you should include.

What we can do with data

In all this I don’t at all mean to imply that we shouldn’t be collecting this data. In both the examples I gave above, the data is extremely useful in improving the patron experience even while giving identifying details away. Not collecting data has trade-offs. For years, libraries have not retained a patron’s borrowing record to protect his or her privacy. But now patrons who want to have an online record of what they’ve borrowed from the library must use third-party services with (most likely) much less stringent privacy policies than libraries. By not keeping records of what users have checked out or read through databases, we are unable to provide them personalized automated suggestions about what to read next. Anyone who uses Amazon regularly knows that they will try to tempt you into purchases based on your past purchases or books you were reading the preview of–even if you would rather no one know that you were reading that book and certainly don’t want suggestions based on it popping up when you are doing a collection development project at work and are logged in on your personal account. In all the decisions we make about collecting or not collecting data, we have to consider trade-offs like these. Is the service so important that the benefits of collecting the data outweigh the risks? Or, is there another way to provide the service?

We can see some examples of this trade-off in two similar projects coming out of Harvard Library Labs. One, Library Hose, was a Twitter stream with the name of every book being checked out. The service ran for part of 2010, and has been suspended since September of 2010. In addition to daily tweet limits, this also was a potential privacy violation–even if it was a fun idea (this blog post has some discussion about it). A newer project takes the opposite approach–books that a patron thinks are “awesome” can be returned to the Awesome Box at the circulation desk and the information about the book is collected on the Awesome Box website. This is a great tweak to the earlier project, since this advertises material that’s now available rather than checked out, and people have to opt in by putting the item in the box.

In terms of personal recommendations, librarians have the advantage of being able to form close working relationships with faculty and students so they can make personal recommendations based on their knowledge of the person’s work and interests. But how to automate this without borrowing records? One example is a project that Ian Chan at California State University San Marcos has done to use student enrollment data to personalize the website based on a student’s field of study. (Slides). This provides a great deal of value for the students, who need to log in to check their course reserves and access articles from off campus anyway. This adds on top of that basic need a list of recommended resources for students, which they can choose to star as favorites.

Conclusion

In thinking about what type of data you collect, whether on purpose or accidentally, spend some time thinking about what is strictly necessary to accomplish the work that you need to do. If you don’t need a piece of data but can’t avoid collecting it (such as full IP addresses or usernames), make sure you have a privacy policy and retention schedule, and ensure that it is not accessible to more people than absolutely necessary.

Work to educate your patrons about privacy, particularly online privacy. ALA has a Choose Privacy Week, which is always the first week in May. The site for that has a number of resources you might want to consult in planning programming. Academic librarians may find it easiest to address college students in terms of their presence on social media when it comes to future job hunting, but this is just an opening to larger conversations about data. Make sure that when you ask patrons to use a third party service (such as a social network) or recommend a service (such as a book recommending site) that you make sure they are aware of what information they are sharing.

We all know that Google’s slogan is “Don’t be evil”, but it’s not always clear if they are sticking to that. Make sure that you are not being evil in your own data collection.


Citation Manager Roundup

In April of this year, the two most popular free citation managers–Mendeley and Zotero–both underwent some big changes. On April 8th, TechCrunch announced that Elsevier had purchased Mendeley, which had been surmised in January. 1 Just a few days later, Zotero announced the release of version 4, with a number of new features. 2 Just as with the sunsetting of Google Reader, this has prompted many to consider what citation managers they have been using and think about switching or changing practices. I will not address subscription or paid products like RefWorks and EndNote specifically, though there are certainly many reasons you might prefer one of those products.

Mendeley: a new Star Wars movie in the making?

The rhetoric surrounding Elsevier’s acquisition of Mendeley was generally alarmist in nature, and the hashtag “#mendelete” that popped up immediately after the announcement suggests that many people’s first instinct was to abandon Mendeley. Elsevier has been held up as a model of anti-open access, and Mendeley as a model for open access. Yet Mendeley has always been a for-profit company, and, like Google, benefits itself and its users (particularly the science community) by knowing what they are reading and sharing. After all, the social features of Mendeley wouldn’t have any value if there was no public sharing. Institutional Mendeley accounts allow librarians to see what their users in aggregate are reading and saving, which helps them make collection development decisions– a service beyond what the average institutional citation manager product accomplishes. Victor Henning promises on the Mendeley blog that nothing will change, and that this will give them more freedom to develop more features 3. As for Elsevier, Oliver Dumon promises that Mendeley will remain independent and allowed to follow their own course–and that bringing it together with ScienceDirect and Scopus will create a “central workflow and collaboration site for authors”.4

There are two questions to be answered in this. First, is it realistic to assume that the Mendeley team will have the creative freedom they say they will have? And second, are users comfortable with their data being available to Elsevier? For many, the answers to both these questions seem to be “no” and “no.” A more optimistic point of view is that if Elsevier must placate Mendeley users who are open access advocates, they will allow more openness than before.

It’s too early to say, but I remain hopeful that Mendeley can continue to create a more open spirit in academic publishing. Peter Hoyt (a former employee of Mendeley and founder of PeerJ) suggests that much of the work that he oversaw to open up Mendeley was being stymied by Elsevier specifically. For him, this went against his personal ethos and so he was unable to stay at Mendeley–but he is confident in the character and ability of the people remaining at Mendeley.  5. I have never been a heavy user of Mendeley, but I have maintained a free account for the past few years. I use it mainly to create a list of my publications on my personal website, using a WordPress plug-in that uses the Mendeley API.

What’s new with Zotero

Zotero is a very different product than Mendeley. First, it is open-source software, with lots of ways to participate in development. Zotero was developed by the Roy Rosenzweig Center for History and New Media at George Mason University, with foundation and user support. It was developed specifically to support the research work of humanists. Originally a Firefox plug-in, Zotero now works as a standalone piece of software that interacts with Firefox, Chrome, and Safari to recognize bibliographic data on websites and pull them into a database that can be synced across computers (and even some third party mobile software). The newest version of Zotero includes several improvements. The one I am most excited about is detailed download display, which tells you what folder you’re saving a reference into, which is crucial for my workflow. Zotero is the citation manager I use on a daily basis, and I rely on it for formatting the footnotes you see on ACRL TechConnect posts or other research articles I produce. Since much of my research is on the open web, books, or other non-journal article resources, I find the ability of Zotero to pick up library catalog records and similar metadata more useful than the Mendeley import bookmarklet.

Both Zotero and Mendeley offer free storage for metadata and PDFs, with a cost for storage above the free level. (It is also possible to use a WebDAV server for syncing Zotero files).

Zotero Mendeley
300 MB Free
2 GB $20 / year 2 GB Free
6 GB $60 / year 5 GB $55 / year
10 GB $100 / year 10 GB $110 / year
25 GB $240 / year Unlimited $165 / year
Some concluding thoughts

Several graduate students in science 6 have written blog posts about switching away from Mendeley to Zotero. But they aren’t the same thing at all, and given the backgrounds of their creators, Mendeley is more skewed to the sciences, and Zotero more to the humanities.

Nor, as I like to point out, must they be mutually exclusive. I use Zotero for my daily citation management since I much prefer it for grabbing citations online, but sync my Zotero library with Mendeley to use the social and API features in Mendeley. I can choose to do this as an individual, but consider carefully the implications of your choice if you are considering an institutional subscription or requiring students or members of a research group to use a particular service.

  1. Lunden, Ingrid. “Confirmed: Elsevier Has Bought Mendeley For $69M-$100M To Expand Its Open, Social Education Data Efforts.” TechCrunch, April 18, 2013. http://techcrunch.com/2013/04/08/confirmed-elsevier-has-bought-mendeley-for-69m-100m-to-expand-open-social-education-data-efforts/.
  2. Takats, Sean. “Zotero 4.0 Launches.” Zotero, April 11, 2013. http://www.zotero.org/blog/zotero-4-0-launches/.
  3. Henning, Victor. “Mendeley and Elsevier – Here’s More Info.” Mend, April 19, 2013. http://blog.mendeley.com/community-relations/mendeley-and-elsevier-heres-more-info/
  4. Dumon, Oliver. “Elsevier Welcomes Mendeley.” Elsevier Connect, April 8, 2013. http://elsevierconnect.com/elsevier-welcomes-mendeley/.
  5. Hoyt, Jason. “My Thoughts on Mendeley/Elsevier & Why I Left to Start PeerJ,” April 9, 2013. http://enjoythedisruption.com/post/47527556151/my-thoughts-on-mendeley-elsevier-why-i-left-to-start.
  6. For one, see “Mendeley Sells Out; I’m Moving to Zotero.” LJ Villanueva’s Research Blog. Accessed May 20, 2013. http://research.coquipr.com/archives/492.