Last month we got the long-awaited ruling in favor of Google in the Authors Guild vs. Google Books case, which by now has been analyzed extensively. Ultimately the judge in the case decided that Google’s digitization was transformative and thus constituted fair use. See InfoDocket for detailed coverage of the decision.
The Google Books project was part of the Google mission to index all the information available, and as such could never have taken place without libraries, which hold all those books. While most, if not all, the librarians I know use Google Books in their work, there has always been a sense that the project should not have been started by a commercial enterprise using the intellectual resources of libraries, but should have been started by libraries themselves working together. Yet libraries are often forced to be more conservative about digitization than we might otherwise be due to rules designed to protect the college or university from litigation. This ruling has made it seem as though we could afford to be less cautious. As Eric Hellman points out, the decision seems to imply that with copyright the ends are the important part, not the means. “In Judge Chin’s analysis, copyright is concerned only with the ends, not the means. Copyright seems not to be concerned with what happens inside the black box.” 1 As long as the end use of the books was fair, which was deemed to be the case, the initial digitization was not a problem.
Looking at this from the perspective of repository manager, I want to address a few of the theoretical and logistical issues behind such a conclusion for libraries.
What does this mean for digitization at libraries?
At the beginning of 2013 I took over an ongoing digitization project, and as a first-time manager of a large-scale long-term project, I learned a lot about the processes involved in such a project. The project I work with is extremely small-scale compared with many such projects, but even at this scale the project is expensive and time-consuming. What makes it worth it is that long-buried works of scholarship are finally being used and read, sometimes for reasons we do not quite understand. That gets at the heart of the Google Books decision—digitizing books in library stacks and making them more widely available does contribute to education and useful arts.
There are many issues that we need to address, however. Some of the most important ones are what access can and should be provided to what works, and making mass digitization more available to smaller and international cultural heritage institutions. Google Books could succeed because it had the financial and computing resources of Google matched with the cultural resources of the participating research libraries. This problem is international in scope. I encourage you to read this essay by Amelia Sanz, in which she argues that digitization efforts so far have been inherently unequal and a reflection of colonialism. 2 But is there a practical way of approaching this desire to make books available to a wider audience?
There are several separate issues in providing access. Books that are in the public domain are unquestionably fine to digitize, though differences in international copyright law make it difficult to determine what can be provided to whom. As Amelia Sanz points out, Google can only digitize Spanish works prior to 1870 in Spain, but may digitize the complete work in the United States. The complete work is not available to Spanish researchers, but it is available in full to US researchers.
That aside, there are several reasons why it is useful to digitize works still unquestionably under copyright. One of the major reasons is textual corpus analysis–you need to have every word of many texts available to draw conclusions about use of words and phrases in those texts. Google Books ngram viewer is one such tool that comes out of mass digitization. Searching for phrases in Google and finding that phrase as a snippet in a book is an important way to find information in books that might otherwise be ignored in favor of online sources. Some argue that this means that those books will not be purchased when they might have otherwise been, but it is equally possible that this leads to greater discovery and more purchases, which research into music piracy suggests may be the case.
Another reason to digitize works still under copyright is to highlight the work of marginalized communities, though in that case it is imperative to work with those communities to ensure that the digitization is not exploitative. Many orphan works, for whom a rights-holder cannot be located, fall under this, and I know from some volunteer work that I have done that small cultural heritage institutions are eager to digitize material that represents the cultural and intellectual output of their communities.
In all the above cases, it is crucial to put into place mechanisms for ensuring that works under copyright are not abused. Google Books uses an algorithm that makes it impossible to read an entire book, which is probably beyond the abilities of most institutions. (If anyone has an idea for how to do this, I would love to hear it.) Simpler and more practical solutions to limiting access are to only make a chapter or sample of a book available for public use, which many publishers already allow. For instance, Oxford University Press allows up to 10% of a work (within certain limits) on personal websites or institutional repositories. (That is, of course, assuming you can get permission from the author). Many institutions maintain “dark archives“, which are digitized and (usually) indexed archives of material inaccessible to the public, whether institutional or research information. For instance, the US Department of Energy Office of Scientific and Technical Information maintains a dark archive index of technical reports comprising the equivalent of 6 million pages, which makes it possible to quickly find relevant information.
In any case where an institution makes the decision to digitize and make available the full text of in-copyright materials for reasons they determine are valid, there are a few additional steps that institutions should take. Institutions should research rights-holders or at least make it widely known to potential rights-holders that a project is taking place. The Orphan Works project at the University of Michigan is an example of such a project, though it has been fraught with controversy. Another important step is to have a very good policy for taking down material when a rights-holder asks–it should be clear to the rights-holder whether any copies of the work will be maintained and for what purposes (for instance archival or textual analysis purposes).
Digitizing, Curating, Storing, Oh My!
The above considerations are only useful when it is even possible for institutions without the resources of Google to start a digitization program. There are many examples of DIY digitization by individuals, for instance see Public Collectors, which is a listing of collections held by individuals open for public access–much of it digitized by passionate individuals. Marc Fischer, the curator of Public Collectors, also digitizes important and obscure works and posts them on his site, which he funds himself. Realistically, the entire internet contains examples of digitization of various kinds and various legal statuses. Most of this takes place on cheap and widely available equipment such as flatbed scanners. But it is possible to build an overhead book scanner for large-scale digitization with individual parts and at a reasonable cost. For instance, the DIY Book Scanning project provides instructions and free software for creating a book scanner. As they say on the site, all the process involves is to “[p]oint a camera at a book and take pictures of each page. You might build a special rig to do it. Process those pictures with our free programs. Enjoy reading on the device of your choice.”
“Processing the pictures” is a key problem to solve. Turning images into PDF documents is one thing, but providing high quality optical character recognition is extremely challenging. Free tools such as FreeOCR make it possible to do OCR from image or PDF files, but this takes processing power and results vary widely, particularly if the scan quality is lower. Even expensive tools like Adobe Acrobat or ABBYY FineReader have the same problems. Karen Coyle points out that uncorrected OCR text may be sufficient for searching and corpus analysis, but does not provide a faithful reproduction of the text and thus, for instance, provide access to visually impaired persons 3 This is a problem well known in the digital humanities world, and one solved by projects such as Project Gutenberg with the help of dedicated volunteer distributed proofreaders. Additionally, a great deal of material clearly in the public domain is in manuscript form or has text that modern OCR cannot recognize. In that case, crowdsourcing transcriptions is the only financially viable way for institutions to make text of the material available. 4 Examples of successful projects using volunteer transcriptors or proofreaders include Ancient Lives to transcribe ancient papyrus, What’s on the Menu at the New York Public Library, and DIYHistory at the University of Iowa libraries. (The latter has provided step by step instructions for building your own version using open source tools).
So now you’ve built your low-cost DIY book scanner, and put together a suite of open source tools to help you process your collections for free. Now what? The whole landscape of storing and preserving digital files is far beyond the scope of this post, but the cost of accomplishing this is probably the highest of anything other than staffing a digitization project, and it is here where Google clearly has the advantage. The Internet Archive is a potential solution to storing public domain texts (though they are not immune to disaster), but if you are making in-copyright works available in any capacity you will most likely have to take the risk on your own servers. I am not a lawyer, but I have never rented server space that would allow copyrighted materials to be posted.
Conclusion: Is it Worth It?
Obviously from this post I am in favor of taking on digitization projects of both public domain and copyrighted materials when the motivations are good and the policies are well thought out. From this perspective, I think the Google Books decision was a good thing for libraries and for providing greater access to library collections. Libraries should be smart about what types of materials to digitize, but there are more possibilities for large-scale digitization, and by providing more access, the research community can determine what is useful to them.
If you have managed a DIY book scanning project, please let me know in the comments, and I can add links to your project.
- Hellman, Eric. “Google Books and Black-Box Copyright Jurisprudence.” Go To Hellman, November 18, 2013. http://go-to-hellman.blogspot.com/2013/11/google-books-and-black-box-copyright.html. ↩
- Sanz, Amelia. “Digital Humanities or Hypercolonial Studies?” Responsible Innovation in ICT (June 26, 2013). http://responsible-innovation.org.uk/torrii/resource-detail/1249#_ftnref13. ↩
- Coyle, Karen. “It’s FAIR!” Coyle’s InFormation, November 14, 2013. http://kcoyle.blogspot.com/2013/11/its-fair.html. ↩
- For more on this, see Ben Brumfield’s work on crowdsourced transcription, for example Brumfield, Ben W. “Collaborative Manuscript Transcription: ‘The Landscape of Crowdsourcing and Transcription’ at Duke University.” Collaborative Manuscript Transcription, November 23, 2013. http://manuscripttranscription.blogspot.com/2013/11/the-landscape-of-crowdsourcing-and.html. ↩
In honor of Open Access Week, I want to look at some troubling recent discussions about open access, and what academic librarians who work with technology can do. As the manager of an open access institutional repository, I strongly believe that providing greater access to academic research is a good worth pursuing. But I realize that this comes at a cost, and that we have a responsibility to ensure that open access also means integrity and quality.
On “stings” and quality
By now, the article by John Bohannon in Science has been thoroughly dissected in the blogosphere 1. This was not a study per se, but rather a piece of investigative journalism looking into the practices of open access journals. Bohannon submitted variations on an article written under African pseudonyms from fake universities that “any reviewer with more than a high-school knowledge of chemistry…should have spotted the paper’s short-comings immediately.” Over the course of 10 months, he submitted these articles to 304 open access journals whose names he drew from the Directory of Open Access Journals and Jeffrey Beall’s list of predatory open access publishers. Ultimately 157 of the journals accepted the article and 98 rejected it, when any real peer review would have meant that it was rejected in all cases. It is very worth noting that in an analysis of the raw data that Bohannon supplied some publishers on Beall’s list rejected the paper immediately, which is a good reminder to take all curative efforts with an appropriate amount of skepticism 2.
There are certainly many methodological flaws in this investigation, which Mike Taylor outlines in detail in his post 3, and which he concludes was specifically aimed at discrediting open access journals in favor of journals such as Science. As Michael Eisen outlines, Science has not been immune to publishing articles that should have been rejected after peer review–though Bohannon informed Eisen that he intended to look at a variety of journals but this was not practical, and this decision was not informed by editors at Science. Eisen’s conclusion is that “peer review is a joke” and that we need to stop regarding the publication of an article in any journal as evidence that the article is worthwhile 4. Phil Davis at the Scholarly Kitchen took issue with this conclusion (among others noted above), since despite the flaws, this did turn up incontrovertible evidence that “a large number of open access publishers are willfully deceiving readers and authors that articles published in their journals passed through a peer review process…” 5. His conclusion is that open access agencies such as OASPA and DOAJ should be better at policing themselves, and that on the other side Jeffrey Beall should be cautious about suggesting a potential for guilt without evidence. I think one of the more level-headed responses to this piece comes from outside the library and scholarly publishing world in Steven Novella’s post on Neurologica, a blog focused on science and skepticism written by an academic neurologist. He is a fan of open access and wider access to information, but makes the point familiar to all librarians that the internet creates many more opportunities to distribute both good and bad information. Open access journals are one response to the opportunities of the internet, and in particular author-pays journals like “all new ‘funding models’ have the potential of creating perverse incentives.” Traditional journals fall into the same trap when they rely on impact factor to drive subscriptions, which means they may end up publishing “sexy” studies of questionable validity or failing to publish replication studies which are the backbone of the scientific method–and in fact the only real way to establish results no matter what type of peer review has been done 6.
More “perverse incentives”
So far the criticisms of open access have revolved around one type of “gold” open access, wherein the author (or a funding agency) pays article publication fees. “Green” open access, in which a version of the article is posted in a repository is not susceptible to abuse in quite the same way. Yet a new analysis of embargo policies by Shan Sutton shows that some publishers are targeting green open access through new policies. Springer used to have a 12 month embargo for mandated deposit in repositories such as PubMed, but now has extended it to all institutional repositories. Emerald changed its policy so that any mandated deposit to a repository (whether by funder or institutional mandate) was subject to a 24 month embargo 7.
In both cases, paid immediate open access is available for $1,595 (Emerald) or $3,000 (Springer). It seems that the publishers are counting that a “mandate” means that funds are available for this sort of hyrbid gold open access, but that ignores the philosophy behind such mandates. While federal open access mandates do in theory have a financial incentive that the public should not have to pay twice for research, Sutton argues that open access “mandates” at institutions are actually voluntary initiatives by the faculty, and provide waivers without question 8. Additionally, while this type of open access does provide public access to the article, it does not address larger issues of reuse of the text or data in the true sense of open access.
What should a librarian do?
The issues above are complex, but there are a few trends that we can draw on to understand our responsibilities to open access. First, there is the issue of quality, both in terms of researcher experience in working with a journal, and that of being able to trust the validity of an individual article. Second, we have to be aware of the terms under which institutional policies may place authors. As with many such problems, the technological issues are relatively trivial. To actually address them meaningfully will not happen with technology alone, but with education, outreach, and network building.
The major thing we can take away from Bohannon’s work is that we have to help faculty authors to make good choices about where they submit articles. Anyone who works with faculty has stories of extremely questionable practices by journals of all types, both open access and traditional. Speaking up about those practices on an individual basis can result in lawsuits, as we saw earlier this year. Are there technical solutions that can help weed out predatory publishers and bad journals and articles? The Library Loon points out that many factors, some related to technology, have meant that both positive and negative indicators of journal quality have become less useful in recent years. The Loon suggests that “[c]reating a reporting mechanism where authors can rate and answer relatively simple questions about their experiences with various journals seems worthwhile.” 9
The comments to this post have some more suggestions, including open peer review and a forum backed by a strong editor that could be a Yelp-type site for academic publisher reputation. I wrote about open peer review earlier this year in the context of PeerJ, and participants in that system did indeed find the experience of publishing in a journal with quick turnarounds and open reviews pleasant. (Bohannon did not submit a fake article to PeerJ). This solution requires that journals have a more robust technical infrastructure as well as a new philosophy to peer review. More importantly, this is not a solution librarians can implement for our patrons–it is something that has to come from the journals.
The idea that seems to be catching on more is the “Yelp” for scholarly publishers. This seems like a good potential solution, albeit one that would require a great deal of coordinated effort to be truly useful. The technical parts of this type of solution would be relatively easy to carry out. But how to ensure that it is useful for its users? The Yelp analog may be particularly helpful here. When it launched in 2004, it asked users who were searching for some basic information about their question, and to provide the email addresses of additional people whom they would have traditionally asked for this information. Yelp then emailed those people as well as others with similar searches to get reviews of local businesses to build up its base of information. 10 Yelp took a risk in pursuing content in that way, since it could have been off-putting to potential users. But local business information was valuable enough to early users that they were willing to participate, and this seems like a perfect model to build up a base of information on journal publisher practices.
This helps address the problem of predatory publishers and shifting embargoes, but it doesn’t help as much with the issue of quality assurance for the article content. Librarians teach students how to find articles that claim to be peer reviewed, but long before Bohannon we knew that peer review quality varies greatly, and even when done well tells us nothing about the validity of the research findings. Education about the scholarly communication cycle, the scientific method, and critical thinking skills are the most essential tools to ensure that students are using appropriate articles, open access or not. However, those skills are difficult to bring to bear for even the most highly experienced researchers trying to keep up with a large volume of published research. There are a few technical solutions that may be of help here. Article level metrics, particularly alternative metrics, can aid in seeing how articles are being used. (For more on altmetrics, see this post from earlier this year).
One of the easiest options for article level metrics is the Altmetric.com bookmarklet. This provides article level metrics for many articles with a DOI, or articles from PubMed and arXiv. Altmetric.com offers an API with a free tier to develop your own app. An open source option for article level metrics is PLOS’s Article-Level Metrics, a Ruby on Rails application. These solutions do not guarantee article quality, of course, but hopefully help weed out more marginal articles.
No one needs to be afraid of open access
For those working with institutional repositories or other open access issues, it sometimes seems very natural for Open Access Week to fall so near Halloween. But it does not have to be frightening. Taking responsibility for thoughtful use of technical solutions and on-going outreach and education is essential, but can lead to important changes in attitudes to open access and changes in scholarly communication.
- Bohannon, John. “Who’s Afraid of Peer Review?” Science 342, no. 6154 (October 4, 2013): 60–65. doi:10.1126/science.342.6154.60. ↩
- “Who Is Afraid of Peer Review: Sting Operation of The Science: Some Analysis of the Metadata.” Scholarlyoadisq, October 9, 2013. http://scholarlyoadisq.wordpress.com/2013/10/09/who-is-afraid-of-peer-review-sting-operation-of-the-science-some-analysis-of-the-metadata/. ↩
- Taylor, Mike. “Anti-tutorial: How to Design and Execute a Really Bad Study.” Sauropod Vertebra Picture of the Week. Accessed October 17, 2013. http://svpow.com/2013/10/07/anti-tutorial-how-to-design-and-execute-a-really-bad-study/. ↩
- Eisen, Michael. “I Confess, I Wrote the Arsenic DNA Paper to Expose Flaws in Peer-review at Subscription Based Journals.” It Is NOT Junk, October 3, 2013. http://www.michaeleisen.org/blog/?p=1439. ↩
- Davis, Phil. “Open Access ‘Sting’ Reveals Deception, Missed Opportunities.” The Scholarly Kitchen. Accessed October 17, 2013. http://scholarlykitchen.sspnet.org/2013/10/04/open-access-sting-reveals-deception-missed-opportunities/. ↩
- Novella, Steven. “A Problem with Open Access Journals.” Neurologica Blog, October 7, 2013. http://theness.com/neurologicablog/index.php/a-problem-with-open-access-journals/. ↩
- Sutton, Shan C. “Open Access, Publisher Embargoes, and the Voluntary Nature of Scholarship: An Analysis.” College & Research Libraries News 74, no. 9 (October 1, 2013): 468–472. ↩
- Ibid., 469 ↩
- Loon, Library. “A Veritable Sting.” Gavia Libraria, October 8, 2013. http://gavialib.com/2013/10/a-veritable-sting/. ↩
- Cringely, Robert. “The Ears Have It.” I, Cringely, October 14, 2004. http://www.pbs.org/cringely/pulpit/2004/pulpit_20041014_000829.html. ↩
Scholarly publishing, if you haven’t noticed, is nearing a crisis. Authors are questioning the value added by publishers. Open Access publications are growing in number and popularity. Peer review is being criticized and re-invented. Libraries are unable to pay price increases for subscription journals. Traditional measures of scholarly impact and journal rankings are being questioned while new ones are developed. Fresh business models or publishing platforms appear to spring up daily.1
I personally am a little frustrated with scholarly publishing, albeit for reasons not entirely related to the above. I find that most journals haven’t adapted to the digital age yet and thus are still employing editorial workflows and yielding final products suited to print.
How come I have yet to see a journal article PDF with clickable hyperlinks? For that matter, why is PDF still the dominant file format? What advantage does a fixed-width format hold over flexible, fluid-width HTML?2 Why are raw data not published alongside research papers? Why are software tools not published alongside research papers? How come I’m still submitting black-and-white charts to publications which are primarily read online? Why are digital-only publications still bound to regular publication schedules when they could publish like blogs, as soon as the material is ready? To be fair, some journals have answered some of these questions, but the issues are still all too frequent.
So, as a bit of an experiment, I recently published a short research study entirely on GitHub.3 I included the scripts used to generate data, the data, and an article-like summary of the whole process.
I analyzed 130 mobile library websites using YSlow, compared them to the Alexa Top 10, published on GitHub: http://t.co/phQVaRRb9V
— Eric Phetteplace (@phette23) October 11, 2013
What makes it possible
Unfortunately, I wouldn’t recommend my little experiment for most scholars, except perhaps for pre- or post-prints of work published elsewhere. Why? The primary reason people publish research is for tenure review, for enhancing a CV. I won’t list my study—though, arguably, I should be able to—simply because it didn’t go through the usual scholarly publishing gauntlet. It wasn’t peer-reviewed, it didn’t appear in a journal, and it wouldn’t count for much in the eyes of traditional faculty members.
However, I’m at a community college. Research and publication are not among my position’s requirements. I’m judged on my teaching and various library responsibilities, while publications are an unnecessary bonus. Would it help to have another journal article on my CV? Yes, probably. But there’s little pressure and personally I’m more interested in experimentation than in lengthening my list of publications.
Other researchers might also worry about someone stealing their ideas or data if they begin publishing an incomplete project. For me, again, publication isn’t really a competitive field. I would be happy to see someone reuse my project, even if they didn’t give proper attribution back. Openness is an advantage, not a vulnerability.
It’s ironic that being at a non-research institution frees me up to do research. It’s done mostly in my free-time, which isn’t great, but the lack of pressure means I can play with modes of publication, or not worry about the popularity of journals I submit to. To some degree, this is indicative of structural problems with scholarly publishing: there’s inertia in that, in order to stay in the game and make a name for yourself, you can’t do anything too wild. You need to publish, and publish in the recognized titles. Only tenured faculty, who after all owe at least some of their success to the current system, can risk dabbling with new publishing models and systems of peer-review.
What’s really good
GitHub, and the web more generally, are great mediums for scholarship. They address several of my prior questions.
For one, the web is just as suited to publishing data as text. There’s no limit on file format or (practically) size. Even if I was analyzing millions of data points, I could make a compressed archive available for others to download, verify, and reuse in their own research. For my project, I used a Google Spreadsheet which allows others to download the data or simply view it on the web. The article itself can be published on GitHub Pages, which provides free hosting for static websites.
While my study didn’t undergo any peer review, it is open for feedback via a pull request or the “issues” queue on GitHub. Typically, peer review is a closed process. It’s not apparent what criticisms were leveled at an article, or what the authors did to address them. Having peer review out in the open not only illuminates the history of a particular article but also makes it easier to see the value being added. Luckily, there are more and more journals with open peer review, such as PeerJ which we’ve written about previously. When I explain peer review to students, I often open up the “Peer Review history” section of a PeerJ article. Students can see that even articles written by professional researchers have flaws which the reviewing process is designed to identify and mitigate.
Another benefit of open peer review, present in publishing on GitHub too, is the ability to link to specific versions of an article. This has at least two uses. First of all, it has historical value in that one can trace the thought process of the researcher. Much like original manuscripts are a source of insight for literary analyses, merely being able to trace the evolution of a journal article enables new research projects in and of itself.
Secondly, as web content can be a moving target as it is revised over time, being able to link to specific versions aids those referencing a work. Linking to a git “commit” (think a particular point in time), possibly using perma.cc or the Internet Archive to store a copy of the project as it existed then, is an elegant way of solving this problem. For instance, at one point I manually removed some data points which were inappropriate for the study I was performing. One can inspect the very commit where I did this, seeing which lines of text were deleted and possibly identifying any mistakes which were made.
I’ve also grown tired of typical academic writing. The tendency to value erudite over straightforward language, lengthy titles with the snarky half separated from the actually descriptive half by a colon, the anxiety about the particularities of citations and style manuals; all of these I could do without. Let’s write compelling, truthful content without fetishizing consistency and losing the uniqueness of our voice. I’m not saying my little study achieves much in this regard, but it was a relief to be free to write in whatever manner I found most suitable.
Finally, and most encouraging in my mind, the time to publication of a research project can be greatly reduced with new web-based means. I wrote a paper in graduate school which took almost two years to appear in a peer-reviewed journal; by the time I was given the pre-prints to review, I’d entirely forgotten about it. On GitHub, all delays were solely my fault. While it’s true (you can see so in the project’s history) that the seeds of this project were planted nearly a year ago, I started working in earnest just a few months ago and finished the writing in early October.
What’s really bad
GitHub, while a great company which has reduced the effort needed to use version control with its clean web interface and graphical applications, is not the most universally understood platform. I have little doubt that if I were to publish a study on my blog, I would receive more commentary. For one, GitHub requires an account which only coders or technologists would be likely to have already, while many comment platforms (like Disqus) build off of common social media accounts like Twitter and Facebook. Secondly, while GitHub’s “pull requests” are more powerful than comments in that they can propose changes to the actual content of a project, they’re doubtless less understood as well. Expecting scholarly publishing to suddenly embrace software development methodologies is naive at best.
As a corollary to GitHub’s rather niche appeal, my article hasn’t undergone any semblance of peer review. I put it out there; if someone spots an inaccuracy, I’ll make note of and address it, but no relevant parties will necessarily critique the work. While peer review has its problems—many intimate with the problems of scholarly publishing at large—I still believe in the value of the process. It’s hard to argue a publication has reached an objective conclusion when only a single pair of eyes have scrutinized it.
Researchers who are afraid of having their work stolen, or of publishing incomplete work which may contain errors, will struggle to accept open publishing models using tools like GitHub. Prof Hacker, in an excellent post on “Forking the Academy”, notes many cultural challenges to moving scholarly publishing towards an open source software model. Scholars may worry that forking a repository feels like plagiarism or goes against the tradition of valuing original work. To some extent, these fears may come more from misunderstandings than genuine problems. Using version control, it’s perfectly feasible to withhold publishing a project until it’s complete and to remove erroneous missteps taken in the middle of a work. Theft is just as possible under the current scholarly publishing model; increasing the transparency and speed of one’s publishing does not give license to others to take credit for it. Unless, of course, one uses a permissive license like the Public Domain.
Convincing academics that the fears above are unwarranted or can be overcome is a challenge that cannot be overstated. In all likelihood, GitHub as a platform will never be a major player in scholarly publishing. The learning curve, both technical and cultural, is simply too great. Rather, a good starting point would be to let the appealing aspects of GitHub—versioning, pull requests, issues, granular attribution of authorship at the commit level—inform the development of new, user-friendly platforms with final products that more closely resemble traditional journals. Prof Hacker, again, goes a long way towards developing this with a wish list for a powerful collaborative writing platform.
What about the IR?
The discoverability of web publications is problematic. While I’d like to think my research holds value for others’ literature reviews, it’s never going to show up while searching in a subscription database. It seems unreasonable to ask researchers, who already look in many places to compile complete bibliographies, to add GitHub to their list of commonly consulted sources. Further fracturing the scholarly publishing environment not only inconveniences researchers but it goes against the trend of discovery layers and aggregators (e.g. Google Scholar) which aim to provide a single search across multiple databases.
On the other hand, an increasing amount of research‐from faculty and students alike—is conducted through Google, where GitHub projects will appear alongside pre-prints in institutional repositories. Simply being able to tweet out a link to my study, which is readable on a smartphone and easily saved to any read-it-later service, likely increases its readership over stodgy PDFs sitting in subscription databases.
Institutional repositories solve some, but not all, of the deficiencies of publishing on GitHub. Discoverability is increased because researchers at your institution may search the IR just like they do subscription databases. Futhermore, thanks to the Open Archives Initiative and the OAI-PMH standard, content can be aggregated from multiple IRs into larger search engines like OCLC’s OAIster. However, none of the major IR software players support versioned publication. Showing work-in-progress, linking to specific points in time of a work, and allowing for easy reuse are all lost in the IR.
Every publication in its place
As I’ve stated, publishing independently on GitHub isn’t for everyone. It’s not going to show up on your CV and it’s not necessarily going to benefit from the peer review process. But plenty of librarians are already doing something similar, albeit a bit less formal: we’re writing blog posts with original research or performing quick studies at our respective institutions. It’s not a great leap to put these investigations under version control and then publish them on the web. GitHub could be a valuable compliment to more traditional venues, reducing the delay between when data is collected and when it’s available for public consumption. Furthermore, it’s not at all mutually exclusive with article submissions. One could gain both the immediate benefit of getting one’s conclusions out there, but also produce a draft of a journal article.
As scholarly publishing continues to evolve, I hope we’ll see a plethora of publishing models rather than one monolithic process replacing traditional print-based journals. Publications hosted on GitHub, or a similar platform, would sit nicely alongside open, web-based publications like PeerJ, scholarly blog/journal hybrids like In The Library with the Lead Pipe, deposits in Institutional Repositories, and numerous other sources of quality content.
- I think a lot of these statements are fairly well-recognized in the library community, but here’s some evidence: the recent Open Access “sting” operation (which we’ll cover more in-depth in a forthcoming post) that exposed flaws in some journals’ peer review process, altmetrics, PeerJ, other experiments with open peer review (e.g. by Shakespeare Quarterly), the serials crisis (which is well-known enough to have a Wikipedia entry), predictions that all scholarship will be OA in a decade or two, and increasing demands that scholarly journals allow text mining access all come to mind. ↩
- I’m totally prejudiced in this matter because I read primarily through InstaPaper. A journal like Code4Lib, which publishes in HTML, is easy to send to read-it-later services, while PDFs aren’t. PDFs also are hard to read on smartphones, but they can preserve details like layout, tables, images, and font choices better than HTML. A nice solution is services which offer a variety of formats for the same content, such as Open Journal Systems with its ability to provide HTML, PDF, and ePub versions of articles. ↩
- For non-code uses of GitHub, see our prior Tech Connect post. ↩
No matter whether a small university press focusing on niche markets to the Big Six giants looking for the next massive bestseller, the publishing industry has been struggling to come to terms with the reality of new distribution models. Those models tends to favor cheaper and faster production with a much lower threshold for access, which generally has been good news for consumers. Several recent rulings and statements have brought the issues to the forefront of conversation and perhaps indicated some common themes in publishing which are relevant to all libraries and their ability to purchase and/or provide digital content.
Academic Publishing: Dissertation == Monograph?
On July 22 the American Historical Association issued a “Statement on Policies Regarding the Embargoing of Completed History PhD Dissertations”. In this statement, the American Historical Association recommended that all libraries and graduate programs allow dissertations to be embargoed for up to six years. This is, in theory, to allow junior scholars enough time to publish a monograph based on the dissertation in order to receive tenure. This would be under the assumption that academic publishers would not publish a book based on a dissertation freely available online. Reactions to this statement prompted the AHA to release a Q & A page to clarify and support their position, including pointing out that publishers’ positions are too unclear to be sure there is no risk to an open access dissertation, and “like it or not”, junior faculty must produce a monograph to get tenure. They claim that in some cases that this benefits junior scholars to give them more time to revise their work before publication–while this is true, it indicates that a dissertation is not equivalent to a published scholarly monograph. The argument from the publisher’s side appears to be that libraries (who are the main purchasers of scholarly monographs) will not purchase books based on revised dissertations freely available online, the truth of which has been debated widely. Libraries do purchase print copies of titles (both monographs and serials) which are freely available online.
From my personal experience as an institutional repository manager, I know the attitude to embargoing dissertations varies widely by advisor and department. Like most people making an argument about this topic, I do not have much more than anecdotes to provide. I checked the most commonly downloaded dissertations from the past year, and it appeared the most frequently downloaded title (over 2000 over 2012-2013) is also the only one that has been published as a book that has been purchased by at least one library. Clearly this does not control for all variables and warrants further study, but it is a useful clue that open access availability does not always affect publication and later purchase. Further, from the point of view of open access creating more equal access to resources across the world, Google Analytics for that dissertation indicates that the sessions over the past year with the most engaged users came from, in order, the UK, the United States, Mauritius, and Sri Lanka.
What Should a Digital Book Cost?
In mid-July Denise Cote, the judge in the Apple e-book price fixing case, issued an opinion stating that Apple did collude with the publishers to set prices on ebooks. Reading the story of the negotiations in the opinion is a thrilling behind the scenes look at companies trying to get a handle on a fairly new market and trying to understand how they will make money. Below I summarize the 160 page opinion, which is well worth reading in its entirety.
The problem with ebook pricing started with Amazon, which set a price of $9.99 for new releases that normally would have had list prices of $25-$30. This was frustrating to the major publishing houses, who worried (probably rightly so) that consumers would be unwilling to pay more than $10 for books after getting used to this low price point. Amazon would effectively price everyone else out of the market. Even after publishers raised the wholesale price of new releases, Amazon would sell them at loss to preserve the $9.99 price. The publishers spent 2009 developing strategies to combat Amazon, but it wasn’t until late 2009 with the entry of Apple into the ebook market that they saw a real opportunity.
Apple agreed with the Big Six publishers that setting all books at $9.99 was too low, but was unwilling to enter into a market in which they could not compete with Amazon. To accomplish this, they wanted the publishers to agree to the same terms, which included lower wholesale prices for ebooks. The negotiations that followed over late 2009 and early 2010 started positively, but ended in dissatisfaction. Because Apple was unwilling to sell anything as a loss leader, they felt that a wholesale model would leave them too vulnerable to Amazon. To address that, they proposed to sell books with an agency model (which several publishers had suggested). With an agency model, Apple would collect a 30% commission on sales just as they did with the App Store. To ensure that publishers did not set unrealistically high prices, Apple would set pricing caps. The other crucial move that Apple made was to insist that publishers move all retailers of ebooks to the agency model in order to ensure Apple would be able to compete on price across the board. Amazon had no interest in the agency model, and in early 2010 had a series of meeting with the publishers that made this clear. After all the agreements were signed with Apple (the only Big Six publisher who did not participate was Random House), the publishers needed to move Amazon to an agency model to fulfill the terms of their contract. Macmillan was the first publisher to set up a meeting with Amazon to discuss this requirement. The response to the meeting was for Amazon to remove the “buy” button from all Macmillan books, both print and Kindle editions. Amazon eventually had to capitulate to the publishers to move to an agency model, which was complete by mid-2010, but submitted a complaint to the Federal Trade Commission. Random House finally agreed to an agency model with Apple in early 2011, thanks to a spot of blackmail on Apple’s part (it wouldn’t allow any Random House apps without a agency deal).
Ultimately the court determined that Apple violated the Sherman Act by conspiring with the publishers to force all their retailers to sell books at the same prices and thus removing competition. A glance at Amazon’s Kindle store bestsellers today shows books priced from $1.99 to $13.99 for the newest Stephanie Plum mystery (the same price as it is in the Apple bookstore). For all titles priced higher than $9.99, Amazon notes that the “price is set by the publisher.” Whether this means anything to the average consumer is debatable. Compare these negotiations to the on-going struggle libraries have had with availability of ebooks for lending–publishers have a lot to learn about libraries in addition to new models for digital sales, some of which was covered at the series of talks with the Big Six publishers that Maureen Sullivan held in early 2012. Over recent months publishers have made more ebooks available to libraries. But some libraries, most notably the Douglas County, Colorado libraries, are setting their own terms for purchasing and lending ebooks.
What Can You Do With a Digital File?
The last ruling I want to address is about the music resale service ReDigi, about which Kevin Smith goes into detail. This was was a service that provided a way for people to re-sell purchased MP3s, but ultimately the judge ruled that it was impossible to transfer the original file and so this did not fit under the first sale doctrine. The first sale doctrine (17 USC § 109) holds that “the owner of a particular copy or phonorecord lawfully made … is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord.” Another case that was decided in April by the Supreme Court, Kirtsaeng v. Wiley, upheld this in the case of international sales of physical items, which was an important decision for libraries. But digital materials are more complicated. First sale applies to computer programs on physical media (except in certain circumstances), but does not cover material that has been licensed rather than sold, which is how most digital files are distributed. (For how the US Attorney’s Office approaches this in criminal investigations, see this document.) So when you “buy” that Kindle book from Amazon or load a book onto your iPad you are licensing the product for limited use on a limited number of devices and no legal recourse for lending or getting rid of the content, even if you try hard to follow the law as ReDigi did. Librarians are well aware of this and its implications, and license quite a bit of content that we can loan and/or distribute under limited circumstances. Libraries are safest in the long term if they can own the content outright rather than licensing, as are consumers. But it will be a long time before there is clarity about the legal way to transfer owner of a digital file at the consumer level.
Librarians and publishers have a complicated relationship. We need each other if either is to succeed, but even if our ends are the ultimately the same, our means are very different. These recent events indicate that there is still much in flux and plenty of room for constructive dialog with content creators and publishers.
In April of this year, the two most popular free citation managers–Mendeley and Zotero–both underwent some big changes. On April 8th, TechCrunch announced that Elsevier had purchased Mendeley, which had been surmised in January. 1 Just a few days later, Zotero announced the release of version 4, with a number of new features. 2 Just as with the sunsetting of Google Reader, this has prompted many to consider what citation managers they have been using and think about switching or changing practices. I will not address subscription or paid products like RefWorks and EndNote specifically, though there are certainly many reasons you might prefer one of those products.
Mendeley: a new Star Wars movie in the making?
The rhetoric surrounding Elsevier’s acquisition of Mendeley was generally alarmist in nature, and the hashtag “#mendelete” that popped up immediately after the announcement suggests that many people’s first instinct was to abandon Mendeley. Elsevier has been held up as a model of anti-open access, and Mendeley as a model for open access. Yet Mendeley has always been a for-profit company, and, like Google, benefits itself and its users (particularly the science community) by knowing what they are reading and sharing. After all, the social features of Mendeley wouldn’t have any value if there was no public sharing. Institutional Mendeley accounts allow librarians to see what their users in aggregate are reading and saving, which helps them make collection development decisions– a service beyond what the average institutional citation manager product accomplishes. Victor Henning promises on the Mendeley blog that nothing will change, and that this will give them more freedom to develop more features 3. As for Elsevier, Oliver Dumon promises that Mendeley will remain independent and allowed to follow their own course–and that bringing it together with ScienceDirect and Scopus will create a “central workflow and collaboration site for authors”.4
There are two questions to be answered in this. First, is it realistic to assume that the Mendeley team will have the creative freedom they say they will have? And second, are users comfortable with their data being available to Elsevier? For many, the answers to both these questions seem to be “no” and “no.” A more optimistic point of view is that if Elsevier must placate Mendeley users who are open access advocates, they will allow more openness than before.
It’s too early to say, but I remain hopeful that Mendeley can continue to create a more open spirit in academic publishing. Peter Hoyt (a former employee of Mendeley and founder of PeerJ) suggests that much of the work that he oversaw to open up Mendeley was being stymied by Elsevier specifically. For him, this went against his personal ethos and so he was unable to stay at Mendeley–but he is confident in the character and ability of the people remaining at Mendeley. 5. I have never been a heavy user of Mendeley, but I have maintained a free account for the past few years. I use it mainly to create a list of my publications on my personal website, using a WordPress plug-in that uses the Mendeley API.
What’s new with Zotero
Zotero is a very different product than Mendeley. First, it is open-source software, with lots of ways to participate in development. Zotero was developed by the Roy Rosenzweig Center for History and New Media at George Mason University, with foundation and user support. It was developed specifically to support the research work of humanists. Originally a Firefox plug-in, Zotero now works as a standalone piece of software that interacts with Firefox, Chrome, and Safari to recognize bibliographic data on websites and pull them into a database that can be synced across computers (and even some third party mobile software). The newest version of Zotero includes several improvements. The one I am most excited about is detailed download display, which tells you what folder you’re saving a reference into, which is crucial for my workflow. Zotero is the citation manager I use on a daily basis, and I rely on it for formatting the footnotes you see on ACRL TechConnect posts or other research articles I produce. Since much of my research is on the open web, books, or other non-journal article resources, I find the ability of Zotero to pick up library catalog records and similar metadata more useful than the Mendeley import bookmarklet.
Both Zotero and Mendeley offer free storage for metadata and PDFs, with a cost for storage above the free level. (It is also possible to use a WebDAV server for syncing Zotero files).
|2 GB||$20 / year||2 GB||Free|
|6 GB||$60 / year||5 GB||$55 / year|
|10 GB||$100 / year||10 GB||$110 / year|
|25 GB||$240 / year||Unlimited||$165 / year|
Some concluding thoughts
Several graduate students in science 6 have written blog posts about switching away from Mendeley to Zotero. But they aren’t the same thing at all, and given the backgrounds of their creators, Mendeley is more skewed to the sciences, and Zotero more to the humanities.
Nor, as I like to point out, must they be mutually exclusive. I use Zotero for my daily citation management since I much prefer it for grabbing citations online, but sync my Zotero library with Mendeley to use the social and API features in Mendeley. I can choose to do this as an individual, but consider carefully the implications of your choice if you are considering an institutional subscription or requiring students or members of a research group to use a particular service.
- Lunden, Ingrid. “Confirmed: Elsevier Has Bought Mendeley For $69M-$100M To Expand Its Open, Social Education Data Efforts.” TechCrunch, April 18, 2013. http://techcrunch.com/2013/04/08/confirmed-elsevier-has-bought-mendeley-for-69m-100m-to-expand-open-social-education-data-efforts/. ↩
- Takats, Sean. “Zotero 4.0 Launches.” Zotero, April 11, 2013. http://www.zotero.org/blog/zotero-4-0-launches/. ↩
- Henning, Victor. “Mendeley and Elsevier – Here’s More Info.” Mend, April 19, 2013. http://blog.mendeley.com/community-relations/mendeley-and-elsevier-heres-more-info/ ↩
- Dumon, Oliver. “Elsevier Welcomes Mendeley.” Elsevier Connect, April 8, 2013. http://elsevierconnect.com/elsevier-welcomes-mendeley/. ↩
- Hoyt, Jason. “My Thoughts on Mendeley/Elsevier & Why I Left to Start PeerJ,” April 9, 2013. http://enjoythedisruption.com/post/47527556151/my-thoughts-on-mendeley-elsevier-why-i-left-to-start. ↩
- For one, see “Mendeley Sells Out; I’m Moving to Zotero.” LJ Villanueva’s Research Blog. Accessed May 20, 2013. http://research.coquipr.com/archives/492. ↩
A few months ago as part of a discussion on open peer review, I described the early stages of planning for a new type of journal, called PeerJ. Last month on February 12 PeerJ launched with its first 30 articles. By last week, the journal had published 53 articles. There are a number of remarkable attributes of the journal so far, so in this post I want to look at what PeerJ is actually doing, and some lessons that academic libraries can take away–particularly for those who are getting into publishing.
What PeerJ is Doing
On the opening day blog post (since there are no editorials or issues in PeerJ, communication from the editors has to be done via blog post 1), the PeerJ team outlined their mission under four headings: to make their content open and help to make that standard practice, to practice constant innovation, to “serve academia”, and to make this happen at minimal cost to researchers and no cost to the public. The list of advisory board and academic editors is impressive–it is global and diverse, and includes some big names and Nobel laureates. To someone judging the quality of the work likely to be published, this is a good indication. The members of PeerJ range in disciplines, with the majority in Molecular Biology. To submit and/or publish work requires a fee, but there is a free plan that allows one pre-print to be posted on the forthcoming PeerJ PrePrints.
PeerJ’s publication methods are based on PLoS ONE, which publishes articles based on subjective scientific and methodological soundness rather with no emphasis placed on subjective measures of novelty or interest (see more on this). Like all peer-reviewed journals, articles are sent to an academic editor in the field, who then sends the article to peer reviewers. Everything is kept confidential until the article actually is published, but authors are free to talk about their work in other venues like blogs.
Look and Feel
There are several striking dissimilarities between PeerJ and standard academic journals. The home page of the journal emphasizes striking visuals and is responsive to devices, so the large image scales to a small screen for easy reading. The “timeline” display emphasizes new and interesting content. 2 The code they used to make this all happen is available openly on the PeerJ Github account. The design of the page reflects best practices for non-profit web design, as described by the non-profit social media guide Nonprofit Tech 2.0. The page tells a story, makes it easy to get updates, works on all devices, and integrates social media. The design of the page has changed iteratively even in the first month to reflect the realities of what was actually being published and how people were accessing it. 3 PDFs of articles were designed to be readable on screens, especially tablets, so rather than trying to fit as much text as possible on one page as many PDFs are designed, they have single columns with left margins, fewer words per line, and references hyperlinked in the text. 4
How Open Peer Review Works
One of the most notable features of PeerJ is open peer review. This is not mandatory, but approximately half the reviewers and authors have chosen to participate. 5 This article is an example of open peer review in practice. You can read the original article, the (in this case anonymous) reviewer’s comments, the editors comments and the author’s rebuttal letter. Anyone who has submitted an article to a peer reviewed journal before will recognize this structure, but if you have not, this might be an exciting glimpse of something you have never seen before. As a non-scientist, I personally find this more useful as a didactic tool to show the peer review process in action, but I can imagine how helpful it would be to see this process for articles about areas of library science in which I am knowledgeable.
With only 53 articles and in existence for such a short time, it is difficult to measure what impact open peer review has on articles, or to generalize about which authors and reviewers choose an open process. So far, however, PeerJ reports that several authors have been very positive about their experience publishing with the journal. The speed of review is very fast, and reviewers have been constructive and kind in their language. One author goes into more detail in his original post, “One of the reviewers even signed his real name. Now, I’m not totally sure why they were so nice to me. They were obvious experts in the system that I studied …. But they were nice, which was refreshing and encouraging.” He also points out that the exciting thing about PeerJ for him is that all it requires are projects that were technically well-executed and carefully described, so that this encourages publication of negative or unexpected results, thus avoiding the file drawer effect.6
This last point is perhaps the most important to note. We often talk of peer-reviewed articles as being particularly significant and “high-impact.” But in the case of PeerJ, the impact is not necessarily due to the results of the research or the type of research, but that it was well done. One great example of this is the article “Significant Changes in the Skin Microbiome Mediated by the Sport of Roller Derby”. 7 This was a study about the transfer of bacteria during roller derby matches, and the study was able to prove its hypothesis that contact sports are a good environment in which to study movements of bacteria among people. The (very humorous) review history indicates that the reviewers were positive about the article, and felt that it had promise for setting a research paradigm. (Incidentally, one of the reviewers remained anonymous , since he/she felt that this could “[free] junior researchers to openly and honestly critique works by senior researchers in their field,” and signed the letter “Diligent but human postdoc reviewer”.) This article was published the beginning of March, and already has 2,307 unique visits to the page, and has been shared widely on social media. We can assume that one of the motivations for sharing this article was the potential for roller derby jokes or similar, but will this ultimately make the article’s long term impact stronger? This will be something to watch.
What Can Academic Libraries Learn?
A recent article In the Library With the Lead Pipe discussed the open ethos in two library publications, In the Library With the Lead Pipe and Code4Lib Journal. 8 This article concluded that more LIS publications need to open the peer review process, though the publications mentioned are not peer reviewed in the traditional sense. There are very few, if any, open peer reviewed publications in the nature of PeerJ outside of the sciences. Could libraries or library-related publications match this process? Would they want to?
I think we can learn a few things from PeerJ. First, the rapid publication cycle means that more work is getting published more quickly. This is partly because they have so many reviewers and so any one reviewer isn’t overburdened–and due to their membership model, it is in the best financial interests of potential future authors to be current reviewers. As In the Library With the Lead Pipe points out that a central academic library journal, College & Research Libraries, is now open access and early content is available as a pre-print, the pre-prints reflect content that will be published in some cases well over a year from now. A year is a long time to wait, particularly for work that looks at current technology. Information Technology in Libraries (ITAL), the LITA journal is also open access and provides pre-prints as well–but this page appears to be out of date.
Another thing we can learn is making reading easier and more convenient while still maintaining a professional appearance and clean visuals. Blogs like ACRL Tech Connect and In the Library with the Lead Pipe deliver quality content fairly quickly, but look like blogs. Journals like the Journal of Librarianship and Scholarly Communication have a faster turnaround time for review and publication (though still could take several months), but even this online journal is geared for a print world. Viewing the article requires downloading a PDF with text presented in two columns–hardly the ideal online reading experience. In these cases, the publication is somewhat at the mercy of the platform (WordPress in the former, BePress Digital Commons in the latter), but as libraries become publishers, they will have to develop platforms that meet the needs of modern researchers.
A question put to the ACRL Tech Connect contributors about preferred reading methods for articles suggests that there is no one right answer, and so the safest course is to release content in a variety of formats or make it flexible enough for readers to transform to a preferred format. A new journal to watch is Weave: Journal of Library User Experience, which will use the Digital Commons platform but present content in innovative ways. 9 Any libraries starting new journals or working with their campuses to create new journals should be aware of who their readers are and make sure that the solutions they choose work for those readers.
- “The Launch of PeerJ – PeerJ Blog.” Accessed February 19, 2013. http://blog.peerj.com/post/42920112598/launch-of-peerj. ↩
- “Some of the Innovations of the PeerJ Publication Platform – PeerJ Blog.” Accessed February 19, 2013. http://blog.peerj.com/post/42920094844/peerj-functionality. ↩
- http://blog.peerj.com/post/45264465544/evolution-of-timeline-design-at-peerj ↩
- “The Thinking Behind the Design of PeerJ’s PDFs.” Accessed March 18, 2013. http://blog.peerj.com/post/43558508113/the-thinking-behind-the-design-of-peerjs-pdfs. ↩
- http://blog.peerj.com/post/43139131280/the-reception-to-peerjs-open-peer-review ↩
- “PeerJ Delivers: The Review Process.” Accessed March 18, 2013. http://edaphics.blogspot.co.uk/2013/02/peerj-delivers-review-process.html. ↩
- Meadow, James F., Ashley C. Bateman, Keith M. Herkert, Timothy K. O’Connor, and Jessica L. Green. “Significant Changes in the Skin Microbiome Mediated by the Sport of Roller Derby.” PeerJ 1 (March 12, 2013): e53. doi:10.7717/peerj.53. ↩
- Ford, Emily, and Carol Bean. “Open Ethos Publishing at Code4Lib Journal and In the Library with the Lead Pipe.” In the Library with the Lead Pipe (December 12, 2012). http://www.inthelibrarywiththeleadpipe.org/2012/open-ethos-publishing/. ↩
- Personal communication with Matthew Reidsma, March 19, 2013. ↩
Bibliometrics– used here to mean statistical analyses of the output and citation of periodical literature–is a huge and central field of library and information science. In this post, I want to address the general controversy surrounding these metrics when evaluating scholarship and introduce the emerging alternative metrics (often called altmetrics) that aim to address some of these controversies and how these can be used in libraries. Librarians are increasingly becoming focused on the publishing side of the scholarly communication cycle, as well as supporting faculty in new ways (see, for instance, David Lankes’s thought experiment of the tenure librarian). What is the reasonable approach for technology-focused academic librarians to these issues? And what tools exist to help?
There have been many articles and blog posts expressing frustration with the practice of using journal impact factors for judging the quality of a journal or an individual researcher (see especially Seglen). One vivid illustration of this frustration is in a recent blog post by Stephen Curry titled “Sick of Impact Factors”. Librarians have long used journal impact factors in making purchasing decisions, which is one of the less controversial uses of these metrics 1 The essential message of all of this research about impact factors is that traditional methods of counting citations or determining journal impact do not answer questions about what articles are influential and how individual researchers contribute to the academy. For individual researchers looking to make a case for promotion and tenure, questions of use of metrics can be all or nothing propositions–hence the slightly hysterical edge in some of the literature. Librarians, too, have become frustrated with attempting to prove the return on investment for decisions–see “How ROI Killed the Academic Library”–going by metrics alone potentially makes the tools available to researchers more homogeneous and ignores niches. As the alt metrics manifesto suggests, the traditional “filters” in scholarly communication of peer review, citation metrics, and journal impact factors are becoming obsolete in their current forms.
It would be of interest to determine, if possible, the part which men of different calibre [sic] contribute to the progress of science.
Alfred Lotka (a statistician at the Metropolitan Life Insurance Company, famous for his work in demography) wrote these words in reference to his 1926 statistical analysis of the journal output of chemists 2 Given the tools available at the time, it was a fairly limited sample size, looking at just the first two letters of an author index for the period of 16 years compared with a slim 100 page volume of important works “from the beginning of history to 1900.” His analysis showed that the more articles published in a field, the less likely it is for an individual author to publish more than one article. As Per Seglen puts it, this showed the “skewness” of science 3
The original journal impact factor was developed by Garfield in the 1970s, and used the “mean number of citations to articles published in two preceding years” 4. Quite clearly, this is supposed to measure the general amount that a journal was cited, and hence a guide to how likely a researcher was to read and immediately find useful the body of work in this journal in his or her own work. This is helpful for librarians trying to make decisions about how to stretch a budget, but the literature has not found that a journal’s impact has much to do with an individual article’s citedness and usefulness 5 As one researcher suggests, using it for anything other than its intended original use constitutes pseudoscience 6 Another issue with which those at smaller institutions are very familiar is the cost of accessing traditional metrics. The major resources that provide these are Thomson Reuters’ Journal Citation Reports and Web of Science, and Elsevier’s Scopus, and both are outside the price range of many schools.
Metrics that attempt to remedy some of these difficulties have been developed. At the journal level, the Eigenfactor® and Article Influence Score™ use network theory to estimate “the percentage of time that library users spend with that journal”, and the Article Influence Score tracks the influence of the journal over five years. 7. At the researcher level, the h-index tracks the impact of specific researchers (it was developed with physicists in mind). The h-index takes into account the number of papers the researcher has published in how much time when looking at citations. 8
These are included under the rubric of alternative metrics since they are an alternative to the JCR, but rely on citations in traditional academic journals, something which the “altmetric” movement wants to move beyond.
In this discussion of alt metrics I will be referring to the arguments and tools suggested by Altmetrics.org. In the alt metrics manifesto, Priem et al. point to several manifestations of scholarly communication that are unlike traditional article publications, including raw data, “nanopublication”, and self-publishing via social media (which was predicted as so-called “scholarly skywriting” at the dawn of the World Wide Web 9). Combined with sharing of traditional articles more readily due to open access journals and social media, these all create new possibilities for indicating impact. Yet the manifesto also cautions that we must be sure that the numbers which alt metrics collect “really reflect impact, or just empty buzz.” The research done so far is equally cautious. A 2011 study suggests that tweets about articles (tweetations) do correlate with citations but that we cannot say that number of tweets about an article really measures the impact. 10
A criticism expressed in the media about alt metrics is that alternative metrics are no more likely to be able to judge the quality or true impact of a scientific paper than traditional metrics. 11 As Per Seglen noted in 1992, “Once the field boundaries are broken there is virtually no limit to the number of citations an article may accrue.” 12 So an article that is interdisciplinary in nature is likely to do far better in the alternative metrics realm than a specialized article in a discipline that still may be very important. Mendeleley’s list of top research papers demonstrates this–many (though not all) the top articles are about scientific publication in general rather than about specific scientific results.
What can librarians use now?
Librarians are used to questions like “What is the impact factor of Journal X?” For librarians lucky enough to have access to Journal Citation Reports, this is a matter of looking up the journal and reporting the score. They could answer “How many times has my article been cited?” in Web of Science or Scopus using some care in looking for typos. Alt metrics, however, remind us that these easy answers are not telling the whole story. So what should librarians be doing?
One thing that librarians can start doing is helping their campus community get signed up for the many different services that will promote their research and provide article level citation information. Below are listed a small number (there are certainly others out there) of services that you may want to consider using yourself or having your campus community use. Some, like PubMed, won’t be relevant to all disciplines. Altmetrics.org lists several tools beyond what is listed below to provide additional ideas.
- Google Scholar Metrics and Google Scholar Citations (personal research metrics)
- PubMed My Bibliography
- ORCID: Create a unique researcher ID.
These tools offer various methods for sharing. PubMed allows one to embed “My Bibliography” in a webpage, as well as to create delegates who can help curate the bibliography. A developer can use the APIs provided by some of these services to embed data for individuals or institutions on a library website or institutional repository. ImpactStory has an API that makes it relatively easy to embed data for individuals or institutions on a library website or institutional repository. Altmetric.com also has an API that is free for non-commercial use. Mendeley has many helpful apps that integrate with popular content management systems.
Since this is such a new field, it’s a great time to get involved. Altmetrics.org held a hackathon in November 2012 and has a Google Doc with the ideas for the hackathon. This is an interesting overview of what is going on with open source hacking on alt metrics.
The altmetrics manifesto program calls for a complete overhaul of scholarly communication–alternative research metrics are just a part of their critique. And yet, for librarians trying to help researchers, they are often the main concern. While science in general calls for a change to the use of these metrics, we can help to shape the discussion through educating and using alternative metrics.
Works Cited and Suggestions for Further Reading
- Jerome K. Vanclay, “Impact Factor: Outdated Artefact or Stepping-stone to Journal Certification?” Scientometrics 92 (2) (2011): 212. ↩
- Alfred Lotka, “The Frequency Distribution of Scientific Productivity.” Journal of the Washington Academy of Sciences 26 (12) (1926)): 317. ↩
- Per Seglen, “The Skewness of Science.” Journal of the American Society for Information Science 43 (9) (1992): 628. ↩
- Vanclay, 212. ↩
- Per Seglen, “Causal Relationship Between Article Citedness and Journal Impact.” Journal of the American Society for Information Science 45 (1) (1994): 1-11. ↩
- Vanclay, 211. ↩
- “Methods”, Eigenfactor.org, 2012. ↩
- J.E. Hirsch, “An Index to Quantify an Individual’s Scientific Research Output.” Proceedings of the National Academy of Sciences of the United States of America 102, no. 46 (2005): 16569–16572. ↩
- Blaise Cronin and Kara Overfelt, “E-Journals and Tenure.” Journal of the American Society for Information Science 46 (9) (1995): 700. ↩
- Gunther Eysenbach, “Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact.” Journal Of Medical Internet Research 13 (4) (2011): e123. ↩
- see in particular Jump. ↩
- Seglen, 637. ↩
Open access publication makes access to research free for the end reader, but in many fields it is not free for the author of the article. When I told a friend in a scientific field I was working on this article, he replied “Open access is something you can only do if you have a grant.” PeerJ, a scholarly publishing venture that started up over the summer, aims to change this and make open access publication much easier for everyone involved.
While the first publication isn’t expected until December, in this post I want to examine in greater detail the variation on the “gold” open-access business model that PeerJ states will make it financially viable 1, and the open peer review that will drive it. Both of these models are still very new in the world of scholarly publishing, and require new mindsets for everyone involved. Because PeerJ comes out of funding and leadership from Silicon Valley, it can more easily break from traditional scholarly publishing and experiment with innovative practices. 2
PeerJ is a platform that will host a scholarly journal called PeerJ and a pre-print server (similar to arXiv) that will publish biological and medical scientific research. Its founders are Peter Binfield (formerly of PLoS ONE) and Jason Hoyt (formerly of Mendeley), both of whom are familiar with disruptive models in academic publishing. While the “J” in the title stands for Journal, Jason Hoyt explains on the PeerJ blog that while the journal as such is no longer a necessary model for publication, we still hold on to it. “The journal is dead, but it’s nice to hold on to it for a little while.” 3. The project launched in June of this year, and while no major updates have been posted yet on the PeerJ website, they seem to be moving towards their goal of publishing in late 2012.
To submit a paper for consideration in PeerJ, authors must buy a “lifetime membership” starting at $99. (You can submit a paper without paying, but it costs more in the end to publish it). This would allow the author to publish one paper in the journal a year. The lifetime membership is only valid as long as you meet certain participation requirements, which at minimum is reviewing at least one article a year. Reviewing in this case can mean as little as posting a comment to a published article. Without that, the author might have to pay the $99 fee again (though as yet it is of course unclear how strictly PeerJ will enforce this rule). The idea behind this is to “incentivize” community participation, a practice that has met with limited success in other arenas. Each author on a paper, up to 12 authors, must pay the fee before the article can be published. The Scholarly Kitchen blog did some math and determined that for most lab setups, publication fees would come to about $1,124 4, which is equivalent to other similar open access journals. Of course, some of those researchers wouldn’t have to pay the fee again; for others, it might have to be paid again if they are unable to review other articles.
Peer Review: Should it be open?
PeerJ, as the name and the lifetime membership model imply, will certainly be peer-reviewed. But, keeping with its innovative practices, it will use open peer review, a relatively new model. Peter Binfield explained in this interview PeerJ’s thinking behind open peer review.
…we believe in open peer review. That means, first, reviewer names are revealed to authors, and second, that the history of the peer review process is made public upon publication. However, we are also aware that this is a new concept. Therefore, we are initially going to encourage, but not require, open peer review. Specifically, we will be adopting a policy similar to The EMBO Journal: reviewers will be permitted to reveal their identities to authors, and authors will be given the choice of placing the peer review and revision history online when they are published. In the case of EMBO, the uptake by authors for this latter aspect has been greater than 90%, so we expect it to be well received. 5
In single blind peer review, the reviewers would know the name of the author(s) of the article, but the author would not know who reviewed the article. The reviewers could write whatever sorts of comments they wanted to without the author being able to communicate with them. For obvious reasons, this lends itself to abuse where reviewers might not accept articles by people they did not know or like or tend to accept articles from people they did like 6 Even people who are trying to be fair can accidentally fall prey to bias when they know the names of the submitters.
Double blind peer review in theory takes away the ability for reviewers to abuse the system. A link that has been passed around library conference planning circles in the past few weeks is the JSConf EU 2012 which managed to improve its ratio of female presenters by going to a double-blind system. Double blind is the gold standard for peer review for many scholarly journals. Of course, it is not a perfect system either. It can be hard to obscure the identity of a researcher in a small field in which everyone is working on unique topics. It also is a much lengthier process with more steps involved in the review process. To this end, it is less than ideal for breaking medical or technology research that needs to be made public as soon as possible.
In open peer review, the reviewers and the authors are known to each other. By allowing for direct communication between reviewer and researcher, this speeds up the process of revisions and allows for greater clarity and speed 7. Open peer review doesn’t affect the quality of the reviews or the articles negatively, it does make it more difficult to find qualified reviewers to participate, and it might make a less well-known researcher more likely to accept the work of a senior colleague or well-known lab. 8.
Given the experience of JSConf and a great deal of anecdotal evidence from women in technical fields, it seems likely that open peer review is open to the same potential abuse of single peer review. While open peer review might make the rejected author able to challenge unfair rejections, this would require that the rejected author feels empowered enough in that community to speak up. Junior scholars who know they have been rejected by senior colleagues may not want to cause a scene that could affect future employment or publication opportunities. On the other hand, if they can get useful feedback directly from respected senior colleagues, that could make all the difference in crafting a stronger article and going forward with a research agenda. Therein lies the dilemma of open peer review.
Who pays for open access?
A related problem for junior scholars exists in open access funding models, at least in STEM publishing. As open access stands now, there are a few different models that are still being fleshed out. Green open access is free to the author and free to the reader; it is usually funded by grants, institutions, or scholarly societies. Gold open access is free to the end reader but has a publication fee charged to the author(s).
This situation is very confusing for researchers, since when they are confronted with a gold open access journal they will have to be sure the journal is legitimate (Jeffrey Beall has a list of Predatory Open Access journals to aid in this) as well as secure funding for publication. While there are many schemes in place for paying publication fees, there are no well-defined practices in place that illustrate long-term viability. Often this is accomplished by grants for the research, but not always. The UK government recently approved a report that suggests that issuing “block grants” to institutions to pay these fees would ultimately cost less due to reduced library subscription fees. As one article suggests, the practice of “block grants” or other funding strategies are likely to not be advantageous to junior scholars or those in more marginal fields 9. A large research grant for millions of dollars with the relatively small line item for publication fees for a well-known PI is one thing–what about the junior humanities scholar who has to scramble for a few thousand dollar research stipend? If an institution only gets so much money for publication fees, who gets the money?
By offering a $99 lifetime membership for the lowest level of publication, PeerJ offers hope to the junior scholar or graduate student to pursue projects on their own or with a few partners without worrying about how to pay for open access publication. Institutions could more readily afford to pay even $250 a year for highly productive researchers who were not doing peer review than the $1000+ publication fee for several articles a year. As above, some are skeptical that PeerJ can afford to publish at those rates, but if it is possible, that would help make open access more fair and equitable for everyone.
Open access with low-cost paid up front could be very advantageous to researchers and institutional bottom lines, but only if the quality of articles, peer reviews, and science is very good. It could provide a social model for publication that will take advantage of the web and the network effect for high quality reviewing and dissemination of information, but only if enough people participate. The network effect that made Wikipedia (for example) so successful relies on a high level of participation and engagement very early on to be successful [Davis]. A community has to build around the idea of PeerJ.
In almost the opposite method, but looking to achieve the same effect, this last week the Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3) announced that after years of negotiations they are set to convert publishing in that field to open access starting in 2014. 10 This means that researchers (and their labs) would not have to do anything special to publish open access and would do so by default in the twelve journals in which most particle physics articles are published. The fees for publication will be paid upfront by libraries and funding agencies.
So is it better to start a whole new platform, or to work within the existing system to create open access? If open (and through a commenting s system, ongoing) peer review makes for a lively and engaging network and low-cost open access makes publication cheaper, then PeerJ could accomplish something extraordinary in scholarly publishing. But until then, it is encouraging that organizations are working from both sides.
- Brantley, Peter. “Scholarly Publishing 2012: Meet PeerJ.” PublishersWeekly.com, June 12, 2012. http://www.publishersweekly.com/pw/by-topic/digital/content-and-e-books/article/52512-scholarly-publishing-2012-meet-peerj.html. ↩
- Davis, Phil. “PeerJ: Silicon Valley Culture Enters Academic Publishing.” The Scholarly Kitchen, June 14, 2012. http://scholarlykitchen.sspnet.org/2012/06/14/peerj-silicon-valley-culture-enters-academic-publishing/. ↩
- Hoyt, Jason. “What Does the ‘J’ in ‘PeerJ’ Stand For?” PeerJ Blog, August 22, 2012. http://blog.peerj.com/post/29956055704/what-does-the-j-in-peerj-stand-for. ↩
- http://scholarlykitchen.sspnet.org/2012/06/14/is-peerj-membership-publishing-sustainable/ ↩
- Brantley ↩
- Wennerås, Christine, and Agnes Wold. “Nepotism and sexism in peer-review.” Nature 387, no. 6631 (May 22, 1997): 341–3. ↩
- For an ingenious way of demonstrating this, see Leek, Jeffrey T., Margaret A. Taub, and Fernando J. Pineda. “Cooperation Between Referees and Authors Increases Peer Review Accuracy.” PLoS ONE 6, no. 11 (November 9, 2011): e26895. ↩
- Mainguy, Gaell, Mohammad R Motamedi, and Daniel Mietchen. “Peer Review—The Newcomers’ Perspective.” PLoS Biology 3, no. 9 (September 2005). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1201308/. ↩
- Crotty, David. “Are University Block Grants the Right Way to Fund Open Access Mandates?” The Scholarly Kitchen, September 13, 2012. http://scholarlykitchen.sspnet.org/2012/09/13/are-university-block-grants-the-right-way-to-fund-open-access-mandates/. ↩
- Van Noorden, Richard. “Open-access Deal for Particle Physics.” Nature 489, no. 7417 (September 24, 2012): 486–486. ↩