Collecting Data: How Much do We Really Need?

Many of us have had conversations in the past few weeks about data collection due to the reports about the NSA’s PRISM program, but ever since April and the bombings at the Boston Marathon, there has been an increased awareness of how much data is being collected about people in an attempt to track down suspects–or, increasingly, stop potential terrorist events before they happen. A recent Nova episode about the manhunt for the Boston bombers showed one such example of this at the New York Police Department. This program is called the Domain Awareness System at the New York Police Department, and consists of live footage from almost every surveillance camera in the New York City playing in one room, with the ability to search for features of individuals and even the ability to detect people acting suspiciously. Added to that a demonstration of cutting edge facial recognition software development at Carnegie Mellon University, and reality seems to be moving ever closer to science fiction movies.

Librarians focused on technical projects love to collect data and make decisions based on that data. We try hard to get data collection systems as close to real-time as possible, and work hard to make sure we are collecting as much data as possible and analyzing it as much as possible. The idea of a series of cameras to track in real-time exactly what our patrons are doing in the library in real-time might seem very tempting. But as librarians, we value the ability of our patrons to access information with as much privacy as possible–like all professions, we treat the interactions we have with our patrons (just as we would clients, patients, congregants, or sources) with care and discretion (See Item 3 of the Code of Ethics of the American Library Association). I will not address the national conversation about privacy versus security in this post–I want to address the issue of data collection right where most of us live on a daily basis inside analytics programs, spreadsheets, and server logs.

What kind of data do you collect?

Let’s start with an exercise. Write a list of all the statistical reports you are expected to provide your library–for most of us, it’s probably a very long list. Now, make a list of all the tools you use to collect the data for those statistics.

Here are a few potential examples:

Website visitors and user experience

  • Google Analytics or some other web analytics tool
  • Heat map tool
  • Server logs
  • Surveys

Electronic resource access reports

  • Electronic resources management application
  • Vendor reports (COUNTER and other)
  • Link resolver click-through report
  • Proxy server logs

The next step may require a little digging. For library created tools, do you have a privacy policy for this data? Has it gone through the Institutional Review Board? For third-party tools, is there a privacy policy? What are the terms or use or user license? (And how many people have ever read the entire terms of service?). We will return to this exercise in a moment.

How much is enough?

Think about with these tools what type of data you are collecting about your users. Some of it may be very private indeed. For instance, the heat map tool I’ve recently started using (Inspectlet) not only tracks clicks, but actually records sessions as patrons use the website. This is fascinating information–we had, for instance, one session that was a patron opening the library website, clicking the Facebook icon on the page, and coming back to the website nearly 7 hours later. It was fun to see that people really do visit the library’s Facebook page, but the question was immediately raised whether it was a visit from on campus. (It was–and wouldn’t have taken long to figure out if it was a staff machine and who was working that day and time). IP addresses from off campus are very easy to track, sometimes down to the block–again, easy enough to tie to an individual. We like to collect IP addresses for abusive or spamming behavior and block users based on IP address all the time. But what about in this case? During the screen recordings I can see exactly what the user types in the search boxes for the catalog and discovery system. Luckily, Inspectlet allows you to obscure the last two octets (which is legally required some places) of the IP address, so you can have less information collected. All similar tools should allow you the same ability.

Consider another case: proxy server logs. In the past when I did a lot of EZProxy troubleshooting, I found the logs extremely helpful in figuring out what went wrong when I got a report of trouble, particularly when it had occurred a day or two before. I could see the username, what time the user attempted to log in or succeeded in logging in, and which resources they accessed. Let’s say someone reported not being able to log in at midnight– I could check to see the failed logins at midnight, and then that username successfully logging in at 1:30 AM. That was a not infrequent occurrence, as usually people don’t think to write back and say they figured out what they did wrong! But I could also see everyone else’s logins and which articles they were reading, so I could tell (if I wanted) which grad students were keeping up with their readings or who was probably sharing their login with their friend or entire company. Where I currently work, we don’t keep the logs for more than a day, but I know a lot of people are out there holding on to EZProxy logs with the idea of doing “something” with them someday. Are you holding on to more than you really want to?

Let’s continue our exercise. Go through your list of tools, and make a list of all the potentially personally identifying information the tool collects, whether or not you use them. Are you surprised by anything? Make a plan to obscure unused pieces of data on a regular basis if it can’t be done automatically. Consider also what you can reasonably do with the data in your current job requirements, rather than future study possibilities. If you do think the data will be useful for a future study, make sure you are saving anonymized data sets unless it is absolutely necessary to have personally identifying information. In the latter case, you should clear your study in advance with your Institutional Review Board and follow a data management plan.

A privacy and data management policy should include at least these items:

  • A statement about what data you are collecting and why.
  • Where the data is stored and who has access to it.
  • A retention timeline.

F0r example, in the past I collected all virtual reference transaction logs for studying the effectiveness of a new set of virtual reference services. I knew I wanted at least a year’s worth of logs, and ideally three years to track changes over time. I was able to save the logs with anonymized IP addresses and once I had the data I needed I was able to delete the actual transcripts. The privacy policy described the process and where the data would be stored to ensure it was secure. In this case, I used the RUSA Guidelines for Implementing and Maintaining Virtual Reference Services as a guide to creating this policy. Read through the ALA Guidelines to Drafting a Library Privacy Policy for additional specific language and items you should include.

What we can do with data

In all this I don’t at all mean to imply that we shouldn’t be collecting this data. In both the examples I gave above, the data is extremely useful in improving the patron experience even while giving identifying details away. Not collecting data has trade-offs. For years, libraries have not retained a patron’s borrowing record to protect his or her privacy. But now patrons who want to have an online record of what they’ve borrowed from the library must use third-party services with (most likely) much less stringent privacy policies than libraries. By not keeping records of what users have checked out or read through databases, we are unable to provide them personalized automated suggestions about what to read next. Anyone who uses Amazon regularly knows that they will try to tempt you into purchases based on your past purchases or books you were reading the preview of–even if you would rather no one know that you were reading that book and certainly don’t want suggestions based on it popping up when you are doing a collection development project at work and are logged in on your personal account. In all the decisions we make about collecting or not collecting data, we have to consider trade-offs like these. Is the service so important that the benefits of collecting the data outweigh the risks? Or, is there another way to provide the service?

We can see some examples of this trade-off in two similar projects coming out of Harvard Library Labs. One, Library Hose, was a Twitter stream with the name of every book being checked out. The service ran for part of 2010, and has been suspended since September of 2010. In addition to daily tweet limits, this also was a potential privacy violation–even if it was a fun idea (this blog post has some discussion about it). A newer project takes the opposite approach–books that a patron thinks are “awesome” can be returned to the Awesome Box at the circulation desk and the information about the book is collected on the Awesome Box website. This is a great tweak to the earlier project, since this advertises material that’s now available rather than checked out, and people have to opt in by putting the item in the box.

In terms of personal recommendations, librarians have the advantage of being able to form close working relationships with faculty and students so they can make personal recommendations based on their knowledge of the person’s work and interests. But how to automate this without borrowing records? One example is a project that Ian Chan at California State University San Marcos has done to use student enrollment data to personalize the website based on a student’s field of study. (Slides). This provides a great deal of value for the students, who need to log in to check their course reserves and access articles from off campus anyway. This adds on top of that basic need a list of recommended resources for students, which they can choose to star as favorites.

Conclusion

In thinking about what type of data you collect, whether on purpose or accidentally, spend some time thinking about what is strictly necessary to accomplish the work that you need to do. If you don’t need a piece of data but can’t avoid collecting it (such as full IP addresses or usernames), make sure you have a privacy policy and retention schedule, and ensure that it is not accessible to more people than absolutely necessary.

Work to educate your patrons about privacy, particularly online privacy. ALA has a Choose Privacy Week, which is always the first week in May. The site for that has a number of resources you might want to consult in planning programming. Academic librarians may find it easiest to address college students in terms of their presence on social media when it comes to future job hunting, but this is just an opening to larger conversations about data. Make sure that when you ask patrons to use a third party service (such as a social network) or recommend a service (such as a book recommending site) that you make sure they are aware of what information they are sharing.

We all know that Google’s slogan is “Don’t be evil”, but it’s not always clear if they are sticking to that. Make sure that you are not being evil in your own data collection.


eBook Review – Cultivating Change in the Academy: 50+ Stories from the Digital Frontlines

Cultivating Change in the Academy: 50+ Stories from the Digital Frontlines

This is a review of the ebook Cultivating Change in the Academy: 50+ Stories from the Digital Frontlines and also of the larger project that collected the stories that became the content of the ebook. The project collects discussions about how technology can be used to improve student success. Fifty practical examples of successful projects are the result. Academic librarians will find the book to be a highly useful addition to our reference or professional development collections. The stories collected in the ebook are valuable examples of innovative pedagogy and administration and are useful resources to librarians and faculty looking for technological innovations in the classroom. Even more valuable than the collected examples may be the model used to collect and publish them. Cultivating Change, especially in its introduction and epilogue, offers a model for getting like minds together on our campuses and sharing experiences from a diversity of campus perspectives. The results of interdisciplinary cooperation around technology and success make for interesting reading, but we can also follow their model to create our own interdisciplinary collaborations at home on our campuses. More details about the ongoing project are available on their community site. The ebook is available as a blog with comments and also as an .epub, .mobi, or .pdf file from the University of Minnesota Digital Conservancy.

The Review

Cultivating Change in the Academy: 50+ Stories from the Digital Frontlines 1

The stories that make up the ebook have been peer reviewed and organized into chapters on the following topics: Changing Pedagogies (teaching using the affordances of today’s technology), Creating Solutions (technology applied to specific problems), Providing Direction (technology applied to leadership and administration), and Extending Reach (technology employed to reach expanded audiences.) The stories follow a semi-standard format that clearly lays out each project, including the problem addressed, methodology, results, and conclusions.

Section One: Changing Pedagogies

The opening chapter focuses on applications of academic technology in the classroom that specifically address issues of moving instruction from memorization to problem solving and interactive coaching. These efforts are often described by the term “digital pedagogy” (For an explanation of digital pedagogy, see Brian Croxall’s elegant definition.2) I’m often critical of digital pedagogy efforts because they can confuse priorities and focus on the digital at the expense of the pedagogy. The stories in this section do not make this mistake and correctly focus on harnessing the affordances of technology (the things we can do now that were not previously possible) to achieve student-success and foster learning.

One particularly impressive story, Web-Based Problem-Solving Coaches for Physics Studentsexplained how a physics course used digital tools to enable more detailed feedback to student work using the cognitive apprenticeship model. This solution encouraged the development of problem-solving skills and has to potential to scale better than classical lecture/lab course structures.

Section Two: Creating Solutions

This section focuses on using digital technology to present content to students outside of the classroom. Technology is extending the reach of the University beyond the limits of our campus spaces, this section address how innovations can make distance education more effective. A common theme here is the concept of the flipped classroom. (See Salmam Khan’s TED talk for a good description of flipping the classroom. 3) In a flipped classroom the traditional structure of content being presented to students in lectures during class time and creative work being assigned as homework is flipped.  Content is presented outside the classroom and instructors lead students in creative projects during class time. Solutions listed in this section include podcasts, video podcasts, and screencasts. They also address synchronous and asynchronous methods of distance education and some theoretical approaches for instructors to employ as they transition from primarily face to face instruction to more blended instruction environments.

Of special note is the story Creating Productive Presence: A Narrative in which the instructor assesses the steps taken to provide a distance cohort with the appropriate levels of instructor intervention and student freedom. In face-to-face instruction, students have body-language and other non-verbal cues to read on the instructor. Distance students, without these familiar cues, experienced anxiety in a text-only communication environment. Using delegates from student group projects and focus groups, the instructor was able to find an appropriate classroom presence balanced between cold distance and micro-management of the group projects.

Section Three: Providing Direction

The focus of this section is on innovative new tools for administration and leadership and how administration can provide leadership and support for the embrace of disruptive technologies on campus. The stories here tie the overall effort to use technology to advance student success to accreditation, often a necessary step to motivate any campus to make uncomfortable changes. Data archives, the institutional repository, clickers (class polling systems), and project management tools fall under this general category.

The University Digital Conservancy: A Platform to Publish, Share, and Preserve the University’s Scholarship is of particular interest to librarians. Written by three UM librarians, it makes a case for institutional repositories, explains their implementation, discusses tracking article-level impacts, and most importantly includes some highly useful models for assessing institutional repository impact and use.

Section Four: Extending Reach

The final section discusses ways technology can enable the university to reach wider audiences. Examples include moving courseware content to mobile platforms, using SMS messaging to gather research data, and using mobile devices to scale the collection of oral histories. Digital objects scale in ways that physical objects cannot and these projects take advantage of this scale to expand the reach of the university.

Not to be missed in this section is R U Up 4 it? Collecting Data via Texting: Developing and Testing of the Youth Ecological Momentary Assessment System (YEMAS). R U Up 4 it? is the story of using SMS (texting) to gather real-time survey data from teen populations.

Propagating the Meme

The stories and practical experiences recorded in Cultivating Change in the Academy are valuable in their own right. It is a great resource for ideas and shared experience for anyone looking for creative ways to leverage technology to achieve educational goals. For this reader though, the real value of this project is the format used to create it. The book is full of valuable and interesting content. However, in the digital world, content isn’t king. As Corey Doctorow tells us:

Content isn’t king. If I sent you to a desert island and gave you the choice of taking your friends or your movies, you’d choose your friends — if you chose the movies, we’d call you a sociopath. Conversation is king. Content is just something to talk about.[2. http://boingboing.net/2006/10/10/disney-exec-piracy-i.html]

The process the University of Minnesota followed to generate conversation around technology and student success is detailed in a white paper. 4 After reading some of the stories in Cultivating Change, if you find yourself wishing similar conversations could take place on your campus, this is the road-map the University of Minnesota followed. Before they were able to publish their stories, the University of Minnesota had to bring together their faculty, staff, and administration to talk about employing innovative technological solutions to the project of increasing student success. In a time when conversation trumps content, a successful model for creating these kinds of conversations on our own campuses will also trump the written record of other’s conversations.

 

  1. Hill Duin, A. et al (eds) (2012) Cultivating Change in the Academy: 50+ Stories from the Digital Frontlines at the University of Minnesota in 2012, An Open-Source eBook. University of Minnesota. Creative Commons BY NC SA. http://digital-rights.net/wp-content/uploads/books/CC50_UMN_ebook.pdf
  2. http://www.briancroxall.net/digitalpedagogy/what-is-digital-pedagogy/
  3. http://www.ted.com/talks/salman_khan_let_s_use_video_to_reinvent_education.html
  4. http://bit.ly/Rj5AIR

What exactly does a fiber-based gigabit speed library app look like?

Mozilla and the National Science Foundation are sponsoring an open round of submissions for developers/app designers to create fiber-based gigabit apps. The detailed contest information is available over at Mozilla Ignite (https://mozillaignite.org/about/). Cash prizes to fund promising start up ideas are being award for a total of $500,000 over three rounds of submissions. Note: this is just the start, and these are seed projects to garner interest and momentum in the area. A recent hackathon in Chattanooga lists out what some coders are envisioning for this space:  http://colab.is/2012/hackers-think-big-at-hackanooga/

The video for the Mozilla Ignite Challenge 2012 on Vimeo is slightly helpful for examples of gigabit speed affordances.

If you’re still puzzled after the video, you are not alone. One of the reasons for the contest is that network designers are not quite sure what immense levels of processing in the network and next generation transfer speeds will really mean.

Consider that best case transfer speeds on a network are somewhere along the lines of 10 megabits per second. There are of course variances of this speed across your home line (it may hover closer to 5 mb/s), but this is pretty much the standard that average subscribers can expect. A gigabit speed rate transfers data at 100 times that speed, 1,000 megabits per second. When a whole community is able to achieve 1,000 megabits upstream and downstream, you basically have no need for things like “streaming” video – the data pipes are that massive.

One theory is that gigabit apps could provide public benefit, solve societal issues and usher in the next generation of the Internet. Think of gigabit speed as the difference between getting water (Internet) through a straw, and getting water (Internet) through a fire-hose. The practicality of this contest is to seed startups with ideas that will in some way impact healthcare (realtime health monitoring), the environment and energy challenges. The local Champaign Urbana municipal gigabit speed fiber cause is noble, as it will provide those in areas without access to broadband an awesome pipeline to the Internet. It is a intergovernmental partnership that aims to serve municipal needs, as well as pave the way for research and industry start-ups.

Here are some attributes that Mozilla Ignite Challenge lists as the possible affordances of fiber based gigabit speed apps:

  • Speed
  • Video
  • Big Data
  • Programmable networks

As I read about the Mozilla Ignite open challenge, I wondered about the possibilities for libraries and as a thought experiment I list out here some ideas for library services that live on gigabit speed networks:

* Consider the video data you could provide access to – in libraries that are stewarding any kind of video gigabit speeds would allow you to provide in-library viewing that has few bottlenecks. A fiber-based gigabit speed video viewing app of all library video content available at once. Think about viewing every video in your collection simultaneously. You could have them playing to multiple clusters (grid videos) in the library at multiple stations. Without streaming.

* Consider Sensors and sensor arrays and fiber. One idea that is promulgated for the use of fiber-based gigabit speed networks are the affordances to monitor large amounts of data in real time. The sensor networks that could be installed around the library facility could help to throttle energy consumption in real time, making the building more energy efficient and less costly to maintain. Such a facilities based app would impact savings on the facilities budgets.

* Consider collaborations among libraries with fiber affordances. Libraries that are linked by fiber-based gigabit speeds would be able to transfer large amounts of data in fractions of what it takes now. There are implications here for data curation infrastructure (http://smartech.gatech.edu/handle/1853/28513).

Another way to approach this problem is by asking: “What problem does gigabit speed networking solve?” One of the problems with the current web is the de facto  latency. Your browser needs to request a page from a server which is then sent back to your client. We’ve gotten so accustomed to this latency that we expect it, but what if pages didn’t have to be requested? What if servers didn’t have to send pages? What if a zero latency web meant we needed a new architecture to take advantage of data possibilities?

Is your library poised to take advantage of increased data transfer? What apps do you want to get funding for?

 


PeerJ: Could it Transform Open Access Publishing?

Open access publication makes access to research free for the end reader, but in many fields it is not free for the author of the article. When I told a friend in a scientific field I was working on this article, he replied “Open access is something you can only do if you have a grant.” PeerJ, a scholarly publishing venture that started up over the summer, aims to change this and make open access publication much easier for everyone involved.

While the first publication isn’t expected until December, in this post I want to examine in greater detail the variation on the “gold” open-access business model that PeerJ states will make it financially viable 1, and the open peer review that will drive it. Both of these models are still very new in the world of scholarly publishing, and require new mindsets for everyone involved. Because PeerJ comes out of funding and leadership from Silicon Valley, it can more easily break from traditional scholarly publishing and experiment with innovative practices. 2

PeerJ Basics

PeerJ is a platform that will host a scholarly journal called PeerJ and a pre-print server (similar to arXiv) that will publish biological and medical scientific research. Its founders are Peter Binfield (formerly of PLoS ONE) and Jason Hoyt (formerly of Mendeley), both of whom are familiar with disruptive models in academic publishing. While the “J” in the title stands for Journal, Jason Hoyt explains on the PeerJ blog that while the journal as such is no longer a necessary model for publication, we still hold on to it. “The journal is dead, but it’s nice to hold on to it for a little while.” 3. The project launched in June of this year, and while no major updates have been posted yet on the PeerJ website, they seem to be moving towards their goal of publishing in late 2012.

To submit a paper for consideration in PeerJ, authors must buy a “lifetime membership” starting at $99. (You can submit a paper without paying, but it costs more in the end to publish it). This would allow the author to publish one paper in the journal a year. The lifetime membership is only valid as long as you meet certain participation requirements, which at minimum is reviewing at least one article a year. Reviewing in this case can mean as little as posting a comment to a published article. Without that, the author might have to pay the $99 fee again (though as yet it is of course unclear how strictly PeerJ will enforce this rule). The idea behind this is to “incentivize” community participation, a practice that has met with limited success in other arenas. Each author on a paper, up to 12 authors, must pay the fee before the article can be published. The Scholarly Kitchen blog did some math and determined that for most lab setups, publication fees would come to about $1,124 4, which is equivalent to other similar open access journals. Of course, some of those researchers wouldn’t have to pay the fee again; for others, it might have to be paid again if they are unable to review other articles.

Peer Review: Should it be open?

PeerJ, as the name and the lifetime membership model imply, will certainly be peer-reviewed. But, keeping with its innovative practices, it will use open peer review, a relatively new model. Peter Binfield explained in this interview PeerJ’s thinking behind open peer review.

…we believe in open peer review. That means, first, reviewer names are revealed to authors, and second, that the history of the peer review process is made public upon publication. However, we are also aware that this is a new concept. Therefore, we are initially going to encourage, but not require, open peer review. Specifically, we will be adopting a policy similar to The EMBO Journal: reviewers will be permitted to reveal their identities to authors, and authors will be given the choice of placing the peer review and revision history online when they are published. In the case of EMBO, the uptake by authors for this latter aspect has been greater than 90%, so we expect it to be well received. 5

In single blind peer review, the reviewers would know the name of the author(s) of the article, but the author would not know who reviewed the article. The reviewers could write whatever sorts of comments they wanted to without the author being able to communicate with them. For obvious reasons, this lends itself to abuse where reviewers might not accept articles by people they did not know or like or tend to accept articles from people they did like 6 Even people who are trying to be fair can accidentally fall prey to bias when they know the names of the submitters.

Double blind peer review in theory takes away the ability for reviewers to abuse the system. A link that has been passed around library conference planning circles in the past few weeks is the JSConf EU 2012 which managed to improve its ratio of female presenters by going to a double-blind system. Double blind is the gold standard for peer review for many scholarly journals. Of course, it is not a perfect system either. It can be hard to obscure the identity of a researcher in a small field in which everyone is working on unique topics. It also is a much lengthier process with more steps involved in the review process.  To this end, it is less than ideal for breaking medical or technology research that needs to be made public as soon as possible.

In open peer review, the reviewers and the authors are known to each other. By allowing for direct communication between reviewer and researcher, this speeds up the process of revisions and allows for greater clarity and speed 7.  Open peer review doesn’t affect the quality of the reviews or the articles negatively, it does make it more difficult to find qualified reviewers to participate, and it might make a less well-known researcher more likely to accept the work of a senior colleague or well-known lab.  8.

Given the experience of JSConf and a great deal of anecdotal evidence from women in technical fields, it seems likely that open peer review is open to the same potential abuse of single peer review. While  open peer review might make the rejected author able to challenge unfair rejections, this would require that the rejected author feels empowered enough in that community to speak up. Junior scholars who know they have been rejected by senior colleagues may not want to cause a scene that could affect future employment or publication opportunities. On the other hand, if they can get useful feedback directly from respected senior colleagues, that could make all the difference in crafting a stronger article and going forward with a research agenda. Therein lies the dilemma of open peer review.

Who pays for open access?

A related problem for junior scholars exists in open access funding models, at least in STEM publishing. As open access stands now, there are a few different models that are still being fleshed out. Green open access is free to the author and free to the reader; it is usually funded by grants, institutions, or scholarly societies. Gold open access is free to the end reader but has a publication fee charged to the author(s).

This situation is very confusing for researchers, since when they are confronted with a gold open access journal they will have to be sure the journal is legitimate (Jeffrey Beall has a list of Predatory Open Access journals to aid in this) as well as secure funding for publication. While there are many schemes in place for paying publication fees, there are no well-defined practices in place that illustrate long-term viability. Often this is accomplished by grants for the research, but not always. The UK government recently approved a report that suggests that issuing “block grants” to institutions to pay these fees would ultimately cost less due to reduced library subscription fees.  As one article suggests, the practice of “block grants” or other funding strategies are likely to not be advantageous to junior scholars or those in more marginal fields 9. A large research grant for millions of dollars with the relatively small line item for publication fees for a well-known PI is one thing–what about the junior humanities scholar who has to scramble for a few thousand dollar research stipend? If an institution only gets so much money for publication fees, who gets the money?

By offering a $99 lifetime membership for the lowest level of publication, PeerJ offers hope to the junior scholar or graduate student to pursue projects on their own or with a few partners without worrying about how to pay for open access publication. Institutions could more readily afford to pay even $250 a year for highly productive researchers who were not doing peer review than the $1000+ publication fee for several articles a year. As above, some are skeptical that PeerJ can afford to publish at those rates, but if it is possible, that would help make open access more fair and equitable for everyone.

Conclusion

Open access with low-cost paid up front could be very advantageous to researchers and institutional  bottom lines, but only if the quality of articles, peer reviews, and science is very good. It could provide a social model for publication that will take advantage of the web and the network effect for high quality reviewing and dissemination of information, but only if enough people participate. The network effect that made Wikipedia (for example) so successful relies on a high level of participation and engagement very early on to be successful [Davis]. A community has to build around the idea of PeerJ.

In almost the opposite method, but looking to achieve the same effect, this last week the Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3) announced that after years of negotiations they are set to convert publishing in that field to open access starting in 2014. 10 This means that researchers (and their labs) would not have to do anything special to publish open access and would do so by default in the twelve journals in which most particle physics articles are published. The fees for publication will be paid upfront by libraries and funding agencies.

So is it better to start a whole new platform, or to work within the existing system to create open access? If open (and through a commenting s system, ongoing) peer review makes for a lively and engaging network and low-cost open access  makes publication cheaper, then PeerJ could accomplish something extraordinary in scholarly publishing. But until then, it is encouraging that organizations are working from both sides.

  1. Brantley, Peter. “Scholarly Publishing 2012: Meet PeerJ.” PublishersWeekly.com, June 12, 2012. http://www.publishersweekly.com/pw/by-topic/digital/content-and-e-books/article/52512-scholarly-publishing-2012-meet-peerj.html.
  2. Davis, Phil. “PeerJ: Silicon Valley Culture Enters Academic Publishing.” The Scholarly Kitchen, June 14, 2012. http://scholarlykitchen.sspnet.org/2012/06/14/peerj-silicon-valley-culture-enters-academic-publishing/.
  3. Hoyt, Jason. “What Does the ‘J’ in ‘PeerJ’ Stand For?” PeerJ Blog, August 22, 2012. http://blog.peerj.com/post/29956055704/what-does-the-j-in-peerj-stand-for.
  4. http://scholarlykitchen.sspnet.org/2012/06/14/is-peerj-membership-publishing-sustainable/
  5. Brantley
  6. Wennerås, Christine, and Agnes Wold. “Nepotism and sexism in peer-review.” Nature 387, no. 6631 (May 22, 1997): 341–3.
  7. For an ingenious way of demonstrating this, see Leek, Jeffrey T., Margaret A. Taub, and Fernando J. Pineda. “Cooperation Between Referees and Authors Increases Peer Review Accuracy.” PLoS ONE 6, no. 11 (November 9, 2011): e26895.
  8. Mainguy, Gaell, Mohammad R Motamedi, and Daniel Mietchen. “Peer Review—The Newcomers’ Perspective.” PLoS Biology 3, no. 9 (September 2005). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1201308/.
  9. Crotty, David. “Are University Block Grants the Right Way to Fund Open Access Mandates?” The Scholarly Kitchen, September 13, 2012. http://scholarlykitchen.sspnet.org/2012/09/13/are-university-block-grants-the-right-way-to-fund-open-access-mandates/.
  10. Van Noorden, Richard. “Open-access Deal for Particle Physics.” Nature 489, no. 7417 (September 24, 2012): 486–486.

Action Analytics

What is Action Analytics?

If you say “analytics” to most technology-savvy librarians, they think of Google Analytics or similar web analytics services. Many libraries are using such sophisticated data collection and analyses to improve the user experience on library-controlled sites.  But the standard library analytics are retrospective: what have users done in the past? Have we designed our web platforms and pages successfully, and where do we need to change them?

Technology is enabling a different kind of future-oriented analytics. Action Analytics is evidence-based, combines data sets from different silos, and uses actions, performance, and data from the past to provide recommendations and actionable intelligence meant to influence future actions at both the institutional and the individual level. We’re familiar with these services in library-like contexts such as Amazon’s “customers who bought this item also bought” book recommendations and Netflix’s “other movies you might enjoy”.

BookSeer β by Apt

Action Analytics in the Academic Library Landscape

It was a presentation by Mark David Milliron at Educause 2011 on “Analytics Today: Getting Smarter About Emerging Technology, Diverse Students, and the Completion Challenge” that made me think about the possibilities of the interventionist aspect of analytics for libraries.  He described the complex dependencies between inter-generational poverty transmission, education as a disrupter, drop-out rates for first generation college students, and other factors such international competition and the job market.  Then he moved on to the role of sophisticated analytics and data platforms and spoke about how it can help individual students succeed by using technology to deliver the right resource at the right time to the right student.  Where do these sorts of analytics fit into the academic library landscape?

If your library is like my library, the pressure to prove your value to strategic campus initiatives such student success and retention is increasing. But assessing services with most analytics is past-oriented; how do we add the kind of library analytics that provide a useful intervention or recommendation? These analytics could be designed to help an individual student choose a database, or trigger a recommendation to dive deeper into reference services like chat reference or individual appointments. We need to design platforms and technology that can integrate data from various campus sources, do some predictive modeling, and deliver a timely text message to an English 101 student that recommends using these databases for the first writing assignment, or suggests an individual research appointment with the appropriate subject specialist (and a link to the appointment scheduler) to every honors students a month into their thesis year.

Ethyl Blum, librarian

Privacy Implications

But should we? Are these sorts of interventions creepy and stalker-ish?* Would this be seen as an invasion of privacy? Does the use of data in this way collide with the profession’s ethical obligation and historical commitment to keep individual patron’s reading, browsing, or viewing habits private?

Every librarian I’ve discussed this with felt the same unease. I’m left with a series of questions: Have technology and online data gathering changed the context and meaning of privacy in such fundamental ways that we need to take a long hard look at our assumptions, especially in the academic environment? (Short answer — yes.)  Are there ways to manage opt-in and opt-out preferences for these sorts of services so these services are only offered to those who want them? And does that miss the point? Aren’t we trying to influence the students who are unaware of library services and how the library could help them succeed?

Furthermore, are we modeling our ideas of “creepiness” and our adamant rejection of any “intervention” on the face-to-face model of the past that involved a feeling of personal surveillance and possible social judgment by live flesh persons?  The phone app Mobilyze helps those with clinical depression avoid known triggers by suggesting preventative measures. The software is highly personalized and combines all kinds of data collected by the phone with self-reported mood diaries. Researcher Colin Depp observes that participants felt that the impersonal advice delivered via technology was easier to act on than “say, getting advice from their mother.”**

While I am not suggesting in any way that libraries move away from face-to-face, personalized encounters at public service desks, is there room for another model for delivering assistance? A model that some students might find less intrusive, less invasive, and more effective — precisely because it is technological and impersonal? And given the struggle that some students have to succeed in school, and the staggering debt that most of them incur, where exactly are our moral imperatives in delivering academic services in an increasingly personalized, technology-infused, data-dependent environment?

Increasingly, health services, commercial entities, and technologies such as browsers and social networking environments that are deeply embedded in most people’s lives, use these sorts of action analytics to allow the remote monitoring of our aging parents, sell us things, and match us with potential dates. Some of these uses are for the benefit of the user; some are for the benefit of the data gatherer. The moment from the Milliron presentation that really stayed with me was the poignant question that a student in a focus group asked him: “Can you use information about me…to help me?”

Can we? What do you think?

* For a recent article on academic libraries and Facebook that addresses some of these issues, see Nancy Kim Phillips, Academic Library Use of Facebook: Building Relationships with Students, The Journal of Academic Librarianship, Volume 37, Issue 6, December 2011, Pages 512-522, ISSN 0099-1333, 10.1016/j.acalib.2011.07.008. See also a very recent New York Times article on use of analytics by companies which discusses the creepiness factor.

 


The Start-Up Library

“Here’s an analogy. The invention of calculus was shocking because for a long time it had simply been presumed that you couldn’t divide by zero. The integrity of math itself seemed to depend on the presumption. Then some genius titans came along and said, “Yeah, maybe you can’t divide by zero, but what would happen if you “could”? We’re going to come as close to doing it as we can, to see what happens.” - David Foster Wallace*

What if a library operated more like an Internet start-up and less like a library?

To be a library in the digital era is to steward legacy systems and practices of an era long past. Contemporary librarianship is at its worst when it accepts the poorly crafted vended services and offers poorly thought through service models, simply because this is the way we have always operated.

Internet start-ups, in the decade of 2010, heavily feature software as a service. The online presence to the Internet start-up is of foundational concern since it isn’t simply a “presence” to the start-up — the online environment is the only environment for the Internet start-up.

Search services would act and look contemporary

If we were an Internet start-up, we wouldn’t use instructional services as a crutch that would somehow correct poor design in our catalogs or other discovery layers. We wouldn’t accept the poorly designed vendor databases we currently accept. We would ask for interfaces that act and look contemporary, and if vendors did not deliver, we would make our own. And we would do this in 30-day time-lines, not six months and not years to roll out, as is the current lamentable state of library software services.

Students in the current era will look at a traditional library catalog search box and say: “that looks very 90s” – we shouldn’t be amused by that comment, unless of course we are trying to look 20 years out of date.

We would embrace perpetual beta.

If the library thought of its software services more like Internet start-ups, we would not be so cautious — we would perpetually improve and innovate in our software offerings. Think of the technology giants Google and Apple, they are never content to rest on laurels, everyday they get up and they invent like their lives depended on it. Do we?

We wouldn’t settle.

For years we’ve accepted legacy ILS systems – we need to move away from accepting the status quo, the way things have always been done, and the way we always work is not the way we should always work — if the information environments have changed, shouldn’t this be reflected in the library’s software services?

We would be bold.

We need to look at massive re-wiring in the way we think about software as a service in libraries; we are smarter and better than mediocrity.

The notion of software services in libraries may be dramatically improved if we thought of our gateways and virtual experiences more like Internet start-ups conceptualize their do or die services; which are seemingly made more effective and efficient every thirty to sixty days.

If Internet start-ups ran their web services the way libraries contently run legacy systems, the company would surely fold, or more likely, never have attracted seed funding to start operating as a start-up. Let’s do our profession a favor and turn the lights out on the library way of running libraries. Let’s run our library as if it were an Internet start-up.

___

* also: “… this purely theoretical construct wound up yielding incredibly practical results. Suddenly you could plot the area under curves and do rate-change calculations. Just about every material convenience we now enjoy is a consequence of this “as if.” But what if Leibniz and Newton had wanted to divide by zero only to show jaded audiences how cool and rebellious they were? It’d never have happened, because that kind of motivation doesn’t yield results. It’s hollow. Dividing-as-if-by-zero was titanic and ingenuous because it was in the service of something. The math world’s shock was a price they had to pay, not a payoff in itself.” – David Foster Wallace