Reflections on Code4Lib 2013

Disclaimer: I was on the planning committee for Code4Lib 2013, but this is my own opinion and does not reflect other organizers of the conference.

We have mentioned Code4Lib before on this blog, but for those who are unfamiliar, it is a loose collective of programmers working in libraries, librarians, and others interested in code and libraries. (You can read more about it on the website.) The Code4Lib conference has emerged as a venue to share very new technology and have discussions with a wide variety of people who might not attend conferences more geared to librarians. Presentations at the conference are decided by the votes of anyone interested in selecting the program, and additionally lightning talks and breakout sessions allow wide participation and exposure to extremely new projects that have not made it into the literature or to conferences with a longer lead time. The Code4Lib 2013 conference ran February 11-14 at University of Illinois Chicago. You can see a list of all programs here, which includes links to the video archive of the conference.

While there were many types of projects presented, I want to focus on those talks which illustrated what I saw as thread running through the conference–care and emotion. This is perhaps unexpected for a technical conference. Yet those themes underlie a great deal of the work that takes place in academic library technology and the types of projects presented at Code4Lib. We tend to work in academic libraries because we care about the collections and the people using those collections. That intrinsic motivation focuses our work.

Caring about the best way to display collections is central to successful projects. Most (though not all) the presenters and topics came out of academic libraries, and many of the presentations dealt with creating platforms for library and archival metadata and collections. To highlight a few: Penn State University has developed their own institutional repository application called ScholarSphere that provides a better user experience for researchers and managers of the repository. The libraries and archives of the Rock and Roll Hall of Fame dealt with the increasingly common problem of wanting to present digital content alongside more traditional finding aids, and so developed a system for doing so. Corey Harper from New York University presented an extremely interesting and still experimental project to use linked data to enrich interfaces for interacting with library collections. Note that all these projects combined various pieces of open source software and library/web standards to create solutions that solve a problem facing academic or research libraries for a particular setting. I think an important lesson for most academic librarians looking at descriptions of projects like this is that it takes more than development staff to make projects like this. It takes purpose, vision, and dedication to collecting and preserving content–in other words, emotion and care. A great example of this was the presentation about DIYHistory from the University of Iowa. This project started out initially as an extremely low-tech solution for crowdsourcing archival transcription, but got so popular that it required a more robust solution. They were able to adapt open source tools to meet their needs, still keeping the project very within the means of most libraries (the code is here).

Another view of emotion and care came from Mark Matienzo, who did a lightning talk (his blog post gives a longer version with more details). His talk discussed the difficulties of acknowledging and dealing with the emotional content of archives, even though emotion drives interactions with materials and collections. The records provided are emotionless and affectless, despite the fact that they represent important moments in history and lives. The type of sharing of what someone “likes” on Facebook does not satisfactorily answer the question of what they care about,or represent the emotion in their lives. Mark suggested that a tool like Twine, which allows writing interactive stories could approach the difficult question of bringing together the real with the emotional narrative that makes up experience.

One of the ways we express care for our work and for our colleagues is by taking time to be organized and consistent in code. Naomi Dushay of Stanford University Library presented best practices for code handoffs, which described some excellent practices for documenting and clarifying code and processes. One of the major takeaways is that being clear, concise, and straightforward is always preferable, even as much as we want to create cute names for our servers and classes. To preserve a spirit of fun, you can use the cute name and attach a description of what the item actually does.

Originally Bess Sadler, also from Stanford, was going to present with Naomi, but ended up presenting a different talk and the last one of the conference on Creating a Commons (the full text is available here). This was a very moving look at what motivates her to create open source software and how to create better open source software projects. She used the framework of the Creative Commons licenses to discuss open source software–that it needs to be “[m]achine readable, human readable, and lawyer readable.” Machine readable means that code needs to be properly structured and allow for contributions from multiple people without breaking, lawyer readable means that the project should have the correct structure and licensing to collaborate across institutions. Bess focused particularly on the “human readable” aspect of creating communities and understanding the “hacker epistemology,” as she so eloquently put it, “[t]he truth is what works.” Part of understanding that requires being willing to reshape default expectations–for instance, the Code4Lib community developed a Code of Conduct at Bess’s urging to underline the fact that the community aims at inclusion and creating a safe space. She encouraged everyone to keep working to do better and “file bug reports” about open source communities.

This year’s Code4Lib conference was a reminder to me about why I do the work I do as an academic librarian working in a technical role. Even though I may spend a lot of time sitting in front of a computer looking at code, or workflows, or processes, I know it makes access to the collections and exploration of those collections better.


The Setup: What We Use at ACRL TechConnect

Inspired by “The Setup” a few of us at Tech Connect have decided to share some of our favorite tools and techniques with you. What software, equipment, or time/stress management tools do you love? Leave us a note in the comments.

Eric – Homebrew package manager for OS X

I love Macs. I love their hardware, their operating system, even some of their apps like Garage Band. But there are certain headaches that Mac OS X comes with. While OS X exposes its inner workings via UNIX command line, it doesn’t provide a package manager like the apt of many Linux distros to install and update software.
Enter Homebrew, a lifesaver that’s helped me to up my game on the command line without much ancillary pain. Homebrew helps you find (“brew search php“), install (“brew install phantomjs“), and update (“brew upgrade git“) software from a central repository. I currently have 36 packages installed, among them utilities that Apple neglected to include like wget, programming tools like Node.js, and brilliant timesavers like z, a bookmarking system for the command line. Installing a lot of these tools can be tougher than using them, requiring permissions tweaks and enigmatic incantations. Homebrew makes installation easy and checking thirty-six separate websites for available updates becomes unnecessary.
As a bonus, some Homebrew commands now produce unicode beer mugs.

Updated Homebrew from bad98b12 to 150b5f96.
==> Updated Formulae
autojump berkeley-db gtk+ imagemagick libxml2
==> Upgrading 1 outdated package, with result:
libxml2 2.9.0
==> Upgrading libxml2
==> Downloading ftp://xmlsoft.org/libxml2/libxml2-2.9.0.tar.gz
####################################### 100.0%
==> Patching
patching file threads.c
patching file xpath.c
==> ./configure --prefix=/usr/local/Cellar/libxml2/2.9.0 --without-python
==> make
==> make install
==> Caveats
This formula is keg-only: so it was not symlinked into /usr/local.
==> Summary
beer mug usr/local/Cellar/libxml2/2.9.0: 273 files, 11M, built in 94 seconds

[Note: simulation, not verbatim output]

Magic! And a shameless plug: Homebrew has a Kickstarter currently to help them with some automated tests, so if you use Homebrew consider a donation.

Margaret – Pomodoro Technique/using time wisely

Everyone works differently and has more effective times of day to complete certain types of work. Some people have to start writing first thing in the mornings, others can’t do much of anything that early. For me personally I find late afternoon the most effective time to work on code or technical work—but late afternoon is a time very prone to being distractible. So many funny things have been posted on the internet, and my RSS reader is all full up again. The Pomodoro technique (as well as similar systems) is a promise to yourself that if you just work hard on something for a relatively short amount of time that you will finish it, and then can have a guilt-free break.

Read the website for the full description of how to use this technique and some tools, but here’s the basic idea. You list the tasks you need to do, and then pick a task to work on for 25 minutes. Then you set a timer and start work. After the timer goes off, you get a 5 minute break to do whatever you want, and then after a few Pomodoros you take a longer break. The timer ideally should have a visual component so that you know how much time you have left and remember to stay on task. My personal favorite is focus booster. This is what mine looks like right now:

Pomodoro status bar

Note that the line changes color as I get closer to the end. It will become blue and count down my break when that starts. Another one I like a lot, especially when I am not at my own computer is e.ggtimer.com. This is a simple display, and you can bookmark http://e.ggtimer.com/pomodoro to get a Pomodoro started.

I can’t do Pomodoros all day—as a librarian, I need to be available to work with others at certain times—that’s not an interruption, that’s my job. Other times I really need to focus and can’t. This is the best technique to get started—and sometimes once I am started I get so focused on the project that I don’t even notice I am supposed to be on a break.

Jim – Tomcat Server with Jersey servlet: a customizable middleware/API system

The Tomcat/Jersey stack is the backbone of the library’s technology prototyping initiative. With this tool, our staff of research programmers and student programmers can take any webpage/database source and turn it into an API that could then feed into a mobile app, a data feed in a website, or a widget in some other emerging technology. While using and leveraging the Tomcat/Jersey stack does require some Java background, it can be learned in a couple weeks by anyone who has some scripting and server experience. The hardest thing to this whole pipeline is finding enough time to keep cranking out the library APIs — one that I got running over the winter holiday is a feed of group rooms that are available to be checked out/scheduled within the next hour at the library.

The data feed sends back a JSON array of available rooms, like this (abbreviated):

[{"roomName":"Collaboration Room 02 - Undergraduate Library",

"startTime":"10:00 AM",

"endTime":"11:00 AM",

"date":"1/27/2013"}, …
Bohyun – Get into the mood for concentration and focus

I am one of those people who are easily excited by new happenings around me. I am also one of those people who often would do anything but the thing that I must be doing. That is, I am prone to distraction and procrastination. My work often requires focus and concentration but I have an extremely hard time getting into the right mood.
there are no limits to what you can accomplish when you are supposed to be doing something else
The two tools that I found help me quite a bit are (a) Scribblet and (b) Rainy Mood. Scribblet (http://scribblet.org/) is a simple Javascript bookmarklet that lets you literally scribble on your web browser. If you tend to read more efficiently while annotating, this simple tool will help you a great deal with online reading. Rainy Mood (http://www.rainymood.com/) is a website that displays the window of any rainy day with even the sound of thunder sprinkled in. I tend to get much calmer on a rainy day which can do wonders for my writing and other projects that require a calm and focused state of mind. This tool instantly makes me have a rainy day regardless of the weather.
rainy mood websitescribblet website

Meghan – Evernote

Evernote is not a terribly technical tool, but it is one I love and constantly use.  It provides the ability for you to take notes, clip items from the web, attach files to notes, organize into notebooks, share notebooks (or keep them private) and search existing notes.  It is available to download for desktops but I use the web version primarily, along with the web clipper and the Android app on my phone.  Everything syncs together, so it is easy to locate notes from any location.  Here are three examples of how it fits into my daily life:

An enormous pile of classified bookmarks: I am currently trying to get up to speed on Drupal development as well as looking at examples of online image collections and brainstorming for my next TechConnect blog entry.  The web clipper allows me to save things into specific piles by using notebooks and then add tags for classification and easier searching.  For example, I can classify an issue description or resolution in the my web development reference notebook, but tag it with the name for our site which is affected by the issue. This is especially useful when I know I have to change tasks and am likely to navigate away from my tabs in the browser.  When I return to the task in a day or so, I can search for the helpful pages I saved.  Classifying in notebooks is also good to build a list of sources that I consult every time I do a certain task, like building a server.

Evernote library

Course and conference notes: Using the web or phone version, I can type notes during a lecture or conference session.  I can also attach a pdf of the slides from a presentation for reference later.  Frequently, I create a notebook for a particular conference that I can opt to share with others.

Conference notes in Evernote

Personal uses:  I am learning to cook, and this tool has been really useful.  Say I find a great recipe that I decide I want to (try and) make for dinner tonight.  Clip the recipe using the web clipper, save it to my recipes notebook and then pull it up on my phone while I’m cooking to follow along (which also explains all the flour on my phone).  In a few months if I want to use it again, I’ll probably have to search for it, because all I will remember is that it had chickpeas in it.  But, that’s all I have to remember.

recipe in Evernote
There are lots of other add-ins for this application, but I love and use the base service the most often.


21st Century Education: A First-Hand Experience with the MOOC

MOOC. It’s such a small word – not even a word, four letters. Yet this tiny word has the power to be a very large change agent in education. MOOC stands for Massively Open Online Courseware – an opening of education to anyone, anywhere in the world, as long as they have a computer and Internet access. The concept of the MOOC is not new – MIT has been making courseware open to the public for many years. Neither is the concept of the online course, though such courses had always been behind a paywall of a university or corporation. In 2012, California-based company Coursera revolutionized the basic concept of the MOOC. Through Coursera, users have access to over 200 courses from some of the world’s top universities – all for free. The courses are truly interdisciplinary, ranging from humanities (Listening to World Music), the social sciences (The Law of the European Union: An Introduction), and the natural and applied sciences (Introduction to Organic Chemistry, Fundamentals of Electrical Engineering).

My fascination with online courses comes from coming of age when the World Wide Web was still in toddler stage. Webpages were basic, AOL was the top ISP, and connection speeds were laughably slow. When I went back to library school in 2007, the internet grew up considerably, but my library school was one of few (if not the only one) that did not offer online courses. This intstructional philosophy, coupled with my part-time student status, left me critically thinking about the state of education. How could someone like me – a non-traditional student, balancing my education and a full time job – get the benefits when there are only so many hours in a day? How could those students with families manage to get an additional degree without sacrificing too much of family life?

Coursera was not my first venture into online education – I took a free class through O’Reilly Media in XML programming in 2010. What made Coursera different from that course was the breadth and depth of subject matter offered, and the ability to receive a certificate of completion for the class.

The first class I took this fall was in the Python programming language, taught by instructors at the University of Toronto. The class structure was easy for me to follow. Each week, the instructors posted several videos on the week’s topics, along with a weekly quiz. It was easy enough for me to watch the videos and work on quiz questions during lunch hour at work or in the evenings/weekends. I spent no more than two hours each week working on coursework. Larger programming assignments were due every other week, and that took up maybe an extra hour or so of my time. Students needing assistance about the course, or just a place to socialize could find community in the course message boards, or through a Meetup-sponsored group. I participated in a local (New York City) group, and the benefits extended beyond coursework assistance – I came out of it with some great friends.

From a pedagogical perspective, the instruction was designed very well. The combination of weekly quizzes and longer assignments helped me to grasp concepts quickly. The instructors also took great care to dig deep into the theory of the science of Python, using a visualizer to step through the code so you could see how the computer processed the language. They transformed it into a living, breathing thing – past the symbols on the computer screen. They were extremely quick to respond to student concerns, such as errors in questions on the final exam and extension of due dates after Hurricane Sandy hit the Northeast United States in the middle of the course.

My largest concern with Coursera was with academic integrity.  After a cheating scandal broke out in a few courses, the company added an honor code to enrollment.  Some students interpret the honor code so strictly as to be impractical, and I experienced one such case where an innocent comment resulted in a cheating accusation from a classmate, including threats to report me to the instructors and Coursera.  Thankfully, several students came to my defense, and no further action from higher powers that be resulted.  It’s difficult to monitor integrity in a course such as this, when answers to questions are more quantitative than qualitative.  Does a simple honor system like what Coursera has in place now work?  Or, are more security measures (two-step or IP authentication, for example) necessary?

What should librarians think about when it comes to the MOOC?  Instructional design librarians and subject specialists should think about how MOOCs can affect the subject guides and instructional sessions they curate.  Is there an obligation to offer support for courses not offered at their university or students not matriculated at the school?  Librarians who manage collection development and electronic resources should ask similar questions – is there an obligation to spend collection development funds on resources that do not directly support your institution or users?   Faculty and department heads should think about how to properly measure assessment for these courses, especially if they want to offer credit. School administrators should carefully think about privacy of student data, especially if the institution is in a country that has stricter privacy laws than the United States (such as Canada).

What’s next for Coursera – both for me and for the company? I will be starting my next class in E-Learning and Digital Cultures at the end of this month, and two more set to start later this year. If you’re attending THATCamp Libraries in Boston at the end of February, I am hoping to convene a discussion group on the MOOC. (Watch the THATCamp Libraries blog for further details.) For Coursera, the company hopes to continue expanding its course offerings, and is in talks to allow Coursera courses to count for college credit. (Some institutions do this already, but Coursera is looking to broaden that reach.) There are rumors that Coursera will start charging for classes or for the end of course certificate, but for now, they are just that – rumors.

We’re growing closer to true freedom of education – opening scholarly pursuits to all. And all it took was the concept behind that little word of MOOC.

About Our Guest Author: Kate Kosturski is the Institutional Participation Coordinator for the United Kingdom and Northern Europe for JSTOR.   You can find Kate exploring innovations in teaching, learning and training on the T is for Training podcast, where she is an occasional guest host and frequent panelist.  You can follow her on Twitter as @librarian_kate.

 


Batch Renaming the Easy Way

Everyone occasionally dives right into a problem without researching (gasp!) the best solution.  For me, this once meant manually renaming hundreds of files and moving them into individual folders in preparation for upload to a digital repository.  Then finally a colleague said to me, and rightly so, “Are you crazy?  There’s scripts to do that for you.”

In my last post, I discussed file naming conventions and the best methods to ensure future access and use for files.  However, as librarians and archivists, we don’t always create the files we manage.  Donors bring hard drives and students bring USB drives and files get migrated…etc, etc.  Renaming existing files to bring them in line with our standards is often a daunting prospect, but there are lots of methods available to save time and sanity.

In this post, I’ll review a few easy methods for batch renaming files:

The first two methods do not require any knowledge of coding; the last is slightly more advanced.  There are some caveats: if you are an experienced developer, it’s likely that you know a more efficient way.  I also tried to avoid any third-party tools specifically touted as renaming applications, as I have not used them and therefore cannot recommend which is best.  Lastly, while Photoshop and other photo editing software may help with this when working with image files, the options listed below should work with all file types.

In my example, I am using a set of 43 images waiting for upload to our digital library.  The files originated on a faculty member’s camera, so the names are in the following format:

DSCN2956.jpg
DSCN2957.jpg
DSCN2958.jpg
...

The images are of the Olympic Stadium in Beijing, China, and I would like the file names to reflect that, i.e. Beijing-OlympicStadium-01.jpg

Mac Automator

One of the features included in Mac OS X (10.4 and above) is Automator, the “personal automation assistant”, according to Apple Support.  The tool allows you to define a group of actions to take, automatically, on a given set of triggers.  For example, after discovering this tool I created a script which, when prompted, quickly grabs a screenshot and save it as a jpeg in a folder I specified.

For this post, let’s step through using the tool to batch re-name files.  First, I found a tutorial online.  These are everywhere, but specifically, I looked at “10 Awesome Uses for Automator Explained” by Matt Reich.  Reich gives a good succinct tutorial, placed in the context of personal photos.  We’re going to make a few changes in our steps, place it in the context of a digital collection and walk a little more slowly through the process.  I’ll be using Mac OS 10.8 in the steps and screenshots.

1.  Go to Finder, Open Applications and double-click on Automator.

2.  We’re going to create an Application.  Reich uses a Folder Action, which means that you would copy the items into the folder which would trigger the rename.  That approach makes sense as you move personal photos from a camera into the same Photos folder over and over again (in fact, I plan to use it myself).  However, in working with existing digital files that we just want to rename, which may need to live in many different folders, the Application is a more direct approach.  This will allow us to act on the files in place.  So, click on the Application Icon, and click on Choose.

3.  Now we need to add some Actions.  In the Library along the far left-hand pane, select “Files & Folders”.  The middle pane will now show all of the options for acting on Files & Folders.

Automator User Interface

4.  Click on “Rename Finder Items” and drag it to the large empty pane on the right.

5.  The system will prompt you as to whether or not you want to “Copy the Finder items.”  For this example, I opted not to, but if you prefer to make a copy, click on Add.

Prompt to Add Copy Items

6.  The window you’ve dragged over will default to settings for “Add Date or Time”.  We want to do this eventually, but let’s start with changing the name and adding a sequence number.  In the drop-down menu at the top of the window, change “Add Date or Time” to “Make Sequential”

Select Make Sequential Option

7.  Select the radio button next to “new name”, but don’t enter a default item name.

Set naming parameters

8.  Set the rest of the parameters.  For my purposes, I placed the number after the name, used a dash to separate, and used a three digit number set.

9.  Click on “Options” at the bottom, and select “Show this action when the workflow runs.”  The application will then prompt you to fill in the item name at runtime.

A note about the date:  In cases where you’d like to append a system date (e.g. Created, Modified, Last Opened or Current), you would use “Add Date or Time”.  To match our file naming conventions we have already established, we’ll want to select non-space and non-special characters as our separators, use Year Month Day as the format, and click the checkbox to “Use Leading Zeros”.  I would use a dash to separate the name from the date and no separator for Year Month Day.  Look at the example provided at the bottom to make sure the date looks correct.

However, in my case, I’m working with a set of files where the system dates aren’t much use to me.  I want to know the date of the photo; this is especially likely if I were working with scanned files from a historical period. So, I’m going to use “Add Text” instead, and append my own date.

10.  Repeat step 4: drag “Rename Finder Items” to the right pane.  This time, select “Add Text” from the dropdown.

11. Leave the “Add Text” field blank, click on “Options” and select “Show this action when the workflow runs.”  Then, when you run the application you’ll be prompted to add text and you can append 1950, for example, to the file name.

12.  Click on File > Save As, and save your Application in a location where it is easy to drag and drop files, like the Desktop.  For my example, I called the application BatchFileRename.

13.  Navigate to the folder containing the files you want to rename, and select them all (can use Cmd+A).  Drag the whole selection to the Automator file you just created, fill in the prompts and click “Continue”.

Automator-12

Prompt 1 for Automator

Text Prompt for Automator

You now have a set of renamed files.  Note that the script did not modify the “Date Modified” value for the file.  The script is now set up for future projects as well; any time you want to rename files, just repeat step 13.

One thing you might notice is that the date is appended after the index number.  If you wanted it before the index number, we would append it to the “item name” field in the Make Sequential box and skip the Add Text section all together.

automatorresult

A note from a paranoid librarian:  I copied this set of files from its original location to do this example, so that if something went horribly wrong, I’d still have the originals.  Until you get comfortable with batch renaming you might consider doing the same.

There are lots of other uses for the Automator tool, check out “10 Awesome Uses for Automator Explained” by Matt Reich for more ideas, or do a search for Automator tutorials in your favorite search engine.

Windows – Notepad++ Column Editor

I started out hoping to accomplish this task the same way I did in the Mac OS X – with no outside tools.  However, the default renaming function in Windows lacks a few things for our purposes.  If you select a group of files, right-click and select “Rename”, you can rename all of the files at once.

Windows default process

However, the resulting file names do not conform to our earlier standards.  They contain spaces and special characters and the index number is not a consistent length, which can cause sorting headaches.

After some searching, I came across this stackoverflow page, which contained a very useful command:

dir /b *.jpg >file.bat

This command allows me to dump a directory’s files into a text file which I can edit into a series of rename commands to be run as a batch file.  The editing of the text file is the most time-consuming part, but using the Column Editor in Notepad++ speeds up the process considerably.  (This is where we break the “no third-party tool” convention.  Notepad++ is a free text editor I use frequently for writing code and highly recommend, though this process may work with other text editors.)

1.  Open a command prompt.

2.  Navigate to the directory which contains the files that need to be renamed.

3.  The command we found above is composed of several parts.  “dir” lists the directory contents, “/b” indicates to only list the filenames, “*.jpg” means to grab only the jpg files, and “>file.bat” directs the output to a file called file.bat.  We are going to keep everything the same except change the name of our output file.

dir /b *.jpg >rename.bat

Command Line

4.  In Windows Explorer, navigate to the directory and find the file you just created.  Right click on it and select Edit with Notepad++ (or Open With > Your Text Editor).

Windows Edit With Notepad++

5.  Put the cursor before the first letter in the first line, and open the Column Editor (Edit > Column Editor or Alt+C).

Open Column Editor

6.  This tool allows you to assign the same character to every line of text in the same space.  We want to insert the beginning of the Windows rename command for each line.  So, in the “text to insert” box, we type:

rename "

and click OK.

Column Editor

7.  Open the editor again to add the portion of the rename command which goes after the old filename.  Here is where we’ll designate our new name, again using the “text to insert” box.  I typed:

" "Beijing-OlympicStadium-

(Note, if you are using file names of varying length, move to the column after the longest file name, then use Find & Replace at the end of the process to remove the extra spaces.)

8.  Next, let’s append an index before the file extension.  Open the Column Editor again and this time, select the number option.  Start at 1, increment by 1, and check the leading zeros box. Click Ok.

Column Editor Insert Number

9.  Last, append the file extension and end the command for each line.  Using the Column Editor’s “text to insert” box one more time, add:

.JPG"

10. The Column Editor adds one extra line at the bottom.  Scroll down and delete it before saving the file.

Column Editor Extra Line

11. Save the file and go back to the command prompt. (If you closed it, re-open it and navigate back to the directory before proceeding.)

12.  Type in the full name of the batch file so it will execute, i.e.

rename.bat

You’ll see the rename commands go by, and the files will each have a new name.  Again, this doesn’t appear to affect the Date Modified on the file.

Windows – Batch File with Loop

It is possible to write your own batch file that will loop through the files in question and rename them.  I have never written my own batch file, so in the interest of researching this post, I decided to give it a shot.  There is lots of documentation available online to help in this effort.  I consulted Microsoft’s documentation, DOS help documentation, and batch file examples (such as this stackoverflow post and a page on OhioLINK’s Digital Resource Management Committee wiki, which focuses preparing files for DSpace batch upload).

A batch file just groups a number of Windows commands together in one file and executes them when the batch file is run, as we saw in our previous example. But, instead of writing the specific rename commands one by one using a text editor, a batch file can also be used to generate the commands on the fly.  Save the following code to a file, place it in the same directory with the set of files and then double click to run it.  Caveat: test this with sample files before you use it!  I have tested on a few directories, but not extensively.

First, we use @echo off to stop the batch commands from printing to the command line window.

@echo off

Then, we set EnableDelayedExpansion so that our index counter will work (has to do with evaluating the variable at execution).  This is why when you see i in the loops, it is written !i! instead of %i% used for other variables.

@setlocal enabledelayedexpansion

Next, I set three prompts to ask the user for some information about the renaming. What’s the root name we want to use? What’s the file extension? How many files are there? (Note, this will only work for under 1000 files).  The “/p” flag assigns the response to the prompt to a variable (r, e and n, respectively).  When we reference these variables later, we’ll use the syntax %r% %e% and %n%.

set /p r=Enter name root: 
set /p e=Enter file extension (ie .jpg .tif): 
set /p n=More than one hundred files? (y/n):

Next, we set the index counter, which allows to add an incrementing index to our filenames.

set /a "i = 1"

If there are less than 100 files, we only need one leading zero in the index for our first ten files, and none for the remaining. If there are more than 100, obviously we’ll want a three digit index. So, the following if statement allows us to fork to one of two loops – for two digits or three digits.

if %n%==y (GOTO three) else GOTO two

Our first segment handles three digit indexes for more than 100 files.  %%v is the temporary variable that holds each item as we iterate through the loop one time. *%e% represents a wildcard plus the extension given by the user. So, if the user enters .jpg, we want to select *.jpg, or all files with a .jpg extension. Everything that follows “do” is a command.

:three
for %%v in (*%e%) do (

First, we want to see if, based on the index counter i, we need leading zeros. If i is less than ten, we want two leading zeros. If it’s less than 100, we want one leading zero. This affects the renaming statement that gets applied. All of the rename statements will rename the file currently in %%v to the root name (represented by %r%), followed by a hyphen, the correct number of leading zeros, the index number (represented by !i!) and the file extension (represented by %e%).

if !i! lss 10 (
rename %%v %r%-00!i!%e%
) else (
if !i! lss 100 (
rename %%v %r%-0!i!%e%
) else (
rename %%v %r%-!i!%e%
)
)

Before we exit the loop, we want to increment the index to use with the next file. And, lastly, we need to add a “goto done” statement, so that we don’t execute the “two” segment.

set /a "i = i + 1"
)
goto done

The “two” section is the basically the same, except that we only need two digit indexes since there are less than 100 files.

:two
for %%v in (*%e%) do (
if !i! lss 10 (
rename %%v %r%-0!i!%e%
) else (
rename %%v %r%-!i!%e%
)
set /a "i = i + 1"
)

We end with our “done” label, which marks the exit point.

:done

Here is the code as a whole:


@echo off
@setlocal enabledelayedexpansion
set /p r=Enter name root: 
set /p e=Enter file extension (ie .jpg .tif): 
set /p n=More than one hundred files? (y/n): 
set /a "i = 1"
if %n%==y (GOTO three) else GOTO two
:three
for %%v in (*%e%) do ( 
 if !i! lss 10 (
 rename %%v %r%-00!i!%e%
 ) else (
 if !i! lss 100 (
 rename %%v %r%-0!i!%e%
 ) else (
 rename %%v %r%-!i!%e%
 )
 )
 set /a "i = i + 1"
)
goto done
:two
for %%v in (*%e%) do (
 if !i! lss 10 (
 rename %%v %r%-0!i!%e%
 ) else (
 rename %%v %r%-!i!%e%
 )
 set /a "i = i + 1"
 )
:done

I saved the file as BatchRename.bat, and then copied it to my test directory. Double click on the .bat file to open it. Enter the prompts and the batch file takes care of the rest.

batch-1
batch-2

The files are renamed and again, the Date Modified field was not changed by this action.

 

Conclusion

Of the three methods, I slightly prefer the Automator method, because of its simplicity and ability to be re-used: once the application is created it can be used over and over again with different sets of files.  The batch file for Windows is similar in that it can be re-used once created, but does require some knowledge of coding concepts.  With the Notepad++ method, we have simplicity, but you’ll need to step through the file editing with each new set.  I love the Column Editor, however; the Insert Number function is incredibly useful for indexing files in file names without the pesky Window parentheses.

All of the methods are quick and easy ways to rename a large set of files.  And from personal experience, I will attest that all are preferable to doing it manually.

I’m curious to hear our readers’ thoughts – feel free to leave questions and other recommendations in the Comments section below.

 

 

 


Aaron Swartz and Too-Comfortable Research Libraries

*** Update: Several references and a video added (thanks to Brett Bonfield) on Feb. 21, 2013. ***

Who was Aaron Swartz?

If you are a librarian and do not know who Aaron Swartz is, that should probably change now. He helped developing the RSS standard, was the co-founder of Reddit, worked on the Open Library project, downloaded and freed 20% (2.7 million documents) of the Public Access to Court Electronic Records (PACER) database that charges access fees for the United States federal court documents, out of which about 1,600 had privacy issues, played a lead role in preventing the Stop Online Piracy Act (SOPA), and wrote the Guerrilla Open Access Manifesto.

Most famously, he was arrested in 2011 for the mass download of journal articles from JSTOR. He returned the documents to JSTOR and apologized. The Massachusetts state court dismissed the charges, and JSTOR decided not to pursue civil litigation. But MIT stayed silent, and the federal court charged Swartz with wire fraud, computer fraud, unlawfully obtaining information from a protected computer and recklessly damaging a protected computer. If convicted on these charges, Swartz could be sentenced to up to 35 years in prison at the age of 26. He committed suicide after facing charges for two years, on January 11, 2013.

Information wants to be free; Information wants to be expensive

Now, he was a controversial figure. He advocated Open Access (OA) but to the extent of encouraging scholars, librarians, students who have access to copyrighted academic materials to trade passwords and circulate them freely on the grounds that this is an act of civil disobedience against unjust copyright laws in his manifesto. He was an advocate of the open Internet, the transparent government, and open access to scholarly output. But he also physically hacked into the MIT network wiring closet and attached his laptop to download over 4 million articles from JSTOR. Most people including librarians are not going to advocate trading their institutions’ subscription database passwords or breaking into a staff-only computer networking area of an institution. The actual method of OA that Swartz recommended was highly controversial even among the strongest OA advocates.

But in his Guerrilla OA manifesto, Swartz raised one very valid point about the nature of information in the era of the World Wide Web. That is, information is power. (a) As power, information can be spread to and be made useful to as many of us as possible. Or, (b) it can be locked up and the access to it can be restricted to only those who can pay for it or have access privileges some other way. One thing is clear. Those who do not have access to information will be at a significant disadvantage compared to those who do.

And I would like to ask what today’s academic and/or research libraries are doing to realize Scenario (a) rather than Scenario (b). Are academic/research libraries doing enough to make information available to as many as possible?

Too-comfortable Internet, Too-comfortable academic libraries

Among the many articles I read about Aaron Swartz’s sudden death, the one that made me think most was “Aaron Swartz’s suicide shows the risk of a too-comfortable Internet.” The author of this article worries that we may now have a too-comfortable Internet. The Internet is slowly turning into just another platform for those who can afford purchasing information. The Internet as the place where you could freely find, use, modify, create, and share information is disappearing. Instead pay walls and closed doors are being established. Useful information on the Internet is being fast monetized, and the access is no longer free and open. Even the government documents become no longer freely accessible to the public when they are put up on the Internet (likely to be due to digitization and online storage costs) as shown in the case of PACER and Aaron Swartz. We are more and more getting used to giving up our privacy or to paying for information. This may be inevitable in a capitalist society, but should the same apply to libraries as well?

The thought about the too-comfortable Internet made me wonder whether perhaps academic research libraries were also becoming too comfortable with the status quo of licensing electronic journals and databases for patrons. In the times when the library collection was physical, people who walk into the library were rarely turned away. The resources in the library are collected and preserved because we believe that people have the right to learn and investigate things and to form one’s own opinions and that the knowledge of the past should be made available for that purpose. Regardless of one’s age, gender, social and financial status, libraries have been welcoming and encouraging people who were in the quest for knowledge and information. With the increasing number of electronic resources in the library, however, this has been changing.

Many academic libraries offer computers, which are necessary to access electronic resources of the library itself. But how many of academic libraries keep all the computers open for user without the user log-in? Often those library computers are locked up and require the username and password, which only those affiliated with the institution possess. The same often goes for many electronic resources. How many academic libraries allow the on-site access to electronic resources by walk-in users? How many academic libraries insist on the walk-in users’ access to those resources that they pay for in the license?  Many academic libraries also participate in the Federal Depository Library program, which requires those libraries to provide free access to the government documents that they receive to the public. But how easy is it for the public to enter and access the free government information at those libraries?

I asked in Twitter about the guest access in academic libraries to computers and e-resources. Approximately 25 academic librarians generously answered my question. (Thank you!) According to the responses in Twitter,  almost all except a few libraries ( mentioned in Twitter responses) offer guest access to computers and e-resources on-site. It is to be noted, however, that a few offer the guest -access to neither. Also some libraries limit the guests’ computer-use to 30 minutes – 4 hours, thereby restricting the access to the library’s electronic resources as well. Only a few libraries offer free wi-fi for guests. And at some libraries, the guest wi-fi users are unable to access the library’s e-resources even on-site because the IP range of the guest wi-fi is different from that of the campus wi-fi.

I am not sure how many academic libraries consciously negotiate the walk-in users’ on-site access with e-resources vendors or whether this is done somewhat semi-automatically because many libraries ask the library building IP range to be registered with vendors so that the authentication can be turned off inside the building. I surmise that publishers and database vendors will not automatically permit the walk-in users’ on-site access in their licenses unless libraries ask for it. Some vendors also explicitly prohibit libraries from using their materials to fill the Interlibrary loan requests from other libraries. The electronic resource vendors and publishers’ pricing has become more and more closely tied to the number of patrons who can access their products. Academic libraries has been dealing with the escalating costs for electronic resources by filtering out library patrons and limiting the access to those in a specific disciplines. For example, academic medical and health sciences libraries often subscribe to databases and resources that have the most up-to-date information about biomedical research, diseases, medications, and treatments. These are almost always inaccessible to the general public and often even to those affiliated with the institution. The use of these prohibitively expensive resources is limited to a very small portion of people who are affiliated with the institution in specific disciplines such as medicine and health sciences. Academic research libraries have been partially responsible for the proliferation of these access limitations by welcoming and often preferring these limitations as a cost-saving measure. (By contrast, if those resources were in the print format, no librarian would think that it is OK to permanently limit its use to those in medical or health science disciplines only.)

Too-comfortable libraries do not ask themselves if they are serving the public good of providing access to information and knowledge for those who are in need but cannot afford it. Too-comfortable libraries see their role as a mediator and broker in the transaction between the information seller and the information buyer. They may act as an efficient and successful mediator and broker. But I don’t believe that that is why libraries exist. Ultimately, libraries exist to foster the sharing and dissemination of knowledge more than anything, not to efficiently mediate information leasing. And this is the dangerous idea: You cannot put a price tag on knowledge; it belongs to the human race. Libraries used to be the institution that validates and confirms this idea. But will they continue to be so in the future? Will an academic library be able to remain as a sanctuary for all ideas and a place for sharing knowledge for people’s intellectual pursuits regardless of their institutional membership? Or will it be reduced to a branch of an institution that sells knowledge to its tuition-paying customers only? While public libraries are more strongly aligned with this mission of making information and knowledge freely and openly available to the public than academic libraries, they cannot be expected to cover the research needs of patrons as fully as academic libraries.

I am not denying that libraries are also making efforts in continuing the preservation and access to the information and resources through initiatives such as Hathi Trust and DPLA (Digital Public Library of America). My concern is rather whether academic research libraries are becoming perhaps too well-adapted to the times of the Internet and online resources and too comfortable serving the needs of the most tangible patron base only in the most cost-efficient way, assuming that the library’s mission of storing and disseminating knowledge can now be safely and neutrally relegated to the Internet and the market. But it is a fantasy to believe that the Internet will be a sanctuary for all ideas (The Internet is being censored as shown in the case of Tarek Mehanna.), and the market will surely not have the ideal of the free and open access to knowledge for the public.

If libraries do not fight for and advocate those who are in need of information and knowledge but cannot afford it, no other institution will do so. Of course, it costs to create, format, review, and package content. Authors as well as those who work in this business of content formatting, reviewing, packaging, and producing should be compensated for their work. But not to the extent that the content is completely inaccessible to those who cannot afford to purchase but nevertheless want access to it for learning, inquiry, and research. This is probably the reason why we are all moved by Swartz’s Guerrilla Open Access Manifesto in spite of the illegal implications of the action that he actually recommended in the manifesto.

Knowledge and information is not like any other product for purchase. Sharing increases its value, thereby enabling innovation, further research, and new knowledge. Limiting knowledge and information to only those with access privilege and/or sufficient purchasing power creates a fundamental inequality. The mission of a research institution should never be limited to self-serving its members only, in my opinion. And if the institution forgets this, it should be the library that first raises a red flag. The mission of an academic research institution is to promote the freedom of inquiry and research and to provide an environment that supports that mission inside and outside of its walls, and that is why a library is said to be the center of an academic research institution.

I don’t have any good answers to the inevitable question of “So what can an academic research library do?” Perhaps, we can start with broadening the guest access to the library computers, wi-fi, and electronic resources on-site. Academic research libraries should also start asking themselves this question: What will libraries have to offer for those who seek knowledge for learning and inquiry but cannot afford it? If the answer is nothing, we will have lost libraries.

In his talk about the Internet Archive’s Open Library project at the Code4Lib Conference in 2008 (at 11:20), Swartz describes how librarians had argued about which subject headings to use for the books in the Open Library website. And he says, “We will use all of them. It’s online. We don’t have to have this kind of argument.” The use of online information and resources does not incur additional costs for use once produced. Many resources, particularly those scholarly research outputs, already have established buyers such as research libraries. Do we have to deny access to information and knowledge to those who cannot afford but are seeking for it, just so that we can have a market where information and knowledge resources are sold and bought and authors are compensated along with those who work with the created content as a result? No, this is a false question. We can have both. But libraries and librarians will have to make it so.

Videos to Watch

“Code4Lib 2008: Building the Open Library – YouTube.”


“Aaron Swartz on Picking Winners” American Library Association Midwinter meeting, January 12, 2008.

“Freedom to Connect: Aaron Swartz (1986-2013) on Victory to Save Open Internet, Fight Online Censors.”

REFERENCES

“Aaron Swartz.” 2013. Accessed February 10. http://www.aaronsw.com/.

“Aaron Swartz – Wikipedia, the Free Encyclopedia.” 2013. Accessed February 10. http://en.wikipedia.org/wiki/Aaron_Swartz#JSTOR.

“Aaron Swartz on Picking Winners – YouTube.” 2008. http://www.youtube.com/watch?feature=player_embedded&v=BvJqXaoO4FI.

“Aaron Swartz’s Suicide Shows the Risk of a Too-comfortable Internet – The Globe and Mail.” 2013. Accessed February 10. http://www.theglobeandmail.com/commentary/aaron-swartzs-suicide-shows-the-risk-of-a-too-comfortable-internet/article7509277/.

“Academics Remember Reddit Co-Founder With #PDFTribute.” 2013. Accessed February 10. http://www.slate.com/blogs/the_slatest/2013/01/14/aaron_swartz_death_pdftribute_hashtag_aggregates_copyrighted_articles_released.html.

“After Aaron, Reputation Metrics Startups Aim To Disrupt The Scientific Journal Industry | TechCrunch.” 2013. Accessed February 10. http://techcrunch.com/2013/02/03/the-future-of-the-scientific-journal-industry/.

American Library Association, “A Memorial Resolution Honoring Aaron Swartz.” 2013. http://connect.ala.org/files/memorial_5_aaron%20swartz.pdf.

“An Effort to Upgrade a Court Archive System to Free and Easy – NYTimes.com.” 2013. Accessed February 10. http://www.nytimes.com/2009/02/13/us/13records.html?_r=1&.

Bonfield, Brett. 2013. “Aaron Swartz.” In the Library with the Lead Pipe (February 20). http://www.inthelibrarywiththeleadpipe.org/2013/aaron-swartz/.

“Code4Lib 2008: Building the Open Library – YouTube.” 2013. Accessed February 10. http://www.youtube.com/watch?v=oV-P2uzzc4s&feature=youtu.be&t=2s.

“Daily Kos: What Aaron Swartz Did at MIT.” 2013. Accessed February 10. http://www.dailykos.com/story/2013/01/13/1178600/-What-Aaron-Swartz-did-at-MIT.

Dupuis, John. 2013a. “Around the Web: Aaron Swartz Chronological Link Roundup – Confessions of a Science Librarian.” Accessed February 10. http://scienceblogs.com/confessions/2013/01/20/around-the-web-aaron-swartz-chronological-link-roundup/.

———. 2013b. “Library Vendors, Politics, Aaron Swartz, #pdftribute – Confessions of a Science Librarian.” Accessed February 10. http://scienceblogs.com/confessions/2013/01/17/library-vendors-politics-aaron-swartz-pdftribute/.

“FDLP for PUBLIC.” 2013. Accessed February 10. http://www.gpo.gov/libraries/public/.

“Freedom to Connect: Aaron Swartz (1986-2013) on Victory to Save Open Internet, Fight Online Censors.” 2013. Accessed February 10. http://www.democracynow.org/2013/1/14/freedom_to_connect_aaron_swartz_1986.

“Full Text of ‘Guerilla Open Access Manifesto’.” 2013. Accessed February 10. http://archive.org/stream/GuerillaOpenAccessManifesto/Goamjuly2008_djvu.txt.

Groover, Myron. 2013. “British Columbia Library Association – News – The Last Days of Aaron Swartz.” Accessed February 21. http://www.bcla.bc.ca/page/news/ezlist_item_9abb44a1-4516-49f9-9e31-57685e9ca5cc.aspx#.USat2-i3pJP.

Hellman, Eric. 2013a. “Go To Hellman: Edward Tufte Was a Proto-Phreaker (#aaronswnyc Part 1).” Accessed February 21. http://go-to-hellman.blogspot.com/2013/01/edward-tufte-was-proto-phreaker.html.

———. 2013b. “Go To Hellman: The Four Crimes of Aaron Swartz (#aaronswnyc Part 2).” Accessed February 21. http://go-to-hellman.blogspot.com/2013/01/the-four-crimes-of-aaron-swartz.html.

“How M.I.T. Ensnared a Hacker, Bucking a Freewheeling Culture – NYTimes.com.” 2013. Accessed February 10. http://www.nytimes.com/2013/01/21/technology/how-mit-ensnared-a-hacker-bucking-a-freewheeling-culture.html?pagewanted=all.

March, Andrew. 2013. “A Dangerous Mind? – NYTimes.com.” Accessed February 10. http://www.nytimes.com/2012/04/22/opinion/sunday/a-dangerous-mind.html?pagewanted=all.

“MediaBerkman » Blog Archive » Aaron Swartz on The Open Library.” 2013. Accessed February 22. http://blogs.law.harvard.edu/mediaberkman/2007/10/25/aaron-swartz-on-the-open-library-2/.

Peters, Justin. 2013. “The Idealist.” Slate, February 7. http://www.slate.com/articles/technology/technology/2013/02/aaron_swartz_he_wanted_to_save_the_world_why_couldn_t_he_save_himself.html.

“Public Access to Court Electronic Records.” 2013a. Accessed February 10. http://www.pacer.gov/.

“Publishers and Library Groups Spar in Appeal to Ruling on E-Reserves – Technology – The Chronicle of Higher Education.” 2013. Accessed February 10. http://chronicle.com/article/PublishersLibrary-Groups/136995/?cid=pm&utm_source=pm&utm_medium=en.

“Remember Aaron Swartz.” 2013. Celebrating Aaron Swartz. Accessed February 22. http://www.rememberaaronsw.com.

Rochkind, Jonathan. 2013. “Library Values and the Growing Scholarly Digital Divide: In Memoriam Aaron Swartz | Bibliographic Wilderness.” Accessed February 10. http://bibwild.wordpress.com/2013/01/13/library-values-and-digital-divide-in-memoriam-aaron-swartz/.

Sims, Nancy. 2013. “What Is the Government’s Interest in Copyright? Not That of the Public. – Copyright Librarian.” Accessed February 10. http://blog.lib.umn.edu/copyrightlibn/2013/02/what-is-the-governments-interest-in-copyright.html.

Stamos, Alex. 2013. “The Truth About Aaron Swartz’s ‘Crime’.” Unhandled Exception. Accessed February 22. http://unhandled.com/2013/01/12/the-truth-about-aaron-swartzs-crime/.

Summers, Ed. 2013. “Aaronsw | Inkdroid.” Accessed February 21. http://inkdroid.org/journal/2013/01/19/aaronsw/.

“The Inside Story of Aaron Swartz’s Campaign to Liberate Court Filings | Ars Technica.” 2013. Accessed February 10. http://arstechnica.com/tech-policy/2013/02/the-inside-story-of-aaron-swartzs-campaign-to-liberate-court-filings/.

“Welcome to Open Library (Open Library).” 2013. Accessed February 10. http://openlibrary.org/.

West, Jessamyn. 2013. “Librarian.net » Blog Archive » On Leadership and Remembering Aaron.” Accessed February 21. http://www.librarian.net/stax/3984/on-leadership-and-remembering-aaron/.