Workflow Automation in Technical Services: Part 2

Note: This is part two of a two part series on workflow automation in Technical Services. Part one covered the what and process of workflow automation and an example of an item level workflow automation process. Part two will discuss batch level workflow automation and resources/tools for workflow automation.

Last time, we discussed the basics of workflow automation and some examples of item-level automation in cataloging and acquisitions workflows. Automating workflows on an item-to-item basis provides greater consistency and efficiency in daily tasks done by staff, allowing them to spend more time on more complex workflows and tasks that may not be so readily automated. Item level workflow automation can be a low barrier investment in creating a more efficient operation.

Then you have the electronic journals, ebooks, and databases. You have large record files that are tied to physical resources – for example, record downloads from WorldCat Cataloging Partners. And then there are all those records in the system – MARC, XML, whatnot – that have missing or incorrect information (the infamous “dirty data”). Why can’t we just stick with item-level processing for everything?

Item level automation or batch automation?

For item level automation, you have a very granular level of control over the process, dealing with items one at a time. If the items are very similar in nature or have only a couple differences in how each item will be processed, though, then going through each item individually probably doesn’t make a lot of sense. On the other hand, batch processing allows you to go through many items at once, which makes adding or maintaining resources a quicker job than going through item by item. You do give up a certain level of control over details with batch processing, however, which leaves you to decide where the “good enough” marker should go in terms of data quality.

Overall, you want to avoid sub-optimizing your workflow. Sub-optimization happens when a part of an organization focuses the success of its own area instead of the entire organization’s success [1]. Going through each resource record individually might give you the greatest control over the record, but if you’re going through a file containing 10,000+ records individually, even with an item level automated workflow, the turnaround time for creating access for all those resources will be much higher than if the file was processed at once. However, with the right tools, you can deal with record batches with speed and a good level of control over the data.

MarcEdit is your friend

Many people have at least heard about MarcEdit, or have colleagues who have used it extensively. MarcEdit is a freely available program (for Windows) created by Terry Reese that works with MARC records in a variety of ways. You can add, delete, or modify fields in records, create MARC records from data in spreadsheets, crosswalk to and from the MARC format, split files, join files, generate call numbers, de-duplicate records – and that’s only part of what you can do with MarcEdit. Also, if you find yourself going through the same batch workflow for the same files on a regular basis, MarcEdit’s Script Wizard helps with automating routine batch processing workflows.

Example: Missing 041 1_ subfield h, or, this item is a translation, not in two languages!

Many of you may have moved your older library catalogs to a newer discovery layer; I’ve survived one move at my previous place of work and will probably have another move under my belt soon. One consequence of moving to a new discovery layer is that data previously ignored by the previous layer sticks out like a sore thumb in the new layer. This example is one of those dirty data discoveries: a particular MARC variable field incorrectly indicated that an item is in two or more languages instead of a translation. Not only you have unhappy library users who thought you had a copy of The Little Prince in both French and English, but this error exists in a few thousand records, finding yourself with a potentially resource intensive cleanup project.

If you can isolate and export those records in one (or a couple of) files from your database, then you can use MarcEdit to clean up the field in a relatively short time. Open the file in MarcEdit’s MarcEditor, and make your way to the “Edit Subfield” under the tools menu. Let’s say that there are a lot of records that have engfre in the 041 field and you want to change all the records with that entry at once. Replace the engfre field data with eng$hfre and you’ve taken care of all those records in one pass.

Since you probably have more than engfre in your file, you can use regular expressions in MarcEdit to change multiple fields at once regardless of language code. Using the Find/Replace tool, search for the 041 field subfield a, but this time add your regular expression and mark the “Use regular expression” box. The following expression is assuming that the 041 field has two language codes that are three letters in length, so you will have to do a little cleanup after running this replace command to catch the three or more language codes as well as two letter language codes. (h/t to zemkat for the regular expression!)

Libraries and modules and packages, oh my!

What if you’ve been learning some code, or are looking for an excuse to learn? You’re in luck! Some of the common programming languages have tools to deal with MARC data. Rolling your own batch automation scripts and applications allows you the most flexibility in working with other library data formats as well. However, if you haven’t programmed before, choose smaller projects to start. In addition, if the script or application doesn’t work, you’re your own tech support.

Example: Creating order records for patron driven acquisition (PDA) items triggered for purchase

Patron driven acquisition usually involves the ingestion of several hundred to thousands of records into the local database for items that are not technically owned by the library at that point in time. Depending on the PDA vendor one uses, the item is triggered for purchase after it reaches a use threshold (for example, 10 page views). The library will receive an invoice with these purchases, but we will still need to create order records in the system to show that these items have been bought. Considering that on a given week,  the number of purchases can range from single digits to higher double digits, that’s a lot of order records to manually key in.

After dabbling with pymarc at code4lib 2010, I thought this would be a good project to learn more about pymarc and python overall. Here is an outline of the script actions:

  1. In the trigger report spreadsheet, extract the local control numbers for the items triggered for purchase.
  2. Execute a SQL query against the local database for our locally developed next generation catalog, matching the local control number and extracting the MARC records from database.
  3. In each MARC record:
  • add a 590 and 790 field for donor/fund information
  • add a 949 field containing bibliographic record overlay and the order record creation information for the system, including cost of the item extracted from the spreadsheet.
  • change the 947 field data to indicate that the item has been purchased (for statistical reporting later on)
  1. Write the MARC records to a file for import into the ILS.

The output file is then uploaded into the ILS manually, which gives staff the chance to address any issues with the records that the system might have before import. Overall, the process from downloading the trigger report spreadsheet to uploading the record file into the ILS takes a few minutes, depending on the size of the file.

Which automation tools and resources to use?

There are a multitude of other automation tools and resources that cannot be fully covered in two blog posts. Your mileage may vary with these tools; you might find Macro Express to be a better fit for your organization than AutoIt, or you find that working with ruby-marc is easier for you than MarcEdit (resource links listed below). The best way to figure out what’s right for you is to play around with various tools and get a feel for them. More often than not, you’ll end up using multiple tools for different levels and types of workflow automation.

Don’t forget about the built-in tools in existing applications as well! Sometimes the best tools for the job are already there for you to take advantage of them.

For your convenience, here are the tools mentioned in the two blog posts, including a few others:


[1] http://dictionary.cambridge.org/dictionary/business-english/sub-optimization


Disruptive Educational Models and Open Education

Eating Your Own Dog Food

One of the most memorable experiences I had as a library student was becoming a patron of my own library. As on online library school student* I usually worked either in my office at pre-approved times, or at home. However, depending on the assignment, sometimes I worked out at the reference area public access computers. It nearly drove me mad, for a very simple reason – this was in the day before optical mouse devices, and the trackballs on our mice were incredibly sticky and jerky, despite regular cleaning routines. It was so bad I wondered how students could stand to work on our workstations, and how it made them feel about the library in general, since there is nothing like a solid hour or so of constantly repeated, albeit small, irritations to make a person develop indelible negative feelings towards a particular environment.

I’ve heard the same thing from colleagues that have started graduate programs here at my university; they are shocked at how hard it can be to be a student in the library, even with insider knowledge, and it can be demoralizing (and galvanizing) to watch classmates and even instructors dismiss library services and resources with “too confusing” or “learning curve too steep” as they ruthlessly practice least-effort satisficing for their information needs.

In information technology circles, the concept of having to use your own platforms/services is known as “eating your own dog food” or “dogfooding.” While there are pitfalls to relying too heavily on it as an assessment tool (we all have insider knowledge about libraries, software, and resources that can smooth the process for us), it is an eye-opening exercise, especially to listen to our users be brutally frank about what we offer — or don’t.

DIY Universities and Open Education

I am suggesting something related but complementary to dogfooding — sampling the models and platforms of a burgeoning movement that has the potential to be a disruptive force in higher education. DIY U and the coming transformation of education are all the rage (pun intended) these days, as prestigious universities and professors, Edupunks, loose collaboratives, and start-ups participate in collaborative free online offerings through various platforms and with different aims: CourseraKhan AcademyP2PUMIT OpenCourseWareUdacityNYU Open Education, and many more. This is a call to action for us as librarians. Instead of endlessly debating what this might mean, or where it might be going, and this movement’s possible effect on academic libraries, I suggest actually signing up for a course and experiencing it first-hand.

For library technologists facing the brave new world of higher education in the 21st century, there are three major advantages to taking a class in one of the new experimental DIY universities. We get to experience new platforms, delivery mechanisms, and modes of teaching, some of which may be applicable to the work of the academic library. In addition, many of the courses offered are technical courses that are directly applicable to our daily work. Thirdly, it allows us as academic participants to personally assess the often intemperate and hyperbolic language on both sides of the debate: “can’t possibly be as good as institutional campus-based face-to-face EVER” versus “This changes everything, FOREVER.” How many faculty on your campuses do you think have actually taken an online class, especially in one of these open educational initiatives? This is an opportunity to become an informed voice in any local campus debates and conversations. These conversations and debates will involve our core services, whether faculty and administrators realize it or  not.

It will also encourage some future-oriented thinking about where libraries could fit into this changing educational landscape. One of the more interesting possible effects in these collaborative,  open-to-all ventures is the necessity of using free or open access high quality resources. Where will that put the library? What does that mean for instructional resources hidden behind a particular institution’s authentication wall? Academic libraries and services have been tied to a particular institution — what happens when those affiliations blur and change extremely rapidly? There are all sorts of implications for faculty, students, libraries, vendors, and open access/open educational resources platforms. As a thought exercise, take a look at these seven predictions for the future of technology-enabled universities from JISC’s Head of Innovation, Sarah Porter. Which ones DON’T involve libraries? As a profession, let’s get out on the bleeding edge and investigate the developing models.

I just signed up for “Model Thinking” through Coursera. Taught by Professor Scott E. Page from the Center for the Study of Complex Systems at the University of Michigan, the course will cover modeling information to make sense of trends, social movements, behaviors, because “evidence shows that people who think with models consistently outperform those who don’t. And, moreover people who think with lots of models outperform people who use only one.” That sounds applicable to making decisions about e-books, collection development, workflow redesign, and changing models of higher education, et cetera.

Some Suggestions:

  • Coursera offers clusters of courses in Society, Networks, and Information (Model Thinking, Gamification, Social Networking Analysis, among others) and Computer Science (Algorithims, Compilers, Game Theory, etc.). If you have a music library or handle streaming media in your library, what about Listening to World Music? If you are curious about humanities subjects that have depended on traditional library materials in the past, try A History of the World since 1300 or Greek and Roman Mythology.
  • Udacity offers Building a Search Engine, Design of Computer Programs, and Programming a Robotic Car (automate a bookmobile?).
  • Set up your own peer class with P2PU, or take Become a Citizen Scientist, Curating Content, or Programming with the Twitter API.
  • If you are in the New York City area and can attend an in-person workshop, General Assembly offers Storytelling Skills, Programming Fundamentals for Non-Programmers, and Dodging the Dangers of Copyright Law (taught by participants in Yale Law School’s Information Society Project) as part of a menu of  tech and tech-business related workshops. These have fees ranging from $15 to $30.
  • Before I take my Model Thinking class, I’m planning to brush up my algebra at Khan Academy.
  • Try the archived lectures at Harvard’s “Building Mobile Applications“, hosted in their institutional repository.
  • Health Sciences Librarian? What about Information Technology in the Health Care System of the Future from MIT OpenCourseWare?

 

* Full disclosure: I am a proud graduate of University of Illinois’ LEEP (5.0) MSLIS program, and I also have another master’s degree done the old fashioned way, and I am an enthusiastic supporter of online education done correctly.


Career Impact and Library Technology Research

This blog post is not concerned with the specific application of a technology, rather it advocates the rather post-modern idea of research and writing in library technology for career impact. I take as my departure point the fact that not all research articles are useful contributions to the field. While intellectual rigor has its place in research, if the connection to service improvements or broader big picture questions are not addressed by scholarly research outputs the profession, as a whole, will not advance.

In a sense, it is after tenure when academic librarians begin to think of notions of careers of impact. We may ask ourselves what library needs or open problems were met by our work. We ask: did our research outputs matter?  Did our research stand up over time? Has the field moved forward at all?

A major problem in library and information science literature from an editorial perspective is the local-ness of any given paper. To generalize, many papers now coming into journal submission portals report how a specific local problem was addressed. The paper does its intellectual work only as far as its local institution is concerned. Broadly, what is needed in library writing — writing that is primarily driven from tenure line librarians is a need to consider practice of librarianship beyond the boundaries of a discreet study.

This underscores another significant problem which could be addressed by the right kind of mentorship in library settings: addressing the why of publishing, this would be a good corollary to the how, which veterans can teach – veteran tenured librarians will be able to speak to the methods for getting into print, getting even into the top tier journals like the Journal of Academic Librarianship. However, what is missing, and what this post is fundamentally concerned with, is the why of publishing for tenure.

When I started writing, the impulse was to sound smart. This is something I regretted deeply when I watched new library school students take notes on that paper. Now, I’m writing to communicate, since a wise person once said: “the smartest people are those who can communicate with others,” and what it is we are attempting to communicate when we publish are ways to improve practice – to move the field forward. That is why we publish. That is why we research. That is why we choose and stay on the tenure track, to have a career of impact in the field.

Can such a thing be taught? It’s like asking if morality can be taught, because it is a rather moral (and, possibly post-modern anti-ego thinking) choice to think of your profession as advancing and not yourself. While most tenure track activities can have the effect of growing ones ego, the path worth going down, the very interesting and profound path librarians must follow, if they are to remain honest, is to empty the ego, to empty any concern for the individual career and to think instead of the profession.

Our careers are not our own, anymore than the libraries we worked in and lived in were ours. The IT career of impact for librarians is that career which was made in the service to the profession.

 


Personal Data Monitoring: Gamifying Yourself

The academic world has been talking about gamification of learning for some time now. The 2012 Horizon Report says gamification of learning will become mainstream in 2-3 years. Gamification taps into the innate human love of narrative and displaying accomplishments.  Anyone working through Code Year is personally familiar with the lure of the green bar that tells you how far you are to your next badge. In this post I want to address a related but slightly different topic: personal data capture and analytics.

Where does the library fit into this? One of the roles of the academic library is to help educate and facilitate the work of researchers. Effective research requires collecting a wide variety of relevant sources, reading them, and saving the relevant information for the future. The 2010 book Too Much to Know by Ann Blair describes the note taking and indexing habits taught to scholars in early modern Europe. Keeping a list of topics and sources was a major focus of scholars, and the resulting notes and indexes were published in their own right. Nowadays maintaining a list of sources is easier than ever with the many tools to collect and store references–but challenges remain due to the abundance of sources and pressure to publish, among others.

New Approaches and Tools in Personal Data Monitoring

Tracking one’s daily habits, reading lists and any other personal information is a very old human habit. Understanding what you are currently doing is the first step in creating better habits, and technology makes it easier to collect this data. Stephen Wolfram has been using technology to collect data about himself for nearly 25 years, and he posted some visual examples of this a few weeks ago. This includes items such as how many emails he’s sent and received, keystrokes made, and file types created. The Felton report, produced by Nick Felton, is a gorgeously designed book with personal data about himself and his family. But you don’t have to be a data or design whiz to collect and display personal information. For instance, to display your data in a visually compelling way you can use a service such as Daytum to create a personal data dashboard.

Hours of Activity recorded by Fitbit

In the realm of fitness and health, there are many products that will help capture, store, and analyze personal data.  Devices like the Fitbit now clip or strap to your body and count steps taken, floors climbed, and hours slept. Pedometers and GPS enabled sport watches help those trying to get in shape, but the new field of personal genetic monitoring and behavior analytics promise to make it possible to know very specific information about your health and understand potential future choices to make. 23andMe will map your personal genome and provide a portal for analyzing and understanding your genetic profile, allowing unprecedented ability to understand health. (Though there is doubt about whether this can accurately predict disease). For the behavioral and lifestyle aspects of health a new service called Ginger.io will help collect daily data for health professionals.

Number of readers recorded by Mendeley

Visual cues of graphs of accomplishments and green progress bars can be as helpful in keeping up research and monitoring one’s personal research habits just as much as they help in learning to code or training for a marathon. One such feature is the personal reading challenge on Goodreads,which lets you set a goal of how many books to read in the year, tracks what you’ve read, and lets you know how far behind or ahead you are at your current reading pace. Each book listed as in progress has a progress bar indicating how far along in the book you are. This is a simple but effective visual cue. Another popular tool, Mendeley, provides a convenient way to store PDFs and track references of all kinds. Built into this is a small green icon that indicates a reference is unread. You can sort references by read/unread–by marking a reference as “read”, the article appears as read in the Mendeley research database. Academia.eduprovides another way for scholars to share research papers and see how many readers they have.

Libraries and Personal Data

How can libraries facilitate this type of personal data monitoring and make it easy for researchers to keep track of what they have done and help them set goals for the future? Last November the Academic Book Writing Month (#acbowrimo) Twitter hashtag community spun off of National Novel Writing Month and challenged participants to complete the first draft of an academic book or other lengthy work. Participants tracked daily word counts and research goals and encouraged each other to complete the work. Librarians could work with researchers at their institutions, both faculty and students, on this type of peer encouragement. We already do this type of activity, but tools like Twitter make it easier to share with a community who might not come to the library often.

The recent furor over the change in Google’s privacy settings prompted many people to delete their Google search histories. Considered another way, this is a treasure trove of past interests to mine for a researcher trying to remember a book he or she was searching for some years ago—information that may not be available anywhere else. Librarians have certain professional ethics that make collecting and analyzing that type of personal data extremely complex. While we collect all types of data and avidly analyze it, we are careful to not keep track of what individuals read, borrowed, or asked of a librarian. This keeps individual researchers’ privacy safe; the major disadvantage is that it puts the onus on the individual to collect his own data. For people who might read hundreds or thousands of books and articles it can be a challenge to track all those individual items. Library catalogs are not great at facilitating this type of recordkeeping. Some next generation catalogs provide better listing and sharing features, but the user has to know how to add each item. Even if we can’t provide users a historical list of all items they’ve ever borrowed, we can help to educate them on how to create such lists. And in fact, unless we do help researchers create lists like this we lose out on an important piece of the historical record, such as the library borrowing history in Dissenting Academies Online.

Conclusion

What are some types of data we can ethically and legally share to help our researchers track personal data? We could share statistics on the average numbers of books checked out by students and faculty, articles downloaded, articles ordered, and other numbers that will help people understand where they fall along a continuum of research. Of course all libraries already collect this information–it’s just a matter of sharing it in a way that makes it easy to use. People want to collect and analyze data about what they do to help them reach their goals. Now that this is so easy we must consider how we can help them.

 

Works Cited
Blair, Ann. Too Much to Know : Managing Scholarly Information Before the Modern Age. New Haven: Yale University Press, 2010.

What is a Graphic Design Development Process?

Previously, I wrote about the value of design in libraries, and others, including Stephen Bell and Aaron Schmidt, have written and presented on the topic of design in libraries as well. Now I’d like to focus on and delve specifically into what graphic design process may entail. For librarians who design regularly, I hope this helps to articulate what you may be doing already or perhaps add a bit to your tools and tips. For those that don’t design, I hope that this might give you insight into a process that is more complex than it may seem and that you might give designing a try yourself. For some ideas, try any of these are great library design projects: signs, webpages, posters, flyers, bookmarks, banners, etc.

What Is It Like to Design?

People might wonder why design needs to be a process. The very basic process of design, like many processes, is to solve a problem and then create a solution. Jason Fried, founder of 37signals and co-author of Rework, tweeted recently, “Your first design may be the best, but you won’t know until you can’t find a better one.” He later added this image from The Intercom blog as an illustration to make this important point. Striving for an elegant or best solution is something librarians and designers have in common. Librarians often share best practices and examining this process may not only assist us in terms of design, but perhaps we can apply these concepts to other areas of librarianship as we create programs, outreach, marketing, and more.

Design is a process.
Designers work hard to develop a successful design and it doesn’t always come easy. Here are some of the basic steps designers take in the development phase of their work. Every designer is a bit different, and not all designers follow the exact same process. However, this is a pretty good foundation for beginner designers and once you get good, you can incorporate or modify pieces of the process to make it work for you and the project at hand. Design is subjective and there are few hard and fast rules to follow, however, in future posts I’ll be talking more about design elements and details to help you create stronger designs that will speak to your users.

Design has constraints.
Before you start laying things out and jumping into a design, you want to understand what the “specs”  or specifications are. These are the details of the final piece you need up front before you begin any design. For example, is the piece going to be printed or is it an online piece? What’s the budget? Is it black and white, color, how many colors? What size? If printed, what paper will it be printed on? Will color bleed to the edge or is there a border? Is there folding or cutting involved?

All of these considerations are going to be the rules you must work under. But most designers like to think of them as challenges; many times if the specs aren’t too restrictive they can actually empower the designer to drive harder to make it more creative. You really don’t want to start designing before you get this all worked out because once you’ve jumped in it can mean starting over if a critical spec is missed. If you have designed for a set of specs and then try to modify it to fit all new specs later, it almost always compromises the strength of the design to work this way. Better to know those specs up front.

Design requires an open mind.
Sketch like crazy. You may think you have the best, most original idea ever once you get your assignment or have your specs, but please do yourself a huge favor and sketch some ideas out first. Do at least a page of sketches if not much more. Take notes, do some research on the topic, do word associations and mind maps and draw stick figures and doodle. Keep an open mind to new possibilities. Observe the world around you, daydream, and collect inspiration. You might still stick with that first idea but chances are you come up with something even better and usually more original if you push yourself to think in new ways and explore.

Design step by step.
Depending on the complexity of the piece, whether it’s print or web, I might do more or less of each step below. If you’re designing or reworking a website, this is a good method to get a powerful, thoughtful design. And of course, you can go back and iterate based on feedback given, changes to the design that impact design elements. If the design structure is strong, changes should be fairly small.

Basic Design Development Process:

1. research the topic, take notes, ask questions, doodle, jot down ideas, simmer 

2. series of thumbnail sketches
This is an extension of step 1. Do as many as you can muster…do it until you are sick of it. Here is a great presentation I recently found on sketching.

3. build wireframe
Stay abstract/block in composition. This is going to be larger than a thumbnail but try to keep it free from detail.

4. sketch comps
Take steps 2 and 3 and flesh out 3 comps. These should not be final but should follow specs and be close to finished in terms of look and feel for the major design components. You may use lorum ipsum text if you wish. This technique helps to keep people from giving feedback about the content over the design. Of course there are times the content may absolutely need to be there but use your own discretion and know that this is an option and may help in moving forward.

5. finalize comps
Usually 3 choices are offered to a client, but if you are your own client obviously just do your favorite.
All of this is separate from any CSS, html, javascript, etc. Mock it up using Photoshop and/or Illustrator (or a similar program of your choice). The point is to focus on the design apart from laying down code. “Form Follows Function” really rings true. It isn’t an either/or statement. The product must work first and foremost and the design will support, enhance, and make it work better. If it doesn’t work, no amount of gorgeous design will change something that is badly broken.

TaDa, right?
The design is done, let’s celebrate!

Well, not exactly. This process is merely just one phase of a much larger process that includes steps including: initially meeting the client, negotiating a contract, presenting your designs, more testing and usability, iterative design adjustments, possibly working with developers or print houses, etc. Design is a process that requires study, skills, schooling, and knowledge like many fields. I’ll be talking about more design topics in the future, so what is not covered here I’ll try and cover next time. Luckily, I gathered some great…

Design resources to get you started:

This is not a comprehensive list by any means but highlights of a few resources to get you thinking about design.

  • Non-Designer’s Design Book: One of the best beginner design books out there (overlook the cover- it really is a great book!).
  • Smashing Magazine: Really good stuff on this website- including freebies, like decent icons and vector artwork. Covers typography, color, graphic design, etc.
  • a list apart: another great site that delves into all kinds of topics but has great stuff on graphic design, UI design, typography, illustrations. etc.
  • Fast Company Design: relevant design articles and examples from industry.
  • IDEO: design thinking, great high level design examples- check out their portfolio in selected works.
  • Thinking With Type: title says it all- learn about the fine art and science of typefaces. You will never look at design and type the same way again.
  • Stop Stealing Sheep and Find Out How Type Works: another must on typography
  • Drawing on the Right Side of the Brain: seriously. even if you think you can’t draw. try it. anyone can draw, truly. Drawing helps you think in new and creative ways- it will help you be more creative and help in problem solving anything. Even those small doodles are valuable.

Pick. your. favorite. see above. do it.

Enjoy and thanks again!