Workflow Automation in Technical Services: Part 2

Note: This is part two of a two part series on workflow automation in Technical Services. Part one covered the what and process of workflow automation and an example of an item level workflow automation process. Part two will discuss batch level workflow automation and resources/tools for workflow automation.

Last time, we discussed the basics of workflow automation and some examples of item-level automation in cataloging and acquisitions workflows. Automating workflows on an item-to-item basis provides greater consistency and efficiency in daily tasks done by staff, allowing them to spend more time on more complex workflows and tasks that may not be so readily automated. Item level workflow automation can be a low barrier investment in creating a more efficient operation.

Then you have the electronic journals, ebooks, and databases. You have large record files that are tied to physical resources – for example, record downloads from WorldCat Cataloging Partners. And then there are all those records in the system – MARC, XML, whatnot – that have missing or incorrect information (the infamous “dirty data”). Why can’t we just stick with item-level processing for everything?

Item level automation or batch automation?

For item level automation, you have a very granular level of control over the process, dealing with items one at a time. If the items are very similar in nature or have only a couple differences in how each item will be processed, though, then going through each item individually probably doesn’t make a lot of sense. On the other hand, batch processing allows you to go through many items at once, which makes adding or maintaining resources a quicker job than going through item by item. You do give up a certain level of control over details with batch processing, however, which leaves you to decide where the “good enough” marker should go in terms of data quality.

Overall, you want to avoid sub-optimizing your workflow. Sub-optimization happens when a part of an organization focuses the success of its own area instead of the entire organization’s success [1]. Going through each resource record individually might give you the greatest control over the record, but if you’re going through a file containing 10,000+ records individually, even with an item level automated workflow, the turnaround time for creating access for all those resources will be much higher than if the file was processed at once. However, with the right tools, you can deal with record batches with speed and a good level of control over the data.

MarcEdit is your friend

Many people have at least heard about MarcEdit, or have colleagues who have used it extensively. MarcEdit is a freely available program (for Windows) created by Terry Reese that works with MARC records in a variety of ways. You can add, delete, or modify fields in records, create MARC records from data in spreadsheets, crosswalk to and from the MARC format, split files, join files, generate call numbers, de-duplicate records – and that’s only part of what you can do with MarcEdit. Also, if you find yourself going through the same batch workflow for the same files on a regular basis, MarcEdit’s Script Wizard helps with automating routine batch processing workflows.

Example: Missing 041 1_ subfield h, or, this item is a translation, not in two languages!

Many of you may have moved your older library catalogs to a newer discovery layer; I’ve survived one move at my previous place of work and will probably have another move under my belt soon. One consequence of moving to a new discovery layer is that data previously ignored by the previous layer sticks out like a sore thumb in the new layer. This example is one of those dirty data discoveries: a particular MARC variable field incorrectly indicated that an item is in two or more languages instead of a translation. Not only you have unhappy library users who thought you had a copy of The Little Prince in both French and English, but this error exists in a few thousand records, finding yourself with a potentially resource intensive cleanup project.

If you can isolate and export those records in one (or a couple of) files from your database, then you can use MarcEdit to clean up the field in a relatively short time. Open the file in MarcEdit’s MarcEditor, and make your way to the “Edit Subfield” under the tools menu. Let’s say that there are a lot of records that have engfre in the 041 field and you want to change all the records with that entry at once. Replace the engfre field data with eng$hfre and you’ve taken care of all those records in one pass.

Since you probably have more than engfre in your file, you can use regular expressions in MarcEdit to change multiple fields at once regardless of language code. Using the Find/Replace tool, search for the 041 field subfield a, but this time add your regular expression and mark the “Use regular expression” box. The following expression is assuming that the 041 field has two language codes that are three letters in length, so you will have to do a little cleanup after running this replace command to catch the three or more language codes as well as two letter language codes. (h/t to zemkat for the regular expression!)

Libraries and modules and packages, oh my!

What if you’ve been learning some code, or are looking for an excuse to learn? You’re in luck! Some of the common programming languages have tools to deal with MARC data. Rolling your own batch automation scripts and applications allows you the most flexibility in working with other library data formats as well. However, if you haven’t programmed before, choose smaller projects to start. In addition, if the script or application doesn’t work, you’re your own tech support.

Example: Creating order records for patron driven acquisition (PDA) items triggered for purchase

Patron driven acquisition usually involves the ingestion of several hundred to thousands of records into the local database for items that are not technically owned by the library at that point in time. Depending on the PDA vendor one uses, the item is triggered for purchase after it reaches a use threshold (for example, 10 page views). The library will receive an invoice with these purchases, but we will still need to create order records in the system to show that these items have been bought. Considering that on a given week,  the number of purchases can range from single digits to higher double digits, that’s a lot of order records to manually key in.

After dabbling with pymarc at code4lib 2010, I thought this would be a good project to learn more about pymarc and python overall. Here is an outline of the script actions:

  1. In the trigger report spreadsheet, extract the local control numbers for the items triggered for purchase.
  2. Execute a SQL query against the local database for our locally developed next generation catalog, matching the local control number and extracting the MARC records from database.
  3. In each MARC record:
  • add a 590 and 790 field for donor/fund information
  • add a 949 field containing bibliographic record overlay and the order record creation information for the system, including cost of the item extracted from the spreadsheet.
  • change the 947 field data to indicate that the item has been purchased (for statistical reporting later on)
  1. Write the MARC records to a file for import into the ILS.

The output file is then uploaded into the ILS manually, which gives staff the chance to address any issues with the records that the system might have before import. Overall, the process from downloading the trigger report spreadsheet to uploading the record file into the ILS takes a few minutes, depending on the size of the file.

Which automation tools and resources to use?

There are a multitude of other automation tools and resources that cannot be fully covered in two blog posts. Your mileage may vary with these tools; you might find Macro Express to be a better fit for your organization than AutoIt, or you find that working with ruby-marc is easier for you than MarcEdit (resource links listed below). The best way to figure out what’s right for you is to play around with various tools and get a feel for them. More often than not, you’ll end up using multiple tools for different levels and types of workflow automation.

Don’t forget about the built-in tools in existing applications as well! Sometimes the best tools for the job are already there for you to take advantage of them.

For your convenience, here are the tools mentioned in the two blog posts, including a few others:


[1] http://dictionary.cambridge.org/dictionary/business-english/sub-optimization

Workflow Automation in Technical Services: Part 1

Note: This is part one of a two part series on workflow automation in Technical Services. Part one will cover the what and process of workflow automation and an example of an item level workflow automation process. Part two will discuss batch level workflow automation and resources/tools for workflow automation.

The mysterious door at the library

Door leading into Technical Services
Photo by author

A majority of you might have passed by this door many times in your library lives. Sometimes it isn’t even a door; maybe a room divider, or an invisible line that runs across the room. In any case, you may have ventured into the space called “Technical Services” (or a similar name), but do you know what goes on there? For most libraries, Technical Services staff acquire, create, and maintain access to library materials, spanning from books and a box of rocks to various electronic databases and digitized local collections. Without them, it would be hard for a library to serve its users: no physical items to borrow, no electronic journals to search for articles, and no metadata in the library discovery layer for users and staff to search for those resources. With the variety of items come a variety of workflows to process those items, many of which are repeated at various intervals: some once a week while others repeated multiple times a day. Staff time and resources are spoken for every time a workflow is repeated. Every time a workflow is manually repeated, less time and resources can be spent on other projects or on new projects that would add value to existing collections or add new collections for library users to use. Technology provides a variety of strategies for workflow automation that reduce time spent on repetitive workflows.

What is workflow automation?

The oversimplified answer to this question is that workflow automation is the process where you have the computer do the things that it can be programmed to do, thereby reducing repetitive manual actions by the staff member.

There are two types of automation to consider when you look at your workflows:

  1. Data Entry: This type of automation is fairly straight forward, and you’ve probably already done this type of automation already without realizing it. For example, the automation script completes a form with data that remains the same for each form or types out standard text in an email being sent to a vendor. Useful for automating repetitive keystrokes, be it system codes, text, or even creating new documents in certain applications, such as an item recor. The automation script is hard-coded, meaning that the output of that script will be the same every time you run it.
  2. Decision Making: This type of automation makes all the decisions for you! Okay, while it won’t make every decision for you, several automation languages and programs can handle fairly complex decision making flowcharts using standard conditionals. For example, if bibliographic record “A” has field “B”, then do action ”C”; else do action “D”. As you probably already guessed, this type of automation resembles coding to a certain extent. The automation script that is designed to deal with several possible outcomes is not hard-coded like the data entry script described above.

What can be automated?

Most Technical Services departments acquire, create, and maintain access to a variety of different formats, from physical to electronic formats. Traditionally, workflows focus on the individual item going through the department and its various teams: acquisitions, cataloging, and processing, for example. With the changeover to electronic formats, workflows are going more towards a batch approach, processing and/or cataloging multiple items (for example, a collection of ebooks) at once.

In addition to adding materials to library collections, a library’s Technical Services staff do a fair amount of database maintenance for the library’s ILS (Integrated Library System). The term “dirty data” is thrown around the TS departments, covering database projects dealing with misspellings, outdated codes, or incorrect codes – anything that could inhibit a library user’s access to the resource.

Why should I automate my workflows?

  • Better quality control of workflow and data. Any time you let a human near a workflow, errors can be introduced into a workflow: incorrect codes, mistyped text, or mishandled items. Having an automated workflow cuts down on the workflow’s fail points and allow for better overall consistency and accuracy.
  • Save staff time.  You and your staff spend a good amount of time with repetitive keystrokes and decisions. Even small repetitive actions add up during the work day, resulting in hours of valuable staff time and resources. By automating the repetitive actions, you free up staff time to work on more complex workflows which are not as easily automated.

How do you decide what workflows to automate?

  • Flowchart your workflow.  A simple flowchart from the beginning of the workflow to the end might reveal several places where current manual decision making can be relegated to a script. If a person is currently looking for a code in the order record to figure out what location code they should enter in the item record, the script could be set to do the same.
  • What are the patterns? In each step, what data remains constant throughout all items? What codes, phrases, or fields do you insert every time you go through the workflow? Is there a pattern of going from one application to another at the same point in every workflow? One record to another?
  • How will the script access the data? Working with a file of MARC records will be different than working with a bibliographic record that is open in your ILS. Having a file of data is easier, but if you’re automating an item-level workflow, you will be dealing with windows that you have to work with. Getting data from a window can be tricky; sometimes you are able to access the data directly, and other times you will have to scrape the screen to get to the data that you want to work on with the script.

Example: Receipt Cataloging

At my former place of work, Technical Services had three levels of cataloging: receipt cataloging, copy cataloging, and original cataloging. All monographs would go through the receipt cataloging process, with items being bumped to the two higher levels of cataloging. The majority of items that go through receipt cataloging, having met a list of 40+ criteria, are fast-tracked to physical processing, shortening the time between the item arriving at the library to being placed on the shelf, which is the overreaching goal of receipt cataloging. The criteria range from determining if the record is DLC (Library of Congress) to determining if the 008, 050, and 260 ‡c dates match in the bibliographic record (if not a conference publication).

Given that the criteria and the decision making flowchart are fairly standard and straightforward, this workflow was built with automation in mind. My predecessor used Macro Express (ME) for the first version of the receipt cataloging macros. When we got to the point where we were bumping up against ME’s limits, I migrated the macros to AutoIt, where I was able to include many more quality control checks on the bibliographic and item records.

Below is a screencast where I walk through the receipt cataloging process. If I wasn’t explaining what was happening, the whole process would have taken a minute and 10 seconds to complete, a couple of seconds more if the item was bumped to another team in the department. Compared to a five minute turnaround time if our staff manually checked every criteria, the macros allows the department to go through more items during the day with better quality control.

Bonus Example: Ordering from GOBI

Another workflow at my former place of work involved ordering monographs from GOBI. The workflow, unlike receipt cataloging, have a lot more complex decision making flowchart and more exceptions. While I could not automate on the level of receipt cataloging, there were still patterns and routines that I could automate, such as searching the library catalog with information supplied by GOBI, and determining which codes to enter in the 949 field in the OCLC record (for exporting into our database).

Below is a screencast that shows a part of the notification ordering automation script set.

Preview for Part 2

In this post, I covered more of the item level workflow automation possibilities. More of Technical Services workflows, however, are changing towards dealing with many items at once. In part 2, I will discuss some examples of batch process automation and several tools (including those mentioned in this post) that can assist in making life easier in Technical Services.