Note: This is part two of a two part series on workflow automation in Technical Services. Part one covered the what and process of workflow automation and an example of an item level workflow automation process. Part two will discuss batch level workflow automation and resources/tools for workflow automation.
Last time, we discussed the basics of workflow automation and some examples of item-level automation in cataloging and acquisitions workflows. Automating workflows on an item-to-item basis provides greater consistency and efficiency in daily tasks done by staff, allowing them to spend more time on more complex workflows and tasks that may not be so readily automated. Item level workflow automation can be a low barrier investment in creating a more efficient operation.
Then you have the electronic journals, ebooks, and databases. You have large record files that are tied to physical resources – for example, record downloads from WorldCat Cataloging Partners. And then there are all those records in the system – MARC, XML, whatnot – that have missing or incorrect information (the infamous “dirty data”). Why can’t we just stick with item-level processing for everything?
Item level automation or batch automation?
For item level automation, you have a very granular level of control over the process, dealing with items one at a time. If the items are very similar in nature or have only a couple differences in how each item will be processed, though, then going through each item individually probably doesn’t make a lot of sense. On the other hand, batch processing allows you to go through many items at once, which makes adding or maintaining resources a quicker job than going through item by item. You do give up a certain level of control over details with batch processing, however, which leaves you to decide where the “good enough” marker should go in terms of data quality.
Overall, you want to avoid sub-optimizing your workflow. Sub-optimization happens when a part of an organization focuses the success of its own area instead of the entire organization’s success . Going through each resource record individually might give you the greatest control over the record, but if you’re going through a file containing 10,000+ records individually, even with an item level automated workflow, the turnaround time for creating access for all those resources will be much higher than if the file was processed at once. However, with the right tools, you can deal with record batches with speed and a good level of control over the data.
MarcEdit is your friend
Many people have at least heard about MarcEdit, or have colleagues who have used it extensively. MarcEdit is a freely available program (for Windows) created by Terry Reese that works with MARC records in a variety of ways. You can add, delete, or modify fields in records, create MARC records from data in spreadsheets, crosswalk to and from the MARC format, split files, join files, generate call numbers, de-duplicate records – and that’s only part of what you can do with MarcEdit. Also, if you find yourself going through the same batch workflow for the same files on a regular basis, MarcEdit’s Script Wizard helps with automating routine batch processing workflows.
Example: Missing 041 1_ subfield h, or, this item is a translation, not in two languages!
Many of you may have moved your older library catalogs to a newer discovery layer; I’ve survived one move at my previous place of work and will probably have another move under my belt soon. One consequence of moving to a new discovery layer is that data previously ignored by the previous layer sticks out like a sore thumb in the new layer. This example is one of those dirty data discoveries: a particular MARC variable field incorrectly indicated that an item is in two or more languages instead of a translation. Not only you have unhappy library users who thought you had a copy of The Little Prince in both French and English, but this error exists in a few thousand records, finding yourself with a potentially resource intensive cleanup project.
If you can isolate and export those records in one (or a couple of) files from your database, then you can use MarcEdit to clean up the field in a relatively short time. Open the file in MarcEdit’s MarcEditor, and make your way to the “Edit Subfield” under the tools menu. Let’s say that there are a lot of records that have engfre in the 041 field and you want to change all the records with that entry at once. Replace the engfre field data with eng$hfre and you’ve taken care of all those records in one pass.
Since you probably have more than engfre in your file, you can use regular expressions in MarcEdit to change multiple fields at once regardless of language code. Using the Find/Replace tool, search for the 041 field subfield a, but this time add your regular expression and mark the “Use regular expression” box. The following expression is assuming that the 041 field has two language codes that are three letters in length, so you will have to do a little cleanup after running this replace command to catch the three or more language codes as well as two letter language codes. (h/t to zemkat for the regular expression!)
Libraries and modules and packages, oh my!
What if you’ve been learning some code, or are looking for an excuse to learn? You’re in luck! Some of the common programming languages have tools to deal with MARC data. Rolling your own batch automation scripts and applications allows you the most flexibility in working with other library data formats as well. However, if you haven’t programmed before, choose smaller projects to start. In addition, if the script or application doesn’t work, you’re your own tech support.
Example: Creating order records for patron driven acquisition (PDA) items triggered for purchase
Patron driven acquisition usually involves the ingestion of several hundred to thousands of records into the local database for items that are not technically owned by the library at that point in time. Depending on the PDA vendor one uses, the item is triggered for purchase after it reaches a use threshold (for example, 10 page views). The library will receive an invoice with these purchases, but we will still need to create order records in the system to show that these items have been bought. Considering that on a given week, the number of purchases can range from single digits to higher double digits, that’s a lot of order records to manually key in.
After dabbling with pymarc at code4lib 2010, I thought this would be a good project to learn more about pymarc and python overall. Here is an outline of the script actions:
- In the trigger report spreadsheet, extract the local control numbers for the items triggered for purchase.
- Execute a SQL query against the local database for our locally developed next generation catalog, matching the local control number and extracting the MARC records from database.
- In each MARC record:
- add a 590 and 790 field for donor/fund information
- add a 949 field containing bibliographic record overlay and the order record creation information for the system, including cost of the item extracted from the spreadsheet.
- change the 947 field data to indicate that the item has been purchased (for statistical reporting later on)
- Write the MARC records to a file for import into the ILS.
The output file is then uploaded into the ILS manually, which gives staff the chance to address any issues with the records that the system might have before import. Overall, the process from downloading the trigger report spreadsheet to uploading the record file into the ILS takes a few minutes, depending on the size of the file.
Which automation tools and resources to use?
There are a multitude of other automation tools and resources that cannot be fully covered in two blog posts. Your mileage may vary with these tools; you might find Macro Express to be a better fit for your organization than AutoIt, or you find that working with ruby-marc is easier for you than MarcEdit (resource links listed below). The best way to figure out what’s right for you is to play around with various tools and get a feel for them. More often than not, you’ll end up using multiple tools for different levels and types of workflow automation.
Don’t forget about the built-in tools in existing applications as well! Sometimes the best tools for the job are already there for you to take advantage of them.
For your convenience, here are the tools mentioned in the two blog posts, including a few others:
- AutoHotKey is a similar and popular automation application
- Macro Express
- Keyboard Express is a slimmed down version of ME by the same company
- Macro Scheduler
- The Working with MARC page on the Code4Lib wiki is a good place to start when looking for ways in dealing with MARC data in various languages: http://wiki.code4lib.org/index.php/Working_with_MaRC