This is How I Work (Bryan J. Brown)

Editor’s Note: This post is part of ACRL TechConnect’s series by our regular and guest authors about The Setup of our work.

 

After being tagged by Eric Phetteplace, I was pleased to discover that I had been invited to take part in the “This is How I Work” series. I love seeing how other people view work and office life, so I’m happy to see this trend make it to the library world.

Name: Bryan J. Brown (@bryjbrown)

Location: Tallahassee, Florida, United States

Current Gig: Web Developer, Technology and Digital Scholarship, Florida State University Libraries

Current Mobile Device: Samsung Galaxy Note 3 w/ OtterBox Defender cover (just like Becky Yoose!). It’s too big to fit into my pants pocket comfortably, but I love it so much. I don’t really like tablets, so having a gigantic phone is a nice middle ground.

Current Computer: 15 inch MacBook Pro w/ 8GB of RAM. I’m a Linux person at heart, but when work offers you a free MBP you don’t turn it down. I also use a thunderbolt monitor in my office for dual-screen action.

Current Tablet: 3rd gen. iPad, but I don’t use it much these days. I bought it for reading books, but I strongly prefer to read them on my phone or laptop instead. The iPad just feels huge and awkward to hold.

One word that best describes how you work: Structured. I do my best when I stay within the confines of a strict system and/or routine that I’ve created for myself, it helps me keep the chaos of the universe at bay.

What apps/software/tools can’t you live without?

Unixy stuff:

  • Bash: I’ve tried a few other shells (tcsh, zsh, fish), but none have inspired me to switch.
  • Vim: I use this for everything, even journal entries and grocery lists. I have *some* customizations, but it’s pretty much stock (except I love my snippets plugin).
  • tmux: Like GNU Screen, but better.
  • Vagrant: The idea of throwaway virtual machines has changed the way I approach development. I do all my work inside Vagrant machines now. When I eventually fudge things, I can just run ‘vagrant destroy’ and pretend it never happened!
  • Git: Another game changer. I shouldn’t have waited so long to learn about version control. Git has saved my bacon countless times.
  • Anaconda: I’m a Python fan, but I like Python 3 and the scientific packages. Most systems only have Python 2, and a lot of the scientific packages fail to build for obscure reasons. Anaconda takes care of all that nonsense and allows you to have the best, most current Python goodness on any platform. I find it very comforting to know that I can use my favorite language and packages everywhere no matter what.
  • Todo.txt-CLI: A command line interface to the Todo.txt system, which I am madly in love with. If you set it to save your list to Dropbox, you can manage it from other devices, too. My work life revolves around my to-do list which I mostly manage at my laptop with Todo.txt-CLI.

Other:

  • Dropbox: Keeping my stuff in order across machines is a godsend. All my most important files are kept in Dropbox so I can always get to them, and being able to put things in a public folder and share the URL is just awesome.
  • Google Drive: I prefer Dropbox better for plain storage, but the ability to write documents/spreadsheets/drawings/surveys at will, store them in the cloud, share them with coworkers and have them write along with you is too cool. I can’t imagine working in a pre-Drive world.
  • Trello: I only recently discovered Trello, but now I use it for everything at work. It’s the best thing for keeping a group of people on track with a large project, and moving cards around is strangely satisfying. Also you can put rocket stickers on cards.
  • Quicksilver for Mac: I love keyboard shortcuts. A lot. Quicksilver is a Mac app for setting up keyboard shortcuts for everything. All my favorite apps have hotkeys now.
  • Todo.txt for Android: A nice mobile interface for the Todo.txt system. One of the few apps I’ve paid money for, but I don’t regret it.
  • Plain.txt for Android: This one is kind of hard to explain until you use it. It’s a mobile text editor for taking notes that get saved in Dropbox, which is useful in more ways than you can imagine. Plain.txt is my mobile interface to the treasure trove of notes I usually write in Vim on my laptop. I keep everything from meeting notes to recipes (as well as the previously mentioned grocery lists and journal entries) in it. Second only to Todo.txt in helping me stay sane.

What’s your workspace like?

My office is one of my favorite places. A door I can shut, a big whiteboard and lots of books and snacks. Who could ask for more? I’m trying out the whole “standing desk” thing, and slowly getting used to it (but it *does* take some getting used to). My desk is multi-level (it came from a media lab that no longer exists where it held all kinds of video editing equipment), so I have my laptop on a stand and my second monitor on the level above it so that I can comfortably look slightly down to see the laptop or slightly up to see the big display.

20141204_105656

What’s your best time-saving trick?

Break big, scary, complicated tasks into smaller ones that are easy to do. It makes it easier to get started and stay on track, which almost always results in getting the big scary thing done way faster than you thought you would.

What’s your favorite to-do list manager?

I am religious about my use of Todo.txt, whether from the command line or with my phone. It’s my mental anchor, and I am obsessive about keeping it clean and not letting things linger for too long. I prioritize things as A (get done today), B (get done this week), C (get done soon), and D (no deadline).

I’m getting into Scrum lately, so my current workflow is to make a list of everything I want to finish this week (my sprint) and mark them as B priority (my sprint backlog, either moving C tasks to B or adding new ones in manually). Then, each morning I pick out the things from the B list that I want to get done today and I move them to A. If some of the A things are complicated I break them into smaller chunks. I then race myself to see if I can get them all done before the end of the day. It turns boring day-to-day stuff into a game, and if I win I let myself have a big bowl of ice cream.

Besides your phone and computer, what gadget can’t you live without?

Probably a nice, comfy pair of over-the-ear headphones. I hate earbuds, they sound thin and let in all the noise around you. I need something that totally covers my ears to block the outside world and put me in a sonic vacuum.

What everyday thing are you better at than everyone else?

I guess I’m pretty good at the whole “Inbox Zero” thing. I check my email once in the morning and delete/reply/move everything accordingly until there’s nothing left, which usually takes around 15 minutes. Once you get into the habit it’s easy to stay on top.

What are you currently reading?

  • The Information by James Gleick. I’m reading if for Club Bibli/o, a library technology bookclub. We just started, so you can still join if you like!
  • Pro Drupal 7 Development by Todd Tomlinson and John K. VanDyk. FSU Libraries is a Drupal shop, so this is my bread and butter. Or at least it will be once I get over the insane learning curve.
  • Buddhism Plain and Simple by Steve Hagen. The name says it all, Steve Hagen is great at presenting the core parts of Buddhism that actually help you deal with things without all the one hand clapping nonsense.

What do you listen to while you work?

Classic ambient artists like Brian Eno and Harold Budd are great when I’m in a peaceful, relaxed place, and I’ll listen to classical/jazz if I’m feeling creative. Most of the time though it’s metal, which is great for decimating to-do lists. If I really need to focus on something, any kind of music can be distracting so I just play static from simplynoise.com. This blocks all the sound outside my office and puts me in the zone.

Are you more of an introvert or an extrovert?

Introvert for sure. I can be sociable when I need to, but my office is my sanctuary. I really value having a place where I can shut the door and recharge my social batteries.

What’s your sleep routine like?

I’ve been an early bird by necessity since grad school, the morning is the best time to get things done. I usually wake up around 4:30am so I can hit the gym when it opens at 5am (I love having the whole place to myself). I start getting tired around 8pm, so I’m usually fast asleep by 10pm.

Fill in the blank: I’d love to see _________ answer these same questions.

Richard Stallman. I bet he’d have some fun answers.

What’s the best advice you’ve ever received?

Do your best. As simple as it sounds, it’s a surprisingly powerful statement. Obviously you can’t do *better* than your best, and if you try your best and fail then there’s nothing to regret. If you just do the best job you can at any given moment you’ll have the best life you can. There’s lots of philosophical loopholes buried that perspective, but it’s worked for me so far.


How I Work (Margaret Heller)

Editor’s Note: This post is part of ACRL TechConnect’s series by our regular and guest authors about The Setup of our work.

 

Margaret Heller, @margaret_heller

Location: Chicago, IL

Current Gig: Digital Services Librarian, Loyola University Chicago

Current Mobile Device: iPhone 5s. It look me years and years of thinking to finally buy a smart phone, and I did it mainly because my iPod Touch and slightly smart phone were both dying so it could replace both.

Current Computer:

Work: Standard issue Dell running Windows 7, with two monitors.

Home: Home built running Windows 7, in need of an upgrade that I will get around to someday.

Current Tablet: iPad 3, which I use constantly. One useful tip is that I have the Adobe Connect, GoToMeeting, Google Hangout, and Lync apps which really help with participating in video calls and webinars from anywhere.

One word that best describes how you work: Tenaciously

What apps/software/tools can’t you live without?

Outlook and Lync are my main methods of communicating with other library staff. I love working at a place where IMing people is the norm. I use these both on desktop and on my phone and tablet. I love that a recent upgrade means that we can listen to voice mails in our email.

Firefox is my normal work web browser. I tend to use Chrome at home. The main reason for the difference is synced bookmarks. I have moved my bookmarks between browsers so many times that I have some of the original sites I bookmarked when I first used Netscape in the late 90s. Needless to say, very few of the sites still exist, but it reminds me of old hobbies and interests. I also don’t need the login to stream shows from my DVR at in my bookmark toolbar at work.

Evernote I use for taking meeting notes, conference notes, recipes, etc. I usually have it open all day at work.

Notepad++ is where I do most of my code writing.

OpenRefine is my favored tool for bulk editing metadata, closely aligned with Excel since I need Excel to get data into our institutional repository.

Filezilla is my favored FTP client.

WriteMonkey is the distraction free writing environment I use on my desktop computer (and how I am writing this post). I use Editorial on my iPad.

Spotify and iTunes for music and podcasts.

RescueTime for staying on track with work–I get an email every Sunday night so I can swear off social media for the next week. (It lasts about a day).

FocusBooster makes a great Pomodoro timer.

Zotero is my constant lifesaver when I can’t remember how to cite something, and the only way I stay on track with writing posts for ACRL TechConnect.

Feedly is my RSS reader, and most of the time I stay on top of it.

Instapaper is key to actually reading rather than skimming articles, though of course I am always behind on it.

Box (and Box Sync) is our institutional cloud file storage service, and I use it extensively for all my collaborative projects.

Asana is how we keep track of ongoing projects in the department, and I use it for prioritizing personal projects as well.

What’s your workspace like? :A large room in the basement with two people full time, and assorted student workers working on the scanner. We have pieces of computers sitting around, though moved out an old server rack that was taking up space. (Servers are no longer located in the library but in the campus data centers). My favorite feature is the white board wall behind my desk, which provides enough space to sketch out ideas in progress.

I have a few personal items in the office: a tea towel from the Bodleian Library in Oxford, a reproduction of an antique map of France, Belgium, & Holland, a photo of a fiddlehead fern opening, and small stone frogs to rearrange while I am talking on the phone. I also have a photo of my baby looking at a book, though he’s so much bigger now I need to add additional photos of him. My desk has in tray, out tray, and a book cart shaped business card holder I got at a long ago ALA conference. I am a big proponent of a clean desk, though the later in the semester it gets the more likely I am to have extra papers, but it’s important to my focus to have an empty desk.

There’s usually a lot going on in here and no natural light, so I go outside to work in the summer, or sometimes just to another floor in the building to enjoy the lake view and think through problems.

What’s your best time-saving trick?: Document and schedule routine tasks so I don’t forget steps or when to take care of them. I also have a lot of rules and shortcuts set up in my email so I can process email very quickly and not work out of my inbox. Learn the keyboard shortcuts! I can mainly get through Gmail without touching the mouse and it’s great.

What’s your favorite to-do list manager?: Remember the Milk is how I manage tasks. I’ve been using it for years for Getting Things Done. I pay for it, and so currently have access to the new version which is amazing, but I am sworn to secrecy about its appearance or features. I have a Google Doc presentation I use for Getting Things Done weekly reviews, but just started using an Asana project to track all my ongoing projects in one place without overwhelming Remember the Milk or the Google Doc. It tells me I currently have 74 projects. A few more have come in that I haven’t added yet either.

Besides your phone and computer, what gadget can’t you live without?: For a few more weeks, my breast pump, which I am not crazy about, but it makes the hard choices of parenting a little bit easier. I used to not be able to live without my Nook until I cut my commute from an hour on the train to a 20 minute walk, so now I need earbuds for the walk. I am partial to Pilot G2 pens, which I use all the time for writing ideas on scrap paper.

What everyday thing are you better at than everyone else?: Keeping my senses of humor and perspective available for problem solving.

What are you currently reading?: How to be a Victorian by Ruth Goodman (among other things). So far I have learned how Victorians washed themselves, and it makes me grateful for central heating.

What do you listen to while you work?: Podcasts (Roderick on the Line is required listening), mainly when I am doing work that doesn’t require a lot of focus. I listen mostly to full albums on Spotify (I have a paid account), though occasionally will try a playlist if I can’t decide what to listen to. But I much prefer complete albums, and try to stay on top of new releases as well as old favorites.

Are you more of an introvert or an extrovert?: A shy extrovert, though I think I should be an introvert based on the popular perception. I do genuinely like seeing other people, and get restless if I am alone for too long.

What’s your sleep routine like?: I try hard to get in bed at 9:30, but by 10 at the latest. Or ok, maybe 10:15. Awake at 6 or whenever the baby wakes up. (He mostly sleeps through the night, but sometimes I am up with him at 4 until he falls asleep again). I do love sleeping though, so chances to sleep in are always welcome.

Fill in the blank: I’d love to see _________ answer these same questions. Occasional guest author Andromeda Yelton.

What’s the best advice you’ve ever received?: You are only asked to be yourself. Figure out how you can best help the world, and work towards that rather than comparing yourself to others. People can adjust to nearly any circumstance, so don’t be afraid to try new things.


This Is How I Work (Nadaleen Tempelman-Kluit)

Editor’s Note: This post is part of ACRL TechConnect’s series by our regular and guest authors about The Setup of our work.

 

Nadaleen Tempelman-Kluit @nadaleen

Location: New York, NY

Current Gig: Head, User Experience (UX), New York University Libraries

Current Mobile Device: iPhone 6

Current Computer:

Work: Macbook pro 13’ and Apple 27 inch Thunderbolt display

Old dell PC that I use solely to print and to access our networked resources

Home:

I carry my laptop to and from work with me and have an old MacBook Pro at home.

Current Tablet: First generation iPad, supplied by work

One word that best describes how you work: has anyone said frenetic yet?

What apps/software/tools can’t you live without?

Communication / Workflow

Slack is the UX Dept. communication tool in which all our communication takes place, including instant messaging, etc. We create topic channels in which we add links and tools and thoughts, and get notified when people add items. We rarely use email for internal communication.

Boomeranggmail-I write a lot of emails early in the morning so can schedule them to be sent at different times of the day without forgetting.

Pivotal Tracker-is a user story-based project planning tool based on agile software development methods. We start with user flows then integrate them into bite size user stories in Pivotal, and then point them for development

Google Drive

Gmail

Google Hangouts-We work closely with our Abu Dhabi and Shanghai campus libraries, so we do a lot of early morning and late night meetings using Google Hangouts (or GoToMeeting, below) to include everyone.

Wireframing, IA, Mockups

Sketch: A great lightweight design app

OmniGraffle: A more heavy duty tool for wire framing, IA work, mockups, etc. Compatible with a ton of stencil libraries, including he great Knoigi (LINK) and Google material design icons). Great for interactive interface demos, and for user flows and personas (link)

Adobe Creative Cloud

Post It notes, Graph paper, White Board, Dry-Erase markers, Sharpies, Flip boards

Tools for User Centered Testing / Methods 

GoToMeeting- to broadcast formal usability testing to observers in another room, so they can take notes and view the testing in real time and ask virtual follow up questions for the facilitator to ask participants.

Crazy Egg-a heat mapping hot spotting A/B testing tool which, when coupled with analytics, really helps us get a picture of where users are going on our site.

Silverback- Screen capturing usability testing software app.

PostitPlus – We do a lot of affinity grouping exercises and interface sketches using post it notes,  so this app is super cool and handy.

OptimalSort-Online card sorting software.

Personas-To think through our user flows when thinking through a process, service, or interface. We then use these personas to create more granular user stories in Pivotal Tracker (above).

What’s your workspace like?

I’m on the mezzanine of Bobst Library which is right across from Washington Square Park. I have a pretty big office with a window overlooking the walkway between Bobst and the Stern School of Business.

I have a huge old subway map on one wall with an original heavy wood frame, and everyone likes looking at old subway lines, etc. I also have a map sheet of the mountain I’m named after. Otherwise, it’s all white board and I’ve added our personas to the wall as well so I can think through user stories by quickly scanning and selecting a relevant persona.

I’m in an area where many of my colleagues mailboxes are, so people stop by a lot. I close my door when I need to concentrate, and on Fridays we try to work collaboratively in a basement conference room with a huge whiteboard.

I have a heavy wooden L shaped desk which I am trying to replace with a standing desk.

Every morning I go to Oren’s, a great coffee shop nearby, with the same colleague and friend, and we usually do “loops” around Washington Square Park to problem solve and give work advice. It’s a great way to start the day.

What’s your best time saving trick

Informal (but not happenstance) communication saves so much time in the long run and helps alleviate potential issues that can arise when people aren’t communicating. Though it takes a few minutes, I try to touch base with people regularly.

What’s your favorite to do list manager

My whiteboard, supplemented by stickies (mac), and my huge flip chart notepad with my wish list on it. Completed items get transferred to a “leaderboard.”

Besides your phone and computer, what gadget can’t you live without?

Headphones

What everyday thing are you better at than everyone else?

I don’t think I do things better than other people, but I think my everyday strengths include:  encouraging and mentoring, thinking up ideas and potential solutions, getting excited about other people’s ideas, trying to come to issues creatively, and dusting myself off.

What are you currently reading?

I listen to audiobooks and podcasts on my bike commute. Among my favorites:

In print, I’m currently reading:

What do you listen to while at work?

Classical is the only type of music I can play while working and still be able to (mostly) concentrate. So I listen to the masters, like Bach, Mozart and Tchaikovsky

When we work collaboratively on creative things that don’t require earnest concentration I defer to one of the team to pick the playlist. Otherwise, I’d always pick Josh Ritter.

Are you more of an introvert or an extrovert?

Mostly an introvert who fakes being an extrovert at work but as other authors have said (Eric, Nicholas) it’s very dependent on the situation and the company.

What’s your sleep routine like?

Early to bed, early to rise. I get up between 5-6 and go to bed between around 10.

Fill in the blank: I’d love to see _________ answer these same questions.

@Morville (Peter Morville)

@leahbuley (Leah Buley)

What’s the best advice you’ve ever received?

Show up


This is How I Work (Lauren Magnuson)

Editor’s Note: This post is part of ACRL TechConnect’s series by our regular and guest authors about The Setup of our work.

Lauren Magnuson, @lpmagnuson

Location: Los Angeles, CA

Current Gig:

Systems & Emerging Technologies Librarian, California State University Northridge (full-time)

Development Coordinator, Private Academic Library Network of Indiana (PALNI) Consortium (part-time, ~10/hrs week)

Current Mobile Device: iPhone 4.  I recently had a chance to upgrade from an old slightly broken iPhone 4, so I got….another iPhone4.  I pretty much only use my phone for email and texting (and rarely, phone calls), so even an old iPhone is kind of overkill for me.

Current Computer:

  • Work:  work-supplied HP Z200 Desktop, Windows 7, dual monitors
  • Home: (for my part-time gig): Macbook Air 11”

Current Tablet: iPad 2, work-issued, never used

One word that best describes how you work: relentlessly

What apps/software/tools can’t you live without?

  • Klok – This is time-tracking software that allows you to ‘clock-in’ when working on a project.  I use it primarily to track time spent working my part-time gig.  My part-time gig is hourly, so I need to track all the time I spend working that job.  Because I love the work I do for that job, I also need to make sure I work enough hours at my full-time job.  Klok allows me to track hours for both and generate Excel timesheets for billing.  I use the free version, but the pro version looks pretty cool as well.
  • Trello – I use this for the same reasons everyone else does – it’s wonderfully simple but does exactly what I need to do.  People often drop by my office to describe a problem to me, and unless I make a Trello card for it, the details of what needs to be done can get lost.  I also publish my CSUN Trello board publically and link it from my email signature.
  • Google Calendar - I stopped using Outlook for my primary job and throw everything into Google Calendar now.  I also dig Google Calendar’s new feature that integrates with Gmail so that hotel reservations and flights are automatically added to your Google Calendar.
  • MAMP/XAMPP – I used to only do development work on my Macbook Air with MAMP and Terminal, which meant I carted it around everywhere – resulting in a lot of wear and tear.  I’ve stopped doing that and invested some time in in setting up a development environment with XAMPP and code libraries on my Windows desktop.  Obviously I then push everything to remote git repositories so that I can pull code from either machine to work on it whether I’m at home or at work.
  • Git (especially Git Shell, which comes with Git for Windows) – I was initially intimidated about learning git – it definitely takes some trial and error to get used to the commands and how fetching/pulling/forking/merging all work together.  But I’m really glad I took the time to get comfortable with it.  I use both GitHub (for code that actually works and is shared publically) and BitBucket (for hacky stuff that doesn’t work yet and needs to be in a private repo).
  • Oxygen XML Editor – I don’t always work with XML/XSLT, but when I have to, Oxygen makes it (almost) enjoyable.
  • YouMail – This is a mobile app that, in the free version, sends you an email every time you have a voicemail or missed call on your phone.  At work, my phone is usually buried in the nether-regions of of my bag, and I usually keep it on silent, so I probably won’t be answering my mobile at work.  YouMail allows me to not worry where my phone is or if I’m missing any calls.  (There is a Pro version that transcribes your voicemail that I do not pay for, but seems like it might be cool if you need that kind of thing).
  • Infinite Storm – It rarely rains in southern California.  Sometimes you just need some weather to get through the day.  This mobile app makes rain and thunder sounds.

Physical:

  • Post It notes (though I’m trying to break this habit)
  • Basic Logitech headset for webinars / Google hangouts.  I definitely welcome suggestions for a headset that is more comfortable – the one I have weirdly crushes my ears.
  • A white board I use to track information literacy sessions that I teach

What’s your workspace like?

I’m on the fourth floor of the Oviatt Library at CSUN, which is a pretty awesome building.  Fun fact:  the library building was the shooting location for Star Fleet Academy scenes in JJ Abrams’ 2009 Star Trek movie, (but I guess it got destroyed by Romulans because they have a different Academy in Into Darkness):

Oviatt Library as Star Fleet Academy

My office has one of the very few windows available in the building, which I’m ambivalent about.  I truly prefer working in a cave-like environment with only the warm glow of my computer screen illuminating the space, but I also do enjoy the sunshine.

I have nothing on my walls and keep few personal effects in my office – I try to keep things as minimal as possible.  One thing I do have though is my TARDIS fridge, which I keep well-stocked with caffeinated beverages (yes, it does make the whoosh-whoosh sound, and I think it is actually bigger on the inside).

tardis

I am a fan of productivity desktop wallpapers – I’m using these right now, which help peripherally see how much time has elapsed when I’m really in the zone.

When I work from home, I mostly work from my living room couch.

What’s your best time saving trick  When I find I don’t know how to do (like when I recently had to wrangle my head around Fedora Commons content models, or learning Ruby on Rails for Hydra), I assign myself some ‘homework’ to read about it later rather than trying to learn the new thing during working hours.  This helps me avoid getting lost in a black hole of Stack Overflow for several hours a day.

What’s your favorite to do list manager Trello

Besides your phone and computer, what gadget can’t you live without?

Mr. Coffee programmable coffee maker

What everyday thing are you better at than everyone else? Troubleshooting

What are you currently reading?  I listen to audiobooks I download from LAPL (Thanks, LAPL!), and I particularly like British mystery series.  To be honest, I kind of tune them out when I listen to them at work, but they keep the part of my brain that likes to be distracted occupied.

In print, I’m currently reading:

What do you listen to while at work?  Mostly EDM now, which is pretty motivating and helps me zone in on whatever I’m working on.  My favorite Spotify station is mostly Deadmau5.

Are you more of an introvert or an extrovert? Introvert

What’s your sleep routine like?  I love sleep.  It is my hobby.  Usually I sleep from around 11 PM to 7 AM; but my ideal would be sleeping between like 9 PM and 9 AM.  Obviously that would be impractical.

Fill in the blank: I’d love to see _________ answer these same questions.  David Walker @ the CSU Chancellor’s Office

What’s the best advice you’ve ever received? 

Do, or Do Not, There is no Try.

Applies equally to using the Force and programming.


How I Work (Eric Phetteplace)

Editor’s Note: ACRL TechConnect blog will run a series of posts by our regular and guest authors about The Setup of our work. This is the second post of the series by one of our TechConnect authors Eric Phetteplace.

The whole Tech Connect crew is doing The Setup. Here’s mine.

Location

Oakland, California, United States

Current Gig

Systems Librarian; California College of the Arts

Current Mobile Device

I use an iPhone 5S though mostly just for InstaPaper, TweetBot, and email (Mailbox). I’ve grown frustrated with iOS lately and I think my next phone will be either Android or, if I’m feeling experimental, Firefox OS.

Current Computer

Work:

  • 2011 13in Macbook Pro with 8gb RAM and a 2.7 GHz i7 running OS X 10.8 Mountain Lion. I know many people are forced to use Windows at work and I feel fortunate be at a Mac school. The reduction in context shifts, even small ones like thinking about different keyboard shortcuts, is a serious productivity boon.

Home:

  • 2013 13in Macbook Air, no extra CPU or RAM, running OS X 10.10 Yosemite. I love Macbook Airs, though their price tag is significant. The solid state drive is fast even when running virtual machines, it’s light and I move around a lot, and OS X is a fine operating system with a nice UNIX core.
  • 2012 11.6in Asus X201E-DH01 Notebook running Ubuntu Server 14.04 Trusty Tahr. For side projects, practicing, storage space (320gb hard drive). It’s been a great little machine to me, surviving several different Linux distributions and serious buffoonery.

Current Tablet

While I technically have an iPad at work, I have yet to use it in any substantive manner. I’m a horrible tablet user. How do you open the terminal?

One word that best describes how you work

Frenetic

What apps/software/tools can’t you live without?

At any given moment, I always have three applications open: a web browser, a text editor, and a terminal emulator. Those are my bread and butter and I’m not even too picky about the particulars, but I far prefer applications which are powerful and highly customizable to ones which have smart defaults but little configurability. Thus my editor and browser are weighed down by dozens of add-ons, and my shell’s dotfiles are extensive.

Desktop Software:

  • Sublime Text 3 with a suite of plugins, the most essential of which are:
    • Emmet for handy CSS & HTML shorthand
    • Git so I can execute git commands from within the editor
    • GitGutter to show which lines in the current file have been added, changed, or deleted since the last commit
    • MarkdownEditing & Markdown Preview for better syntax highlighting & easy previewing of markdown files, which is what I use to write notes, blog posts, & documentation
    • SublimeLinter with linter plugins for the languages I regularly operate in (JavaScript, SASS, Python)
  • Atom may replace Sublime soon, I worry about the slowed pace of Sublime’s development as well as its cost
  • iTerm2 is my preferred terminal emulator
  • Alfred is an application launcher which I also use to store text snippets and do a few other things via plugins. I was a Quicksilver devotee for a long time but Alfred’s faster and simpler to set up. OS X Yosemite’s major Spotlight redesign makes that another choice in this arena.
  • 1Password saves my randomized passwords & makes it easy to log in securely to the hundreds of websites that require accounts
  • Chrome with another host of extensions, including:
    • AdBlock because the Internet is terrible without it
    • Context because I have too many extensions, this lets me group them into modes (web development, research, video, none) I can switch between
    • Diigo Web Collector for saving web pages
    • Google Cast for Chromecasting to our TV
    • HTTPS Everywhere for security
    • JSONView to see pretty responses from JSON APIs
    • Stylish to customize the look of a couple sites
    • TweetDeck for a better Twitter experience
  • Other browsers: Chrome Canary, Firefox, Firefox Developer Edition, sometimes Safari. I like to try out new, experimental browsers too though I’m finding it hard to switch from Chrome to anything else.
  • Spotify plays music while I work

Command Line:

  • Fish is my default shell and I love it
  • BASH is everywhere, including our servers, so I use it, too
  • Homebrew manages software packages on my macs
  • Git is good version control software
  • Ack searches through source code like no other
  • Z makes jumping around directories quick and easy
  • All the standard, unheralded UNIX tools, too numerous to name, are great and assist with text and file manipulation tasks

Web Services:

  • Trello is my preferred to-do app
  • GitHub is great for versioning documents, sharing code, & creating to-do lists in the context of particular projects
  • Last.fm records the music I listen to and recommends similar artists
  • Google Apps: Gmail, Drive, Calendar. They’re good applications and I use Takeout to assuage my fear of lock-in.

What’s your workspace like?

my work area

It’s important to have a standing desk. Mine is a VARIDESK PRO, though I’ve made due with stacks of reference books and cardboard boxes before. Sitting all day is awful, for both my health and energy level. While I will typically sit for a couple hours a day, I attempt to stand as much as possible.

Other than that, there’s not much to it. I don’t need a desk. I try not to collect papers. I like facing a window. Two monitors or two laptops helps, since I’m often performing multiple tasks at once. Reading documentation on one screen and coding/configuring on another, for instance.

I need coffee in my workspace. Or close by.

What’s your best time-saving trick?

Don’t be harried by emails or any notifications. The surest way to kill your time is to repeatedly switch contexts and spend time staring at settings, open tabs, code, etc. that you’ve forgotten the purpose of. Disable all but the essential notifications on your work computer and your phone.

Also, avoid meetings.

What’s your favorite to-do list manager?

I used Remember the Milk extensively at my last position but I used few of its features; all the tags, labels, notes, etc. I was filling in out of devotion to metadata more than utility. The only use was at the end of the year when I would run some self analysis on how I spent my time.

I like the flexibility of Trello. It’s both easy to quickly review items and to attach different types of information to them, from checklists to files. I have a few Trello boards for different areas of responsibility at work. For to-dos and bugs on code projects, I try to be good about documenting everything in GitHub, though I could improve. In my personal life, I have a few sparse sets of Reminders in Apple’s paired iOS and OS X apps. On top of all this, I find it useful to have a sticky note (either in OS X’s Dashboard or an honest-to-spaghetti dead-tree sticky note) of the day’s primary objectives.

It should be apparent that I have too many disparate to-do lists. I’ve actually migrated some to-do lists three times since starting my current position in June. I need to consolidate further, but there is value to putting personal and work to-do lists in separate places. My favorite to-do list software is whatever other people on my team are using. During my career I’ve worked very independently on very small teams so it has not been vital to share items, but I’ll take a clunky app that puts everyone on the same page any day.

Besides your phone and computer, what gadget can’t you live without?

Does coffee count? I don’t need much else.

What everyday thing are you better at than everyone else?

I initially wrote “nothing” as an answer. It’s strange how many of these questions elicited negative responses. But upon further reflection, I came up with a couple things I’m good at. I wouldn’t deign to say better than everyone else, however.

I let data or others inform my priorities. While I am not unopinionated (understatement), I prioritize according to my supervisor’s needs or what our data indicates is important. I’m quite willing to humble myself before analytics, user studies, or organizational goals.

Also, recognizing opportunities for abstraction or automation. I’m good at seeing the commonality amongst a set of tasks or items and creating an abstraction to simplify interactions.

What are you currently reading?

I read two books at once, one creative and one analytical. Currently it’s mostly analytical books, though.

  • Pataphysics: a useless guide
  • Ambient Findability
  • some JavaScript books for a book chapter I thought I was going to write: JavaScript: the Good Parts, Standard ECMA-262 Edition 5.1, JavaScript: The Definitive Guide

What do you listen to while you work?

  • IDM: Aphex Twin, Prefuse 73, Squarepusher, & similar
  • Dubstep: Burial, Clubroot, Distance, & similar
  • Black Metal: Krallice, Liturgy, Wolves in the Throne Room, & similar
  • whatever was released the last couple weeks, I listen to new music frequently just to see if there’s anything new I like

I like intense music without lyrics at work. It pumps me up without distracting from reading/writing tasks where I’m already absorbing language visually.

By request, here’s an example Spotify mix.

Are you more of an introvert or an extrovert?

Not to be tricky, but I dislike this dichotomy and every time I try to apply it to someone I end up misjudging their character, badly. Whether someone is outgoing is often contextual (see, for instance, Nicholas Schiller’s answer). While there are certainly shy people and social people, many oscillate in between. I like alone time. I can go without speaking to other people for days and be content. On the other hand, in a room of fun people I admire I want to talk endlessly.

What’s your sleep routine like?

My greatest weakness. I am not attuned to the regular 9-to-5 schedule. I like to stay up late and sleep in. In practice, this means I go to bed at midnight or later and wake up at 7 to get to work at 9. Sometimes post-work naps are required. I don’t get enough sleep. It wears on me.

Fill in the blank: I’d love to see _________ answer these same questions.

Besides my fellow Tech Connect authors, I’d be curious what Bryan J. Brown uses.

What’s the best advice you’ve ever received?

Tom Haverford and Donna Meagle once said “Treat. Yo. Self.” and they were right. Life is stressful. Sometimes a cupcake and a massage aren’t niceties, they’re necessary.


This Is How I (Attempt To) Work

Editor’s Note: ACRL TechConnect blog will run a series of posts by our regular and guest authors about The Setup of our work. The first post is by TechConnect alum Becky Yoose.

Ever wondered how several of your beloved TechConnect authors and alumni manage to Get Stuff Done? In conjunction with The Setup, this is the first post in a series of TechConnect authors, past and present, to show off what tools, tips, and tricks they use for work.

I have been tagged by @nnschiller in his “This is how I work” post. Normally, I just hide when these type of chain letter type events come along, but this time I’ll indulge everyone and dust off my blogging skills. I’m Becky Yoose, Discovery and Integrated Systems Librarian, and this is how I work.

Location: Grinnell, Iowa, United States

Current Gig: Assistant Professor, Discovery and Integrated Systems Librarian; Grinnell College

Current Mobile Device: Samsung Galaxy Note 3, outfitted with an OtterBox Defender cover. I still mourn the discontinuation of the Droid sliding keyboard models, but the oversized screen and stylus make up for the lack of tactile typing.

Current Computer:

Work: HP EliteBook 8460p (due to be replaced in 2015); boots Windows 7

Home: Betty, my first build; dual boots Windows 7 and Ubuntu 14.04 LTS

eeepc 901, currently b0rked due to misjudgement on my part about appropriate xubuntu distros.

Current Tablet: iPad 2, supplied by work.

One word that best describes how you work:

Panic!

Don’t panic. Nothing to see here. Move along.

What apps/software/tools can’t you live without?

Essential work computer software and tools, in no particular order:

  • Outlook – email and meetings make up the majority of my daily interactions with people at work and since campus is a Microsoft shop…
  • Notepad++ – my Swiss army knife for text-based duties: scripts, notes, and everything in between.
  • PuTTY - Great SSH/Telnet client for Windows.
  • Marcedit – I work with library metadata, so Marcedit is essential on any of my work machines.
  • MacroExpress and AutoIt – Two different Windows automation apps: MacroExpress handles more simple automation (opening programs, templating/constant data, simple workflows involving multiple programs) while AutoIt gives you more flexibility and control in the automation process, including programming local functions and more complex decision-making processes.
  • Rainmeter and Rainlander – These two provide customized desktop skins that give you direct or quicker access to specific system information, functions, or in Rainlander’s case, application data.
  • Pidgin – MPOW uses both LibraryH3lp and AIM for instant messaging services, and I use IRC to keep in touch with #libtechwomen and #code4lib channels. Being able to do all three in one app saves time and effort.
  • Jing – while the Snipping Tool in Windows 7 is great for taking screenshots for emails, Jing has proven to be useful for both basic screenshots and screencasts for troubleshooting systems issues with staff and library users. The ability to save screencasts on screencast.com is also valuable when working with vendors in troubleshooting problems.
  • CCleaner – Not only does it empty your recycling bin and temporary files/caches, the various features available in one spot (program lists, registry fixes, startup program lists, etc.) make CCleaner an efficient way to do housekeeping on my machines.
  • Janetter (modified code for custom display of Twitter lists) – Twitter is my main information source for the library and technology fields. One feature I use extensively is the List feature, and Janetter’s plugin-friendly set up allows me to highly customize not only the display but what is displayed in the list feeds.
  • Firefox, including these plugins (not an exhaustive list):

For server apps, the main app (beyond putty or vSphere) that I need is Nagios to monitor the library virtual Linux server farm. I also am partial to nano, vim, and apt.

As one of the very few tech people on staff, I need a reliable system to track and communicate technical issues with both library users and staff. Currently the Libraries is piggybacking on ITS’ ticketing system KBOX. Despite being fit into a somewhat inflexible existing structure, it has worked well for us, and since we don’t have to maintain the system, all the better!

Web services: The Old Reader, Gmail, Google Drive, Skype, Twitter. I still mourn the loss of Google Reader.

For physical items, my tea mug. And my hat.

What’s your workspace like?

Take a concrete box, place it in the dead center of the library, cut out a door in one side, place the door opening three feet from the elevator door, cool it to a consistent 63-65 degrees F., and you have my office. Spending 10+ hours a day during the week in this office means a bit of modding is in order:

  • Computer workstation set up: two HP LA2205wg 22 inch monitors (set to appropriate ergonomic distances on desk), laptop docking station, ergonomic keyboard/mouse stand, ergonomic chair. Key word is “ergonomic”. I can’t stress this enough with folks; I’ve seen friends develop RSIs on the job years ago and they still struggle with them today. Don’t go down that path if you can help it; it’s not pretty.
  • Light source: four lamps of varying size, all with GE Daylight 6500K 15 watt light bulbs. I can’t do the overhead lights due to headaches and migraines, so these lamps and bulbs help make an otherwise dark concrete box a little brighter.
  • Three cephalopods, a starfish, a duck, a moomin, and cats of various materials and sizes
  • Well stocked snack/emergency meal/tea corner to fuel said 10+ hour work days
  • Blankets, cardigans, shawls, and heating pads to deal with the cold

When I work at home during weekends, I end up in the kitchen with the laptop on the island, giving me the option to sit on the high chair or stand. Either way, I have a window to look at when I need a few seconds to think. (If my boss is reading this – I want my office window back.)

What’s your best time-saving trick?

Do it right the first time. If you can’t do it right the first time, then make the path to make it right as efficient  and painless as you possibly can. Alternatively, build a time machine to prevent those disastrous metadata and systems decisions made in the past that you’re dealing with now.

What’s your favorite to-do list manager?

Post it notes on a wall

The Big Picture from 2012

I have tried to do online to-do list managers, such as Trello; however, I have found that physical managers work best for me. In my office I have a to-do management system that comprises of three types of lists:

  • The Big Picture List (2012 list pictured above)- four big post it sheets on my wall, labeled by season, divided by months in each sheet. Smaller post it notes are used to indicate which projects are going on in which months. This is a great way to get a quick visual as to what needs to be completed, what can be delayed, etc.
  • The Medium Picture List – a mounted whiteboard on the wall in front of my desk. Here specific projects are listed with one to three action items that need to be completed within a certain time, usually within one to two months.
  • The Small Picture List – written on discarded Choice review cards, the perfect size to quickly jot down things that need to be done either today or in the next few days.

Besides your phone and computer, what gadget can’t you live without?

My wrist watch, set five minutes fast. I feel conscientious if I go out of the house without it.

What everyday thing are you better at than everyone else?

I’d like to think that I’m pretty good with adhering to Inbox Zero.

What are you currently reading?

The practice of system and network administration, 2nd edition. Part curiosity, part wanting to improve my sysadmin responsibilities, part wanting to be able to communicate better with my IT colleagues.

What do you listen to while you work?

It depends on what I am working on. I have various stations on Pandora One and a selection of iTunes playlists to choose from depending on the task on hand. The choices range from medieval chant (for long form writing) to thrash metal (XML troubleshooting).

Realistically, though, the sounds I hear most are email notifications, the operation of the elevator that is three feet from my door, and the occasional TMI conversation between students who think the hallway where my office and the elevator are located is deserted.

Are you more of an introvert or an extrovert?

An introvert blessed/cursed with her parents’ social skills.

What’s your sleep routine like?

I turn into a pumpkin at around 8:30 pm, sometimes earlier. I wake up around 4:30 am most days, though I do cheat and not get out of bed until around 5:15 am, checking email, news feeds, and looking at my calendar to prepare for the coming day.

Fill in the blank: I’d love to see _________ answer these same questions.

You. Also, my cats.

What’s the best advice you’ve ever received?

Not advice per se, but life experience. There are many things one learns when living on a farm, including responsibility, work ethic, and realistic optimism. You learn to integrate work and life since, on the farm, work is life. You work long hours, but you also have to rest whenever you can catch a moment.  If nothing else, living on a farm teaches you that no matter how long you put off doing something, it has to be done. The earlier, the better, especially when it comes with shoveling manure.


Building a Data Dashboard

Dashboard web pages that display, at a glance, a wide range of information about a library’s operations are becoming common. These dashboards are made possible by the ubiquity of web-based information visualization tools combined with the ever increasing availability of data sources. Naturally, libraries want to take advantage of these tools to provide insight alongside some “wow” factor.

Ball State Libraries' dashboard

Ball State University Libraries’ dashboard shows many high-level figures about a variety of services. One can learn more about a particular figure by clicking the “+” or “i” icons beside it.

TADL dashboard

The Traverse Area District Library dashboard has a few sections with different types of charts. Shown here are four line graphs displaying changes in various usage statistics over time.

I’ve assembled a list of several data dashboards; if you know of one that isn’t on the list, let me know in the comments so I can add it.

It’s difficult to set about piecing together a display from the information immediately available. The design and data collection process largely determines the success of the dashboard. I don’t pretend to be an expert on these processes; rather, my own attempt at building a dashboard was a productive failure that helped to highlight where I went wrong.

Why do we want to build a dashboard?

As with any project, the starting point for building a dashboard is to identify what we’re trying to accomplish. Having a guiding idea will determine everything that follows, from the data we collect to how we choose to display them.

There are two primary goals for dashboards: marketing and success. One seeks to advertise the excellence of the library—perhaps to secure further funding, perhaps to raise its profile on campus—while the other aims at improved daily operations, however that may be defined. These are two terrifically broad categories but they create a useful distinction when building a dashboard.

Perhaps our goal is “to make a flashy dashboard, one that’ll make spectators rubberneck as they navigate the information superhighway.”

I’m not even being sarcastic. It’s easy to dismiss designs that are flashy to the detriment of their content. The web is rife with visualizations that leave their audiences with no improved understanding of the issue at hand. But there is also value to shiny sites that show the library can create an intriguing product. Furthermore, experimental visualizations that explore the possibilities of design can be a wonderful starting point. What appeared to be superficial glitz may unearth an important pattern. Several failed, bombastic charts may ultimately lead to a balanced display.

In general, flashy displays fall into the marketing category. They’re eye candy for the library, whether through the enormity of the figures themselves (“look how many books we circulated last year! articles downloaded from our databases! reference queries we answered!”) or loud design.

Increasing the success of the library is a very different goal. What defines “success” is more subjective than marketing and as such the types of data included will vary greatly. Assessing services calls for timelines, cross-sections of data, and charts that render divergent data representing different aspects of a library comparable. Unlike marketing figures, the charts need not be easily interpreted. Simplification may unintentionally distort what’s going on, obscuring important but subtle trends. To quote from an In the Library with the Lead Pipe post from 2009:

Libraries collect a lot of data that encompass complex networks about how users navigate through online resources, which subjects circulate the most or the least, which resources are requested via interlibrary loan, visitation patterns over periods of time, reference queries, and usage statistics of online journals and databases. Making sense of these complex networks of use and need isn’t easy. But the relationships between use and need patterns can help libraries make hard decisions (say, about which journals to cut) and creative decisions to improve user experiences, outreach, achieve efficiencies, and enhance alignment with organizational goals.
— Hilary Davis, “Not Just Another Pretty Picture” 1

If an institution or department has a more narrow goal in mind than marketing or assessment, chances are it still relates to one of the two. It’s also possible to build a dashboard that accomplishes both goals, providing insight for staff while thrilling external stakeholders. However, that’s no easy feat to accomplish. What external parties find surprising may be pedestrian to staff, and vice versa. Libraries provide simple, Google-esque search fields alongside triplicate advanced search forms for the same reason. Users shouldn’t need to know SQL to find a title; staff shouldn’t be limited to keyword searches for their administrative tasks. Appealing to such varied sensibilities at once is an onerous task best avoided if possible.

Choosing Our Data Sources

The ultimate goal of a data dashboard will greatly affect what data sources are sensible. Reviewing the dashboards of other libraries, a few data sources are repeated across them:

  • Checkouts and renewals
  • Print volumes and other material holdings
  • Inter-library loan
  • Gate counts
  • Computer use
  • Reference questions

These data are commonly available, but don’t capture the unique value that libraries provide. We can use them as a starting point but should keep in mind that our dashboard’s purpose determines what information is pertinent. Simply because a figure is readily available within our ILS isn’t a sufficient reason to include it. On the other hand, we may find our display would benefit greatly from a figure which isn’t available, thus influencing how we collect and analyze data going forward.

Marketing displays must be persuasive. The information in them should be positive, surprising, and understandable to a broad audience.

Total circulation, article/ebook downloads, and new items digitized all might be starting points. The breadth of a library’s offerings is often surprising to external parties, too. Show circulation across many subject areas, attempt to outline all the impact of the library’s many services. For instance, if our library has an active social media presence we could show various accounts and the measures of popularity for each (followers and favorites, translated across the schemas of the specific networks).

Simply pilfering all the dashboard’s data from typical year-end reports submitted to organizations like the Nation Center for Educational Statistics, ALA, or our accrediting bodies is a terrible idea for marketing displays. These numbers are less important outside of the long-term, cross-institutional trends they make visible. Our students, faculty, or funders probably do not care about the exact quantity of our microfilm holdings. Since our dashboard isn’t held to any reporting standards, this is our chance to show what’s new, what those statistical warehouse organizations do not yet collect.

Marketing displays should focus on aggregate numbers that look impressive in isolate and do not require much context to apprehend. Different categories need not be comparable or normalized; each figure can have its own type of chart and style as long as the design is visually appealing. Marketing isn’t meant to generate insight so highlighting patterns across different data sources isn’t necessary. All of the screenshot examples at the beginning of this post represent marketing-oriented dashboards because of the impressive nature of the figures presented as well as their relative lack of informative depth.

I’ll be honest, presenting data for marketing purposes makes me feel conflicted. It’s easy to skirt the truth since one’s incentives lie in producing ever more stunning figures, not in representing reality. If there is serious dysfunction within a library, marketing displays won’t disclose them. Nonetheless, there’s value to these efforts. I wouldn’t be in this profession if I didn’t believe that libraries produce an astonishing amount of value for their host institutions, through a wide range of services. Demonstrating that value is vital.

A dashboard meant to improve library success should answer pressing questions about collections and services. A library might want to investigate different areas each year, and so data sources could change periodically. Investigations might be specific and change each year; one year we aim at improving information literacy instruction, another year at increasing access to electronic resources. The dashboard can differ from year to year if it’s meant to aid a focused study. Trying to represent everything a library does at once leads to clutter, information overload, and time spent culling statistics no one cares about.

For a nice end-to-end discussion of building a data warehouse and hooking it up to dashboard charts, see “Trends at a Glance: A Management Dashboard of Library Statistics” (open access via ITAL) by Emily G. Morton-Owens and Karen L. Hanson. They cover in-depth creating a dashboard meant to give an overview of library operations from a broad variety of data sources, including external ones like Google Analytics as well as a local inter-library loan database.

GVSU Libraries status

A dashboard shows that a few minor issues affect various systems at GVSU.

While it’s not displaying the sorts of data this posts discusses, the Grand Valley State University Libraries’ Status page is an example of a good dashboard that aims at improving service. Issues affecting several areas are brought together in on one page, making it possible to gain a quick overview of the library’s operations in seconds. Just because GVSU Libraries Status is not a marketing display doesn’t imply design doesn’t matter; bright, easily scannable colors make information more digestible. The familiar red-for-error, green-for-success color scheme is well applied.

There are some attributes that influence data choices regardless of the goal of our dashboard. Data sources that are easy to access and reuse will naturally be better selections than cumbersome ones. For instance, while COUNTER usage statistics for vendor databases can be both informative and persuasive, they’re often painful to collect. One must log into a number of separate administrative sites, download a CSV (possibly multiple CSVs if we need information from separate reports), trim useless rows, and combine into one master file. This process is difficult to automate and the SUSHI standard meant to address this labor has no viable software implementations. 2

On the other hand, many modern third-party services offer robust APIs that make retrieving data simple. For the dashboard I built, some of the easiest pieces came from LibGuides, YouTube, and WordPress because all three surfaced usage information. Since I wanted to show the breadth of web services we were using and their popularity, it was easy to set up some simple API calls that for our most popular guides, videos, and blog posts. 3

Building the Dashboard Pt. I: Data Stores

Once we’ve chosen our data sources—ignoring, for the sake of brevity, the important part where we get staff buy-in and ensure our data quality is sufficient—we’ll need a central data store to power our dashboard. This can be simpler than it sounds, for instance a single relational database with tables for various data sources which are imported on a periodic basis. I’ve even used Google Spreadsheets as a data store and since the spreadsheet is already online, its data can be accessed through an API or even downloaded as a CSV.

Even if we rely on third-party APIs for much of our data, we may want to download the data into a data store. Having a full copy of our library’s information has many benefits: we don’t need to worry about service outages or sudden discontinuation, we can add summative values which many not be present in the original API, and we can reduce the total number of databases the dashboard relies upon by keeping everything centralized.

Once we have a data store, we can use it to power an API and a site that consumes the API, known as a client. At a high level, our client-API interaction looks like:

  • the client passes parameters in the URL letting the API know what kind of data it wants
  • the API receives the request, parses the parameters (in PHP this is done with the superglobal $_GET associative array, for instance)
  • the API queries a database with those parameters
  • the API takes the query response and formats it as JSON, sending it along to our client (formatting as JSON is common enough that most languages have a built-in function for it) 4
  • the client consumes the JSON and outputs charts, figures, etc.

A more code-like example of this process:

a GET HTTP request for example.com/api.php?month=02&dataset=reference asks for reference data from the month of February

the API performs query like SELECT * FROM reference WHERE month LIKE 'february' on the database

the API takes the query results and formats them into a JSON response like:

{
 "reference questions": 404,
 "directional questions": 808,
 "computer questions": 10101,
 "ready reference": 42,
 "research consultations": 42
}

This is a basic example of library-specific data that probably lives in a database we control. However, depending on how many third party APIs we choose to rely upon, the necessary flexibility of the client varies. If we’re normalizing and storing everything in our data store, the front end can be simple since the data is in a pre-processed, predictable format. For instance, aggregation operations like sums, maximums, and averages can be baked into a well-designed, comprehensive dashboard API such that the client only needs to know what URL to request. On the other end, a sophisticated client might consume multiple sources of data, combine them into a similar profile, and then run summative calculations itself. Both approaches have their relative advantages depending upon the situation.

Building the Dashboard Pt. II: Charting

There are numerous choices for building graphs and charts on the web. This post will not attempt to list them. Instead, I have two recommendations. One, our front-end shouldn’t be Flash. Adobe Flash doesn’t work on the mobile web and, at this point, JavaScript has a tremendous number of data visualization libraries. Secondly, we should use D3 or something that builds on top of D3.

While there are a lot of approaches to data visualization in JavaScript, D3 has a large user community, tremendous flexibility (it can be used for visualizations as varied as timelines, bar charts, and word clouds), and a nice design based on chained function calls that will be familiar to jQuery users.

While D3 is a knockout choice, we can’t just plug data in, pick a chart type, and watch the output instantly appear as in Excel or Numbers. Instead, D3 is more of a framework for working with data on the web. It takes a lot of learning and code to produce high quality charts. Instead, it’s best to use a library that provides an easy-to-use wrapper around D3. Here are several choices that abstract away common difficulties:

  • C3 makes creating D3 charts full of nice features and best practices as simple as passing data to c3.generate()
  • reD3 is a set of reusable D3 charts
  • The D3 Gallery shows what the framework can do with imitable example code

Raw is another interesting option. It’s browser-based charting software that works more similarly to spreadsheet software: plug in our data, choose a chart type, and then take the output that’s produced (using D3). If our charts don’t change often then we don’t need to build a sophisticated client-server setup; we can take our data and then use Raw to produce a web-friendly chart. Repeat every year or as often as the dashboard needs be updated.

When I made a dashboard, I didn’t use a wrapper around D3 and instead ended up building my own, a creaky abstraction for making pie charts with a few built-in options. While it was fun, I spent time making something not on par with what was already available.

Conclusion

Figure out why you want a dashboard, choose data sources appropriate to your purposes, build a data store with an API, use D3 to display web-based charts. Easy, right?

That sounds like numerous increasingly sophisticated steps, but I think each one also poses value in its own right. If you can’t define the goals of the data you’re collecting, why exactly are you gathering them? If you’re not aggregating your data in one place, aren’t they just a messy hodgepodge that everyone dreads negotiating come annual report time? Dashboards are shiny. Shiny things are nice. But the reflective processes behind building a data dashboard are perhaps more valuable than their final product.

Notes

  1. This post also recommends the book Information Dashboard Design by Stephen Few which is probably an excellent resource for those who want to learn more about the topic.
  2. I’d love to hear from anyone who has found a good open source SUSHI client. I’ve been looking for one for years now and found a number of broken options but nothing I was capable of setting up. There are a few commercial SUSHI products that ease this pain significantly for those libraries that can afford them.
  3. I haven’t seen many good explanations of what an API is; the acronym stands for “Application Programming Interface” which I find utterly nondescript. APIs expose information about web services such that they can be easily processed by programs. They fall somewhere in between having direct access to query a database and only being able to see an HTML page intended for human consumption.
  4. There are non-JSON APIs—XML is common—but JSON is becoming a de facto standard due to its light weight and the ease with which JavaScript and other languages can parse it.

Analyzing EZProxy Logs

Analyzing EZProxy logs may not be the most glamorous task in the world, but it can be illuminating. Depending on your EZProxy configuration, log analysis can allow you to see the top databases your users are visiting, the busiest days of the week, the number of connections to your resources occurring on or off-campus, what kinds of users (e.g., staff or faculty) are accessing proxied resources, and more.

What’s an EZProxy Log?

EZProxy logs are not significantly different from regular server logs.  Server logs are generally just plain text files that record activity that happens on the server.  Logs that are frequently analyzed to provide insight into how the server is doing include error logs (which can be used to help diagnose problems the server is having) and access logs (which can be used to identify usage activity).

EZProxy logs are a kind of modified access log, which record activities (page loads, http requests, etc.) your users undertake while connected in an EZProxy session. This article will briefly outline five potential methods for analyzing EZProxy logs:  AWStats, Piwik, EZPaarse, a custom Python script for parsing starting-point URLS (SPU) logs, and a paid option called Splunk.

The ability of  any log analyzer will of course depend upon how your EZProxy log directives are configured.  You will need to know your LogFormat and/or LogSPU directives in order to configure most log file analyzing solutions.  In EZProxy, you can see how your logs are formatted in config.txt/ezproxy.cfg by looking for the LogFormat directive, 1  e.g.,

LogFormat %h %l %u %t “%r” %s %b “%{user-agent}i”

and / or, to log Starting Point URLs (SPUs):

LogSPU -strftime log/spu/spu%Y%m.log %h %l %u %t “%r” %s %b “%{ezproxy-groups}i”

Logging Starting Point URLs can be useful because those tend to be users either clicking into a database or the full-text of an article, but no activity after that initial contact is logged.  This type of logging does not log extraneous resource loading, such as loading scripts and images – which often clutter up your traditional LogFormat logs and lead to misleadingly high hits.  LogSPU directives can be defined in addition to traditional LogFormat to provide two different possible views of your users’ data.  SPULogs can be easier to analyze and give more interesting data, because they can give a clearer picture of which links and databases are most popular  among your EZProxy users.  If you haven’t already set it up, SPULogs can be a very useful way to observe general usage trends by database.

You can find some very brief anonymized EZProxy log sample files on Gist:

On a typical EZProxy installation, historical monthly logs can be found inside the ezproxy/log directory.  By default they will rotate out every 12 months, so you may only find the past year of data stored on your server.

AWStats

Get It:  http://www.awstats.org/#DOWNLOAD

Best Used With:  Full Logs or SPU Logs

Code / Framework:  Perl

    An example AWStats monthly history report. Can you tell when our summer break begins?

An example AWStats monthly history report. Can you tell when our summer break begins?

AWStats Pros:

  • Easy installation, including on localhost
  • You can define your unique LogFormat easily in AWStats’ .conf file.
  • Friendly, albeit a little bit dated looking, charts show overall usage trends.
  • Extensive (but sometimes tricky) customization options can be used to more accurately represent sometimes unusual EZProxy log data.
Hourly traffic distribution in AWStats.  While our traffic peaks during normal working hours, we have steady usage going on until about 1 AM, after which point it crashes pretty hard.  We could use this data to determine  how much virtual reference staffing we should have available during these hours.

Hourly traffic distribution in AWStats. While our traffic peaks during normal working hours, we have steady usage going on until about Midnight, after which point it crashes pretty hard. We could use this data to determine how much virtual reference staffing we should have available during these hours.

 

AWStats Cons:

  • If you make a change to .conf files after you’ve ingested logs, the changes do not take effect on already ingested data.  You’ll have to re-ingest your logs.
  • Charts and graphs are not particularly (at least easily) customizable, and are not very modern-looking.
  • Charts are static and not interactive; you cannot easily cross-section the data to make custom charts.

Piwik

Get It:  http://piwik.org/download/

Best Used With:  SPULogs, or embedded on web pages web traffic analytic tool

Code / Framework:  Python

piwik visitor dashboard

The Piwik visitor dashboard showing visits over time. Each point on the graph is interactive. The report shown actually is only displaying stats for a single day. The graphs are friendly and modern-looking, but can be slow to load.

Piwik Pros:

  • The charts and graphs generated by Piwik are much more attractive and interactive than those produced by AWStats, with report customizations very similar to what’s available in Google Analytics.
  • If you are comfortable with Python, you can do additional customizations to get more details out of your logs.
Piwik file ingestion in PowerShell

To ingest a single monthly log took several hours. On the plus side, with this running on one of Lauren’s monitors, anytime someone walked into her office they thought she was doing something *really* technical.

Piwik Cons:

  • By default, parsing of large log files seems to be pretty slow, but performance may depend on your environment, the size of your log files and how often you rotate your logs.
  • In order to fully take advantage of the library-specific information your logs might contain and your LogFormat setup, you might have to do some pretty significant modification of Piwik’s import_logs.py script.
When looking at popular pages in Piwik you’re somewhat at the mercy that the subdirectories of databases have meaningful labels; luckily EBSCO does, as shown here.  We have a lot of users looking at EBSCO Ebooks, apparently.

When looking at popular pages in Piwik you’re somewhat at the mercy that the subdirectories of database URLs have meaningful labels; luckily EBSCO does, as shown here. We have a lot of users looking at EBSCO Ebooks, apparently.

EZPaarse

Get Ithttp://analogist.couperin.org/ezpaarse/download

Best Used With:  Full Logs or SPULogs

Code / Framework:  Node.js

ezPaarse’s friendly drag and drop interface.  You can also copy/paste lines for your logs to try out the functionality by creating an account at http://ezpaarse.couperin.org.

ezPaarse’s friendly drag and drop interface. You can also copy/paste lines for your logs to try out the functionality by creating an account at http://ezpaarse.couperin.org.

EZPaarse Pros:

  • Has a lot of potential to be used to analyze existing log data to better understand e-resource usage.
  • Drag-and-drop interface, as well as copy/paste log analysis
  • No command-line needed
  • Its goal is to be able to associate meaningful metadata (domains, ISSNs) to provide better electronic resource usage statistics.
ezPaarse Excel output generated from a sample log file, showing type of resource (article, book, etc.) ISSN, publisher, domain, filesize, and more.

ezPaarse Excel output generated from a sample log file, showing type of resource (article, book, etc.) ISSN, publisher, domain, filesize, and more.

EZPaarse Cons:

  • In Lauren’s testing, we couldn’t get of the logs to ingest correctly (perhaps due to a somewhat non-standard EZProxy logformat) but the samples files provided worked well. UPDATE 11/26:  With some gracious assistance from EZPaarse’s developers, we got EZPaarse to work!  It took about 10 minutes to process 2.5 million log lines, which is pretty awesome. Lesson learned – if you get stuck, reach out to ezpaarse [at] couperin.org or tweet for help @ezpaarse.  Also be sure to try out some of the pre-defined parameters set up by other institutions under Parameters. Check out the comments below for some more detail from ezpaarse’s developers.
  • Output is in Excel Sheets rather than a dashboard-style format – but as pointed out in the comments below, you can optionally output the results in JSON.

Write Your Own with Python

Get Started With:  https://github.com/robincamille/ezproxy-analysis/blob/master/ezp-analysis.py

Best used with: SPU logs

Code / Framework:  Python

code

Screenshot of a Python script, available at Robin Davis’ Github

 

Custom Script Pros:

  • You will have total control over what data you care about. DIY analyzers are usually written up because you’re looking to answer a specific question, such as “How many connections come from within the Library?”
  • You will become very familiar with the data! As librarians in an age of user tracking, we need to have a very good grasp of the kinds of data that our various services collect from our patrons, like IP addresses.
  • If your script is fairly simple, it should run quickly. Robin’s script took 5 minutes to analyze almost 6 years of SPU logs.
  • Your output will probably be a CSV, a flexible and useful data format, but could be any format your heart desires. You could even integrate Python libraries like Plotly to generate beautiful charts in addition to tabular data.
  • If you use Python for other things in your day-to-day, analyzing structured data is a fun challenge. And you can impress your colleagues with your Pythonic abilities!

 

Action shot: running the script from the command line. (Source)

Action shot: running the script from the command line.

Custom Script Cons:

  • If you have not used Python to input/output files or analyze tables before, this could be challenging.
  • The easiest way to run the script is within an IDE or from the command line; if this is the case, it will likely only be used by you.
  • You will need to spend time ascertaining what’s what in the logs.
  • If you choose to output data in a CSV file, you’ll need more elbow grease to turn the data into a beautiful collection of charts and graphs.
output

Output of the sample script is a labeled CSV that divides connections by locations and user type (student or faculty). (Source)

Splunk (Paid Option)

Best Used with:  Full Logs and SPU Logs

Get It (as a free trial):  http://www.splunk.com/download

Code / Framework:  Various, including Python

A Splunk distribution showing traffic by days of the week.  You can choose to visualize this data in several formats, such as a bar chart or scatter plot.  Notice that this chart was generated by a syntactical query in the upper left corner:  host=lmagnuson| top limit=20 date_wday

A Splunk distribution showing traffic by days of the week. You can choose to visualize this data in several formats, such as a bar chart or scatter plot. Notice that this chart was generated by a syntactical query in the upper left corner: host=lmagnuson| top limit=20 date_wday

Splunk Pros:  

  • Easy to use interface, no scripting/command line required (although command line interfacing (CLI) is available)
  • Incredibly fast processing.  As soon as you import a file, splunk begins ingesting the file and indexing it for searching
  • It’s really strong in interactive searching.  Rather than relying on canned reports, you can dynamically and quickly search by keywords or structured queries to generate data and visualizations on the fly.
Here's a search for log entries containing a URL (digital.films.com), which Splunk uses to create a chart showing the hours of the day that this URL is being accessed.  This particular database is most popular around 4 PM.

Here’s a search for log entries containing a URL (digital.films.com), which Splunk uses to display a chart showing the hours of the day that this URL is being accessed. This particular database is most popular around 4 PM.

Splunk Cons:

    • It has a little bit of a learning curve, but it’s worth it for the kind of features and intelligence you can get from Splunk.
    • It’s the only paid option on this list.  You can try it out for 60 days with up to 500MB/day a day, and certain non-profits can apply to continue using Splunk under the 500MB/day limit.  Splunk can be used with any server access or error log, so a library might consider partnering with other departments on campus to purchase a license.2

What should you choose?

It depends on your needs, but AWStats is always a tried and true easy to install and maintain solution.  If you have the knowledge, a custom Python script is definitely better, but obviously takes time to test and develop.  If you have money and could partner with others on your campus (or just need a one-time report generated through a free trial), Splunk is very powerful, generates some slick-looking charts, and is definitely work looking into.  If there are other options not covered here, please let us know in the comments!

About our guest author: Robin Camille Davis is the Emerging Technologies & Distance Services Librarian at John Jay College of Criminal Justice (CUNY) in New York City. She received her MLIS from the University of Illinois Urbana-Champaign in 2012 with a focus in data curation. She is currently pursuing an MA in Computational Linguistics from the CUNY Graduate Center.

Notes
  1. Details about LogFormat and what each %/lettter value means can be found at http://www.oclc.org/support/services/ezproxy/documentation/cfg/logformat.en.html; LogSPU details can be found http://oclc.org/support/services/ezproxy/documentation/cfg/logspu.en.html
  2. Another paid option that offers a free trial, and comes with extensions made for parsing EZProxy logs, is Sawmill: https://www.sawmill.net/downloads.html

Making Open Access Everyone’s Business

Librarians should have a role in promoting open access content. The best methods and whether they are successful is a matter of heated debate. Take for an example a recent post by Micah Vandergrift on the ACRL Scholarly Communications mailing list, calling on librarians to stage a publishing walkout and only publish in open access library and information science journals. Many have already done so. Others, like myself, have published in traditional journals (only once in my case) but make a point of making their work available in institutional repositories. I personally would not publish in a journal that did not allow such use of my work, and I know many who feel the same way. 1 The point is, of course, to ensure that librarians are not be hypocritical in their own publishing and their use of repositories to provide open access–a long-standing problem pointed out by Dorothea Salo [2.Salo, Dorothea. “Innkeeper at the Roach Motel,” December 11, 2007. http://digital.library.wisc.edu/1793/22088.], among others2 We know that many of the reasons that faculty may hesitate to participate in open access publishing relate to promotion and tenure requirements, which generally are more flexible for academic librarians (though not in all cases–see Abigail Goben’s open access tenure experiment). I suspect that many of the reasons librarians aren’t participating more in open access has partly to do with more mundane reasons of forgetting to do so, or fearing that work is not good enough to make public.

But it shouldn’t be only staunch advocates of open access, open peer review, or new digital models for work and publishing who are participating. We have to find ways to advocate and educate in a gentle but vigorous manner, and reach out to new faculty and graduate students who need to start participating now if the future will be different. Enter Open Access Week, a now eight-year-old celebration of open access organized by SPARC. Just as Black Friday is the day that retailers hope to be in the black, Open Access Week has become an occasion to organize around and finally share our message with willing ears. Right?

It can be, but it requires a good deal of institutional dedication to make it happen. At my institution, Open Access Week is a big deal. I am co-chair of a new Scholarly Communications committee which is now responsible for planning the week (the committee used to just plan the week, but the scope has been extended). The committee has representation from Systems, Reference, Access Services, and the Information Commons, and so we are able to touch on all aspects of open access. Last year we had events five days out of five; this year we are having events four days out of five. Here are some of the approaches we are taking to creating successful conversations around open access.

    • Focus on the successes and the impact of your faculty, whether or not they are publishing in open access journals.

The annual Celebration of Faculty Scholarship takes place during Open Access Week, and brings together physical material published by all faculty at a cocktail reception. We obtain copies of articles and purchase books written by faculty, and set up laptops to display digital projects. This is a great opportunity to find out exactly what our faculty are working on, and get a sense of them as researchers that we may normally lack. It’s also a great opportunity to introduce the concept of open access and recruit participants to the institutional repository.

    • Highlight the particular achievements of faculty who are participating in open access.

We place stickers on materials at the Celebration that are included in the repository or are published in open access journals. This year we held a panel with faculty and graduate students who participate in open access publishing to discuss their experiences, both positive and negative.

  • Demonstrate the value the library adds to open access initiatives.

Recently bepress (which creates the Digital Commons repositories on which ours runs) introduced a real time map of repositories downloads that was a huge hit this year. It was a compelling visual illustration of the global impact of work in the repository. Faculty were thrilled to see their work being read across the world, and it helped to solve the problem of invisible impact. We also highlighted our impact with a new handout that lists key metrics around our repository, including hosting a new open access journal.

  • Talk about the hard issues in open access and the controversies surrounding it, for instance, CC-BY vs. CC-NC-ND licenses.

It’s important to not sugarcoat or spin challenging issues in open access. It’s important to include multiple perspectives and invite difficult conversations. Show scholars the evidence and let them draw their own conclusions, though make sure to step in and correct misunderstandings.

  • Educate about copyright and fair use, over and over again.

These issues are complicated even for people who work on them every day, and are constantly changing. Workshops, handouts, and consultation on copyright and fair use can help people feel more comfortable in the classroom and participating in open access.

  • Make it easy.

Examine what you are asking people to do to participate in open access. Rearrange workflows, cut red tape, and improve interfaces. Open Access Week is a good time to introduce new ideas, but this should be happening all year long.

We can’t expect revolutions in policy and and practice to happen overnight, or without some  sacrifice. Whether you choose to make your stand to only publish in open access journals or some other path, make your stand and help others who wish to do the same.

Notes
  1. Publishers have caught on to this tendency in librarians. For instance, Taylor and Francis has 12-18 month repository embargoes for all its journals except LIS journals. Whether this is because of the good work we have done in advocacy or a conciliatory gesture remains up for debate.
  2. Xia, Jingfeng, Sara Kay Wilhoite, and Rebekah Lynette Myers. “A ‘librarian-LIS Faculty’ Divide in Open Access Practice.” Journal of Documentation 67, no. 5 (September 6, 2011): 791–805. doi:10.1108/00220411111164673.

Hacking in Python with PyMARC

marc icon with an arrow pointing to a spreadsheet icon

The pymarc Python library is an extremely handy library that can be used for writing scripts to read, write, edit, and parse MARC records.  I was first introduced to pymarc through Heidi Frank’s excellent Code4Lib journal article on using pymarc in conjunction with MARCedit.1.  Here at ACRL TechConnect, Becky Yoose has also written about some cool applications of pymarc.

In this post, I’m going to walk through a couple of applications of using pymarc to make comma-separated (.csv) files for batch loading in DSpace and in the KBART format (e.g., for OCLC KnowledgeBase purposes).  I know what you’re thinking – why write a script when I can use MARCedit for that?  Among MARCedit’s many indispensable features is its Export Tab-Delimited Data feature. 2.  But there might be a couple of use cases where scripts can come in handy:

  • Your MARC records lack data that you need in the tab-delimited or .csv output – for example, a field used for processing in whatever system you’re loading the data into that isn’t found in the MARC record source.
  • Your delimited file output requires things like empty fields or parsed, modified fields.  MARCedit is great for exporting whole MARC fields and subfields into delimited files, but you may need to pull apart a MARC element or parse a MARC element out with some additional data.  Pymarc is great for this.
  • You have to process on a lot of records frequently, and just want something tailor-made for your use case.  While you can certainly use MARCedit to save a frequently used export template, editing that saved template when you need to change the output parameters of your delimited file isn’t easy.  With a Python script, you can easily edit your script if your delimited file requirements change.

 

Some General Notes about Python

Python is a useful scripting language to know for a lot of reasons, and for small scripting projects can have a fairly shallow learning curve.  In order to start using Python, you’ll need to set your development environment.  Macs already come with Python installed, but to double-check which version you have installed, launch Terminal and type Python -v.  In a Windows environment, you’ll need to do a few more steps, detailed here.  Full Python documentation for 2.x can be found here (though personally, I find it a little dense at times!), and CodeAcademy features some pretty extensive tutorials to help you learn the syntax quickly.  Google also has some pretty extensive tutorials on Python.  Personally, I find it easiest to learn when I have a project that actually means something to me, so if you’re familiar with MARC records,  just downloading the Python scripts mentioned in this post below and learning how to run them can be a good start.

Spacing and indentation in Python is very important, so if you’re getting errors running scripts make sure that your conditional statements are indented properly 3. Also the version of Python on your machine makes a really big difference, and the version will determine whether your code runs successfully.  These examples have all been tested with Python 2.6 and 2.7, but not with Python 3 or above.  I find that Python 2.x has more help resources out on the web, which is nice to be able to take advantage of when you’re first starting out.

Use Case 1:  MARC to KBART

The complete script described below, along with some sample MARC records, is on GitHub.

The KBART format is a NISO/United Kingdom Serials Group (UKSG) initiative designed to standardize information for use with Knowledge Base systems.  The KBART format is a series of standardized fields that can be used to identify serial coverage (e.g., start date and end date) URLs, publisher information, and more.4  Notably, it’s the required format for adding and modifying customized collections in OCLC’s Knowledge Base. 5.

In this use case, I’m using OCLC’s modified KBART file – most of the fields are KBART standard but a few fields are specific to OCLC’s Knowledge Base, e.g., oclc_collection_name.6.  Typically, these KBART files would be created either manually, or generated by using some VLOOKUP Excel magic with an existing KBART file 7.  In my use case, I needed to batch migrate a bunch of data stored in MARC records to the OCLC Knowledge Base, and these particular records didn’t always correspond neatly to existing OCLC Knowledge Base Collections.  For example, one collection we wanted to manage in the OCLC Knowledge Base was the library’s print holdings so that these titles were displayed in OCLC’s A-Z Journal lookup tool.

First, I’ll need a few helper libraries for my script:

import csv
from pymarc import MARCReader
from os import listdir
from re import search

Then, I’ll declare the source directory where my MARC records are (in this case, in the same directory the script lives in a folder called marc) and instruct the script to go find all the .mrc files in that directory.  Note that this enables the script to process either a single file of multiple MARC records, or multiple distinct files, all at once.

# change the source directory to whatever directory your .mrc files are in
SRC_DIR = 'marc/'

The script identifies MARC records by looking for all files ending in .mrc using the Python filter function.  Within the filter function, lambda creates a one-off anonymous function to define the filter parameters: 8

# get a list of all .mrc files in source directory 
file_list = filter(lambda x: search('.mrc', x), listdir(SRC_DIR))

Now I’ll define the output parameters.  I need a tab-delimited file, not a comma-delimited file, but either way I’ll use the csv.writer function to create a .txt file and define the tab delimiter (\t).  I also don’t really want the fields quoted unless there’s a special character or something that might cause a problem reading the file, so I’ll set quoting to minimal:

#create tab delimited text file that quotes if special characters are present
csv_out = csv.writer(open('kbart.txt', 'w'), delimiter = '\t', 
quotechar = '"', quoting = csv.QUOTE_MINIMAL)


And I’ll also create the header row, which includes the KBART fields required by the OCLC Knowledge Base:

#create the header row
csv_out.writerow(['publication_title', 'print_identifier', 
'online_identifier', 'date_first_issue_online', 
'num_first_vol_online', 'num_first_issue_online', 
'date_last_issue_online', 'num_last_vol_online', 
'num_last_issue_online', 'title_url', 'first_author', 
'title_id', 'coverage_depth', 'coverage_notes', 
'publisher_name', 'location', 'title_notes', 'oclc_collection_name', 
'oclc_collection_id', 'oclc_entry_id', 'oclc_linkscheme', 
'oclc_number', 'action'])

Next, I’ll start a loop for writing a row into the tab-delimited text file for every MARC record found.

for item in file_list:
 fd = file(SRC_DIR + '/' + item, 'r')
 reader = MARCReader(fd)

By default, we’ll need to set each field’s value to blank (”), so that errors are avoided if the value is not present in the MARC record.  We can set all the values to blank to start with in one statement:

for record in reader:
    publication_title = print_identifier = online_identifier 
    = date_first_issue_online = num_first_vol_online 
    = num_first_issue_online = date_last_issue_online 
    = num_last_vol_online = num_last_issue_online 
    = title_url = first_author = title_id = coverage_depth 
    = coverage_notes = publisher_name = location = title_notes 
    = oclc_collection_name = oclc_collection_id = oclc_entry_id 
    = oclc_linkscheme = oclc_number = action = ''

Now I can start pulling in data from MARC fields.  At its simplest, for each field defined in my CSV, I can pull data out of MARC fields using a construction like this:

    #title_url
    if record ['856'] is not None:
      title_url = record['856']['u']

I can use generic python string parsing tools to clean up fields.  For example, if I need to strip the ending / out of a MARC 245$a field, I can use .rsplit to return everything before that final slash (/):

 
   # publication_title
    if record['245'] is not None:
      publication_title = record['245']['a'].rsplit('/', 1)[0]
      if record['245']['b'] is not None:
          publication_title = publication_title + " " + record['245']['b']

Also note that once you’ve declared a variable value (like publication_title) you can re-use that variable name to add more onto the string, as I’ve done with the 245$b subtitle above.

As is usually the case for MARC records, the trickiest business has to do with parsing out serial ranges.  The KBART format is really designed to capture, as accurately as possible, thresholds for beginning and ending dates.  MARC makes this…complicated.  Many libraries might use the 866 summary field to establish summary ranges, but there are varying local practices that determine what these might look like.  In my case, the minimal information I was looking for included beginning and ending years, and the records I was processing with the script stored summary information fairly cleanly in the 863 and 866 fields:

    # date_first_issue_online
    if record ['866'] is not None:
      date_first_issue_online = record['863']['a'].rsplit('-', 1)[0]
 
    # date_last_issue_online
    if record ['866'] is not None:
      date_last_issue_online = record['866']['a'].rsplit('-', 1)[-1]

Where further adjustments were necessary, and I couldn’t easily account for the edge cases programmatically, I imported the KBART file into OpenRefine for further cleanup.  A sample KBART file created by this script is available here.

Use Case 2:  MARC to DSpace Batch Upload

Again, the complete script for transforming MARC records for DSpace ingest is on GitHub.

This use case is very similar to the KBART scenario.  We use DSpace as our institutional repository, and we had about 2000 historical theses and dissertation MARC  records to transform and ingest into DSpace.  DSpace facilitates batch-uploading metadata as CSV files according to the RFC4180 format, which basically means all the field elements use double quotes. 9  While the fields being pulled from are different, the structure of the script is almost exactly the same.

When we first define the CSV output, we need to ensure that quoting is used for all elements – csv.QUOTE_ALL:

csv_out = csv.writer(open('output/theses.txt', 'w'), delimiter = '\t', 
quotechar = '"', quoting = csv.QUOTE_ALL)

The other nice thing about using Python for this transformation is the ability to program in static text that would be the same for all the lines in the CSV file.  For example, I parsed together a more friendly display of the department the thesis was submitted to like so:

# dc.contributor.department
if record ['690']['x'] is not None:
dccontributordepartment = ('UniversityName.  Department of ') + 
record['690']['x'].rsplit('.', 1)[0]

You can view a sample output file created by this script on here on Github.

Other PyMARC Projects

There are lots of other, potentially more interesting things that can be done with pymarc, although certainly one of its most common applications is converting MARC to more flexible formats, such as MARCXML 10.  If you’re interested in learning more,  join the pymarc Google Group, where you can often get help by the original pymarc developers (Ed Summers, Mark Matienzo, Gabriel Farrell and Geoffrey Spear).

Notes
  1. Frank, Heidi. “Augmenting the Cataloger’s Bag of Tricks: Using MarcEdit, Python, and pymarc for Batch-Processing MARC Records Generated from the Archivists’ Toolkit.” Code4Lib Journal, 20 (2013):  http://journal.code4lib.org/articles/8336
  2. https://www.youtube.com/watch?v=qkzJmNOvY00
  3. For the specifications on indentation, take a look at http://legacy.python.org/dev/peps/pep-0008/#id12
  4. http://www.uksg.org/kbart/s5/guidelines/data_fields
  5. http://oclc.org/content/dam/support/knowledge-base/kb_modify.pdf
  6.  A sample blank KBART Excel sheet for OCLC’s Knowledge Base can be found here:  http://www.oclc.org/support/documentation/collection-management/kb_kbart.xlsx.  KBART files must be saved as tab-delimited .txt files prior to upload, but are obviously more easily edited manually in Excel
  7. If you happen to be in need of this for OCLC’s KB, or  are in need of comparing two Excel files, here’s a walkthrough of how to use VLOOKUP to select owned titles from a large OCLC KBART file: http://youtu.be/mUhkMzpPnBE
  8. A nice tutorial on using lambda and filter together can be found here:  http://www.u.arizona.edu/~erdmann/mse350/topics/list_comprehensions.html#filter
  9.  https://wiki.duraspace.org/display/DSDOC18/Batch+Metadata+Editing
  10.  http://docs.evergreen-ils.org/2.3/_migrating_your_bibliographic_records.html