An Elevator Pitch for File Naming ConventionsPosted: January 14, 2013 | Author: Meghan Frazer | Filed under: data, digital libraries | Tags: digital file management, personal archives | 2 Comments »
As a curator and a coder, I know it is essential to use naming conventions. It is important to employ a consistent approach when naming digital files or software components such as modules or variables. However, when a student assistant asked me recently why it was important not to use spaces in our image file names, I struggled to come up with an answer. “Because I said so,” while tempting, is not really an acceptable response. Why, in fact, is this important? For this blog entry, I set out to answer this question and to see if, along the way, I could develop an “elevator pitch” – a short spiel on the reasoning behind file naming conventions.
As a habit, I implore my assistants and anyone I work with on digital collections to adhere to the following when naming files:
- Do not use spaces or special characters (other than “-” and “_”)
- Use descriptive file names. Descriptive file names include date information and keywords regarding the content of the file, within a reasonable length.
- Date information is the following format: YYYY-MM-DD.
So, 2013-01-03-SmithSculptureOSU.jpg would be an appropriate file name, whereas Smith Jan 13.jpg would not. But, are these modern practices? Current versions of Windows, for example, will accept a wide variety of special characters and spaces in naming files, so why is it important to restrict the use of these characters in our work?
The Search Results
A quick Google search finds support for my assertions, though often for very specific cases of file management. For example, the University of Oregon publishes recommendations on file naming for managing research data. A similar guide is available from the University of Illinois library, but takes a much more technical, detailed stance on the format of file names for the purposes of the library’s digital content.
The Bentley Historical Library at University of Michigan, however, provides a general guide to digital file management very much in line with my practices: use descriptive directory and file names, avoid special characters and spaces. In addition, this page discusses avoiding personal names in the directory structure and using consistent conventions to indicate the version of a file.
The Why – Dates
The Bentley page also provides links to a couple of sources which help answer the “why” question. First, there is the ISO date standard (or, officially, “ISO 8601:2004: Data elements and interchange formats — Information interchange — Representation of dates and times”). This standard dictates that dates be ordered from largest term to smallest term, so instead of the month-day-year we all wrote on our grade school papers, dates should take the form year-month-day. Further, since we have passed into a new millennium, a four digit year is necessary. This provides a consistent format to eliminate confusion, but also allows for file systems to sort the files appropriately. For example, let’s look at the following three files:
If we expressed those dates in another format, say, month-day-year, they would not be listed in chronological order in a file system sorting alphabetically. Instead, we would see:
This may not be a problem if you are visually searching through three files, but what if there were 100? Now, if we only used a two digit year, we would see:
If we did not standardize the number of digits, we might see:
You can try this pretty easily on your own system. Create three text files with the names above, sort the files by name and check the order. Imagine the problems this might create for someone trying to quickly locate a file.
You might ask, why include the date at all, when dates are also maintained by the operating system? There are many situations where the operating system dates are unreliable. In cases where a file moves to a new drive or computer, for example, the Date Created may reflect the date the file moved to the new system, instead of the initial creation date. Or, consider the case where a user opens a file to view it and the application changes the Date Modified, even though the file content was not modified. Lastly, consider our earlier example of a photograph from 1960; the Date Created is likely to reflect the date of digitization. In each of these examples, it would be helpful to include an additional date in the file name.
The Why – Descriptive File Names
So far we have digressed down a date-specific path. What about our other conventions? Why are those important? Also linked from the Bentley Library and in the Google search results are YouTube videos created by the State Library of North Carolina which answer some of these questions. The Inform U series on file naming has four parts, and is intended to help users manage personal files. However, the rationale described in Part 1 for descriptive file names in personal file management also applies in our libraries.
First, we want to avoid the accidental overwriting of files. Image files can provide a good example here: many cameras use the file naming convention of IMG_1234.jpg. If this name is unique to the system, that works ok, but in a situation where multiple cameras or scanners are generating files for a digital collection, there is potential for problems. It is better to batch re-name image files with a more descriptive name. (Tutorials on this can be found all over the web, such as the first item in this list on using Mac’s Automator program to re-name a batch of photos).
Second, we want to avoid the loss of files due to non-descriptive names. While many operating systems will search the text content of files, naming files appropriately makes for more efficient access. For example, consider mynotes.docx and 2012-01-05WebMeetingNotes.docx – which file’s contents are easier to ascertain?
I should note, however, that there are cases where non-descriptive file names are appropriate. The use of a unique identifier as a filename is sometimes a necessary approach. However, in those cases where you must use a non-descriptive filename, be sure that the file names are unique and in a descriptive directory structure. Overall, it is important that others in the organization have the same ability to find and use the files we currently manage, long after we have moved on to another institution.
The Why – Special Characters & Spaces
We have now covered descriptive names and reasons for including dates, which leaves us with spaces and special characters to address. Part 3 of the Inform U video series addresses this as well. Special characters can designate special meaning to programming languages and operating systems, and might be misinterpreted when included in file names. For instance, the $ character designates the beginning of variable names in the php programming language and the \ character designates file path locations in the Windows operating system.
Spaces may make things easier for humans to read, but systems generally do better without the spaces. While operating systems attempt to handle spaces gracefully and generally do so, browsers and software programs are not consistent in how they handle spaces. For example, consider a file stored in a digital repository system with a space in the file name. The user downloads the file and their browser truncates the file name after the first space. This equates to the loss of any descriptive information after the first space. Plus, the file extension is also removed, which may make it harder for less tech savvy users to use a file.
That example leads us to the heart of the issue: we never know where our files are going to end up, especially files disseminated to the public. Our content is useless if our users cannot open it due to a poorly formatted file name. And, in the case of non-public files or our personal archives, it is essential to facilitate the discovery of items in the piles and piles of digital material accumulated every day.
So, do I have my elevator pitch? I think so. When asked about file naming standards in the future, I think I can safely reply with the following: “It is impossible to accurately predict all of the situations in which a file might be used. Therefore, in the interest of preserving access to digital files, we choose file name components that are least likely to cause a problem in any environment. File names should provide context and be easily understood by humans and computers, now and in the future.”
And, like a good file name, that is much more effective and descriptive than, “Because I said so.”