Names are Hard

A while ago I stumbled onto the post “Falsehoods Programmers Believe About Names” and was stunned. Personal names are one of the most deceptively difficult forms of data to work with and this article touched on so many common but unaddressed problems. Assumptions like “people have exactly one canonical name” and “My system will never have to deal with names from China/Japan/Korea” were apparent everywhere. I consider myself a fairly critical and studious person, I devote time to thinking about the consequences of design decisions and carefully attempt to avoid poor assumptions. But I’ve repeatedly run into trouble when handling personal names as data. There is a cognitive dissonance surrounding names; we treat them as rigid identifiers when they’re anything but. We acknowledge their importance but struggle to take them as seriously.

Names change. They change due to marriage, divorce, child custody, adoption, gender identity, religious devotion, performance art, witness protection, or none of these at all. Sometimes people just want a new name. And none of these reasons for change are more or less valid than others, though our legal system doesn’t always treat them equally. We have students who change their legal name, which is often something systems expect, but then they have the audacity to want to change their username, too! And that works less often because all sorts of system integrations expect usernames to be persistent.

Names do not have a universal structure. There is no set quantity of components in a name nor an established order to those components. At my college, we have students without surnames. In almost all our systems, surname is a required field, so we put a period “.” there to satisfy that requirement. Then, on displays in our digital repository where surnames are assumed, we end up with bolded section headers like “., Johnathan” which look awkward.

Many Western names might follow a [Given name] – [Middle name] – [Surname] structure and an unfortunate number of the systems I have to deal with assume all names share this structure. It’s easy to see how this yields problematic results. For instance, if you want to a see a sorted list of users, you probably want to sort by family name, but many systems sort by the name in the last position causing, for instance, Chinese names 1 to be handled differently from Western ones. 2 But it’s not only that someone might not have a middle name, or might have two middle names, or might have a family name in the first position—no, even that would be too simple! Some name components defy simple classifications. I once met a person named “Bus Stop”. “Stop” is clearly not a family affiliation, despite coming in the final position of the name. Sometimes the second component of a tripartite Western name isn’t a middle name at all, but a maiden name or the second word of a two-word first name (e.g. “Mary Anne” or “Lady Bird”)! One cannot even determine by looking at a familiar structure the roles of all of a name’s pieces!

Names are also contextual. One’s name with family, with legal institutions, and with classmates can all differ. Many of our international students have alternative Westernized first names. Their family may call them Qiáng but they introduce themselves as Brian in class. We ask for a “preferred name” in a lot of systems, which is a nice step forward, but don’t ask when it’s preferred. Names might be meant for different situations. We have no system remotely ready for this, despite the personalization that’s been seeping into web platforms for decades.

So if names are such a trouble, why not do our best and move on? Aren’t these fringe cases that don’t affect the vast majority of our users? These issues simply cannot be ignored because names are vital. What one is called, even if it’s not a stable identifier, has great effects on one’s life. It’s dispiriting to witness one’s name misspelled, mispronounced, treated as an inconvenience, botched at every turn. A system that won’t adapt to suit a name delegitimizes the name. It says, “oh that’s not your real name” as if names had differing degrees of reality. But a person may have multiple names—or many overlapping names over time—and while one may be more institutionally recognized at a given time, none are less real than the others. If even a single student a year is affected, it’s the absolute least amount of respect we can show to affirm their name(s).

So what do we to do? Endless enumerations of the difficulties of working with names does little but paralyze us. Honestly, when I consider about the best implementation of personal names, the MODS metadata schema comes to mind. Having a <name> element with any number of <namePart> children is the best model available. The <namePart>s can be ordered in particular ways, a “@type” attribute can define a part’s function 3, a record can include multiple names referencing the same person, multiple names with distinct parts can be linked to the same authority record, etc. MODS has a flexible and comprehensive treatment of name data. Unfortunately, returning to “Falsehoods Programmers Believe”, none of the library systems I administer do anywhere near as good a job as this metadata schema. Nor is it necessarily a problem with Western bias—even the Chinese government can’t develop computer systems to accurately represent the names of people in the country, or even agree on what the legal character set should be! 4 It seems that programmers start their apps by creating a “users” database table with columns for unique identifier, username, “firstname”/”lastname” [sic], and work from there. On the bright side, the name isn’t used as the identifier at least! We all learned that in databases class but we didn’t learn to make “names” a separate table linked to “users” in our relational databases.

In my day-to-day work, the best I’ve done is to be sensitive to the importance of names changes specifically and how our systems handle them. After a few meetings with a cross-departmental team, we developed a name change process at our college. System administrators from across the institution are on a shared listserv where name changes are announced. In the libraries, I spoke with our frontline service staff about assisting with name changes. Our people at the circulation desk know to notice name discrepancies—sometimes a name badge has been updated but not our catalog records, we can offer to make them match—but also to guide students who may need to contact the registrar or other departments on campus to initiate the top-down name change process. While most of our the library’s systems don’t easily accommodate username changes, I can write administrative scripts for our institutional repository that alter the ownership of a set of items from an old username to a new one. I think it’s important to remember that we’re inconveniencing the user with the work of implementing their name change and not the other way around. So taking whatever extra steps we can do on our own, without pushing labor onto our students and staff, is the best way we can mitigate how poorly our tools are able to support the protean nature of personal names.


  1. Chinese names typically have the surname first, followed by the given name.
  2. Another poor implementation can be seen in The Chicago Manual of Style‘s indexing instructions, which has an extensive list of exceptions to the Western norm and how to handle them. But CMoS provides no guidance on how one would go about identifying a name’s cultural background or, for instance, identifying a compound surname.
  3. Although the MODS user guidelines sadly limit the use of the type attribute to a fixed list of values which includes “family” and “given”, rendering it subject to most of the critiques in this post. Substantially expanding this list with “maiden”, “patronymic/matronymic” (names based on a parental given name, e.g. Mikhailovich), and more, as well as some sort of open-ended “other” option, would be a great improvement.

2 thoughts on “Names are Hard”

  1. Have you ever tried library cataloguing rules? They have 150 years+ experience in dealing with these issues.

    1. I have not because I’m not a cataloger, though I work with metadata (thus the MODS example). I took a peak just now at OCLC’s guidelines for personal names and saw that, for instance, the 1st indicator in MARC fields can only denote whether a name is a forename, surname, or family name which obviously omits all types of names I discussed in this post. I’m curious what RDA’s personal name instructions are but they’re not something I’ve come across in my work.

Leave a Reply

Your email address will not be published. Required fields are marked *