Name Variants - acl-org/acl-anthology GitHub Wiki

The Anthology needs to know how to map names of authors/editors to individuals. It does this mainly via the file data/yaml/name_variants.yaml. Each entry in this file describes a different person; an example entry would be:

- canonical: Aravind Joshi
  id: aravind-joshi
  comment: Penn
  similar: Aravind Q. Joshi
  variants:
  - Aravind K. Joshi
  - A. K. Joshi

Only the canonical field is required.

There are two ways to indicate a mapping from a name to a person.

Variant method. If a person goes by multiple names, like Aravind Joshi and Aravind K. Joshi, all names should be entered into the file

The canonical name is the one that the Anthology displays by default. The variants must be globally unique.

ID method. Alternatively, the referent of the name can be indicated in the XML file itself using the id attribute:

<author id="aravind-joshi"><first>A.</first> <last>Joshi</last></author>

When should each of the two methods be used?

  • The variant method is better for names that are likely to be unique (because variant names must be unique) and likely to be reused (because variant names don't need special annotation to be resolved correctly).

  • The ID method is better for names that are either likely to be non-unique (for example, a name abbreviated to use just a first initial) or unlikely to be reused (for example, a misspelling).

The Anthology enforces the constraint that each name must always use the variant method or always use the ID method. This is to reduce the chance of a newly ingested paper having an author name that is not resolved correctly.

An ID can be any unique string that uses only characters allowed in URLs. Usually it is based on the author's canonical name, but in the case of two authors with a name in common, the ID could add a middle initial to distinguish them, or failing that, the current convention is to append the author's PhD institution to their ID (e.g., aravind-joshi-upenn).

There are two other fields used to help identify people:

  • The comment field is displayed under a person's name on their Anthology page. It can be any text. Usually it lists past and current affiliations.

  • If two people have the same canonical name, the Anthology automatically adds them to each other's "People with similar names" list. If there other people who have almost the same name, you can add their IDs to the similar field.

⚠️ **GitHub.com Fallback** ⚠️