Venues, Volumes, and Events - acl-org/acl-anthology GitHub Wiki

Venues

Venues in the Anthology are conferences and journals where papers are published. Each venue has its own venue page. Examples include the following:

Venues have first-order representation in the Anthology in data/yaml/venues.yaml. Here is an example entry for ACL:

acl:
  acronym: ACL
  is_acl: true
  is_toplevel: true
  name: Annual Meeting of the Association for Computational Linguistics
  oldstyle_letter: P

The top-level key is the venue's slug. This is used to create its venue name and, since 2020, in the Anthology ID. Prior to 2020, a single letter was used for Anthology IDs for some venues, e.g., P as indicated here for ACL. (The history here is that "P" originally stood for "Proceedings", when there was only ACL to worry about). The default letter was W ("workshop"), a catch-all that placed all venues without a top-level letter into a single undifferentiated category. Only one venue can be associated with an "oldstyle letter"; if you add it to two or more venues, it will be used by the last venue it was added to.

acronym is the human-written acronym, which can contain hyphens and mixed capitalization, and name is the venue name. This name should be the name of the venue, and not of an event held by that venue.

There are a number of binary variables, which default to False. is_acl denotes whether this is an ACL venue (conference, journal, or publication). Unfortunately, this is not always clear or true across years, but generally an event is an ACL event if it is managed by ACL or its regional variants or if it is associated with one of the ACL Special Interest Groups.

is_toplevel determines whether the event is displayed on the front page of the Anthology.

The raw data for each venue is stored in XML files, one per year. For example, ACL in 2020 is 2020.acl.xml; in 2019 and before, it was Pyy.xml (where yy is the year).

Volumes

Venues issue volumes. Prior to 2020, volumes were identified by a two-digit number for W and a single-digit number for all other letters. Volumes are listed underneath https://www.aclweb.org/anthology/volumes. Example volume names were P19-5 and W19-52.

Modern volume names are not restricted in this way, but can be named with arbitrary strings under [a-z0-9]+. Examples include 2020.acl-main and 2020.autosimtrans-1.

There is a one-to-one association between volumes and venues through their Anthology ID, but in case of the oldstyle "W"-volumes, this venue is always the generic "workshop" venue. To associate these volumes with more specific venues, we need the joint.yaml file described in the next section.

Events

An event is a meeting of a venue in a particular year, e.g., EMNLP 2020. Events are not represented as first order objects, but are inferred from the file representation for venues, so there is one event per venue per year.

Events can be joint events. This term unfortunately has two meanings:

  1. Two venues might issue joint proceedings; for example, in 2015, ACL and IJCNLP held a joint conference in Beijing.
  2. Two venues might be co-located; for example, when a workshop is attached to a larger conference.

The file joint.yaml comes into play in both situations. In the first situation, the proceedings have to be assigned an Anthology ID, so we must choose one of the venue slugs. For ACL-IJCNLP 2015, P was used. This automatically associates the venue with ACL. However, we also want the volumes created in this event to be associated with the IJCNLP venue. To do this, we create the following entry in data/yaml/joint.yaml:

ijcnlp:
  2015:
  - P15-1
  - P15-2
  - P15-3
  - P15-4
  - P15-5

This associates the five volumes of ACL 2015 with IJCNLP.

Second, events can be colocated. This is done in the same way. For example, to cause all EMNLP 2020 workshops to display on the EMNLP page, we use:

emnlp:
  2020: [2020.findings-emnlp, 2020.alw-1, 2020.blackboxnlp-1, 2020.clinicalnlp-1, 2020.cmcl-1, 2020.codi-1, 2020.deelio-1, 2020.eval4nlp-1, 2020.insights-1, 2020.intexsempar-1, 2020.louhi-1, 2020.nlpbt-1, 2020.nlpcovid19-2, 2020.nlpcss-1, 2020.nlposs-1, 2020.privatenlp-1, 2020.scai-1, 2020.sdp-1, 2020.sigtyp-1, 2020.splu-1, 2020.spnlp-1, 2020.sustainlp-1, 2020.wnut-1]

Here, note the alternate but equivalent single-line YAML formatting, as well as the modernized volume ID formats.

The joint.yaml file is also needed to associate oldstyle "W"-events with their more specific venue codes. For example, the complete volume history of the Conference on Machine Translation (WMT) is established with this entry:

wmt:
  2006:
  - W06-31
  2007:
  - W07-07
  2008:
  - W08-03
  2009:
  - W09-04
  2010:
  - W10-17
  2011:
  - W11-21
  2012:
  - W12-31
  2013:
  - W13-22
  2014:
  - W14-33
  2015:
  - W15-30
  2016:
  - W16-22
  - W16-23
  2017:
  - W17-47
  2018:
  - W18-63
  - W18-64
  2019:
  - W19-52
  - W19-53
  - W19-54