Venues, Volumes, and Events - acl-org/acl-anthology GitHub Wiki

Venues

Venues in the Anthology are conferences and journals where papers are published. Each venue has its own venue page. Examples include the following:

Venues have first-order representation in the Anthology in data/yaml/venues.yaml. Here is an example entry for ACL:

acl:
  acronym: ACL
  is_acl: true
  is_toplevel: true
  name: Annual Meeting of the Association for Computational Linguistics
  oldstyle_letter: P

The top-level key is the venue's slug. This is used to create its venue name and, since 2020, in the Anthology ID. Prior to 2020, a single letter was used for Anthology IDs for some venues, e.g., P as indicated here for ACL. (The history here is that "P" originally stood for "Proceedings", when there was only ACL to worry about). The default letter was W ("workshop"), a catch-all that placed all venues without a top-level letter into a single undifferentiated category. Only one venue can be associated with an "oldstyle letter"; if you add it to two or more venues, it will be used by the last venue it was added to.

acronym is the human-written acronym, which can contain hyphens and mixed capitalization, and name is the venue name. This name should be the name of the venue, and not of an event held by that venue.

There are a number of binary variables, which default to False. is_acl denotes whether this is an ACL venue (conference, journal, or publication). Unfortunately, this is not always clear or true across years, but generally an event is an ACL event if it is managed by ACL or its regional variants or if it is associated with one of the ACL Special Interest Groups.

is_toplevel determines whether the event is displayed on the front page of the Anthology.

The raw data for each venue is stored in XML files, one per year. For example, ACL in 2020 is 2020.acl.xml; in 2019 and before, it was Pyy.xml (where yy is the year).

Volumes

Venues issue volumes. Prior to 2020, volumes were identified by a two-digit number for W and a single-digit number for all other letters. Volumes are listed underneath https://www.aclweb.org/anthology/volumes. Example volume names were P19-5 and W19-52.

Modern volume names are not restricted in this way, but can be named with arbitrary strings under [a-z0-9]+. Examples include 2020.acl-main and 2020.autosimtrans-1.

There is a one-to-one association between volumes and venues through their Anthology ID, but in case of the oldstyle "W"-volumes, this venue is always the generic "workshop" venue. To associate these volumes with more specific venues, we need the joint.yaml file described in the next section.

Events

An event is a meeting of a venue in a particular year, e.g., EMNLP 2020. Events are not represented as first order objects, but are inferred from the file representation for venues, so there is one event per venue per year. There can be, however, an <event> block inside each collection file (e.g., 2025.acl.xml) that lists both metadata for an event (location, dates, website URL) and a set of volumes that were presented at the event.

Events can be joint events. This term unfortunately has two meanings:

  1. Two venues might issue joint proceedings or volume; for example, in 2015, ACL and IJCNLP held a joint conference in Beijing.
  2. Two venues might be co-located; for example, when a workshop is attached to a larger conference.

Joint volumes

Volumes are associated with venues using the <venue> tag in the volume's <meta> block. This is used to determine (a) what venue pages a volume will be displayed on and (b) what event pages a volume will be displayed on. A joint venue is handled by adding multiple <venue> tags such that the volume appears under both venues. The Anthology ID itself must choose one of the venues for its name. For example, in 2024, MWE and UDW held a joint workshop. The Anthology ID used mwe, but multiple venue tags cause the volume to be displayed under both venues.

An interesting use of the "venue" tag is the ws venue. Any workshop volume is assigned this venue in addition to its own venue. This causes the volume to appear under both venues. For example, the 2025 volume of the AfricaNLP is tagged with both <venue>africanlp</venue> and <venue>ws</venue>. The ws venue allows easy browsing of all workshop volumes introduced in a year.

Colocated volumes

Second, events can be colocated. This is done using a <colocated> block under the <collection> tag in the XML file. The parent venue will list all the workshops that appeared with it. So, for EMNLP 2020, we use a block like this in 2020.emnlp.xml:

<collection id="2020.emnlp">
  <!-- elided -->
  <event id="emnlp-2020">
    <colocated>
      <volume-id>2020.findings-emnlp</volume-id>
      <volume-id>2020.alw-1</volume-id>
      <volume-id>2020.blackboxnlp-1</volume-id>
      <volume-id>2020.clinicalnlp-1</volume-id>
      <volume-id>2020.cmcl-1</volume-id>
      <volume-id>2020.codi-1</volume-id>
      <volume-id>2020.deelio-1</volume-id>
      <volume-id>2020.eval4nlp-1</volume-id>
      <volume-id>2020.insights-1</volume-id>
      <volume-id>2020.intexsempar-1</volume-id>
      <volume-id>2020.louhi-1</volume-id>
      <volume-id>2020.nlpbt-1</volume-id>
      <volume-id>2020.nlpcovid19-2</volume-id>
      <volume-id>2020.nlpcss-1</volume-id>
      <volume-id>2020.nlposs-1</volume-id>
      <volume-id>2020.privatenlp-1</volume-id>
      <volume-id>2020.scai-1</volume-id>
      <volume-id>2020.sdp-1</volume-id>
      <volume-id>2020.sigtyp-1</volume-id>
      <volume-id>2020.splu-1</volume-id>
      <volume-id>2020.spnlp-1</volume-id>
      <volume-id>2020.sustainlp-1</volume-id>
      <volume-id>2020.wmt-1</volume-id>
      <volume-id>2020.wnut-1</volume-id>

This causes all of these volumes to appear on the event page for EMNLP 2020, but not on the EMNLP 2020 venue pages. The venue page just lists volumes emanated under that venue ID, whereas the event page lists volumes that were associated by virtue of being presented at a shared event.

⚠️ **GitHub.com Fallback** ⚠️