Adding DOIs - acl-org/acl-anthology GitHub Wiki

We add DOIs to official ACL events (conference proceedings and workshops). These have the form

10.18653/v1/${ANTHOLOGY_ID}

e.g.,

10.18653/v1/W19-5203

and can be accessed online via the DOI page (http://doi.org/10.18653/v1/W19-5203), which then redirects to the Anthology page (https://www.aclweb.org/anthology/W19-5203).

(Note: Prior to 2016, ACL used the prefix 10.3115)

Creating DOIs and adding them to our XML is part of the ingestion process. It comprises two steps:

  1. Creating the DOIs. This is accomplished by producing an XML file that is uploaded to CrossRef.
  2. Ingesting the DOIs. This task adds <doi> tags to our XML, so they can be displayed on web pages.

Creating the DOIs

We assign a digital object identifier (DOI) for each paper published by ACL and export an XML file for DOI registration on CrossRef.

To create these, use the script generate_crossref_doi_metadata.py, which takes a list of volume IDs and creates the XML file.

The full volume id = collection id + '-' + volume id, e.g. 2021.naacl-main

$ ./bin/generate_crossref_doi_metadata.py P19-1 P19-2 P19-3 P19-4 \
    $(for num in $(seq 32 54); do echo " W19-$num"; done) > acl2019_doi.xml
$ python3 bin/generate_crossref_doi_metadata.py 2021.naacl-main 2021.naacl-demos 2021.naacl-srw 2021.naacl-tutorials 2021.naacl-industry 2021.alvr-1 2021.americasnlp-1 2021.autosimtrans-1 2021.bionlp-1 2021.calcs-1 2021.clpsych-1 2021.cmcl-1 2021.dash-1 2021.deelio-1 2021.maiworkshop-1 2021.nlp4if-1 2021.nlpmc-1 2021.nuse-1 2021.privatenlp-1 2021.sdp-1 2021.sigtyp-1 2021.smm4h-1 2021.socialnlp-1 2021.teachingnlp-1 2021.textgraphs-1 2021.trustnlp-1 2021.vigil-1 > naacl2021_doi.xml

You can validate your XML file here.

Then, navigate to https://doi.crossref.org/servlet/useragent, and login. Click on "Upload Submissions". Choose your XML file, and for type select "Metadata" (the default). Upload the file, and it is queued for ingestion.

You can click on the "Administration" tab and hit "Search" (without filling in any boxes) to see the status. Note that once it's uploaded, the started and finished columns will say Never. This is normal and the timestamps will be updated once the process is finished.

At the bottom of the message textarea field is a summary that looks like this:

   <batch_data>
      <record_count>1393</record_count>
      <success_count>1392</success_count>
      <warning_count>0</warning_count>
      <failure_count>1</failure_count>
   </batch_data>

DOI minting costs $1 each, for which ACL is billed once a year. You don't have to worry about this.

Ingesting the DOIs

The ingestion script is add_dois.py. This script also takes a list of volume names. It iterates through all papers in the volumes, checks whether the DOI URL resolves, and if so, it injects the <doi> tag into the XML. You can then commit this, push to Github, and create a pull request.

$ python3 ./bin/add_dois.py P19-1 P19-2 P19-3 P19-4 \
    $(for num in $(seq 32 54); do echo " W19-$num"; done)
Attempting to add DOIs for P19-1
Identified as Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
-> [P19-1000] Skipping since DOI http://dx.doi.org/10.18653/v1/P19-1000 doesn't exist
Adding DOI 10.18653/v1/P19-1001
Adding DOI 10.18653/v1/P19-1002
Adding DOI 10.18653/v1/P19-1003
Adding DOI 10.18653/v1/P19-1004
...

It takes a while to run because it limits queries to one per second, and will also adhere to 429 "retry-after" complaints from the web server.

⚠️ **GitHub.com Fallback** ⚠️