Anthology API - acl-org/acl-anthology GitHub Wiki

The Anthology's official data is hosted in the data/xml directory. You could parse this yourself, but there is a nice Python library we have built around it. This can be found under bin/anthology/. Many scripts that use this Anthology() class can be found in the bin/ directory, such as create_hugo_yaml.py and add_revision.py. You can view those scripts for an example usage, for example, for how to iterate over the volumes and papers in the Anthology.

Here is a quick example:

from anthology import Anthology

anthology = Anthology(importdir=XML_DIR)
    for id_, paper in anthology.papers.items():
        print(paper.full_id, paper.get_title('text'))

Here, XML_DIR is the directory containing the xml/ directory. This script will print out every paper in the Anthology, along with its title.