schema - OpenAPC/openapc-de GitHub Wiki

OpenAPC data schemas

The following schemas describe the data sets aggregated by the OpenAPC initiative. Each line conforms to a column in the according CSV file. At the moment, 4 data sets are maintained:

  1. APC data set (for APCs on a per-publication basis)
  2. BPC data set (for BPCs on a per-publication basis)
  3. Transformative Agreements (TA) data set (for journal articles published under transformative agreements or other payment models, like Springer Compact or the German DEAL agreements)
  4. Contracts data set (aggregation of cost components for transformative and other publishing agreements)

APC data set

This is the original data set OpenAPC started with. It collects cost data on Article Processing Charges (APCs) on a per-publication basis. An article is assigned to this collection only if it has been paid for independently and was not part of any contract or agreement. It consists of 18 metadata fields, with 5 of them being mandatory when contributing data.

Mandatory and backup columns

Only the first 5 columns are mandatory in all cases. The 4 columns marked as backup are required only if at least one of the articles in a contributed table does not have a DOI assigned. In that case, the DOI-less articles (and only those) have to provide these 4 data fields as additional information (Example).

If you want to report additional costs, you can append the relevant data fields. You can do this by adding a selection of data fields from the APC Additional Costs data set to your table.

column description source required?
institution Top-level organisation to which the reported costs are allocated, e.g. "Bielefeld University" - mandatory
period Year of APC payment (YYYY) - mandatory
euro The APC amount that was paid in EURO. Includes VAT and any discounts - mandatory
doi Digital Object Identifier - mandatory
is_hybrid Determines if the article has been published in a hybrid journal (TRUE) or in fully/Gold OA journal (FALSE) - mandatory
publisher Name of the publication house that has charged the fee CrossRef backup
journal_full_title Full name of periodical that contains the article CrossRef backup
issn International Standard Serial Number CrossRef backup
issn_print International Standard Serial Number - print version CrossRef no
issn_electronic International Standard Serial Number - electronic version CrossRef no
issn_l Linking International Standard Serial Number ISSN International Centre no
license_ref License under which the article has been published CrossRef no
indexed_in_crossref indicates if the contribution is registered with the DOI agency CrossRef (TRUE/FALSE) CrossRef no
pmid id for metadata records indexed in Europe Pubmed Central (Europe PMC) Europe PMC no
pmcid id for articles available in Europe PubMed Central full text collection Europe PMC no
ut Web of Science unique item id Web of Science no
url URL to article if no DOI is available - backup
doaj Indicates if the journal is indexed in the Directory of Open Access Journals (TRUE/FALSE) DOAJ no

APC Additional Costs data set

  • Publication type: Additional costs for journal articles
  • Cost data: linked to single publications in either the APC or TA data set, optional
  • CSV file

The Additional Costs collection is not intended to be a stand-alone data set, but is used to enrich both the APC and TA data set with additional costs.

column description source required?
doi Linked to an existing DOI in the APC/TA data set - mandatory
colour charge Additional costs for publishing figures in colour, incl. VAT - no
cover charge Additional costs for featuring an article on the journal cover / cover image, incl. VAT - no
page charge Additional costs for overlength, incl. VAT - no
permission Licence fee / charges for re-using e.g. an image from another previously published publication, including VAT. - no
reprint Fee for reprinting publications, incl. VAT - no
submission fee Fee for submitting an article, incl. VAT - no
payment fee Additional costs for transactions (bank charges, extra charge for payments via credit card), incl. VAT - no
other Other additional costs (e.g. translation charges, abstract charges, etc.), incl. VAT - no

BPC data set

This data set is collects data on BPCs (Book Processing Charges). It consists of 13 fields, with 5 being mandatory.

Mandatory and backup columns

The first 5 columns are mandatory in all cases. The isbn column is marked as backup and is required if the book does not have a DOI assigned. Since the usage of DOIs is not as widespread with books as it is with journal articles, we make two additional recommendations when contributing data:

  • The book_title column is marked recommended. It is not strictly necessary, but if you happen to have access to that kind of information, it could be helpful to add it to the table.
  • Books can have a variety of ISBNs, depending on the publication form (hardcover, softcover, PDF, epub...). If your original data provides fields for more than one ISBN type, we recommend to include them all. It is not required to name the additional columns accordingly, some generic schema (isbn_1, isbn_2...) will do.
column description source required?
institution Top-level organisation to which the reported costs are allocated, e.g. "Bielefeld University" - mandatory
period Year of BPC payment (YYYY) - mandatory
euro The BPC amount that was paid in EURO. Includes VAT and any discounts - mandatory
doi Digital Object Identifier - mandatory
backlist_oa Was the book published OA in the first place (FALSE) or was it already part of a publisher's backlist and became OA retroactively (TRUE)? - mandatory
publisher Name of the publication house that has charged the fee CrossRef no
book_title Title of the monograph CrossRef recommended
isbn International Standard Book Number CrossRef backup
isbn_print International Standard Book Number - print version CrossRef no
isbn_electronic International Standard Book Number - electronic version CrossRef no
license_ref License under which the book has been published CrossRef no
indexed_in_crossref indicates if the work is registered with the DOI agency CrossRef (TRUE/FALSE) CrossRef no
doab Indicates if the book is listed in the Directory of Open Access Books (TRUE/FALSE) DOAB no

Transformative Agreements data set (TA)

The Transformative Agreements (TA) data set contains metadata on journal articles published under transformative agreements and other publishing agreements. These types of agreements are concluded with publishers and usually involve larger institutions such as research organizations (e.g., Max Planck Society) or national consortia as contractual partners.

The cost and payment models may vary greatly. The TA dataset can contain both articles with individually billed costs (e.g., Gold OA articles under DEAL) and articles without specific cost information. Contract costs are not mapped at the article level, but are recorded centrally in the contracts data set. The relation between articles and contracts is established via the group_id, which links a set of articles to a set of contract entries. At the same time, it remains possible to report articles exclusively with bibliographic metadata, but without any cost information.

Mandatory and backup columns

Since DOI registration is an accepted standard for TA articles, the "backup" rule of the OpenAPC data set does not apply here. Consequently, all TA entries need a valid DOI.

column description source required?
institution Top-level organisation the article author is affiliated with - mandatory
period Year of payment (YYYY) - mandatory
euro Article cost, usually calculated in hindsight on an agreed formula - no
doi Digital Object Identifier - mandatory
is_hybrid Determines if the article has been published in a hybrid journal (TRUE) or in fully/Gold OA journal (FALSE) - mandatory
opt_out Determines if the article has been excluded from OA publishing, usually indicating that it remains closed access (TRUE/FALSE) - mandatory
publisher Name of the publisher the TA was concluded with CrossRef no
journal_full_title Full name of periodical that contains the article CrossRef no
issn International Standard Serial Number CrossRef no
issn_print International Standard Serial Number - print version CrossRef no
issn_electronic International Standard Serial Number - electronic version CrossRef no
issn_l Linking International Standard Serial Number ISSN International Centre no
license_ref License under which the article has been published CrossRef no
indexed_in_crossref indicates if the contribution is registered with the DOI agency CrossRef (TRUE/FALSE) CrossRef no
pmid ID for metadata records indexed in Europe Pubmed Central (Europe PMC) Europe PMC no
pmcid ID for articles available in Europe PubMed Central full text collection Europe PMC no
ut Web of Science unique item id Web of Science no
url URL to article if no DOI is available (not used) none no
doaj Indicates if the journal is indexed in the Directory of Open Access Journals (TRUE/FALSE) DOAJ no
agreement ESAC Identifier (preferred) or a meaningful agreement name - mandatory
group_id Links the article to a set of records in contracts.csv - (automatically generated by OpenAPC) no

Contracts data set

  • Content type: Cost data on contracts/publishing agreements
  • CSV file

The contracts data set contains metadata and cost information on transformative and other publishing agreements. It supplements the TA data set by centrally recording contract costs, while the individual articles are linked to the corresponding contract entries via the group_id.

column description source required?
institution Top-level organisation to which the reported costs are allocated (e.g., a university or consortium) - yes
consortium name of negotiating consortium if applicable ESAC Registry (optional) no
contract_name name of the contract, this designation is used in OpenAPC reporting/treemaps ESAC-Registry (optional) yes
identifier optional contract identifier (e.g., ESAC-ID) - no
period_from begin of licensing period (year) - yes
period_to end of licensing period (year) - yes
cost_type cost type according to openCost vocabulary (Publish, Read, Publish and Read, Service Fee) - no
euro cost amount - no
group_id unique key linking this record to a set of articles in the TA data set - (automatically generated by OpenAPC) yes

Additional remarks:

  • If an invoice for a license year contains several cost components (e.g., publish and read fees), each component is recorded in a separate line.
  • The remaining fields (institution, consortium, contract_name, identifier, period_from, period_to, group_id) remain identical in this case.
  • Contracts without specific cost data (neither at article level nor at contract level) can also be recorded; cost_type and euro are then assigned the value NA.

Related Work