schema - OpenAPC/openapc-de GitHub Wiki
OpenAPC data schemas
The following schemas describe the data sets aggregated by the OpenAPC initiative. Each line conforms to a column in the according CSV file. At the moment, 4 data sets are maintained:
- APC data set (for APCs on a per-publication basis)
- BPC data set (for BPCs on a per-publication basis)
- Transformative Agreements (TA) data set (for journal articles published under transformative agreements or other payment models, like Springer Compact or the German DEAL agreements)
- Contracts data set (aggregation of cost components for transformative and other publishing agreements)
APC data set
- Publication type: Journal articles
- Cost data: per-publication, mandatory
- CSV file
- Treemap visualisation
This is the original data set OpenAPC started with. It collects cost data on Article Processing Charges (APCs) on a per-publication basis. An article is assigned to this collection only if it has been paid for independently and was not part of any contract or agreement. It consists of 18 metadata fields, with 5 of them being mandatory when contributing data.
Mandatory and backup columns
Only the first 5 columns are mandatory in all cases. The 4 columns marked as backup are required only if at least one of the articles in a contributed table does not have a DOI assigned. In that case, the DOI-less articles (and only those) have to provide these 4 data fields as additional information (Example).
If you want to report additional costs, you can append the relevant data fields. You can do this by adding a selection of data fields from the APC Additional Costs data set to your table.
| column | description | source | required? |
|---|---|---|---|
| institution | Top-level organisation to which the reported costs are allocated, e.g. "Bielefeld University" | - | mandatory |
| period | Year of APC payment (YYYY) | - | mandatory |
| euro | The APC amount that was paid in EURO. Includes VAT and any discounts | - | mandatory |
| doi | Digital Object Identifier | - | mandatory |
| is_hybrid | Determines if the article has been published in a hybrid journal (TRUE) or in fully/Gold OA journal (FALSE) | - | mandatory |
| publisher | Name of the publication house that has charged the fee | CrossRef | backup |
| journal_full_title | Full name of periodical that contains the article | CrossRef | backup |
| issn | International Standard Serial Number | CrossRef | backup |
| issn_print | International Standard Serial Number - print version | CrossRef | no |
| issn_electronic | International Standard Serial Number - electronic version | CrossRef | no |
| issn_l | Linking International Standard Serial Number | ISSN International Centre | no |
| license_ref | License under which the article has been published | CrossRef | no |
| indexed_in_crossref | indicates if the contribution is registered with the DOI agency CrossRef (TRUE/FALSE) | CrossRef | no |
| pmid | id for metadata records indexed in Europe Pubmed Central (Europe PMC) | Europe PMC | no |
| pmcid | id for articles available in Europe PubMed Central full text collection | Europe PMC | no |
| ut | Web of Science unique item id | Web of Science | no |
| url | URL to article if no DOI is available | - | backup |
| doaj | Indicates if the journal is indexed in the Directory of Open Access Journals (TRUE/FALSE) | DOAJ | no |
APC Additional Costs data set
- Publication type: Additional costs for journal articles
- Cost data: linked to single publications in either the APC or TA data set, optional
- CSV file
The Additional Costs collection is not intended to be a stand-alone data set, but is used to enrich both the APC and TA data set with additional costs.
| column | description | source | required? |
|---|---|---|---|
| doi | Linked to an existing DOI in the APC/TA data set | - | mandatory |
| colour charge | Additional costs for publishing figures in colour, incl. VAT | - | no |
| cover charge | Additional costs for featuring an article on the journal cover / cover image, incl. VAT | - | no |
| page charge | Additional costs for overlength, incl. VAT | - | no |
| permission | Licence fee / charges for re-using e.g. an image from another previously published publication, including VAT. | - | no |
| reprint | Fee for reprinting publications, incl. VAT | - | no |
| submission fee | Fee for submitting an article, incl. VAT | - | no |
| payment fee | Additional costs for transactions (bank charges, extra charge for payments via credit card), incl. VAT | - | no |
| other | Other additional costs (e.g. translation charges, abstract charges, etc.), incl. VAT | - | no |
BPC data set
- Publication type: Books/Monographs (no single chapters)
- Cost data: per-publication, mandatory
- CSV file
- Treemap visualisation
This data set is collects data on BPCs (Book Processing Charges). It consists of 13 fields, with 5 being mandatory.
Mandatory and backup columns
The first 5 columns are mandatory in all cases. The isbn column is marked as backup and is required if the book does not have a DOI assigned. Since the usage of DOIs is not as widespread with books as it is with journal articles, we make two additional recommendations when contributing data:
- The book_title column is marked recommended. It is not strictly necessary, but if you happen to have access to that kind of information, it could be helpful to add it to the table.
- Books can have a variety of ISBNs, depending on the publication form (hardcover, softcover, PDF, epub...). If your original data provides fields for more than one ISBN type, we recommend to include them all. It is not required to name the additional columns accordingly, some generic schema (isbn_1, isbn_2...) will do.
| column | description | source | required? |
|---|---|---|---|
| institution | Top-level organisation to which the reported costs are allocated, e.g. "Bielefeld University" | - | mandatory |
| period | Year of BPC payment (YYYY) | - | mandatory |
| euro | The BPC amount that was paid in EURO. Includes VAT and any discounts | - | mandatory |
| doi | Digital Object Identifier | - | mandatory |
| backlist_oa | Was the book published OA in the first place (FALSE) or was it already part of a publisher's backlist and became OA retroactively (TRUE)? | - | mandatory |
| publisher | Name of the publication house that has charged the fee | CrossRef | no |
| book_title | Title of the monograph | CrossRef | recommended |
| isbn | International Standard Book Number | CrossRef | backup |
| isbn_print | International Standard Book Number - print version | CrossRef | no |
| isbn_electronic | International Standard Book Number - electronic version | CrossRef | no |
| license_ref | License under which the book has been published | CrossRef | no |
| indexed_in_crossref | indicates if the work is registered with the DOI agency CrossRef (TRUE/FALSE) | CrossRef | no |
| doab | Indicates if the book is listed in the Directory of Open Access Books (TRUE/FALSE) | DOAB | no |
Transformative Agreements data set (TA)
- Publication type: Journal articles
- Cost data: No (per-publication optional in some cases, see below)
- CSV file
- Treemap visualisation (cost-based)
- Treemap visualisation (artice count-based)
The Transformative Agreements (TA) data set contains metadata on journal articles published under transformative agreements and other publishing agreements. These types of agreements are concluded with publishers and usually involve larger institutions such as research organizations (e.g., Max Planck Society) or national consortia as contractual partners.
The cost and payment models may vary greatly. The TA dataset can contain both articles with individually billed costs (e.g., Gold OA articles under DEAL) and articles without specific cost information. Contract costs are not mapped at the article level, but are recorded centrally in the contracts data set. The relation between articles and contracts is established via the group_id, which links a set of articles to a set of contract entries. At the same time, it remains possible to report articles exclusively with bibliographic metadata, but without any cost information.
Mandatory and backup columns
Since DOI registration is an accepted standard for TA articles, the "backup" rule of the OpenAPC data set does not apply here. Consequently, all TA entries need a valid DOI.
| column | description | source | required? |
|---|---|---|---|
| institution | Top-level organisation the article author is affiliated with | - | mandatory |
| period | Year of payment (YYYY) | - | mandatory |
| euro | Article cost, usually calculated in hindsight on an agreed formula | - | no |
| doi | Digital Object Identifier | - | mandatory |
| is_hybrid | Determines if the article has been published in a hybrid journal (TRUE) or in fully/Gold OA journal (FALSE) | - | mandatory |
| opt_out | Determines if the article has been excluded from OA publishing, usually indicating that it remains closed access (TRUE/FALSE) | - | mandatory |
| publisher | Name of the publisher the TA was concluded with | CrossRef | no |
| journal_full_title | Full name of periodical that contains the article | CrossRef | no |
| issn | International Standard Serial Number | CrossRef | no |
| issn_print | International Standard Serial Number - print version | CrossRef | no |
| issn_electronic | International Standard Serial Number - electronic version | CrossRef | no |
| issn_l | Linking International Standard Serial Number | ISSN International Centre | no |
| license_ref | License under which the article has been published | CrossRef | no |
| indexed_in_crossref | indicates if the contribution is registered with the DOI agency CrossRef (TRUE/FALSE) | CrossRef | no |
| pmid | ID for metadata records indexed in Europe Pubmed Central (Europe PMC) | Europe PMC | no |
| pmcid | ID for articles available in Europe PubMed Central full text collection | Europe PMC | no |
| ut | Web of Science unique item id | Web of Science | no |
| url | URL to article if no DOI is available (not used) | none | no |
| doaj | Indicates if the journal is indexed in the Directory of Open Access Journals (TRUE/FALSE) | DOAJ | no |
| agreement | ESAC Identifier (preferred) or a meaningful agreement name | - | mandatory |
| group_id | Links the article to a set of records in contracts.csv | - (automatically generated by OpenAPC) | no |
Contracts data set
- Content type: Cost data on contracts/publishing agreements
- CSV file
The contracts data set contains metadata and cost information on transformative and other publishing agreements. It supplements the TA data set by centrally recording contract costs, while the individual articles are linked to the corresponding contract entries via the group_id.
| column | description | source | required? |
|---|---|---|---|
| institution | Top-level organisation to which the reported costs are allocated (e.g., a university or consortium) | - | yes |
| consortium | name of negotiating consortium if applicable | ESAC Registry (optional) | no |
| contract_name | name of the contract, this designation is used in OpenAPC reporting/treemaps | ESAC-Registry (optional) | yes |
| identifier | optional contract identifier (e.g., ESAC-ID) | - | no |
| period_from | begin of licensing period (year) | - | yes |
| period_to | end of licensing period (year) | - | yes |
| cost_type | cost type according to openCost vocabulary (Publish, Read, Publish and Read, Service Fee) | - | no |
| euro | cost amount | - | no |
| group_id | unique key linking this record to a set of articles in the TA data set | - (automatically generated by OpenAPC) | yes |
Additional remarks:
- If an invoice for a license year contains several cost components (e.g., publish and read fees), each component is recorded in a separate line.
- The remaining fields (institution, consortium, contract_name, identifier, period_from, period_to, group_id) remain identical in this case.
- Contracts without specific cost data (neither at article level nor at contract level) can also be recorded; cost_type and euro are then assigned the value NA.