-
articles/
-- contains the full-text articles annotated in the CRAFT corpus, along with some metadata for each article.
ids/
craft-pmids-release
-- a file listing the PubMed IDs (PMIDs) contained in this distribution
craft-idmappings-release
-- a file mapping from PubMed ID to PubMed Central ID and original downloaded file name for all articles in this distribution
nxml/
-- contains the original XML for each article as downloaded from the PubMed Central Open Access collection
txt/
-- contains a plain text version of each article that was derived from the original XML files. Also included for each article are files containing the copyright information ([PMID].copyright) and the article's references ([PMID].references). NOTE: Annotation offsets included in this distribution are relative to these plain-text versions of the articles. The file name for any given article is its PubMed ID with a ".txt" extension. All CRAFT articles use UTF-8 encoding.
-
concept-annotation/
-- contains concept annotations mapped (“normalized”) to specific ontology classes for each article in this distribution.
- Annotations are provided in a single native format (Knowtator1 format). See the Creating annotation files in alternative formats wiki page for details on how to generated alternative annotation file formats. For each ontology, there are two subdirectories, one for annotations produced using only proper classes of the given ontology (e.g.,
concept-annotation/CHEBI/CHEBI/
), and the other for annotations produced using proper classes plus extension classes created by the CRAFT semantic annotators (e.g., concept-annotation/CHEBI/CHEBI+extensions/
). Each subdirectory contains the ontology file that should be used for the corresponding annotations (e.g., concept-annotation/CHEBI/CHEBI.obo
for the former, concept-annotation/CHEBI/CHEBI+extensions/CHEBI+extensions.obo
for the latter). Most of these subdirectories also contain one or more mapping text files that may be useful, particularly when comparing automatically generated concept annotations to the CRAFT concept annotations. For further details please reference Bada et al., 2012 and the concept-annotation/README.
-
coreference-annotation/
-- Contains annotations of coreferential nouns/noun phrases, provided in Knowtator1 format for each article in this distribution. Details are specified in Cohen et al., 2017.
-
schema/
-- Contains various schema files for annotation file formats associated with the CRAFT distribution. For details on available formats, see the Creating annotation files in alternative formats wiki page.
CCPUimaTypeSystem.xml
-- UIMA type system file. For details, please see the CCP UIMA Type System Manual.
knowtator.xsd
-- XML Schema Definition for the Knowtator1 file format
knowtator2.xsd
-- XML Schema Definition for the Knowtator2 file format
-
structural-annotation/
-- contains syntactic and other document structure related annotations for articles in the distribution
dependency/
-- contains dependency parse trees for each sentence in the CRAFT distribution, provided in CoNLL-U format. Dependency parse trees have been automatically derived from the manually annotated treebank files (see below). For details, see the Dependency parse derivation from treebank data wiki page.
sections-and-typography/
-- contains annotations for sections boundaries and typography (e.g., italics, boldface, subscript, superscript) mined from the original XML for each article in this distribution. Annotations are provided Knowtator1 format.
treebank/
-- contains the full syntactic parse trees in Penn Treebank style for each sentence of each article in this distribution. Details are specified in Verspoor et al., 2012.
-
.knowtator2/
-- this hidden directory contains files used to automatically generate Knowator2 projects. Files in this directory can be ignored by most users. For details on creating a Knowtator2 annotation project, see the Knowator2 annotation project creation wiki page.
-
CHANGES.md
-- enumerates changes for the different versions of the CRAFT distribution
-
LICENSE.txt
-- CRAFT annotations are distributed under the Creative Commons BY 3.0 license
-
README.md
-- contains general citation information for the CRAFT distribution as well as a link to the CRAFT Wiki.
-
build.boot
-- Clojure Boot script that facilitates annotation file format conversion and other tasks. For details see the Creating annotation files in alternative formats wiki page.