Batch load and edit formats - NCIEVS/nci-protege5 GitHub Wiki
The format of the files:
Tab-delimited (tsv), and the strings are not quoted. Any quotes will be retained, which you probably don’t want unless the quotes are found within the strings themselves.
Files are UTF-8 encoded (a basic ascii text file qualifies as utf-8). Any lines with non-UTF8 characters will be discarded.
Byte-order-marks (BOM) in Windows unicode/utf-8 files are not being dealt with and break the batch jobs. In Windows, use an editor like Notepad++ to save files as UTF8 w/o a BOM.
The “id” entries in the descriptions below are the fragment identifiers of the IRIs, i.e. whatever follows the last / or # in the IRI.
Class load
term
parent-id
Lab Result
C1234
Pizza Order
C456
Holidays
C4321
Lápiz óptico
C654
The term cannot contain embedded control characters, or the ‘?’ or ‘!’ characters (see Special Characters in here).
The term will be used to create the rdfs:label, the preferred_name, and the fully qualified synonym (with NCI source and PT term type) of the newly-created class.
The new class will be assigned a new code/id property, and this code/id will be used for the fragment identifier of the IRI.
The parent-id identifies the superclass under which the new class will be treed. This superclass must already exist, i.e. we can’t create a new class in one row and then use it as a superclass in subsequent row, its identifier must be known prior to the run.
We don’t support the case where the fragment is generated by taking the term and modifying it, e.g “hello there” -> “hello_there” so that the fragment can be predicted and used elsewhere in the input file as a parent-id.
Simple Annotation Property
Format
entity-id
operation
prop-id
value
value
C123
new
P34
Now is the time
C456
modify
P34
Now is the time
For all good women
C567
delete
P52
foo
Example “by code” (i.e. meaningless fragment identifiers in IRIs)
C123
new
P34
Now is the time
C456
modify
P34
Now is the time
For all good women
C567
delete
P52
foo
Example “by name” (i.e. meaningful fragment identifiers in IRIs)
Gene
new
Editor_Note
Now is the time
Gene
modify
Editor_Note
Now is the time
For all good women
Gene
delete
Editor_Note
Now is the time
Complex Annotation Property
entity-id
operation
prop-id
value
ann-id
value
ann-id
value
ann-id
value
prop-id
value
ann-id
value
C123
new
P56
foo bar
P101
NCI
P102
PT
P103
FDA
C456
modify
P56
foo bar
P101
NCI
P102
PT
P103
FDA
P56
foo baz
P101
NCI
P102
PT
P103
FDA
C789
delete
P56
foo baz
P101
NCI
P102
PT
P103
FDA
Note that for the modify operation, one can tell where the old value ends and new begins as the property id occurs again, .eg. P56 above. The “qualifier” annotations are included to disambiguate properties that might only differ in their qualifiers, i.e. the property values are the same.
We call “complex” the properties that have values which are annotated with other properties.