Data Processing Steps - VertNet/toolkit GitHub Wiki
Create Occurrences file from source
The result is a UTF8-encoded TSV file with all non-printing characters removed, an internally unique processingID, and a header with clean field names (no white spaces) in sorted order.
- Encoding to UTF-8
- Get header and dialect
- Try to read all rows
- See if field count is wrong for any row
- Add field "dummytest" as first field with values "dummytest"
- Remove non-printing characters (see PurgeNonprintingCharacters.sh and PurgeNuls.sh)
- Format as TSV, add processingID field, remove dummy field, merge header to get clean, sorted header
Create Darwin Cloud Occurrence file from Occurrences file
The result is a version of the Occurrences file with superfluous fields removed and with remaining fields having names matching terms in the Darwin Cloud wherever possible and with any remaining non-Darwin Cloud terms processed into dwc:dynamicProperties.
Map the fields to Darwin Cloud terms
Create a mapping file consisting of one key:value pair per line, where each Occurrence file field name is a key and its value is either "omit" or a Darwin Cloud term name (Darwin Cloud terms are commonly used fields that can be unambiguously processed into Darwin Core terms).
loanID:omit
catNum:catalogNumber
datum:geodeticDatum
Process the mapping
Use the mapping file to create Darwin Cloud Occurrence file by doing the following for every record
- omit all fields mapped to "omit"
- set the value of that explicitly mapped-to field to be the value of the mapped-from field unless that field already exists for the record, in which case, append the value (Note: this doesn not account for situations such as preparations where the fields are named parts (Skin, Skull) and the values are True or False).
- for a key in the Occurrence file that is not in the mapping list
- if the key is a Darwin Cloud term, set the value of Darwin Cloud term
- if the key is not a Darwin Cloud term, append a dynamicProperty whose key is the given key
Darwin Cloud terms
- Occurrence
- Event
- Identification
- Taxon
- Location
- Depth
- Elevation
- Georeference