FcigDM_SAM - AtlasOfLivingAustralia/ala-datamob GitHub Wiki

South Australian Museum Australia: primary collection management system darwincore export

Introduction

This is an implementation of a darwincore export, for one of the FCIG-OZCAM participants.

Artefacts and synopsis

Item Short URL Details (or long URL)
This wiki page http://goo.gl/paUZ9 FcigDM_SAM
Source data system
Collection management software KE EMu on Linux
Database backend texpress
Exporter's execution environment bash (linux command shell)
Adhoc query texql
Bulk export method texkfdump, texexport
Schema reporting texdescribe
DwC mapping awk script
Compression, transmission gzip, sftp to upload.ala.org.au
Output data Darwincore csv (simple-dwc) format with non-standard FCIG extensions

institutionCode "SAMA"
dcterms:type "PhysicalObject"
basisOfRecord "PreservedSpecimen"

Data availability:

SAM data before export some metadata at http://www.samuseum.sa.gov.au/science, but most complete record via biocache
SAM data at export not generally available - contact data manager
SAM data after atlas (biocache) ingest http://goo.gl/7tVs6
Completeness model http://goo.gl/Qm8lT Google docs -> Data management -> CompletenessDwC -> sam.dwccm-26
Source code http://goo.gl/uOhTK https://github.com/AtlasOfLivingAustralia/ala-datamob/tree/master/biodomains/fcig-ozcam/sam
Usage doco http://goo.gl/9Ugfh https://github.com/AtlasOfLivingAustralia/ala-datamob/tree/master/biodomains/fcig-ozcam/sam/sam%20cms%20doco.20130103.pdf
Final status report Google docs ➢ Communications ➢ Data management ➢ Mobilisation reports ➢ finalreport.sam.odt
finalreport.sam.pdf (under the same directory)

Behavioural diagrams

From usage documentation https://github.com/AtlasOfLivingAustralia/ala-datamob/tree/master/biodomains/fcig-ozcam/sam/sam%20cms%20doco.20130103.pdf...

There are five parts to the exporter:

dwc_spc.sh

The first export component is a bash shell script, dwc_spc.sh, which is the entry point for running an export – this script prepares the export directory, reads in disciplines-list and calls the sub-script dwcdm2dsx.sh for each non-comment line (no leading #), bundles the export on completion and sends to specified servers using sftp.

The second export component is the text file, disciplines-list, which controls the behaviour of the main script dwdm2.sh. Disciplines matching the CatCollectionName field should be entered here, one per line. Comment lines (beginning with #) and blank lines are ignored. Comment out disciplines to do a partial export. If you delete or rename this file, dwc_spc.sh will rebuild it from the database. Note: this is a costly operation (roughly 2 hours) and no subsequent exports will occur, to allow for any unwanted disciplines to be excluded by deletion or comment.


Activity diagram for https://github.com/AtlasOfLivingAustralia/ala-datamob/tree/master/biodomains/fcig-ozcam/sam/dwc_spc.sh

dwc_spsub.sh

The third export component is a bash shell script, dwc_spsub.sh, which is called by dwc_spc.sh for
each line in the disciplines-list, and handles exporting the full list of current id's, as well as the partial or full export
depending. Scripts ozdc_full.awk and ozdc_id.awk are called by dwc_spsub.sh, and handle mapping between
an emu export and a darwincore csv. The output file DISCIPLINE-dwcid.csv is converted by ozdc_id.awk while
DISCIPLINE-dwcdata.csv is converted by ozdc_full.awk.


Activity diagram for https://github.com/AtlasOfLivingAustralia/ala-datamob/tree/master/biodomains/fcig-ozcam/sam/dwc_spsub.sh

ozdc_full.awk, ozdc_id.awk

The fourth export component are the awk scripts, ozdc_full.awk and ozdc_id.awk. These
scripts handle the mapping between an emu export and a darwincore csv – they are called on by dwc_spsub.sh
to convert data inline before output csv files are written by dwc_spc.sh.


Activity diagram for https://github.com/AtlasOfLivingAustralia/ala-datamob/tree/master/biodomains/fcig-ozcam/sam/ozdc_full.awk and https://github.com/AtlasOfLivingAustralia/ala-datamob/tree/master/biodomains/fcig-ozcam/sam/ozdc_id.awk
⚠️ **GitHub.com Fallback** ⚠️