AtlasSubsysIngest - AtlasOfLivingAustralia/ala-datamob GitHub Wiki
This is not an attempt to document the ingest sub-system - merely a pointer to the code base, along with high-level descriptions (based on the understanding of someone who hasn't developed or used these tools, merely relied on their outcomes)
This wiki page is concerned only with the *
bold steps in the process applied to occurrence records; preliminary steps are included for context:
- data provider and the atlas establish new system account, creating a new data resource for each discrete source-system, eg: a collection's specimen-record management system, or species profile database ... http://collections.ala.org.au/datasets
- the atlas generates a SFTP upload account on the upload server
*
data provider generates an export in simple-dwc csv format*
data provider uploads the compressed export*
ingest subsystem periodically checks the sftp server for new files*
a new file is found, downloaded, unpacked and a record loading process triggered-
(note: planned behaviour only at feb 2013)
*
a log of the overall ingest process is left on the sftp server
This document lives at: http://goo.gl/qzioQ or ‘Automated ingest: file naming conventions’ under
Google docs➢Communications➢Data management➢Mobilisation - public
- a search for an existing record,
- an attempt to match a taxon,
-
a series of quality-tests,
- a test for taxonomic sensitivity against state and commonwealth legislation, and
- a search for duplicate or associate records.
- google code project ala-portal, rooted at the ingest (importer) source-code
- the main entrypoint for loading new records for a data resource, dataimport.scala
- ... put more stuff here about ingest