Mobilization Steps - VertNet/toolkit GitHub Wiki

Index to Documentation

VertNet Portal

Migrating

Publishing

Harvesting

Post-Harvest Processing

Indexing

Workflow Steps

The data mobilization workflow for VertNet involves the following steps:

DELETE from resource

, then

INSERT INTO resource
(
cartodb_id, title, url, created_at, updated_at, the_geom, eml, dwca, pubdate, orgname, description, emlrights, contact, email, icode, ipt, count, citation, networks, collectioncount, orgcountry, orgstateprovince, orgcity, source_url, migrator, license, lastindexed, gbifdatasetid, gbifpublisherid, doi) 
SELECT 
cartodb_id, title, url, created_at, updated_at, the_geom, eml, dwca, pubdate, orgname, description, emlrights, contact, email, icode, ipt, count::integer, citation, networks, collectioncount, orgcountry, orgstateprovince, orgcity, source_url, migrator, license, lastindexed, gbifdatasetid, gbifpublisherid, doi 
from resource_staging 
where ipt=True and networks like '%Vert%' 
  • Harvest to Google Cloud Storage with gulo: https://github.com/VertNet/gulo/wiki/Harvest-Workflow
  • Update harvestfolder field in VertNet CartoDB table 'resource_staging'
  • Export resource_staging.csv from CartoDB
  • Run post-harvest processor check_harvest_folders.py for data sets in resource_staging.csv
  • Run post-harvest processor harvest_resource_processor.py to process data sets in resource_staging.csv
  • Check Google Cloud Storage directory tree vertnet-harvesting/processed for duplicates and counts
  • Remove data sets from the index that have had changes to the identifier scheme using the dwc-indexer
  • Index any datasets that need to be updated: https://github.com/VertNet/dwc-indexer/wiki/Index-Workflow
  • Load files from processed folders into BigQuery for data sets of interest specified by GCS directory using post-harvest processor bigquery-loader.py
  • Create Taxon Subset snapshots: https://github.com/VertNet/post-harvest-processor/wiki/Making-Snapshots