0 Workflow for product creation - NEONScience/NEON-IS-data-processing GitHub Wiki

Workflow for product creation

A note before beginning: The best approach to standing up a new product is to use the DAG and pipeline specifications for a similar source type and product as templates. Many products closely follow the Model DAG. Be sure to adjust the inputs/parameters for the modules according to specifics of your DAG and the details in the ATBD. See Wiki article on Pipeline naming and terms for details on pipeline naming and file schemas needed in many modules. The pages in Section 2 of this Wiki give specifics on some of the more complex modules. Finally, refer to the code itself (found in the pack/, flow/ and modules/ directories of this git repo) for documentation on module inputs and outputs.

  1. Review the ATBD for the product and draft the DAG (processing workflow).
  2. Generate L0 test data in Pachyderm
  3. Determine and create location properties (groups, data rate, active periods,context)
  4. Create the pipeline specifications for the source type and stand them up through the regularization step (often combined with the date gap filler).
  5. Create Groups that bring together the sensors needed for each instance of data product.
  6. Populate thresholds for your product.
  7. Stand up the data-product-specific portion of your pipeline, all the way through to L1 data. Again, reference the ATBD and your DAG for the modules to deploy as well as inputs/parameters to specify in the associated pipeline specs (for which documentation is found in the code). Use the pipeline specs of another data product as template and adjust inputs and processing parameters as appropriate.
  8. Compare the L1 output from Pachyderm to Airflow transitions
  9. Follow the Release and deployment process in order to publish to the NEON Data Portal

Useful Links (some are internal-only)

Holds avro schemas for L0 data

Holds avro schemas for L0' and higher data files as well as empty file templates for use in the pipelines

If the link does not deliver you to the Data Product Manager, you'll need access. Put in a serviceNow ticket requesting access.