Run the pipeline - NCAR/kcor-pipeline GitHub Wiki

Configuration file

The new pipeline uses a configuration file to specify how the pipeline should be run, i.e., to specify paths, which actions to perform, etc. You always need a configuration file for running the pipeline. The configuration filename should be kcor.FLAGS.cfg. See config/kcor.spec.cfg for the definition of each possible option. You must change at least the paths in this file in order to run the pipeline.

Database credentials

If you want to use the database, you must have your database login credentials in a file. I recommend creating a file that is not viewable by anyone but yourself. The format of the file should be:

[pipeline]
host     : {database URL}
user     : {actual username here}
password : {actual password here}
port     : 3306
database : MLSO

It can have multiple logins in it, the config file specifies the location of the file and which section of the file to use.

Running the pipeline

Running the pipeline requires a Python 3 installation with the psutil package installed (can be installed via pip).

Use the kcor script in the bin directory of a kcor-pipeline installation to run the pipeline. As stated above, you will need to have a valid and appropriately named configuration file.

Running the realtime and end-of-day pipeline

To run the pipeline for a day, or range of days, do something like the following:

$ kcor process -f reprocess-2018 20180702

where reprocess-2018 is the "FLAGS" portion of a configuration filename to be used for the run. To run the pipeline for multiple days, use "-" (for a range with the start date inclusive and the end date exclusive) and "," to combine dates, like:

$ kcor process -f reprocess-2018 20180601-20180701,20180704

which runs the pipeline for all of June 2018 plus 20180704 (but not 20180701).

Just producing a calibration

To run the calibration for a day, or range of days, do something like the following:

$ kcor cal -f reprocess-2018 20180702

where reprocess-2018 is the "FLAGS" portion of a configuration filename to be used for the run. You can also do:

$ kcor cal --list files.txt -f reprocess-2018 20180702

where files.txt is a file containing a list of files to use for a calibration.