New validations workflow - reichlab/covid19-forecast-hub GitHub Wiki

What is the new workflow?

The new workflow (hereafter called "validations v2") removed the bottleneck of cloning the entire repository (that, as of writing, takes ~15 minutes) on Github Actions to run a validation on a forecast added as part of a PR. The new workflow now takes ~1 minute to run completely on a PR with a single forecast file change.

Workflow

The following steps explain the new workflow.

  1. Clone only the validations code present in the validations repository.
  2. Run the script main.py that does the following -
    1. Download all the changed files in the PR into a temporary forecasts directory.
    2. Segregate all files changed into forecast files, metadata files, and other files.
    3. Add appropriate labels to the PR based on the files changed. Details are available in the labels section.
    4. Run forecast validations on forecast files. Details on the data format is present here and on the validations, here.
    5. Run metadata validations on metadata files.
    6. If any errors occur during forecast/metadata validations, fail the PR and add a comment specifying that there were validation errors.
    7. If no errors, dont add any comment, and the tick mark indicates that everything looks good.

Labels

The following labels are added by the validation run based on some logic explained for each label. Also, note that it is possible that multiple labels could be eligible to be added to a PR.

  • data-submission: This label is added when there's any change in the data-processed folder. This includes additions/deletions/updations. Note: This is added only when all the PR changes are in the data-processed folder.
  • forecast-updated: This label is added when an existing forecast in the data-processed folder is updated/deleted.
  • metadata-change: This label is added to a PR when an existing metadata file is updated or a new metadata file is added to the repository.
  • other-files-updated: This label is added to PRs that have added/updated valid forecasts and also include changes in other directories as well. For example, a PR has added a new forecast into their model folder and added another file, test.txt, to the root of the repository. Note: This label will not be added if the PR has no valid forecast added. The intent of adding this label is to indicate when valid forecasts accidentally add files in other places.
  • code - This label is added when there are any changes made in the code folder of the repository.
  • viz - This label is added when there are any changes made in the visualization folder of the repository.

Rollout plan

The rollout plan for this new flow is that we would be running both, the old validation (named Node.js CI on the Checks tab). The validations v2 flow would also run in parallel with Node.js CI build but will not contribute to the PR builds failure/success. But, this new flow will add appropriate PR comments to the PR.

PR review steps

During this transition phase, we would be using the old PR validation flow (Node.js CI flow) solely for deciding whether a PR is merged to master.

The tick or the cross mark on the PR would be governed by the old validations' success/failure, although there might be chances when this would not happen (if there was a programming error in the run of the v2 flow). Hence, please make sure you see that there is a tick mark on the Node.js CI build line on the PR page.

For example - In this case, check the cross mark next to the Node.js CI line as indicated in the image below means that validations failed, regardless of the fact that the validations v2 build shows a tick mark.