Issue Log - accordproject/concerto GitHub Wiki

2020-03-09:

This morning, a small “development incident” occurred after a push to a branch in the main repo for Concerto.

Summary

This morning, @jeromesimeon pushed a fix to the js-remove-system-models branch in the Concerto repository. Those changes were automatically pushed to master by the Continuous Integration system (Travis), which also attempted to publish the package on npm (that publish failed). The changes were corrected in the master branch (requiring a force push to master).

Timeline

  • Changes were pushed to the js-remove-system-models around 10:45 AM EST. Triggering the Travis build which automatically pushed to master a few minutes later.
  • The issue was detected around 12:10 PM EST.
  • Force push to master to revert the changes was done around 12:30 PM EST.

What Happened?

Context

Most of the Accord Projects repositories use deploy scripts which are triggered on every push to a branch in the main repository.

The behavior is that: when pushing commits to master or a branch, if the tests success, the scripts publishes a new timestamped "unstable" version @accordproject/[email protected] to npm. When someone creates a GitHub release, this release creates a tag and the same script instead publishes a "stable" official release on npm based on the GitHub release.

The scripts use Travis variables to decide whether this is an official release or an ‘unstable’ development version, notably TRAVIS_TAG which indicate whether the commit has been tagged on GitHub, indicating an official release.

The Incident

  1. It seems that for regular commits, Travis for that build used TRAVIS_TAG='' rather than an empty TRAVIS_TAG variable. See previously from Travis logs:
echo "--I-- ${TRAVIS_TAG} ${TRAVIS_BRANCH}"
--I--  release-1.0

But in that build:

echo "--I-- ${TRAVIS_TAG} ${TRAVIS_BRANCH}"
--I-- '' js-remove-system-models
  1. The script is written as a shell script which checks whether there is a TRAVIS_TAG or not using if [ -z "${TRAVIS_TAG}" ](/accordproject/concerto/wiki/--z-"${TRAVIS_TAG}"-) which checks whether the variable is present or not, and succeed if it is, even if the content is empty.

  2. As a result, the test succeeded and the script falls back to a stable release which triggers both publication to npm and a push to the master branch. That push included all changes from the js-remove-system-models development branch.

Fix

Short term

  1. Make the scripts more robust to handle the new content of the TRAVIS_TAG variable
  2. Double check whether this is a recurring behavior or whether this is a temporary issue. As far as I can tell there is no way to ask Travis about things like this or report problems (I don’t even see support listed for the enterprise version -- maybe I missed it). The travis Changelog does not say anything about that change, and the last update is two weeks old (https://changelog.travis-ci.com)
  3. Make sure the problem is addressed and make the script fixes throughout all the repos

Longer term

Should we rely so heavily on Travis deploy? How do we make those scripts more robust to avoid those issues