HPX Procedures - STEllAR-GROUP/hpx GitHub Wiki

Purpose

This document serves as a reference manual outlining the policies used to guide HPX's development. To create a stable, quality codebase with a thriving community of developers and users, we feel that everyone must be able to access the hows and whys we do things the way we do them. Furthermore, as the HPX community continues to grow, we will use this platform to document the structural changes needed to accommodate a larger community.

Contents

Committing to Master

During code development, we often make small mistakes that get into the codebase. For master to be reactive to small errors that we discover, such as typos and missing includes, we propose a policy for committing directly to master.

Rationale

Keep broken code out of master

Code in the master branch is for users to experiment with and for developers to maintain. The goal is to keep master in a working state, not waste users' and developers' time.

Users will try code from master, and will inevitably hit bugs. Inexperienced users will likely make an extra effort to make things work. Eventually, they will come asking why their code does not work (if they do not give up), and the worst possible answer we can give is "Yeah, that's a known issue."

Developers should be less likely to fall into those traps, but it has been known to happen when communication on either or both sides is poor. Additionally, developers need to fix the broken code (someone has to), and unlike in branches, broken code in master is everyone's responsibility.

There is no question that broken code ought to be fixed, but whether to take the change off master, where it is supposed to be less likely to be problematic to others. Recently committed changes are trivial to revert, taking newly introduced problems out of the way before they get a chance to do any harm. Git makes the process straightforward, and it's likely to be non-disruptive. The time window is too narrow for it to be used elsewhere, and it would be unwise to build on top of untested code.

It follows that:

  • Reverts should be non-disruptive.
  • When fixing a broken change is as trivial as reverting it, fixing it would be preferred.
  • Revert or fix, it should be applied as soon as possible, preferably by the person responsible for the broken changes.

Keep buildbot useful

Detecting broken changes

Once a test has failed, it is not useful to know it keeps failing when no related changes were done. Actually, it is harmful as it becomes noise, and it's often not easy to predict how a change will ripple throughout the codebase. The goal is to remove or reduce noise from buildbot so that new breakages can be easily spotted.

Communicating broken code

Once a test has failed, everyone (users and developers) should know that it has failed. Once the test has been fixed, everyone (users and developers) should know about the fix. Users can then avoid working with broken features or wait for a fix. Developers can then avoid diagnosing already diagnosed issues, as well as avoid working on top of a broken feature that constrains how the issue can be addressed. The goal is to make everyone aware of the current codebase state (there is overlap with an issue tracking system).

The above two might seem contradictory, but it's possible to reconcile most of them. This is why some contiguous integration systems have the notion of expected failures and "still" failures. These are two separate features, an expected failure is a manual mark on a test known to be exercising broken code, a "still" failure compares consecutive failed tests to see if the output is still the same.

It follows that:

  • Known issues should be communicated
  • Reverting a broken change reduces buildbot noise (should still be communicated)
  • Unit tests should be unitary. Both unit and regression tests should be as contained as reasonably possible, to avoid unnecessary breakages due to reasons unrelated to the test's purpose

Moving forward

Changes should be tested as a last step in detecting issues. Tests that constantly (or spuriously) fail don't give meaningful results. To properly tests new changes, they should be introduced at a point where the codebase is stable. The goal is to be able to run the entire test suite against new changes and derive useful information from it.

When changes create new failures, they keep other unrelated changes from making it into master, as they would not be reasonably tested. As there is concern over PRs piling up, leaving broken changes in master that could have been taken out of the way puts pressure onto others to either fix it themselves, or to assess the risk of merging new unrelated changes against its perceived benefits, and later having to manually check every single failing unit test to determine whether a failure is known or new/different. Failure to do so or to determine the outcome correctly means that when the original failure is finally fixed, it will introduce new problems (or reveal those introduced while the build was failing).

Continuously operating in this mode leads to ignored tests failures, which further leads to ignoring broken code.

It follows that:

  • Reverting changes is a valuable tool to contain problems instead of hiding them. It allows the development process to move forward without risking codebase instability.

Related:

  • Communication of known issues and limitations
  • Expected usage and purpose of buildbot

Procedure

When a developer, who is a member of the STE||AR group, makes a small fix or improvement which in his/her considered opinion

  • Has a very low probability of causing build or test failures or warnings on any of the supported platforms and compilers
  • Does not significantly affect topics/branches or outstanding issues that others might be working on
  • Is unlikely to be considered controversial by other members of the the development team and will not, therefore, require a review process
  • Does not break any public API or cause other backward incompatibilities
  • Can be easily reverted in future should the need arise, as it is atomic in nature and touches a very small number of files.
  • Where the git commit log is sufficiently detailed to document the fix so that it can be easily found at a later date.

Then the fix may be applied directly to the master branch. If the fix does not comply with all of the above, or there is any doubt in the developer's mind, then it must be applied to a branch with an accompanying Pull Request and is subject to a review process.

Generally Speaking:

  1. Directly committing to master should be allowed only

  2. For trivial patches to documentation and comments

  3. For cosmetic (formatting) changes that don't touch on the actual code semantics

  4. For fixes to obvious bugs caused, for instance, by differences in compilers/system libraries, etc. (like missing/superfluous typename keywords, obvious copy&paste errors, etc.)

  5. Any other changes, including non-trivial changes to documentation (especially those which go along with tests or which touch several files) should go through a pull request. The purpose of a pull request is to propose a change; thus, explanations of what the proposed change is for and why it is needed should be part of the ticket and allow other developers in the group to review the proposed changes and comment on those. All pull requests require at least one review from another person. Pull requests will be merged only after all raised issues have been resolved or explicitly turned into separate tickets.

  6. Any direct commit to master will be unconditionally reverted if they cause any problems or if they turn out to be controversial. Any such reverted change will be considered non-trivial from this point on.