Flaky Tests - department-of-veterans-affairs/caseflow GitHub Wiki

What is a flaky test?

  • A flaky test is one that intermittently fails in the CI test runs (GitHub Actions), or fails in the CI tests runs but passes locally

How do I check to see if a test is flaky?

  • Check if the test passes locally. This may require running the test multiple times to see if it always fails or only sometimes fails.
    • Only using the "re-run failed jobs" button in Github Actions is not always enough. Sometimes tests fail only in the CI environment, so testing them locally is a must.
  • If the test passes locally, check the Jira flaky test epic to see if the test or test file is documented as flaky

Should I be trying to fix flaky tests outside of my normal area of expertise?

  • In short, yes. We should all be making an attempt to identify what the root cause of the flaky test is. However, if you spend some time looking at the test and are unable to identify why it may be flaky, follow the steps in the sections below to skip the test in the suite and inform a TL or dev who has expertise in that area.

How to fix flaky tests

  • See the best practices section in the Backend Pattern Test wiki entry for recommendations and advice. In general:
    • Minimize writing to the database:
      • Use let instead of let! wherever possible; let! instantiates the variable any time it enters the scope of the test whether it is used or not, while let only instantiates a variable if it is called
    • For feature specs, consolidate checks into a single test and use :aggregate_failures to reduce the number of DB writes and HTTP requests that need to be made
    • For feature specs, use Capybara matchers instead of Rspec matchers wherever possible. Capybara matchers are set up to wait until elements appear on a page, Rspec matchers are not
    • Do not rely on data being created in or fetched from the database in the order that it was written to be created in a test. If you are checking one object in a set of three that are created, don't use {model}.first unless you have sorted the array by some other value and know that it will always come back from the database in that order.
    • If you have a failing test that visits the Case Details page, check to see if it is failing to load the case (the "Unable to load this case" page appearing). If it is, the following line can be added before the visit call: page.find("a", text: "refresh the page").click if page.has_text?(COPY::CASE_DETAILS_LOADING_FAILURE_TITLE). This will click the "refresh the page" link on the error page, and is usually enough to properly load a Case Details page.

What if I can't fix a flaky test?

  • If there is no open Jira story or task for the test, create a jira story under the flaky test epic detailing the file, test scenario, and reason (if known) that the test is flaky
  • Use comments in the test to document what you have done to try and fix the test or information you've found
  • Skip the test in the file using skip: "some description here"