Flakey Test Remedies - department-of-veterans-affairs/caseflow GitHub Wiki

Remedies for common pitfalls that can cause flakey tests. Also check out Testing Best Practices.

When testing pages with async data, write an assertion for a page element before interacting with the page

The test spec/feature/hearings/schedule_veteran/build_hearsched_spec.rb:739 was failing in CI builds but not locally at the click_dropdown step:

    visit "hearings/schedule/assign"

    click_dropdown(text: "Denver")

Within our click_dropdown method in the spec/support/feature_helper.rb file, we do page.find(selector, **args) to get the desired dropdown, then .click it and check for the menu options in a sibling container. The failure was at the line dropdown.sibling(".cf-select__menu")&.text&.split("\n") || [], because it couldn't find a sibling with text to split. To fix this, we added a line:

    visit "hearings/schedule/assign"
+   expect(page).to have_content("Regional Office")

    click_dropdown(text: "Denver")

Capybara executes each line synchronously unless told otherwise. Since the dropdown exists on the page whether it has options in it or not, it is able to immediately find and click the dropdown and the test fails. Adding the assertion above causes Capybara to wait its default_wait_time number of seconds before failing a test for being unable to locate an element on the page. This particular page does not render the text "Regional Office" until the asynchronous calls to the backend have been returned and processed by the redux store. By adding that assertion we delay the execution of the click_dropdown method until the data it requires is present.

When visiting a Case Details page, use the feature_helper method reload_case_detail_page

TLDR: The method reload_case_detail_page in spec/support/feature_helper.rb should be used in place of simply calling visit "/queue/appeals/#{uuid}" for the first time visiting the case details page in a test, if going directly to that page (without manually navigating through queue).

Occasionally, tests will fail with the "Unable to load this case" error screen when going directly to a case details screen in a test. This is because the data that the page requires has not been cached yet or created yet. One path that will cause the error is that simultaneous calls are made to different endpoints which all trigger the method batch_find_by_css_id_or_create_with_default_station_id in the user model. This method is attempting to create new users from VACOLS records if those users haven't already been created (which is often the case with our default factory setup), and since each call starts with the same set of records, each call will try and create all of those records. This causes one of the calls to error on unique css_id constraint and leads to that page.

The workaround for this is that the reload_case_detail_page will visit the case details page, and if this error does occur it will click the "reload this page" button that the pages renders. This is usually enough time to allow the user creation to finish, and then the next request will not error. If the error does not occur, the method simply visits the page and returns to the test execution.

find vs. find_all

In spec/feature/queue/colocated_task_queue_spec.rb:

-      page.find_all("table#case-timeline-table button", text: view_text).each(&:click)
+      page.find("table#case-timeline-table button", text: view_text).click
       expect(page).to have_content("INSTRUCTIONS FROM VLJ", wait: 15)

find_all will not fail if it finds no matching element. If it finds no matching element, it has nothing to click to open. So, if the page does not load right away, we won't actually open the task instructions, which is the reason we were seeing a flake. By changing this to find, we wait until the "View instructions" element is loaded, then click it.

let vs. before

Prefer using let fixture to setup test data to before since before is always run.

before do
  organization.add_user(organization_user)
  User.authenticate!(user: organization_user)
-   create_list(:privacy_act_task, unassigned_count, :in_progress, assigned_to: organization)
-   create_list(:privacy_act_task, assigned_count, :on_hold, assigned_to: organization)
end

+ let!(:unassigned_tasks) do
+   privacy_tasks = create_list(
+     :privacy_act_task, unassigned_count / 2,
+     :in_progress,
+     assigned_to: organization
+   )
+   translation_tasks = create_list(
+     :translation_task,
+     translation_task_count,
+     :in_progress,
+     assigned_to: organization
+   )
+   privacy_tasks + translation_tasks
+ end

More details described in this StackOverflow answer and this article.

Use before(:each) instead of before(:all)

An important detail of before :all is that it's not DB transactional. I.e, anything within the before :all persists to the DB and you must manually tear down in the after :all method.

Implications mean that after test suites have completed, changes are not reverted ready for later tests. This can result in complicated bugs and issues with cross contamination of data. I.e, If an exception is thrown the after :all callback is not called.

However, before: each is DB transaction.

More here

Use skip_callback and set_callback in a before and after block

skip_callback will persist to other specs if it is run in a spec, as seen below.

      scenario "BGS stuff is happening" do
        BgsPowerOfAttorney.skip_callback(:foo, :before, :bar!)
      end

To ensure that skip_callback doesn't have undesired effects, try using a before and after block for skip_callback and set_callback. Below includes an example of this set-up.

     before do
        BgsPowerOfAttorney.skip_callback(:foo, :before, :bar!)
     end
     after do
        BgsPowerOfAttorney.set_callback(:foo, :before, :bar!)
     end

Using let! outside context

We should be wary of let!(:foo) outside of any context scoping, particularly at the start of a file. That means the factory will always be instantiated.

In this case, we need to define the before block before anything else, so that the time is frozen before any factories are called.

The bug was the order in which we declared let! vs before at the top of a file. The before was executing after the let!. https://github.com/department-of-veterans-affairs/caseflow/pull/13085/files#diff-7de594cd05a73f5334aad3c44048c1f0R3-R14

describe Appeal, :all_dbs do
  include IntakeHelpers

-  let!(:appeal) { create(:appeal) }

  before do
    Timecop.freeze(Time.utc(2019, 1, 1, 12, 0, 0))
  end

+  let!(:appeal) { create(:appeal) } # must be *after* Timecop.freeze

  context "includes PrintsTaskTree concern" do
    context "#structure" do
      let!(:root_task) { create(:root_task, appeal: appeal) }

Debugging flakey tests

Check Knapsack Pro for Test Order

I don't believe you need to do the 'binary search' section below since we recently added Knapsack pro.

If you go into CircleCI for and find the failing test, you should see something like the following information in the terminal. This info includes:

  • Directions on how to run the same subset of tests in the same order in dev.
  • A list of the files that are tested.
D, [2021-03-02T16:42:42.288202 #840] DEBUG -- : [knapsack_pro] POST https://api.knapsackpro.com/v1/queues/queue
D, [2021-03-02T16:42:42.288288 #840] DEBUG -- : [knapsack_pro] API request UUID: 6f6b1959-bb02-4156-a30a-8f3411e6fed2
D, [2021-03-02T16:42:42.288319 #840] DEBUG -- : [knapsack_pro] API response:
D, [2021-03-02T16:42:42.288371 #840] DEBUG -- : [knapsack_pro] {"queue_name"=>"1783:edd88486c51be049a6c2b31c7b627c02", "build_subset_id"=>nil, "test_files"=>[{"path"=>"spec/feature/hearings/change_hearing_disposition_spec.rb", "time_execution"=>169.1206240653992}, {"path"=>"spec/models/distribution_spec.rb", "time_execution"=>161.0852148532867}]}`
I, [2021-03-02T16:42:42.288472 #840]  INFO -- : [knapsack_pro] To retry in development the subset of tests fetched from API queue please run below command on your machine. If you use --order random then remember to add proper --seed 123 that you will find at the end of rspec command.
I, [2021-03-02T16:42:42.288514 #840]  INFO -- : [knapsack_pro] bundle exec rspec --format documentation --no-color --format RspecJunitFormatter --out tmp/rspec.xml  --default-path spec "spec/feature/hearings/change_hearing_disposition_spec.rb" "spec/models/distribution_spec.rb"
D, [2021-03-02T16:42:43.709751 #840] DEBUG -- : [knapsack_pro] Test suite time execution queue recording enabled.
Run options: include {:focus=>true}

Binary search your way to victory

You can try the rspec --bisect feature which is ideal when you are running random order of tests.

Otherwise, if you are running a predictable order of tests, assuming you can duplicate the errors, you want to apply a binary search pattern (which is essentially what rspec --bisect does for you).

Identify the first failing spec file

Access the xml output, rspec.xml, from the node that failed. Download the xml and parse it with nokogiri to extract an ordered list of the test files. Below is an example.

doc = Nokogiri::XML(File.read("rspec.xml"))
testcases = doc.xpath("//testcase")
result = testcases.collect {|c| c["file"]  }
deduped = result.uniq
deduped.join(" ")

Comment out half the spec filenames prior to the failure. Then, re-run the tests using the new filelist.

Each time you get a failure, repeat the process

The binary search pattern means you keep commenting out half the active lines in the file till the tests all pass. When they all pass, you've narrowed it down to the combination of tests, in the correct order, that trigger the failure. When they pass, whatever change you made last to the filelist is the one that matters.

When you've identified the culprit, debug and repeat

Now you know the file(s) that caused the flake. You'll have to inspect each one to try and see what might trigger the bug. Often it's databases not being cleaned up, or global variables being set (things like FeatureToggle or RequestStore are often culprits).

Use ensure_stable to re-run a test

In trying to replicate a flake or test if a flake fix is working, you may want to try wrapping the test with ensure_stable to run it many times.

OS-dependencies