Flakey Test Remedies - TISTATechnologies/caseflow GitHub Wiki

Remedies for common pitfalls that can cause flakey tests. Also check out Testing Best Practices.

find vs. find_all

In spec/feature/queue/colocated_task_queue_spec.rb:

-      page.find_all("table#case-timeline-table button", text: view_text).each(&:click)
+      page.find("table#case-timeline-table button", text: view_text).click
       expect(page).to have_content("INSTRUCTIONS FROM VLJ", wait: 15)

find_all will not fail if it finds no matching element. If it finds no matching element, it has nothing to click to open. So, if the page does not load right away, we won't actually open the task instructions, which is the reason we were seeing a flake. By changing this to find, we wait until the "View instructions" element is loaded, then click it.

let vs. before

Prefer using let fixture to setup test data to before since before is always run.

before do
  organization.add_user(organization_user)
  User.authenticate!(user: organization_user)
-   create_list(:privacy_act_task, unassigned_count, :in_progress, assigned_to: organization)
-   create_list(:privacy_act_task, assigned_count, :on_hold, assigned_to: organization)
end

+ let!(:unassigned_tasks) do
+   privacy_tasks = create_list(
+     :privacy_act_task, unassigned_count / 2,
+     :in_progress,
+     assigned_to: organization
+   )
+   translation_tasks = create_list(
+     :translation_task,
+     translation_task_count,
+     :in_progress,
+     assigned_to: organization
+   )
+   privacy_tasks + translation_tasks
+ end

More details described in this StackOverflow answer and this article.

Use before(:each) instead of before(:all)

An important detail of before :all is that it's not DB transactional. I.e, anything within the before :all persists to the DB and you must manually tear down in the after :all method.

Implications mean that after test suites have completed, changes are not reverted ready for later tests. This can result in complicated bugs and issues with cross contamination of data. I.e, If an exception is thrown the after :all callback is not called.

However, before: each is DB transaction.

More here

Use skip_callback and set_callback in a before and after block

skip_callback will persist to other specs if it is run in a spec, as seen below.

      scenario "BGS stuff is happening" do
        BgsPowerOfAttorney.skip_callback(:foo, :before, :bar!)
      end

To ensure that skip_callback doesn't have undesired effects, try using a before and after block for skip_callback and set_callback. Below includes an example of this set-up.

     before do
        BgsPowerOfAttorney.skip_callback(:foo, :before, :bar!)
     end
     after do
        BgsPowerOfAttorney.set_callback(:foo, :before, :bar!)
     end

Using let! outside context

We should be wary of let!(:foo) outside of any context scoping, particularly at the start of a file. That means the factory will always be instantiated.

In this case, we need to define the before block before anything else, so that the time is frozen before any factories are called.

The bug was the order in which we declared let! vs before at the top of a file. The before was executing after the let!. https://github.com/department-of-veterans-affairs/caseflow/pull/13085/files#diff-7de594cd05a73f5334aad3c44048c1f0R3-R14

describe Appeal, :all_dbs do
  include IntakeHelpers

-  let!(:appeal) { create(:appeal) }

  before do
    Timecop.freeze(Time.utc(2019, 1, 1, 12, 0, 0))
  end

+  let!(:appeal) { create(:appeal) } # must be *after* Timecop.freeze

  context "includes PrintsTaskTree concern" do
    context "#structure" do
      let!(:root_task) { create(:root_task, appeal: appeal) }

Debugging flakey tests

Check Knapsack Pro for Test Order

I don't believe you need to do the 'binary search' section below since we recently added Knapsack pro.

If you go into CircleCI for and find the failing test, you should see something like the following information in the terminal. This info includes:

  • Directions on how to run the same subset of tests in the same order in dev.
  • A list of the files that are tested.
D, [2021-03-02T16:42:42.288202 #840] DEBUG -- : [knapsack_pro] POST https://api.knapsackpro.com/v1/queues/queue
D, [2021-03-02T16:42:42.288288 #840] DEBUG -- : [knapsack_pro] API request UUID: 6f6b1959-bb02-4156-a30a-8f3411e6fed2
D, [2021-03-02T16:42:42.288319 #840] DEBUG -- : [knapsack_pro] API response:
D, [2021-03-02T16:42:42.288371 #840] DEBUG -- : [knapsack_pro] {"queue_name"=>"1783:edd88486c51be049a6c2b31c7b627c02", "build_subset_id"=>nil, "test_files"=>[{"path"=>"spec/feature/hearings/change_hearing_disposition_spec.rb", "time_execution"=>169.1206240653992}, {"path"=>"spec/models/distribution_spec.rb", "time_execution"=>161.0852148532867}]}`
I, [2021-03-02T16:42:42.288472 #840]  INFO -- : [knapsack_pro] To retry in development the subset of tests fetched from API queue please run below command on your machine. If you use --order random then remember to add proper --seed 123 that you will find at the end of rspec command.
I, [2021-03-02T16:42:42.288514 #840]  INFO -- : [knapsack_pro] bundle exec rspec --format documentation --no-color --format RspecJunitFormatter --out tmp/rspec.xml  --default-path spec "spec/feature/hearings/change_hearing_disposition_spec.rb" "spec/models/distribution_spec.rb"
D, [2021-03-02T16:42:43.709751 #840] DEBUG -- : [knapsack_pro] Test suite time execution queue recording enabled.
Run options: include {:focus=>true}

Binary search your way to victory

You can try the rspec --bisect feature which is ideal when you are running random order of tests.

Otherwise, if you are running a predictable order of tests, assuming you can duplicate the errors, you want to apply a binary search pattern (which is essentially what rspec --bisect does for you).

Identify the first failing spec file

Access the xml output, rspec.xml, from the node that failed. Download the xml and parse it with nokogiri to extract an ordered list of the test files. Below is an example.

doc = Nokogiri::XML(File.read("rspec.xml"))
testcases = doc.xpath("//testcase")
result = testcases.collect {|c| c["file"]  }
deduped = result.uniq
deduped.join(" ")

Comment out half the spec filenames prior to the failure. Then, re-run the tests using the new filelist.

Each time you get a failure, repeat the process

The binary search pattern means you keep commenting out half the active lines in the file till the tests all pass. When they all pass, you've narrowed it down to the combination of tests, in the correct order, that trigger the failure. When they pass, whatever change you made last to the filelist is the one that matters.

When you've identified the culprit, debug and repeat

Now you know the file(s) that caused the flake. You'll have to inspect each one to try and see what might trigger the bug. Often it's databases not being cleaned up, or global variables being set (things like FeatureToggle or RequestStore are often culprits).

Use ensure_stable to re-run a test

In trying to replicate a flake or test if a flake fix is working, you may want to try wrapping the test with ensure_stable to run it many times.

OS-dependencies