Application Backend Overview - department-of-veterans-affairs/caseflow GitHub Wiki

This page was initially assembled for an overview session with new engineers coming on board. It almost certainly touches on material covered elsewhere. Please feel free to edit, add links, etc. and make this a living document.

Slack thread surveying participant's prior knowledge: https://dsva.slack.com/archives/C02NHLSJM3M/p1638904699102600

The presentation asked us to focus on:

Explore how the application handles requests, patterns in the Rails app, gems used, and so on.

Overview

"Caseflow" refers generally to our contract/project, and also specifically to the Caseflow application (in whose wiki this document is located). Caseflow, which formerly went by the name "Certification", is a Ruby on Rails application. We are currently running Rails 5 (with aspirations to upgrade to Rails 6) on Ruby 2.7.

The application runs on AWS GovCloud. The private appeals-deployment repository has more information on the web stack, but fundamentally, we use puma as a multi-threaded application server, running in an ASG behind a load balancer. We also use shoryuken for processing background jobs asynchronously.

We maintain our own database in Postgres (on Aurora), and also make use of Redis for storing/caching certain information.

Core to Caseflow's function is interaction with other systems. Some of the more significant ones include:

  • VACOLS, which is the source of truth for legacy appeals. We have read-write access to its Oracle database. N.B. that the database is on the other side of the country, so n+1 queries are extremely detrimental.
  • BGS, the VA's Benefits Gateway Services. It presents a SOAP interface. We have developed the ruby-bgs gem for interacting with it.
  • VBMS, the Veteran Benefits Management System. It also has a SOAP API, and we developed the connect_vbms gem for interacting with it.
  • eFolder Express, which we also maintain.

eFolder is maintained and deployed like Caseflow, but sees little new development. Its primary purpose for end users is to allow them to download a .zip file of all a Veteran's documents; formerly, users had to individually download potentially hundreds. Two notes:

  • Caseflow makes a large number of API requests to eFolder for operations around Veteran documents
  • The login page for Caseflow is actually on eFolder.

Caseflow and eFolder are separate applications, but are somewhat interconnected and are both Rails apps we maintain.

"gems used, and so on"

RubyGems is analogous to Perl's CPAN or Node's npm: it's the central source for ruby plugins/libraries.

In Ruby, Gemfile is a list of gems required by the application. Environment-specific groups can be assembled, e.g., stuff to only run in development.

We use Bundler to manage local, per-project sets of gems. (Rather than using a system-wide installation.) (Think virtualenv for Python, kinda.) N.B. that (1) Bundler is, itself, a gem, and (2) The gem is called bundler, but the command line tools are bundle ....

After running bundle install or bundle update locally, Gemfile.lock is written, articulating exact versions and the dependencies between packages.

Rails-isms

Rails is an (IMHO) fairly easy-to-use framework built on top of Ruby.

The Rails Doctrine consists of nine items (there will not be a quiz), but if I had to sum up Rails in a nutshell, I'd say:

  • Rails is big on convention over configuration. Your production config goes in config/environments/production.rb; it just does.
  • Code can be beautiful. It should at least be readable. (Rails says "Optimize for programmer happiness")
  • Lots of magic via metaprogramming. 2.weeks.ago is a real thing. method_missing is powerful magic.

ActiveRecord is the default (and ubiquitous) ORM for Rails. It allows us to write expressive code without building out lots of boilerplate in our models.

[1] pry(main)> User.last
[2021-12-07 12:28:01 -0500]   User Load (8.5ms)  SELECT  "users".* FROM "users" ORDER BY "users"."id" DESC LIMIT $1  ["LIMIT", 1](/department-of-veterans-affairs/caseflow/wiki/"LIMIT",-1)
=> #<User:0x00007f8bd2d52108
 id: 2000016654,
 created_at: Mon, 27 Jul 2020 18:53:56 UTC +00:00,
 css_id: "LEONAJVACO",
 efolder_documents_fetched_at: Mon, 31 May 2021 18:00:37 UTC +00:00,
 email: "[email protected]",
 full_name: "Mavis Kunde",
 last_login_at: Thu, 03 Jun 2021 20:31:45 UTC +00:00,
 roles: ["Reader", "Mail Intake"],
 selected_regional_office: nil,
 station_id: "101",
 status: "active",
 status_updated_at: nil,
 updated_at: Thu, 03 Jun 2021 20:31:45 UTC +00:00>

[3] pry(main)> User.where(full_name: "Lauren Roth").count
[2021-12-07 12:28:49 -0500]    (8.5ms)  SELECT COUNT(*) FROM "users" WHERE "users"."full_name" = $1  ["full_name", "Lauren Roth"](/department-of-veterans-affairs/caseflow/wiki/"full_name",-"Lauren-Roth")
=> 4105

(This is not PII. We use faker a lot.)

It also shields us from DB-specific behavior:

[7] pry(main)> VACOLS::Case.last
[2021-12-07 12:37:24 -0500]   VACOLS::Case Load (32.0ms)  SELECT  "BRIEFF".* FROM "BRIEFF" ORDER BY "BRIEFF"."BFKEY" DESC FETCH FIRST :a1 ROWS ONLY  ["LIMIT", 1](/department-of-veterans-affairs/caseflow/wiki/"LIMIT",-1)
[2021-12-07 12:37:24 -0500]   Column definitions (147.9ms)  SELECT cols.column_name AS name, cols.data_type AS sql_type, cols.data_default, cols.nullable, cols.virtual_column, cols.hidden_column, cols.data_type_owner AS sql_type_owner, DECODE(cols.data_type, 'NUMBER', data_precision, 'FLOAT', data_precision, 'VARCHAR2', DECODE(char_used, 'C', char_length, data_length), 'RAW', DECODE(char_used, 'C', char_length, data_length), 'CHAR', DECODE(char_used, 'C', char_length, data_length), NULL) AS limit, DECODE(data_type, 'NUMBER', data_scale, NULL) AS scale, comments.comments as column_comment FROM all_tab_cols cols, all_col_comments comments WHERE cols.owner = 'VACOLS_DEV' AND cols.table_name = 'BRIEFF' AND cols.hidden_column = 'NO' AND cols.owner = comments.owner AND cols.table_name = comments.table_name AND cols.column_name = comments.column_name ORDER BY cols.column_id
=> nil

Test suite

  • We aspire to have our tests produce readable output
  • We aspire to have our tests serve as secondary documentation of how our code is meant to work
  • We enforce 90% code coverage requirements
  • Some documentation on our tests
  • rspec is the test suite we use

readable output and secondary documentation

    distribute_appeals for CAVC
      when the cavc remand is within affinity (< 21 days)
        is distributed only to the issuing judge
      when the cavc remand is outside of affinity (>= 21 days)
        when genpop: not_genpop is set
          is not distributed because it is now genpop
        when genpop is not 'not_genpop' (i.e., is genpop)
          is distributed to the first available judge

Feature tests

Feature tests (in spec/feature/) run in Chrome. Beyond being more of an end-to-end test, this can help you find obscure parts of the app, or set up contrived test data. Also note that:

  • binding.pry in a test will pause execution and drop you into a REPL. Good for testing in general, but specifically in Feature tests, you can do this to pause the test and allow you to manually interact with the UI. (Type continue when you're done.)
  • Setting CI=true in front of running a feature test will run in headless mode so Chrome doesn't keep stealing the focus

Flaky tests

While we do much better than some other projects, there exist tests with intermittent failures. We refer to these as flaky tests (or just "flakes"). (Note that there is a :flakey-boy: emoji you are strongly encouraged in discussions because the emoji is grrrrreat.)

Candidly, we are probably too casual about putting up with flaky tests. If a test fails once, it's easy to just re-run it and move on. We should probably be more aggressive. That said, some of the current failures are fairly tricky to debug.

Some helpful resources:

  • Flakey Test Remedies
  • Tests that regularly fail after 7pm Eastern are probably timezone-related date bugs (Time.now vs. Time.now.utc)
  • Circle CI Insights are somewhat helpful in identifying the worst offenders.

Miscellany

  • There's a wiki page for the Backend Working Group
  • We have a small, informal Ruby Style Guide
  • I have some strong opinions about wikis I want to impart on you:
    • WP:BOLD -- The Caseflow wiki (not to mention Google Docs used in meetings) is meant to be a living document, not authoritative documentation. It's not just OK to edit it; it's welcomed!
    • The wiki shouldn't just cover deep technical workings; pages should start with a basic overview. Before explaining how we distribute CAVC Remands, we try to answer "What is a CAVC remand?" Often, wiki page authors are the developers who want to jot down technical notes for others and the more basic overview gets missed. Y'all are likely in a great place to help fix this.
    • Wiki pages should link to other wiki pages when possible.

Customs I plan to take with me after Caseflow

  • We make extensive use of Slack, linking to threads from other threads, and generally documenting decisions and weird errors. We can link to Slack threads from documentation and postmortems. There's a lot of knowledge there if you search for it.
  • We put a lot of work into our PRs having a good test plan, background, and, when code is pushed up, comments on why we approached things the way we did. Reviewers will leave commentary, including sometimes just praise / words of encouragement. git blame on a confusing section of code and linking back to the PR will often turn up excellent background on why changes were made.