Data: where and why - TISTATechnologies/caseflow GitHub Wiki

How Caseflow decides what to store and why

(Peter) I've had similar conversations recently about what data Caseflow stores, where and why. So I thought it worth trying to articulate the motivating philosophy and patterns I've observed for Caseflow. Note that these are my observations only, I haven't talked with any DS folks or anyone else who might have been present Way Back In The Day when these patterns first started to emerge. But hopefully writing this down can help crystallize my thinking and help others. Feel free to thread/clarify/question these.

  1. Prioritize data integrity and performance, in that order.
  2. Store the least amount of data possible in the Caseflow database.
  3. If the data originates in an upstream service (BGS, VBMS, VACOLS) prefer storing only a pointer to that data in the form of a reference_id of some kind. I.e. think of the entire VA data structure as a normalized database.
  4. The only reasons to store upstream data in Caseflow are to improve performance or overall data integrity. Note that duplicating data and data integrity are naturally opposing motivations, so tread carefully.
  5. If you must store the upstream data in Caseflow, prefer caching over long-term storage in Caseflow db.
  6. If you must cache, prefer per-object caching (memoization) over Redis. This is where data integrity and performance should be weighed against one another. Be clear with yourself about what problem it is you are trying to solve.

See background reference discussion: https://github.com/department-of-veterans-affairs/caseflow/issues/5484

What parts of Caseflow interact with BGS

The BGS external API service is most specific.

  • All Veteran profile records (name, DOB, file number, SSN, contact/address)
  • POA records (name, address)
  • Veteran record permissions (sensitivity)
  • People records (Veterans, claimants, participants and others)
  • Claims (End Products) are read from BGS
  • Ratings, Rating Profiles, and Rating Issues

What parts of Caseflow interact with VBMS

The VBMS external API service is most specific.

  • Efolder (document artifacts for Veteran cases)
  • Establishing claims (End Products, Contentions, etc for Intake)