VA.gov Data Visibility Initiative - department-of-veterans-affairs/abd-vro GitHub Wiki

Problem Overview

Executive Summary
Problem
Consequence
Opportunity
Goal
VRO's role
Enabling team Q&A

Executive Summary

VA.gov activity data, including disability benefits claim submission data, is functionally inaccessible to Benefits Portfolio product teams, with the exception of a handful of engineers with command line access to query the production database in vets-api.
OCTO wants to develop a safer, more accessible, and more user-friendly way for teams to access this data.
The VRO team is responsible for coordinating this effort via collaboration across the Benefits Portfolio, in particular with the Disability Benefits Experience team(s) who are familiar with the va.gov Postgres database and the needs of engineers working on va.gov benefits products.

Problem

VA.gov activity data is currently trapped in a Postgres database in vets-api

Teams working to improve the end-to-end experience of digitally submitted disability benefit claims need access to va.gov activity data (including claim submission data) in order to learn about problems, validate ideas, troubleshoot issues, measure experiments, and iterate on solutions. However, teams can't easily access this data from the database where it's currently stored.

How is va.gov claim submission data trapped in this Postgres DB?

VA.gov stores all activity in a Postgres database that can only be queried via prod Rails console, and the data can’t be used by BI tools/etc. to interact with the data. Due to the nature of some of the information (PII data) it is often not able to be logged.
Only a handful of Benefits portfolio team members have access to query the production database via the prod Rails console.

Why not just get the data from the VBA side?

While va.gov claim submissions end up in the Enterprise Data Warehouse (EDW, which sits one layer above the source-of-truth Corporate Data Warehouse, CDW) after claim establishment, all of EDW's claims that come from va.gov are slightly mislabeled when it comes to identifying va.gov as their source. While this mislabeling problem is currently being investigated in hopes of resolving it point forward (outside the scope of this effort), the problem will still apply to historical claims data (since Nov 2021) in EDW.
Even disregarding that EDW data is slightly mislabeled, no one in OCTO has access to EDW; queries must go through VBA's PA&I team

What about getting the data via Kafka streams? The current thinking is that in the long-term, the ideal would be for the VES Event Bus kafka service to act as the source for this data, however, OCTO would like to implement an intermediary solution without waiting for this option to solidify.

Consequences

Va.gov activity data in its current state is functionally inaccessible for the majority of Benefits Portfolio teams' needs.

What's wrong with getting this data from the Postgres DB?

Having engineers going into the prod Rails console and querying for data risks potentially impacting actual production and end-users, or altering real production data
It's hard to look at large amounts of this data, because of the risk of overloading the actual production system (due to size of the queries)
While only a small number of Benefits Portfolio engineers can access this data (Yang, Luke, Steve, and Kyle Soskin are the ones we're aware of), increasing that number would violate the principle of least privilege
The folks who have access to query the database are not in support roles tasked with fielding requests for data, so there's no official way to ask for a data pull

What's wrong with requesting this data from PA&I?

Given that PA&I fields requests from all across VBA and OCTO, it can take weeks or months for data requests to be fulfilled.
Furthermore, because all of VBA's reporting is slightly mislabeled when it comes to va.gov as a claim source, it's not possible for PA&I to report on claims from va.gov with full confidence of accuracy.

Opportunity

OCTO needs a way for Benefits Portfolio teams to retrieve va.gov activity data in a safer, more accessible, and more user-friendly way:

without using a prod command line to access the va.gov Postgres database,
without lengthy turnaround times on the scale of weeks or months,
and with confidence that the data we're seeing covers all va.gov 526 claims.

Ideally, non-engineers who need visibility into the data will be able to retrieve it for themselves without having to go through an engineer. Building upon that ideal scenario, we can imagine enabling the configuration of data dashboards to meet teams' specific and ever-present needs for data analysis and insights. And in a perfect world, this va.gov activity data would be matched up to "downstream" claim lifecycle data from EDW (available via kafka event topics) so that teams could follow claims from submission on va.gov through to claim completion in VBMS.

Note that the VA has a larger effort underway related to reducing/eliminating va.gov engineers' dependencies on interacting with the prod Rails console (as shared by Bill Chapman in the July Benefits Portfolio engineering all-hands) – our focus on data visibility represents just one aspect of this overall effort.

Goal

By December 31, 2023, Benefits Portfolio teams will have visibility to all disability benefit form data submitted on VA.gov.

More context:

The whole benefits portfolio should be part of discovery of needs even if we choose to implement early solutions that focus on a particular team or crew
At minimum, "visibility" = Benefits portfolio engineers can pull data via a secure solution (e.g. read-only credentials, scoped only to claim data) that doesn't require them having prod console access to the va.gov Postgres DB.
Currently, "all disability benefit form data" is a hypothesis about what will be valuable to the portfolio teams. For now, we can assume that we're talking about 526EZ form submission data (including historical submission data), but we will refine expectations of what data to include based on further discovery that defines and prioritizes data visibility needs across Benefits Portfolio teams. Other types of data that may be prioritized include va.gov activity and error data.

VRO's Role

Given VRO's mission to make it easy to build software to improve the VA's internal claims process, with particular emphasis on our vision of allowing teams to quickly build and validate product ideas, the VRO team is well positioned to lead the effort of identifying a pathway to deliver value in this problem space.

Our VA partners are asking the VRO team to:

Be responsible for coordinating this work. If VRO's research or roadmap requires work or input from other teams (such as DBEx), that's totally fine.
- Expectations:
  - VRO should make it as easy as possible for other teams to stay informed and complete relevant tasks.
  - This work should (as always!) follow OCTO's principle of working in the open. All chatter about this project should be in open channels for folks across the portfolio (i.e. #benefits-vro-support #benefits-portfolio #benefits-cft or in a place that’s new / opt-in / not 1:1 DMs).
Collaborate with teams across the Benefits Portfolio to define and prioritize the needs related to visibility of va.gov data. There are a variety of needs related to claim data across the portfolio, both at submission and beyond. Some are related to monitoring in the moment, and some are more driven by product/design, researching historical data. The Disability Benefits Experience (DBEx) teams, given their knowledge of the va.gov database, should be primary collaborators on this effort.
- Required output:
  - Recommend a prioritized set of needs as initial and subsequent areas of focus for the teams' efforts between now and the end of the year
- Expectations:
  - Set up a touchpoint/meeting between VRO and DBEx team (or teams, but likely starting with DBEx Team 2) by first week of August
  - Build out a comprehensive list of needs (and their relative priority/frequency)** through end of August and use that list to define our roadmap.
Collaborate with DBEx on shaping and scoping solutions to the portfolio's prioritized needs.
- Required output:
  - Recommend a roadmap to deliver on the priority needs, including determining which team will implement which portions of which solutions (assuming the solutions include elements that span va.gov and VRO).
- Expectations:
  - VRO should aim to implement the solution as far as possible, reducing dependency on DBex or any other teams if possible.
Implement against the agreed upon roadmap.
- Expectations:
  - An MVP solution should be started by ~ end of August

A note about the long-term

Our product owners recognize that a "productized" version of available data via dashboards and other tooling is a product in itself, somewhat separate from VRO as a platform. We are empowered to recommend the best structure for long-term maintenance and expansion of this work stream, however, the expectation is that VRO will lead the initial shaping and roadmapping of solutions to this set of problems, identify a path to quick value, and deliver on it.

Enabling Team Q&A

Q: Who are the ultimate decision-makers about what can/can't be done with the va.gov data and what can/can't be built on the va.gov side? (We know it's all OCTO; is there anyone we don't know yet who's a key decision-maker?)
- A: Need to loop in/keep aligned with the ATO team for Lighthouse and va.gov (Jesse House has been the person Steve has talked to; #platform-security-review is the team's slack channel). Do the same with VRO's cATO contacts.
Q: Is it accurate to think of this effort as addressing one slice of the overall "Steve and Bill idea" (ie. Steve Albers & Bill Chapman's exploration of internal APIs and/or other solutions to reduce/eliminate dependencies on accessing the va.gov database via prod console)?
- A: Short answer, no. They're related problems but we don't want to create dependencies between these efforts.
Q: Technical question: is there an existing backup of the va.gov database?
- A: No. It's hard. System written in almost NoSQL fashion... it's not trivial to extract data from the DB; have to decrypt. Would be potentially a performance impact because it needs to run in the same space as production.
Q: For Steve & Cory: How baked are your ideas on where to start (e.g. replicate data to an S3 bucket)? We don't want to go back to square one if you're already feeling confident that there's an obvious first step we should take.
- A: Kyle Soskin is writing up a best practices guide on using Sidekiq for backend queries and we should wait for that before making decisions. There's a backlog ticket for CE teams (DBEx 2) to do this monthly data extraction -- but maybe this is a short-term thing, we might only want it once! It makes sense for them to own it, but in terms of capacity it might make sense for VRO to build it, if we can get to it sooner.
Q: When we talk about access as a problem, is [technical] skill part of the problem? For example, could we assume that everyone who needs access to this data can use SQL?
- A: Maybe for an MVP but since part of the problem is making the data accessible beyond engineers, probably can't assume e.g. all PMs are SQL fluent.
Q: What would overthinking this look like?
- A: Don't over-engineer in the beginning. For example, if we felt like a JSON file or something would be insufficient and assumed we need complex data viz to solve the need. We CAN start small!
  - We don't need all data in real-time. Figuring out which data is part of the question for us! Picturing a list that lays out, "We need this, this often, and this is the person who needs it" -- then prioritize this list.
  - There's also underthinking it! Doing a data pull, dropping it on Sharepoint and calling it done is not taking holistic enough view of the problem. There are many needs!
  - A good deliverable would be laying out what we can/should do now with sidekiq and which things should wait until data is available from EventBus.
  - We're excited about the potential linkage between claim submission data and downstream claim lifecycle data (but don't start there!)