Software Conventions - department-of-veterans-affairs/abd-vro GitHub Wiki

VRO Software Conventions

VRO is opinionated about the technical architecture (i.e., event-driven and Apache Camel-based).

This page describes software conventions that should guide new development in VRO to help maintain consistency and expedite time-to-deployment, not to mention to evoke a great developer experience.

Background

VRO has developed an MVP (in Dec 2022) and iMVP (in Feb 2023). The keyword is "MVP". VRO needs some TLC:

  • The code works but should not necessarily be used as the basis for new code.
  • New teams/engineers will inevitably copy-and-paste to develop new functionality.
  • VRO code should follow best practice; follow conventions and be consistent (which improves maintainability); be modular with minimal coupling (to help ensure components don't inadvertently affect each other as new features are added); and be scalable to handle high volumes.

So I propose:

  • (Mostly done) We document instructions, set coding conventions, and/or incorporate working example code into the VRO codebase.
  • (In progress) We refactor code for reuse, maintenance, and modularity while keeping existing VRO capabilities functional. We can leverage Camel routes to fork a secondary path that runs refactored code in parallel with existing code, and then compare the results. Once results are the same for many claims, then we can be confident that our refactored code can replace the existing code.

Other context:

  • VRO is one of the first to use the LHDI platform, LH SecRel pipeline, and LH cATO process, which are gradually being developed, expanded, and matured.
  • Contributions to and management of the VRO codebase will change hands over time.
  • Consistent code implies easier debugging and maintenance across domains. Coding conventions help to ensure consistent code.

Terminology

  • "domain" = area of functionality to be added to VRO, such as va.gov integration (VRO v1), MAS integration (VRO v2), or Contention Classification (CC)
  • "MQ" = message queue, such as RabbitMQ, provides a message bus, enabling communication across all containers
  • "container" = a Docker container is used to encapsulate functionality into modular, scalable, stateless components
  • VRO's container categories:
    • App container: there's a default App container that exposes an API using Java Spring Web and combines Java-based domain-specific API endpoints into a single API
      • A domain can expose its own API but some manual updates will be needed to expose the API via the API Gateway
    • Workflows container: defines domain-specific workflows using Camel Routes (in Java) or Prefect in Python; a domain should have its own Workflows container; typically a single Workflows container is needed per domain regardless of the number of workflows.
      • The workflow library should be able to send requests to Service containers using the message queue.
    • Service container: holds one or more microservices that implement (in any language) step(s) in a workflow, providing some basic, stateless functionality. A multi-step workflow is expected to use multiple service containers. These services are typically domain-specific but they can be generalized to be domain-independent and reusable by multiple domains.
    • Platform container: offers a domain-independent resource or service, such as containers that run the RabbitMQ, Postgres DB, and Redis Cache services.
  • A container may be composed of several VRO Gradle modules (or subprojects).
    • A Gradle module is a folder with a build.gradle file. A module is used as a library or for a container (which may import libraries). A container module uses a container.*-conventions Gradle plugin (search for id .*container.*-conventions in build.gradle files to identify all the container modules).

Example of VRO Container Categories:

Folder Structure

Top-level folders are for domain-independent code, except for domain-... folders where domain-specific code resides. VRO software resides in these folders:

  • app: VRO entrypoint; pulls in domain-specific api-controller modules
  • console: VRO shell console for diagnostics (inspired by Rails Console)
  • db-init: initializes and updates the database schema
  • postgres: defines a customized database container
  • shared: utility and library code shared across domains (Shared libraries)
  • svc-*: domain-independent microservices, typically for integrations (Shared microservices)
  • mocks: mock services for development and testing, typically mocking external APIs
  • domain-ABC: domain-specific code with subfolders:
    • ABC-api-controller: module that defines domain API endpoints and controller
    • ABC-workflows: defines domain workflows (i.e., Camel Routes)
    • svc-*: domain microservices supporting the workflows

Other top-level folders contain configurations for building and deploying VRO.

Shared libraries

VRO offers utilities and DB classes as shared libraries for use by domain-specific classes.

  • shared/api folder: (Java) general API models
  • shared/controller folder: (Java) Controller utilities, including InputSanitizerAdvice
  • shared/lib-camel-connector folder: (Java) Camel utilities, including a Spring Configuration for Camel and CamelEntry
  • shared/persistence-model folder: DB utilities, including Java Entities
  • Python MQ utilities to facilitate interaction with the MQ

These libraries should have minimal dependencies to help avoid dependency conflicts (aka "dependency hell").

Be cautious when updating these files since they can inadvertently affect multiple domains that use the libraries. (Versioning the libraries can help avoid this scenario but adds significant complexity and maintenance overhead.)

Shared microservices

VRO offers integrations with external services (such as Lighthouse API and BIP API) via shared microservices, which are deployed in separate containers (for independent updating, down-time handling, and scalability). These domain-independent shared microservices are in the top-level svc-... folders. Other (non-integration) cross-domain microservices can be added as long as they are domain-independent.

Note that external services can go down, so the domain's workflow should incorporate error-handling and retry mechanisms for workflow robustness. Do not rely on RabbitMQ-specific retry mechanisms (RabbitMQ Microservice reliability or any MQ-specific features) in case VRO moves to using some other MQ; instead, use Camel EIPs to do retries as part of the workflow.

Adding a New Domain

To enable capabilities for a new domain in VRO, the partner team will typically implement a Workflows container and several Service containers, as well as adding API endpoints by creating an API-Controller module.

  • Code for one domain should not reference code in another domain. Keep domains decoupled, including keeping containers for different domains independent of each other. This allows containers for one domain to be restarted without affecting unrelated containers.
  • The API-Controller module should rarely need to be updated once created, whereas the Workflows and Service containers are restarted more frequently.
  • Add a domain-... folder at the top level of VRO's codebase. All code for the domain should reside in subfolders under this domain folder. For an example, refer to the domain-xample folder.
  • Domains should include automated tests that validate the workflow; end2end tests. These tests can be called manually in the local developer environment, and automatically during CI by GitHub actions.

API Endpoints and Controller library module

Under the domain folder, add a ...-api-controller subfolders and populate them with Java code. Because the folder name is used as the jar artifact file name, the folder name should be unique, regardless of where it is in the codebase.

  • The app container module (in the top-level app folder) pulls in each domain's api-controller module to present a single API spec.
  • Note that API endpoints should rarely be deprecated. Instead, use version numbers in the URL. https://www.mnot.net/blog/2012/12/04/api-evolution
  • The controllers should be very minimal in terms of logic. It should check the request payload and immediately inject it into a workflow. This enables error-recovery and testing, where the payload can be injected directly into the workflow, bypassing the need for an API and Controller.

APIs implemented in non-Java languages are served by VRO without having to implement a ...-api-controller subfolder -- see API Gateway.

Shared library module

Domain-specific shared libraries can be implemented in a single subfolder or multiple subfolders. You can use any folder name, such as constants, util, or dto. These subfolders define library modules that can be imported by other modules in the domain.

  • In the domain-xample example, the shared module illustrates defining shared domain-specific constants, DTO (data transfer object) classes, and utility classes.
  • Note that DAO (data access objects) belong in the top-level persistence folder because all domains share a DB – see the Add DB Entities section.
  • Once the code is stable, domain-independent classes can be moved to one of the shared libraries for use by other domains. Ensure these classes are very well tested with unit tests and clearly documented.

Workflows container module

Workflows are implemented within a single ...-workflows subfolder. Because the folder name is used as the Docker image name, the folder name for the container should be unique, regardless of where it is in the codebase.

  • For Java, workflows are defined using Camel Routes and Camel Components, which are building blocks for implementing EIPs.
  • The Workflows container should implement only Camel Routes and basic processing (workflow logic or data object conversions), not the microservices themselves.
    • However for rapid prototyping, early implementations can include microservices in the single container, with the understanding that the microservices will later be extracted into their own containers (as described in the next section).
    • A workflow step can call a method on a Java Bean, a custom Camel Processor, etc. to perform basic logic to determine the next processing step. Nothing (except maybe a rejection from a manual code review) prevents that component from being big and complex. To avoid this situation, split the workflow into simple steps, which will facilitate extracting a step into an appropriate service container later. In the long-term, this will make the workflow easier to understand (workflow logic and transmitted data is encoded at the workflow level), comprehensive (because hidden substeps become an explicit step), and easier to modify (by modifying routes instead of service internals).
  • Workflows should split the processing steps into logical stages, where each stage is one or more Camel Routes. The benefit of segmenting the workflow is to facilitate testing and error recovery, where for example a payload (a claim) can be injected into any stage of the workflow.
  • Prefer to design workflows for asynchronous (event-based) processing, leveraging MQ queues to trigger the next step in the workflow. In addition to decoupling workflow steps, this facilitates being able to (manually or as part of another workflow) inject a payload in strategic intermediate steps within the workflow.
  • For Python, Prefect library can be used instead of Camel. To send requests to Service containers, Prefect tasks should send the request over the MQ.

Service (microservice) container module

A microservice is implemented within a svc-... subfolder. There should be a separate folder for each microservice. Because the folder name is used as the Docker image name, the folder name for the service should be unique, regardless of where it is in the codebase.

  • For scalability, a microservice is encapsulated in a container so they can be replicated as needed.
    • Microservices can be easily scaled by having replicated microservices listen to the same MQ queue; more effort is required to scale using a REST API
    • queue-based asynchronous model vs REST API
  • A microservice should be stateless, idempotent, and basic (e.g., implementing one step in a workflow).
  • A microservice listens for JSON requests on the MQ and always sends a JSON response back to the client via the MQ. Benefits:
    • No REST API to set up and manage; less DevOps maintenance (e.g., exposing ports for REST API). Fewer libraries to include implies fewer libraries to maintain for SecRel.
    • Makes services easier to test since the input and output are always JSON strings.
    • JSON String are easily parsed in any programming language. There is no JSON data structure enforcement unless a JSON Schema is specified and applied.
    • A microservice response is always expected by the client to ensure the request was received and processed (or errored). The client can handle the response synchronously (blocks the workflow while waiting for a response) or asynchronously (workflow continues and reacts whenever a response is received).
    • https://developer.ibm.com/articles/how-messaging-simplifies-strengthens-microservice-applications/: decoupled communication, pull instead of push workload management, simplified error handling, security is configured in central MQ instead of potentially inconsistently in each microservice
  • The expected JSON structure for a microservice response is very flexible. The only requirement is a header field:
{
  "someRequestCorrelationId": 987654,
  "anyField": { "someSubField": "anyValue" },
  "anotherField": 1234.5,
  "header": {
    "statusCode": 200, // use HTTP status codes
    "statusMessage": "error msg or optional msg useful for debugging/observability",
    "jsonSchemaName": "optional domain-specific JSON Schema name for msg body",
  }
}
  • To convey an error or exception to the microservice client, the response JSON String should contain the fields statusCode and statusMessage.
    • The statusCode integer value should correspond to an HTTP status code. If this key is not present, then the client can assume status code = 200.
    • The statusMessage String value should convey some diagnostics and preferably actionable information. This should be present for a non-success statusCode to provide details for diagnostics.
  • Implement business logic in the Workflow container.
    • Do not rely on RabbitMQ-specific retry mechanisms (or any MQ-specific features) in case VRO moves to using some other MQ. Handle retries as part of the Workflow, especially since the retry strategy (e.g., constant-time retry 3 times, exponential backoff, send Slack notification for each retry) will likely depend on the domain. Check the shared libraries or domain-specific code for implemented retry strategies that can be reused.
    • A microservice should not call another microservice – implement that as steps in the workflow when possible.
    • Except for communication via the MQ container, a microservice should avoid interacting directly with Platform containers. If DB values are needed, have the Workflow load it from the DB and feed that as input to the microservice. If the service needs to write to the DB, have the service output the data back to the Workflow container, which would write to the DB. This facilitates unit testing (so that a mock DB is not needed) and flexibility (e.g., output data can be sent to logs in addition to the DB).
  • Integrations with external APIs (VA services) should be implemented as a microservice in order to better isolate and handle external API problems. Additionally it should be implemented in a general manner in order to promote it to a shared microservice.
    • The microservice should indefinitely retry connecting to external services rather than exiting and causing the container to fail. Otherwise, when the microservice is deployed to LHDI (where mock services don't exist), the microservice will fail and Kubernetes will keep retrying to start the container. If the microservice loops indefinitely retrying to connect to the external service, this would avoid Kubernetes from unnecessarily restarting the container since the problem is the connection, not the microservice container itself.
    • If there is a temporary network or external service issue, the microservice container should not be restarted and interrupt/cancel other activities/processing occurring in the microservice.
    • Another reason to keep the container running is to enable diagnostics within the running container in case there is some credential or login issue when deployed to LHDI. If the container keeps restarting, it's challenging to log into the container to diagnose the problem.
    • If using Java Spring to connect to the external service, you get this for free because Spring will automatically retry connecting.
  • Ideally there is one microservice per container, but if several microservice are very basic, then those microservice can be in a single container to conserve resources.

When implementing the microservice in Java, Groovy, and other Java-based languages:

  • To listen to the queue(s), use a RabbitMQ client with the help of Spring – see module svc-xample-j.
  • Or use a Camel Route to listen to the queue(s), like the Workflow container module. Use classes in the :shared:lib-camel-connector module to facilitate this.

When implementing the microservice in other languages, such as Python or Ruby :

  • Use a RabbitMQ client to listen to the queue(s). There's likely common code to use the client, so set up some mechanism (like a common folder) to share that code across services for easier maintenance and updating. (TODO: create a common library for Python)

All together now

Once the above pieces are implemented, a call to the endpoint flows as follows:

  1. VRO client sends a request to API endpoint.
  2. Controller parses the request and sends it to the MQ via the CamelEntrance utility class using the queue named derived from the endpoint URL string. (This convention avoids having to share queue name constants across containers.)
  3. The controller can choose to send a quick response to the client or wait for some result from the triggered workflow, depending on the semantics of the endpoint.
  4. The Camel Route (that is listening on the queue) initiates the associated workflow given the request. At the termination of the workflow, the last response message is sent back to the controller.
  5. One step at a time, the workflow sends JSON-string messages to specific queues (or topics). (Again, use consistent queue naming conventions to avoid synchronizing these constants across containers.)
  6. A service (that is listening on the specific queue or topic) performs its function and returns an output, which is typically used as input to the next step in the workflow.

Figure: Interfacing with MQ from various components

Running the xample domain microservices

To run the xample microservices, add xample to the COMPOSE_PROFILES environment variable so that the xample microservices are started in order for the xample-resource POST endpoint to succeed.

export COMPOSE_PROFILES="$COMPOSE_PROFILES,xample"

Add DB Entities

Since the DB is shared across domains, the associated DB code resides at the top-level persistence folder, rather than the domain-specific folder.

  • Updates to the DB schema require adding a Flyway migration (in the db-init folder)
  • DB models should be consistent across domains and clearly documented with column comments. Great care should be taken to prevent the logic in one domain incorrectly modify data values in another domain.
  • DB ORM models should reside in the persistence folder in the appropriate subfolder. These classes do not have to be in sync but they should be consistent with each other across languages. Note that these classes can be used by any domain.

Note: A microservice should avoid interacting with the DB directly – a workflow should act as the intermediary (see the DB-related bullet in the Service (microservice) container section). With that said, if direct service-to-DB interaction is desired, use the following guidance.

For interacting with the database via Java :

  • Add associated Java Entity classes in the model subfolder for use by workflows or some Repository class

For interacting with the database via Python :

  • Add associated SQLAlchemy ORM classes in the sqlalchemy subfolder

For interacting with the database via Ruby on Rails :

  • Add associated Rails ActiveRecord classes in the rails subfolder

Other Details

Message Queue (RabbitMQ)

  • The payload or message body is a JSON string.
  • Requests/calls/messages can be synchronous or asynchronous.
  • MQ queue names are constants that must be shared or synchronized across containers. Using naming conventions reduces the number of constants to be shared (via a library or environment variables).
  • In order for Camel to automatically marshal/unmarshal DTO classes for transmission in the MQ, either define the class in the gov.va.vro.model package (or nested package) or add to the vro.camel.dto-classes (in conf-camel.yml).

Configuration Settings and Environment Variables

  • Configuration settings for software settings within a container
  • scripts/setenv.sh for environment variables
    • Prefer to add them to application*.yml (for Java) or settings*.py (for Python). Those files allow different setting values per deployment env. Adding environment variables incurs the cost of keeping docker-compose.yml (for local development) and helmcharts (for each LHDI deployment environment) updated.
    • Valid reasons why a setting should be in the setenv.sh file:
      1. A setting to support local development (and automated end2end testing)
      2. Secret credentials like username, password, private token. Only fake secret values belong in this file. Sensitive non-production values are in the abd-vro-dev-secrets repo. Critical production values are set within the Kubernetes deployment environment -- see Secrets Vault.
      3. A setting that must be the same or shared across containers
  • Feature flags: TBD

SpringBoot Profiles

Merging Workflows Containers

To conserve infrastructure resources, lightweight workflows from a domain can be combined with workflows from other domains to reside in a single container. A lightweight workflow is defined as one that uses very little memory and cpu. The workflow code should be stable and expect no changes between major VRO deployments (e.g., when the VRO API is updated). Combining workflows across domains is easily done without any code change by treating the workflow module as a library (instead of a container) and importing it into a multi-domain workflows container, which gets deployed instead of the domain-specific container – similar to how the app container imports api-controller modules from several domains.

Versioning

Semantic versioning is used for VRO releases.

  • All code (all Gradle modules) have the same version when a code release is created.
  • A deployed container uses a Docker image tagged with a specific release version. Since containers may be independently updated, deployed containers may refer to different versions.

Port numbers

VRO platform services use the typical port numbers for the particular service (usually less than port 10,000)

  • 5432: Postgres DB service
  • 5672: Rabbit MQ service
    • 15672: RabbitMQ Management UI
  • 6379: Redis cache service

The API Gateway uses the following:

  • 8060: API Gateway
    • 8061: health check port for the API Gateway

Ports for APIs within VRO

VRO domains offering APIs use this port numbering convention:

  • 81Nx = VRO microservice ports, where N is an index

The VRO (Java) App uses the following:

  • 8110: VRO App API
    • 8110: health check port for the VRO App

The (Python) Contention Classification uses the following:

  • 8120: Contention Classification API
    • 8121: health check port for Contention Classification

The Employee Experience Team uses the following:

  • 8130: Max CFI API (python) (same port is used for health check due to uvicorn limitations)
  • 8140: EP Merge App (python) (same port is used for health check due to uvicorn limitations)

Ports for VRO microservices

VRO microservices use this port numbering convention:

  • 10NNx = VRO microservice ports, where NN is an index

    • example: 1010x = ports used for svc-lighthouse-api microservice
    • example: 1020x = ports used for svc-bgs-api microservice
  • 10NN1 = health check port for microservice NN

    • example: 10101 = health check port for svc-lighthouse-api microservice
    • example: 10201 = health check port for svc-bgs-api microservice
    • example: 10301 = health check port used for svc-bie-kafka microservice

VRO microservices only need to connect to RabbitMQ and do not typically need to expose any service ports, except for health checks.

Mock services use this port numbering convention:

  • 20NNx = mock service ports, where NN is an index
  • 20NN1 = health check port for mock service NN

So the following mock services would use these ports:

  • 20100: mock Slack
    • 20101: for health check
  • 20200: mock Lighthouse API
    • 20201: for health check
  • 20300: mock BIP Claims
    • 20301: for health check
  • 20310: mock BIP Claim Evidence
    • 20311: for health check
  • 20500: mock BGS API
    • 20501: for health check

(Note that the 2 BIP mocks use 2030x and 2031x to denote high similarity.)

To see if a port is already being used search the code base for usages of that port.

Running Xample Domain containers

A xample-integration-test GH Action workflow demonstrates an end-to-end test, from VRO API request to microservice.

To manually test svc-workflow and svc-xample-j locally, use Docker Compose:

source scripts/setenv.sh
# Build all containers
./gradlew docker

# Start the relevant containers
./gradlew :dockerComposeUp
./gradlew :domain-xample:dockerComposeUp
COMPOSE_PROFILES="" ./gradlew :app:dockerComposeUp

Open a browser to http://localhost:8110/swagger and go to the POST ​/v3​/xample-resource section and open it. Click "Try it out". In the Request body, replace null with:

{
  "resourceId":"1234",
  "diagnosticCode":"J"
}

Note: diagnosticCode must be J in order for the request to be routed to the svc-xample-j microservice, as coded in Xample's Camel route.

Click Execute. The response code should be 201 with response body:

{
  "resourceId": "1234",
  "diagnosticCode": "J",
  "status": "PROCESSING",
  "statusCode": 0
}

The above can also be done with curl:

curl -X POST "http://localhost:8110/v3/xample-resource" -H "accept: application/json" -H "Content-Type: application/json" \
  -d '{"resourceId":"1234","diagnosticCode":"J"}'

# The API response:
{"resourceId":"1234","diagnosticCode":"J","status":"PROCESSING","statusCode":0}%

Check the logs:

docker logs vro-xample-workflows-1
docker logs vro-svc-xample-j-1

Xample code highlights

Run microservice outside of Docker

To run any VRO component outside of Docker, you'll need to configure your environment so that the component can communicate with containers inside of Docker -- some steps for running vro-app in IntelliJ.

For the svc-xample-j microservice, no additional setup is necessary since the defaults to connect to the MQ should work.

source scripts/setenv.sh
./gradlew :dockerComposeUp
./gradlew :domain-xample:dockerComposeUp
COMPOSE_PROFILES="" ./gradlew :app:dockerComposeUp

# Stop the Docker instance of svc-xample-j so you can run svc-xample-j's bootRun outside of Docker
docker stop vro-svc-xample-j-1

./gradlew :domain-xample:svc-xample-j:bootRun

To test, run in a new console:

curl -X POST "http://localhost:8110/v3/xample-resource" -H "accept: application/json" -H "Content-Type: application/json" \
  -d '{"resourceId":"1234","diagnosticCode":"J"}'

To build and restart specific containers

  • To build the svc-xample-j Docker image: ./gradlew :domain-xample:svc-xample-j:docker
    • To build all images under the domain-xample folder: ./gradlew -p domain-xample build
  • To run this the updated image: ./gradlew :domain-xample:dockerComposeUp

For more see Docker Compose.