ReportStream Integration - CDCgov/prime-simplereport GitHub Wiki

What is ReportStream?

ReportStream is our sister project on PRIME. They are responsible for taking test data from senders, aggregating and cleaning it, and passing the data along to public health departments. You can read more about the project on their GitHub repo.

SimpleReport is a sender to ReportStream. We pass them testing data recorded in our application and can rely on ReportStream to report it to the appropriate public health departments.

How does our integration work?

At a high level, when test results are recorded in SimpleReport we send them to an Azure processing queue. We then have a separate Azure Function App that reads from this queue, batches test results, and sends them to the ReportStream API endpoint. At the time of writing, we send results to ReportStream every two minutes and have a maximum send size of 500 messages (tests).

There's also the new CSV uploader option for reporting results. That operates differently from the queue-based way of reporting, such that on submission of a new uploaded file we hit the ReportStream API endpoint directly (bypassing the Azure function and storage queue).

Test Event Creation

Test events are created using the addTestResultMultiplex graphQL mutation. This mutation:

  • Updates the TestOrder to include the final result and saves the TestOrder
  • Creates a new TestEvent entity and saves it in our database
  • Sends the patient notification, if email or SMS results are enabled
  • Uses the TestEventReportingService to export the new TestEvent to the storage queue as a JSON.

In environments with a ReportStream integration enabled (production and dev/test, for now) the TestEventReportingService is implemented by AzureStorageQueueTestEventReportingService. It tries to add a TestEvent to the queue using Azure's QueueAsyncClient.

In non-ReportStream enabled environments, we use a NoOpReportingService that serializes a TestEvent to JSON but doesn't do any further processing.

TestEventExport

To serialize the TestEvent as a JSON, we utilize TestEventExport. This takes all aspects of a TestEvent (including related data, like a facility's ordering provider information) and serializes it. There's also some special logic here that converts data from our database representation to something ReportStream can ingest, like transforming the patient's ethnicity information from "Hispanic/Non-Hispanic" to "H/N".

When we send our messages to ReportStream through the function app (more on that in a minute), we do so as a CSV of these serialized JSONs. ReportStream takes our tests and processes them as HL7 messages, meaning our serialized JSON needs to map to that format. This is why our data needs some manipulation and reformatting before it can be sent. (See more examples of HL7 messages.)

Reading out of the storage queue

This is where things get interesting!

The Azure Function Application we use to process queued messages and send them to ReportStream is the one major SimpleReport component that is NOT part of our SpringBoot backend. Instead, it's a separate function, hosted on a separate server in a separate resource, that doesn't interact directly with our application at all. This is because in the early days of SimpleReport, any time the app was down we had to manually collect all the messages that hadn't been sent and re-submit them for processing. This was time-consuming and error-prone, and made cleanup from outages much more difficult. By having the ReportStream submissions go through a separate Function Application, we don't need to worry about tests not being sent for processing if the main app goes down.

As such, you won't find our Function App code in the usual /backend folder - instead, it's hiding in the /ops folder underneath QueueBatchedReportStreamUploader. (We technically have two function apps, but this is the one that actually handles processing - the other is for parsing the error messages that ReportStream sends us, and it's disabled at the moment.)

This folder has three main pieces:

  • function.json, which is just a cron job that runs a script every two minutes
  • index.ts, which is the script that's run
  • lib.ts, which is a helper file that holds some of the logic for dealing with the storage queue

There are also some associated tests for these files.

The function:

  • Looks at a set number of messages from the Azure storage queue (this is done as a peek operation, not a pop - so the events are still on the storage queue, they've just been looked at for processing)
  • Converts the messages to CSV
  • Hits an HTTPS endpoint to initiate the ReportStream upload
  • Waits for a response from the endpoint
  • After a successful response, dequeues the original messages from the queue so we don't continue to process them (an actual queue pop).

If we get a failure response from ReportStream, we don't take the messages off the queue and they stick around for further processing. (This can be dangerous, as when we received false 504 responses from the Akamai layer in front of ReportStream in January 2021, but it's better than the alternative of users entering tests into SimpleReport and the data never making it to public health departments!)

Additional information

There are two "batch sizes" used in the function app. One is the DEQUEUE_BATCH_SIZE found in lib.ts, which is the number of messages we attempt to read off the queue and convert to CSV at once (currently set to 32). The second is the REPORT_STREAM_BATCH_MAXIMUM, which is the number of messages we attempt to send to ReportStream at once (currently set to 500).

We currently send to the ReportStream asynchronous processing endpoint, meaning we get a response quickly but it's not a full picture of what happened to the data (i.e., we don't know upon sending the data if there was a failure to process any of the individual messages, just if ReportStream successfully received our messages).

Debugging

For debugging information and Azure links, please see the wiki page Debugging the ReportStream Uploader.

CI/CD

The function app is updated as part of our normal CI/CD pipeline. The folder is compressed into a zip file and uploaded to Azure in the rs-batched-publisher-function-releases container, inside the simplereportprodapp storage account.

This does mean that uploads are occasionally interrupted by a new release. Usually when this happens the upload just stops and the messages will be re-processed on the next attempt.