Try out the FHIR Pipelines Controller - google/fhir-data-pipes GitHub Wiki

The FHIR Pipelines Controller makes it easy to schedule and manage the transformation of data from a HAPI FHIR server to a collection of Apache Parquet files. It uses FHIR Data Pipes JDBC pipeline to run either full or incremental transformations to a Parquet data warehouse.

The FHIR Pipelines Controller only works with HAPI FHIR servers using Postgres. You can see an example of configuring a HAPI FHIR server to use Postgres here.

This guide will show you how to set up the FHIR Pipelines Controller with a test HAPI FHIR server. It assumes you are using Linux, but should work with other environments with minor adjustments.

Clone the fhir-data-pipes repository

Clone the fhir-data-pipes GitHub repository using your preferred method. After cloned, open a terminal window and cd to the directory where you cloned it. Later terminal commands will assume your working directory is the repository root.

Set up the test server

The repository includes a Docker Compose configuration to bring up a HAPI FHIR server configured to use Postgres.

To set up the test server, follow these instructions. At step two, follow the instructions for "HAPI source server with Postgres".

Configure the FHIR Pipelines Controller

First, open pipelines/controller/config/application.yml in a text editor.

Change fhirServerUrl to be:

fhirServerUrl: "http://localhost:8091/fhir"

Read through the rest of the file to see other settings. The other lines may remain the same. Note the value of dwhRootPrefix, as it will be where the Parquet files are written. You can also adjust this value if desired. Save and close the file.

Next, open pipelines/controller/config/hapi-postgres-config.json in a text editor.

Change databaseHostName to be:

"databaseHostName" : "localhost"

Save and close the file.

Run the FHIR Pipelines Controller

From the terminal run:

cd pipelines/controller/
mvn spring-boot:run

Open a web browser and visit http://localhost:8080. You should see the FHIR Pipelines Control Panel.

Before automatic incremental runs can occur, you must manually trigger a full run. Under the Run Full Pipeline section, click on Run Full. Wait for the run to complete.

Explore the configuration settings

The Control Panel shows the options being used by the FHIR Pipelines Controller.

Main configuration parameters

This section corresponds to the settings in the application.yml file.

Batch pipeline non-default configurations

This section calls out FHIR Data Pipes batch pipeline settings that are different from their default values. These are also mostly derived from application.yml. Use these settings if you want to run the batch pipeline manually.

Query the DWH

On your machine, look for the Parquet files created in the directory specified by dwhRootPrefix in the application.yml file. FHIR Data Pipes includes query libraries to help explore the data.