Home - google/fhir-data-pipes GitHub Wiki

FHIR Data Pipes provides a series of pipelines to transform data from a FHIR server to either Apache Parquet files for analysis or another FHIR store for data integration. It also provides some minimal support to integrate other tools for querying Parquet files.

Good places to start are:

Read the quick start guide below
Learn about the pipelines.
Try the pipelines using local test servers.
Use the FHIR Pipelines Controller to manage and schedule data exports from a HAPI FHIR server to Parquet files. This also uses the incremental-update pipelines to update the data-warehouse on a periodic basis.

See the right navigation bar for all the documentation.

Quick start guide

The FHIR Analytics is made up of two core components:

(i) Data Pipelines (fhir-data-pipes) and
(ii) Query Libraries for generating views (FHIR Views)

FHIR Data Pipes: ETL pipelines that allow you to transform data from a FHIR Source (could be store or via a FHIR transformer/facade) into SQL-on-FHIR schema that can be loaded into an SQL DWH (current implementation uses parquet for distributed storage).

To evaluate this quickly: The single machine config is the easiest way to evaluate this (and can be used for deployments too)
To see how you can add a dashboard - using Apache SuperSet - check out this tutorial here

It should take about 45-60 minutes to get this set-up and running with sample code (provided). Please let us know if you are having any issues

We also have query libraries - i.e FHIR Views - that makes it easier to write (the otherwise complex) SQL-on-FHIR queries using Python and FHIRPath expressions. This is for creating views that then further simplify the SQL you need to write to query the DWH

You can learn more about this from the FHIR Views page on the OHS DevSite which includes links to example jupyter notebooks for using FHIR Views

Why anchor on a common schema i.e. SQL-on-FHIR:

By converging around a common schema (i.e SQL-on-FHIR), we can then unlock capabilities including:

FHIR Views for both on-prem via Spark and Cloud via BigQuery
FHIR-dbt-analytics to define and run metrics

The FHIR-dbt-analytics project by our sister team at Google, contains a suite of dbt macros for working with FHIR data (in the SQL-on-FHIR schema) as well as a sample set of data quality metrics that can be visualized through a dashboard that uses materialized views (current demo is only for BigQuery, and we have a prototype available for Spark and SuperSet)

This approach could be used for common shareable program indicators and we believe there is a community opportunity here

To learn more or to provide any feedback, please get in touch via hello-ohs[AT]google.com