BFD RIF Export - synthetichealth/synthea Wiki

The BFD RIF exporter produces files that conform to the following specifications:

Configuration

The exporter is configured via a set of properties as shown below with their default values:

  • exporter.bfd.bene_id_start = -1000000 defines the start value of BENE_ID, the first exported patient will get the specified value, subsequent ids are monotonically decremented from that value
  • exporter.bfd.clm_id_start = -100000000 defines the start value of CLM_ID, the first exported claim will get the specified value, subsequent ids are monotonically decremented from that value
  • exporter.bfd.clm_grp_id_start = -100000000 defines the start value of CLM_GRP_ID, the first exported group will get the specified value, subsequent ids are monotonically decremented from that value
  • exporter.bfd.pde_id_start = -100000000 defines the start value of PDE_ID, the first exported PDE claim will get the specified value, subsequent ids are monotonically decremented from that value
  • exporter.bfd.mbi_start = 1S00-E00-AA00 defines the start value of MBI_NUM, the first exported patient will use that value, subsequent ids will monotonically increase from that value
  • exporter.bfd.hicn_start = T01000000A defines the start value of BENE_CRNT_HIC_NUM, the first exported record will use that value, subsequent ids will monotonically increase from that value.
  • exporter.bfd.partc_contract_start = Y0001 defines the start value of Part C contract IDs that will be used in PTC_CNTRCT_JAN_ID to PTC_CNTRCT_DEC_ID, the first contract will use that id, subsequent ids will monotonically increase from that value.
  • exporter.bfd.partc_contract_count = 10 defines the number of Part C contracts that Synthea will use in exports; each year, each patient will be randomly assigned to one of the contracts (or no contract).
  • exporter.bfd.partd_contract_start = Z0001 defines the start value of Part D contract IDs that will be used in PLAN_CNTRCT_REC_ID, the first contract will use that id, subsequent ids will monotonically increase from that value.
  • exporter.bfd.partd_contract_count = 10 defines the number of Part D contracts that Synthea will use in exports; each year, each patient will be randomly assigned to one of the contracts (or no contract).
  • exporter.bfd.plan_benefit_package_start = 800 defines the starting value of plan benefit package identifiers
  • exporter.bfd.plan_benefit_package_count = 5 defines the number of plan benefit package identifiers, each Part C and Part D plan will share the same set of plan benefit package identifiers.
  • exporter.bfd.clia_labs_start = 00A0000000 defines the start number of CLIA lab numbers that will be used to populate CARR_LINE_CLIA_LAB_NUM.
  • exporter.bfd.clia_labs_count = 10 defines the number of CLIA lab numbers that will be used.
  • exporter.bfd.cutoff_date=20140529 defines the earliest date for any exported claims

At the end of a Synthea run, the exporter will create an end_state.properties file that captures the final value of any of the above configuration options that require a monotonically increasing or decreasing value per beneficiary or claim. The values in this file can be used (via the -c command line switch) to override the configured values to permit subsequent runs of Synthea to start where the prior run ended. An example file is shown below.

exporter.bfd.hicn_start=T01000020A
exporter.bfd.mbi_start=1S00E00AA20
exporter.bfd.clm_grp_id_start=-100003266
exporter.bfd.pde_id_start=-100000996
exporter.bfd.fi_doc_cntl_num_start=-100000575
exporter.bfd.bene_id_start=-1000020
exporter.bfd.carr_clm_cntl_num_start=-100001695
exporter.bfd.clm_id_start=-100002270

Random and Fixed Values

Synthea does not model values for all the RIF file fields. In these cases, each field is assigned a fixed value, or a value randomly taken from a set of allowed values. These values are configured using the bfd_field_values.tsv tab-separated file. Each cell within this file specifies the allowed values for a particular field (row) for a particular file (column): where a value can be one from a set of allowed values, this is shown as a comma-separated list; where the field is always empty, this is shown as [Blank].

Generating a National Set of Records

The following shell script will generate records for a set of beneficiaries for all 50 states and Washington, DC. The desired total size of the population is supplied as a command line argument, numbers of beneficiaries in each location will be proportional to the population of each state (based on census data).

#!/bin/bash

if [ $# -eq 0 ](/synthetichealth/synthea/wiki/-$#--eq-0-); then
  echo "Usage: $0 size"
  echo "where 'size' is an integer specifying the target population size"
  exit 1
fi

# Weights are based on 2019 census data:
#
# https://data.census.gov/cedsci/table?q=Total%20Population&g=0400000US01,02,04,05,06,08,09,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,44,45,46,47,48,49,50,51,53,54,55,56&tid=ACSDP1Y2019.DP05&hidePreview=true&moe=false
#
# Each value represents the number of state residents aged 62 or more divided by the
# total number of USA state residents aged 62 or more expressed as a percentage.
#
states=( ); weights=( )
states+=( "Alabama" ); weights+=( "1.578" )
states+=( "Alaska" ); weights+=( "0.178" )
states+=( "Arizona" ); weights+=( "2.357" )
states+=( "Arkansas" ); weights+=( "0.958" )
# states+=( "California" ); weights+=( "10.801" ) # California is handled separately at the end and is used to absorb any rounding errors
states+=( "Colorado" ); weights+=( "1.586" )
states+=( "Connecticut" ); weights+=( "1.170" )
states+=( "Delaware" ); weights+=( "0.351" )
states+=( "District of Columbia" ); weights+=( "0.161" )
states+=( "Florida" ); weights+=( "8.044" )
states+=( "Georgia" ); weights+=( "2.836" )
states+=( "Hawaii" ); weights+=( "0.492" )
states+=( "Idaho" ); weights+=( "0.536" )
states+=( "Illinois" ); weights+=( "3.796" )
states+=( "Indiana" ); weights+=( "2.016" )
states+=( "Iowa" ); weights+=( "1.016" )
states+=( "Kansas" ); weights+=( "0.891" )
states+=( "Kentucky" ); weights+=( "1.401" )
states+=( "Louisiana" ); weights+=( "1.399" )
states+=( "Maine" ); weights+=( "0.530" )
states+=( "Maryland" ); weights+=( "1.801" )
states+=( "Massachusetts" ); weights+=( "2.179" )
states+=( "Michigan" ); weights+=( "3.288" )
states+=( "Minnesota" ); weights+=( "1.712" )
states+=( "Mississippi" ); weights+=( "0.905" )
states+=( "Missouri" ); weights+=( "1.963" )
states+=( "Montana" ); weights+=( "0.382" )
states+=( "Nebraska" ); weights+=( "0.580" )
states+=( "Nevada" ); weights+=( "0.916" )
states+=( "New Hampshire" ); weights+=( "0.472" )
states+=( "New Jersey" ); weights+=( "2.753" )
states+=( "New Mexico" ); weights+=( "0.698" )
states+=( "New York" ); weights+=( "6.092" )
states+=( "North Carolina" ); weights+=( "3.210" )
states+=( "North Dakota" ); weights+=( "0.220" )
states+=( "Ohio" ); weights+=( "3.804" )
states+=( "Oklahoma" ); weights+=( "1.175" )
states+=( "Oregon" ); weights+=( "1.406" )
states+=( "Pennsylvania" ); weights+=( "4.413" )
states+=( "Rhode Island" ); weights+=( "0.351" )
states+=( "South Carolina" ); weights+=( "1.713" )
states+=( "South Dakota" ); weights+=( "0.285" )
states+=( "Tennessee" ); weights+=( "2.098" )
states+=( "Texas" ); weights+=( "7.031" )
states+=( "Utah" ); weights+=( "0.686" )
states+=( "Vermont" ); weights+=( "0.234" )
states+=( "Virginia" ); weights+=( "2.523" )
states+=( "Washington" ); weights+=( "2.247" )
states+=( "West Virginia" ); weights+=( "0.679" )
states+=( "Wisconsin" ); weights+=( "1.903" )
states+=( "Wyoming" ); weights+=( "0.185" )

END_STATE_PROPS_FILE="./output/bfd/end_state.properties"

total_generated=0
for i in "${!states[@]}"
do 
  state=${states[$i]}
  weight=${weights[$i]}
  count=`echo "${1}*${weight}/100" | bc`
  total_generated=`echo "${total_generated}+${count}" | bc`
  
  if [ $count -eq "0" ](/synthetichealth/synthea/wiki/-$count--eq-"0"-)
  then
    echo "Skipping generating ${state}, requested patients is ${count} "
    continue
  fi

  if [ -f "${END_STATE_PROPS_FILE}" ](/synthetichealth/synthea/wiki/--f-"${END_STATE_PROPS_FILE}"-)
  then
    load_props="-c ${END_STATE_PROPS_FILE}"
  else
    load_props=
  fi

  echo "Generating ${count} patients for ${state}"
  ./run_synthea -s ${i} -cs ${i} -r 20211020 ${load_props} -p ${count} --exporter.fhir.export=false --exporter.fhir.transaction_bundle=false --exporter.hospital.fhir.export=false --exporter.practitioner.fhir.export=false --exporter.bfd.export=true --exporter.years_of_history=10 --generate.only_alive_patients=true -a 70-80 "${state}"
done

# Generate remaining requested population for California to handle any rounding errors
if [ -f "${END_STATE_PROPS_FILE}" ](/synthetichealth/synthea/wiki/--f-"${END_STATE_PROPS_FILE}"-)
then
  load_props="-c ${END_STATE_PROPS_FILE}"
else
  load_props=
fi

remaining=`echo "${1}-${total_generated}" | bc`
echo "Generating ${remaining} patients for California"
total_generated=`echo "${total_generated}+${remaining}" | bc`
./run_synthea -s 51 -cs 51 -r 20211020 ${load_props} -p ${remaining} --exporter.fhir.export=false --exporter.fhir.transaction_bundle=false --exporter.hospital.fhir.export=false --exporter.practitioner.fhir.export=false --exporter.bfd.export=true --exporter.years_of_history=10 --generate.only_alive_patients=true -a 70-80 California
echo "Finished generating ${total_generated} of ${1} requested patients"