Patient - OHDSI/ETL--PulmonaryHypertensionRegistries GitHub Wiki

The CDM is a patient-centric model, so the person table is essential. Here are some aspects worth paying attention to.

One working on harmonization projects should always keep in mind that person source value should be unique across all aggregated datasets. If source datasets are in SDTM, unique subject identifiers (USUBJID) are preferred over just subject identifiers (SUBJID). If source data are in a different format, the idea of USUBJID is easily adjustable and transferable - use a combination of a study identifier and a unique patient identifier within the study as a person source value.

Among mandatory person attributes, the year of birth is a top priority. However, it is not always available in the source data. See the Getting Full Dates chapter to find out how to calculate the year of birth.

We recommend substituting original un-mappable values with some generic ones for race and ethnicity. This will allow you to do some analysis based on source values rather than concept ids since the latter is 0. For example, one dataset can have 'Mixed' or 'Mixed race' as a value for the race, the other - '2 or more races', etc. This type of race is non-standard and thus cannot be populated in race_concept_id. In this case, it makes sense to substitute original values with one, let us say 'MIXED', and put it into race_source_value. This will allow you to pick all mixed raced patients out for analysis from all studies without writing complex regular expressions. In such a manner, 'OTHER' and 'UNSPECIFIED' categories for races and ethnicities can be created if it adds value for analysis.