Getting Full Dates - OHDSI/ETL--PulmonaryHypertensionRegistries GitHub Wiki

The CDM requires full dates for events. Although it is not explicitly stated, setting date fields as required and being of date type comprise the need for complete dates. This is somewhat different from other data models, like SDTM, where incomplete and partial dates are allowed. Regarding the conversion process, such a full date requirement demands that incomplete and partial dates be handled in the ETL. Records with such problematic dates can be either dropped or imputed.

There is no need to say that this step can tremendously affect scientific results. While the first option tempts by its easiness, it can lead to a substantial data loss which is not permissible for small datasets common for PH registries. At the same time, the second option is not as straightforward as the first one. Moreover, it may mislead researchers, giving them a false feeling of completeness of the data since teams that perform ETL often do not analyze data for scientific purposes. Anyway, we believe date imputation is worth the effort it takes since it saves valuable information destined to be lost, and if it is well documented, it will not mislead researchers.

General approach for date imputation

As a general principle, all original dates should be retained in the CDM data, independent of their level of completeness. This can be useful to trace back to the source dates for resolving some data issues. This may also help when a researcher has a different view on how dates should be imputed.

Secondly, it is always good to follow instructions for dates imputation for the particular PH registry if they are available. They can be found in documentation like statistical analysis plans (SAPs), study protocols, etc.

Thirdly, be consistent. If you aggregate data from multiple sources, you should apply the same method and principles when imputing the same types of dates. For example, if you impute partial dates where only the year is available with June 30, you should apply this approach across the entire database (e.g., don't jump between June 30, January 1, December 31 etc.). Only if important reference dates are available, exceptions might be made. Any method for imputation should be well-grounded. Speaking about multiple sources, it is worth noting that often not all the sources can be trusted equally. This aspect should also be taken into account when working on dates imputation.

Fourthly, there might be dependencies between the dates that should be considered, so the order of imputation does matter. For example, you may impute a procedure date with a visit date, so the visit date should be imputed first.

Fifthly, the KISS principle is well applicable to the imputation rules. Keeping them reasonably simple avoids unfounded assumptions and thus prevents misinterpretation of the data.

Lastly, all the source documentation and tables should be carefully inspected to determine when any given event happened. For example, lots of patient information is captured at the baseline visit. In addition, there might be some relative time points, i.e., something happened since the last visit, etc. Moreover, each patient has several reference dates like date of birth, date when informed consent was signed, baseline visit, date of first PH diagnosis, date when the first dose of a study drug was administered, study termination date, date of death, etc. These dates should also form a ground for date imputation.

Year of birth

Among mandatory person attributes, the year of birth is a top priority. However, sometimes in the source data, explicit patient's year of birth is absent, but age is present instead. In such cases, a year of birth should be calculated based on age and some reference date. This reference date can be a date when informed consent was signed, the first dose of a study drug was administered, a baseline visit occurred, etc. If you are dealing with SDTM, you can use an explicit subject reference date (RFSTDTC) if it is present in the Demographics domain.

Examples

Several examples of the imputation rules we used in our practice can be found in the main repository.