The IDSR and vaccine effectiveness research - jaygee-on-github/The-APHRC-LSHTM-MUBAS-IDSR-Project GitHub Wiki

Introduction and statement of the problem

In vaccine effectiveness research we might compare the incidence of a disease across three groups -- the fully vaccinated, the partially vaccinated and the unvaccinated.

However, the IDSR Immediate Case-based Reporting Form used in the African Region does not actually capture they type of vaccine a person received:

This is consequential when it comes to COVID-19 because, in principle, it is necessary to know not just the number of doses but also the vaccine type in order to determine if a vaccinated COVID-19 IDSR case is either fully or partially vaccinated based on the number of vaccine does received and the date of last vaccination.

Limitations

Under the circumstances, we considered a couple of strategies.

In the first strategy we just assume someone is fully vaccinated if they have received two or more doses even though certain vaccines like Janssen COVID-19 Vaccine (Johnson & Johnson) are "one-and-done".

In the second strategy it is possible to augment the IDSR with country-specific information provided by COVAX:

From the WHO Africa COVID-19 dashboard

Recall that COVAX had a hand in procuring and distributing COVID-19 vaccine doses across the African region. This dashboard includes both the COVAX procurements and the non-COVAX procurements.

In a proof-of-concept INSPIRE adopted the second strategy: we assigned a vaccine type probabilistically to each PERSON who received one or more vaccine doses. Note that in certain countries like Malawi the WHO CRF was amended to include the type of vaccine. This led us to think that in some countries we would and in other countries we wouldn't have to use the COVAX data during analysis.

Methodology

Introduction

INSPIRE is conducting a network study on COVID-19 vaccine effectiveness across several countries using OHDSI's OMOP CDM and its ATLAS data analysis workbench. An OMOP ETL wrangles source data into a standard format and the ATLAS data analysis workbench runs on top of OMOP. See this Wiki for details on how certain IDSR CRFs were wrangled into instances of the OMOP CDM.

The PROVE analysis plan

The statistical analysis plan covers the objectives of assessing the Covid-19 vaccine effectiveness, Vaccine uptake determinants, and impact of pandemic on Health programs in Africa, carried out in African Union member states. This statistical analysis plan is written in support of, and is entirely consistent with the full trial protocol.

The study design in ATLAS

To conduct a vaccine effectiveness study using local IDSR datasets wrangled into several OMOP CDMs, INSPIRE leverages a specific study design from the ATLAS data analysis workbench. The design is a type of population effect estimation OHDSI and ATLAS provides called the cohort method design. It is also referred to as an “emulated clinical trial”.

Cohorts

In the cohort method design there is a target cohort, one or more comparator cohorts and one or more outcome cohorts.

Here the target cohort is fully vaccinated persons. In this research question there are two “comparator cohorts” — partially vaccinated persons and unvaccinated persons.

Target and comparator cohorts both share one other characteristic in additional to vaccine status: they entered the study and were "visited" not because of any observed signs or symptoms. Instead they were at risk because there lived or worked institutions like jails, schools and hospitals. Indeed, during the pandemic the IDSR was used with two populations. Either people were suspected of having COVID-19 because of signs and symptoms. Or people were not suspected but were at risk because of their situation. In this study we look at the outcome(s) of at risk people who were either fully vaccinated, partially vaccinated or unvaccinated.

More specifically, the outcome cohort are folks who measure/test positive for COVID-19 in either saliva or blood after the “treatment”. Alternatively, an outcome cohort might be the persons who were found dead.

The model

The choice of model specifies, among other things covariate settings, propensity score adjustment and the outcome model (logistic regression, Cox regression, etc.) For example, we could use a logistic regression, which evaluates whether or not the outcome has occurred, and produces an odds ratio. See the cohort method design.

Propensity scoring

Propensity scoring controls for measured confounders. Using a model, we determine the likelihood that a person will receive a certain treatment (vaccination status) based on a set of covariates. These covariates are random selection confounders. See the cohort method design.

To adjust for these likelihood differences between the treatment groups, several adjustment strategies can be used, such as stratification, matching, or weighting by the propensity score, or by adding baseline characteristics like demographics to the outcome model. See the cohort method design.

Method validation (New)

We note that the PROVE analysis plan does not include a section on method validation. A method validation plan tests whether an analysis plan like the one PROVE uses to mitigate confounders is working. A best practice here and the one that ATLAS has adopted is to specify falsification endpoints. A falsification endpoint is an implausible outcome for a given intervention. Were we to take the same approach to mitigating confounders in a new analysis that uses the falsification endpoint as the outcome, if an intervention we are studying like, for example, vaccines has a positive effect on the falsification endpoint; then we can conclude that there are still causes in play alongside the intervention that our design has not accounted for. A best practice is to execute our strategy for mitigating confounders with many falsification endpoints. A data analysis workbench like ATLAS provides decision support for choosing these so-called "negative controls" in bulk.

Positive Controls

Positive controls are outcomes for an exposure where we believe the null hypothesis not to be true. For purposes of method validation we re-run our analysis plan replacing, for example, an outcome like COVID-19 with a positive control. If the treatment -- for example, exposure to a vaccine -- produces the expected result with positive controls, this is confirmation that our analysis plan, including our approach to mitigating confounders, is valid.

The thing about positive controls is that they may be largely unknown or only partially understood in a given research context. Under the circumstances, OHDSI and the ATLAS data analysis workbench has taken to manufacturing positive controls. Indeed, OHDSI includes an AI that, on call, produces one or more synthetic positive controls by modifying a negative control through injection of additional, simulated occurrences of the outcome during the time at risk of the exposure, as explained here.

INSPIRE best practices

The way INSPIRE approaches study conduct is first to specify and test a study design using synthetic data. Next, working with our partners in the network study, we adjust the initial design so it will work with real data.

In this vaccine effectiveness research we first generate a source synthetic dataset based on the IDSR immediate case-based reporting form AND a COVAX Facility Interim Distribution Forecast. Next we ETL this synthetic dataset into an OMOP CDM instance. Then, we conduct an OHDSI population effect estimation study using its cohort method design. It is an emulated clinical trial.

Mapping the source data into the OMOP CDM

The CDM is composed of "clinical domains". Each domain hosts a type of occurrence -- e.g. CONDITION_OCCURRENCE, DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, MEASUREMENT, OBSERVATION and so forth. The CDM also includes a VISIT_OCCURRENCE. This occurrence serves as an umbrella for all the other occurrences in the course of a single visit. In OMOP there may be a succession of visits each of which is associated with a set of conditions, procedures, drugs, tests and so forth. In the context of the IDSR, we might open successively several cases on the same person. Each of those case might be a successive episode of COVID-19. Each of those cases is represented as a VISIT_OCCURRENCE.

In the synthetic OMOP CDM instance, INSPIRE only created one case (VISIT_OCCURRENCE) per person.

With vaccine doses reported, using the IDSR and COVAX, INSPIRE created a synthetic source table that looked like this:

During the ETL each row in the source data was used to create a CDM DRUG_EXPOSURE record. Just like with the other tables in the OMOP CDM, a [domain]_concept_id needs to be provided. With a DRUG_EXPOSURE, this would be a drug_concept_id. It comes from one of the standard vocabularies hosted and managed by ATHENA. Here the drug_concept_id identifies the specific COVID-19 vaccine a person received. These concepts come from the standard vocabulary which is actually a set of specific vocabularies that include SNOMED-CT, LOINC, ICD-10, RxNorm and others. One of the "others" is CVX which hosts vaccine codes for vaccines approved for use in the United States as well as vaccines used internationally.

Also note that a DRUG_EXPOSURE also has a drug_type_concept_id which provides the provenance for this information is the Case Report Form.

DRUG_ERA

In our vaccine effectiveness study we don't construct fully vaccinated, partially vaccinated and unvaccinated cohorts from the DRUG_EXPOSURE table. Instead in each OMOP CDM there are derived tables. One derived table is DRUG_ERA. It hosts a record for each drug a person takes based on the drug_concept_id in the DRUG_EXPOSURE table:

From OMOP CDM 5.4 Standardized Elements

Mostly it is this table and its drug_exposure_count that we go to in order to construct the fully vaccinated, partially vaccinated and unvaccinated cohorts we will use as "treatments" in our emulated clinical trial.

Note that DRUG_ERAs as a rule end when the gap between DRUG_EXPOSUREs exceed 30 days. The use case motivating this rule is a prescription that targets one or more conditions that a person takes many times in the course of a month. With vaccines the efficacy of a single dose may last many months. So we are tweaking the DRUG_ERA SQL that is routinely run with DRUG_EXPOSUREs to account for this population health use case.

The analysis

The analysis executes the design as specified in OHDSI's ATLAS data analysis workbench for emulated clinical trials. The analysis may be modified many times by making a succession of small changes in either the cohort definitions or the model (including covariate settings, propensity score adjustment and the outcome model (logistic regression, Cox regression, etc)) that makes up the design. In this process we are able to observe and compare the sensitivity of several designs first in our synthetic data and then across CDM instances in a network study.