02.Causation07.Causal machine learning - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

  • Causal models are used when you need to make a personalized prediction for a given patient AND you need to avoid biases that are inherent to the data used to train that model. A typical example is the personalized prediction of response to therapy since the observational data used to train the model often has confounding by indication. Specific examples include:

    1. Reproduction of trial results using observational, real-world data [1].
    2. Machine learning models serving as the basis for decision support systems focused on the response to therapy. The main argument is that traditional, non-causal models would lead to confounding by indication..

2. Input: what kind of data does the method require?

  1. Pre-requisits for causal-analysis

3. Algorithm: how does the method work?

Model mechanics

Steps involved in causal modeling are:

  1. Scoping review of the literature looking for sources of confounding
  2. Focus groups with clinicians for an in-depth evaluation of biases that might be inherent to the local culture. This step is particularly important in cross-cultural, multinational studies
  3. Use of causal statistical models to evaluate the differences in estimates of causal and unadjusted association models. These often include confounding by indication [2]; over adjustment and mediation (if you adjust for an intermediate variable); colliding and colliding stratification bias; selection bias, missing data, and imputation strategies (usually not missing at random); residual confounding and high dimensional covariate balance (a typical example is the lack of social determinants of health data in electronic health records); reverse causality (an outcome influencing the presence of a risk factor); and measurement bias.
  4. The causal machine learning model per se (see below).
  5. A formal comparison between causal and traditional machine learning models.

There are a number of methods used across different libraries. Causallib (from IBM) uses a two-step process where they first use machine learning to estimate IPW (inverse probability weighting) and then later use the second round of machine learning methods to predict the outcome having IPW as a feature [26], [27]. The end product is a prediction that is unbiased.

DoWhy (from Microsoft) generates data-driven graphs, which can use a number of underlying algorithms to create causal relationships, including those to estimate treatment assignment (propensity-based stratification, propensity score matching, and inverse propensity weighting), to estimate the response surface (regression), and those related to instrumental variables (binary instrument/wald estimator, and regression discontinuity).

When there is residual confounding, the first recommendation is to use randomization. If we’re already talking about a randomized study, other options for dealing with this issue include:

  • Restricting the confounding variable.
  • Opting for matching variables—a common approach for case-control studies, although it may lead to errors, requires careful assessment.
  • Stratifying the sample into subgroups. While the methods mentioned above are the traditional approach, over the past decade or so, the standard practice has become a propensity score analysis inside the trial, providing all the benefits of the methods mentioned without their potential problems. For example, matching will almost always exclude certain patients. Stratifying has some issues related to pooling the results from each stratum into a final result, among other things. Doubly robust estimates (a type of propensity score analysis) borrow features from the methods mentioned but probably do a better job implementing them.

Of importance, when connecting causal machine learning models to image recognition, an approach is to the former provide a combination of handcrafted and deep features into the causal models, similar to what was done by Fauw et al [3].

For example, if the training data present a bias against minorities [4] [5] [6], then that bias would be carried on to the machine learning model underlying decision support systems. Causal machine learning avoids that biased decision support systems by addressing the underlying confounding.

Reporting guidelines

Data science packages

The following libraries are available:

  • causallib - is a Python library that makes use of a two-step process. First, a machine learning model is used to calculate IPW (inverse probability weighting) scores, and then the second set of machine learning models are used to predict personalized, bias-free outcomes. For R users, causallib can be brought into your code through reticulate.

  • DoWhy from Microsoft.

  • Although RCTs represent the gold standard in clinical research, most clinical questions cannot be answered using this technique, because of ethical considerations, time, and cost. The goal of observational research in clinical medicine is to gain insight into the relationship between a clinical exposure and patient outcome, in the absence of evidence from RCTs.Observational research offers additional benefit when compared with data from RCTs: the conclusions are often more generalisable to a heterogenous population, which may be of greater value to everyday clinical practice [18], [19].

  • Most books have an initial chapter on DAGs, and then dive into other things [20]. In other words, DAGs is a way that forces you to think about causal pathways. For example: if a patient loses a lot of blood after a penetrating abdomen injury, you transfuse a bunch of blood, and then the patient presents renal failure and infection postop, is transfusion causing the complications? There are literally dozens if not hundreds of papers saying that transfusion is causing a bunch of complications, but a simple DAG will tell you that the way these models are put together is actually mixing stuff up. So, transfusions might be the ones to blame, but most papers models are way off in terms of providing the correct estimates, and nobody is raising a red flag. This of course has been translated into guidelines and practice. My point is that DAGs matter.

  • To run actual graph models rather than just do a qualitative evaluation, then we use machine learning [21] [22].

Last, a simple way to add tools to your toolkit, you can use the idea by Efron and Hastie which has been the basis for some of the most important methods we use today [23].

Suggested companion methods

Possible companion methods include:

Learning materials

  1. Books

    • Causal Inference in Statistics: A Primer, by Judea Pearl [11].
    • Causality [12].
    • Judea Pearl's Web site
    • Tutorial on Causal Inference and Counterfactual Reasoning [13].
    • Causal Inference [20].
    • Bayesian Networks: With Examples in R (Chapman & Hall/CRC Texts in Statistical Science Book 109) [21].
    • Probabilistic Graphical Models [22].
    • Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (Institute of Mathematical Statistics Monographs Book 5) [23].
    • Causal Inference: What If (1st edition, 2020) [24].
    • Statistical Matching or Data Fusion (StatMatch) [25].
  2. Review articles

    • Framework for Identifying Drug Repurposing Candidates from Observational Healthcare Data [14].
    • An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference [15].
    • Nice review on the mathematical properties of IPW, along with a method to address robustness issues [16].
    • Causal inference in perioperative medicine observational research: part 1, a graphical introduction [18].
    • Causal inference in perioperative medicine observational research: part 2, advanced methods [19].
    • Common references for causation
    • Implementing Causal Impact on Top of TensorFlow Probability
    • A review of generalizability and transportability [26].
    • Generalizing evidence from randomized trials using inverse probability of sampling weights [27].
  3. Videos

4. Output: how do I interpret this method's results?

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

  • Bias-free, personalized outcome predictions included [EXAMPLE PREDICTIONS]
  • Most important predictor variables, on average, included [MOST IMPORTANT PREDICTORS, ON AVERAGE]
  • Individual factors (variables) involved in the explanation of individual patient outcomes included [PERSONALIZED PREDICTORS]
  • When attempting to reproduce a randomized controlled trial based on an [OBSERVATIONAL DATASET], the main differences included [MAIN DIFFERENCES].

Tables, plots, and their interpretation

  • Scatterplots with flexible splines (often with a correlation test) comparing trial results vs causal machine learning predictions - see Figures 3 and 4 from Framework for Identifying Drug Repurposing Candidates from Observational Healthcare Data [17]. The line is a spline identical to a regression line that becomes flexible and adapts to the relationship between the two variables in a more dynamic manner. However, depending on its flexibility and the influence of outliers, it might get bent too much, which would not be generalizable to other samples or represent the population more generally.
  • Plots and tables demonstrating balance after IPW [26], [27]
  • Calculator screenshots demonstrating decision support systems
  • Plots demonstrating area under the curve (AUC) or precision and recall curves (PRC)

5. SporeData-specific

Templates

References

[1] Ozery-Flato M, Goldschmidt Y, Shaham O, Ravid S, Yanover C. Framework for Identifying Drug Repurposing Candidates from Observational Healthcare Data. medRxiv. 2020 Jan 1.

[2] Kyriacou DN, Lewis RJ. Confounding by indication in clinical research. Jama. 2016 Nov 1;316(17):1818-9.

[3] De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, Askham H, Glorot X, O’Donoghue B, Visentin D, van den Driessche G. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature medicine. 2018 Sep;24(9):1342-50.

[4] Penner LA, Dovidio JF, Gonzalez R, Albrecht TL, Chapman R, Foster T, Harper FW, Hagiwara N, Hamel LM, Shields AF, Gadgeel S. The effects of oncologist implicit racial bias in racially discordant oncology interactions. Journal of clinical oncology. 2016 Aug 20;34(24):2874.

[5] Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019 Oct 25;366(6464):447-53.

[6] Mackenbach JP, Valverde JR, Artnik B, Bopp M, Brønnum-Hansen H, Deboosere P, Kalediene R, Kovács K, Leinsalu M, Martikainen P, Menvielle G. Trends in health inequalities in 27 European countries. Proceedings of the National Academy of Sciences. 2018 Jun 19;115(25):6440-5.

[7] Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, Shilton A, Yearwood J, Dimitrova N, Ho TB, Venkatesh S. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. Journal of medical Internet research. 2016;18(12):e323.

[8] Kerr KF, Meisner A, Thiessen-Philbrook H, Coca SG, Parikh CR. RiGoR: reporting guidelines to address common sources of bias in risk model development. Biomarker research. 2015 Dec 1;3(1):2.

[9] Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) The TRIPOD Statement. Circulation. 2015 Jan 13;131(2):211-9.

[10] Razavi M, Glasziou P, Klocksieben FA, Ioannidis JP, Chalmers I, Djulbegovic B. US Food and Drug Administration Approvals of drugs and devices based on nonrandomized clinical trials: a systematic review and meta-analysis. JAMA network open. 2019 Sep 4;2(9):e1911111-.

[11] Pearl J, Glymour M, Jewell NP. Causal Inference in Statistics: A Primer, by Judea Pearl. John Wiley & Sons; 2016 Mar 7.

[12] Pearl J. Causality. Cambridge university press; 2009 Sep 14.

[13] Sharma A. Tutorial on Causal Inference and Counterfactual Reasoning.

[14] Ozery-Flato M, Goldschmidt Y, Shaham O, Ravid S, Yanover C. Framework for Identifying Drug Repurposing Candidates from Observational Healthcare Data. medRxiv. 2020 Jan 1.

[15] Shimoni Y, Karavani E, Ravid S, Bak P, Ng TH, Alford SH, Meade D, Goldschmidt Y. An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference. arXiv preprint arXiv:1906.00442. 2019 Jun 2.

[16] Ma X, Wang J. Robust Inference Using Inverse Probability Weighting. Journal of the American Statistical Association. 2010 Oct 12:1-0.

[17] Ozery-Flato M, Goldschmidt Y, Shaham O, Ravid S, Yanover C. Framework for Identifying Drug Repurposing Candidates from Observational Healthcare Data. medRxiv. 2020 Jan 1.

[18] Krishnamoorthy V, Wong DJ, Wilson M, Raghunathan K, Ohnuma T, McLean D, Moonesinghe SR, Harris SK. Causal inference in perioperative medicine observational research: part 1, a graphical introduction British Journal of Anaesthesia 125.3 (2020): 393-397.

[19] Krishnamoorthy V, McLean D, Ohnuma T, Harris SK, Wong DJ, Wilson M, Moonesinghe R, Raghunathan K. Causal inference in perioperative medicine observational research: part 2, advanced methods British Journal of Anaesthesia. 2020 Sep 1;125(3):398-405.

[20] Causal Inference

[21] Bayesian Networks

[22] Probabilistic Graphical Models

[23] Computer Age Statistical Inference

[24] Causal Inference: What If (1st edition, 2020)

[25] D'Orazio M. Statistical Matching or Data Fusion (StatMatch).

[26] Degtiar I, Rose S. A review of generalizability and transportability.

[27] Buchanan AL, Hudgens MG, Cole SR, Mollan KR, Sax PE, Daar ES, Adimora AA, Eron JJ, Mugavero MJ. Generalizing evidence from randomized trials using inverse probability of sampling weights. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2018 Oct;181(4):1193-209.

⚠️ **GitHub.com Fallback** ⚠️