09.Machine learning01.Classification - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

  1. Longitudinal: The prediction of future events can be accomplished when past variables predict prognosis or response to therapy. See Derivation, Validation, and Potential Treatment Implications of Novel Clinical Phenotypes for Sepsis

  2. Cross-sectional (diagnostic): When an existing set of variables can be used to predict a complex score. For example, when administrators variables can be used to predict a physical function score.

2. Input: what kind of data does the method require?

  • The dataset should have a categorical outcome and multiple predictors.

3. Algorithm: how does the method work?

Model mechanics

The predictive performance of machine learning is affected by missing data, low sample size, misclassification bias, and measurement error [8]. AUC and PRC are frequently used to assess predictive performance [6]. C statistic is just AUC applied to binary outcomes. PRC is more suitable for unbalanced data [7]

Reporting guidelines

Data science packages

  • To assess AUC use the mlr package using the performance function [4]

  • To assess PRC use the prauc function from the mlr3measures package [3] or the plotROCCurves function [5]

  • Sometimes we categorize variables that are originally numeric to develop classification models. The package cutpointr provides tools to determine optimal cutpoints for categorizing these variables.

Suggested companion methods

Learning materials

  1. Books *
  2. Articles
    • Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data [1].
    • Using free-response receiver operating characteristic curves to assess the accuracy of machine diagnosis of cancer [2].
    • Common references for machine learning

4. Output: how do I interpret this method's results?

Interpretability

Machine learning models usually perform well for predictions but are difficult to interpret. Interpretability is key from a variety of perspectives and it can be done in four different ways which include;

  1. Feature importance for the model as a whole, which is the traditional approach
  2. Feature effects: answering how a feature influences the prediction. This relates to things such as accumulated local effects, partial dependence plots, and individual conditional expectation curves
  3. Surrogate trees: Approximating the underlying model with a short decision tree, as that is more interpretable.
  4. Explanations for personalized predictions, i.e., individual patients: in other words, how did a given feature (predictor) value for a specific patient affect its prediction? This relates to things like LIME plots and Shapley value, which are critical for decision-making since they focus on individual patients rather than the model as a whole as shown in https://cran.r-project.org/web/packages/iml/vignettes/intro.html .

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

Tables, plots, and their interpretation

  • The plot below represents a confusion matrices. Source.

Imgur

5. SporeData-specific

Templates

Data science functions

References

[1] Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA internal medicine. 2018 Nov 1;178(11):1544-7.

[2] Moskowitz CS. Using free-response receiver operating characteristic curves to assess the accuracy of machine diagnosis of cancer. Jama. 2017 Dec 12;318(22):2250-1.

[3] Team SD. mlr3measures

[4] Team SD. mlr3measures

[5] Team SD. mlr3measures

[6] Team SD. [https://stackoverflow.com/questions/18265941/two-horizontal-bar-charts-with-shared-axis-in-ggplot2-similar-to-population-pyr)

[7] Team SD. [https://acutecaretesting.org/en/articles/precision-recall-curves-what-are-they-and-how-are-they-used).

[8] de Hond AA, Leeuwenberg AM, Hooft L, Kant IM, Nijman SW, van Os HJ, Aardoom JJ, Debray TP, Schuit E, van Smeden M, Reitsma JB. [Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review.] (https://www.nature.com/articles/s41746-021-00549-7). NPJ digital medicine. 2022 Jan 10;5(1):2.

⚠️ **GitHub.com Fallback** ⚠️