01.Association02.Inferential tests - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

They are used to compare two variables (or a variable vs. a population) as an initial exploratory analysis or explore unadjusted relations. Unadjusted relations are useful since they show "the world as it is" rather than exploring causes.

2. Input: what kind of data does the method require?

  1. Cross-sectional or longitudinal data
  2. Outcome and predictor variables

For each of the tests below, there is a frequentist and a Bayesian version of the test.

  • T-tests - comparing a continuous variable across two groups, i.e. a continuous and a dichotomous variable. A t-test requires a continuous outcome variable and a dichotomous (yes/no) predictor.
  • Chi-square tests - compares two categorical variables. Chi-square tests requires a categorical outcome variable and a categorical outcome variable
  • Correlation tests - compares to continuous variables.
  • Standardized Mean Difference (SMD) - comparing a continuous variable across two groups, i.e. a continuous and a dichotomous variable. We consider the following guidelines when interpreting SMD magnitude: SMD = 0.2 corresponds to a small effect; SMD = 0.5 corresponds to a medium effect; and SMD = 0.8 corresponds to a large effect [4]
  • Population Attributable Fraction (PAF) - calculates the contribution of individual risk factors to the burden of disease [5] [7].
  • Frequentist tests are chosen by algorithm. Common ones include

    • Two-sample t-test - outcome close to a normal distribution, risk factor dichotomous, patients coming from two distinct samples
    • Paired t-test
    • One-sample t-test
    • Chi-square test
    • Correlation test
  • Mock dataset

    • library(fabricatr)
      patients <- fabricate(
      N = 1000,
      gender = draw_binary(0.5, N = N),
      qol = round(runif(N, 45, 90)),
      age = round(runif(N, 18, 85)),
      prepost = draw_binary(prob = ifelse(qol < 40, 0.4, 0.7), N=N),
      rural = draw_binary(prob = ifelse(qol < 55, 0.3, 0.9), N=N)
      )

3. Algorithm: how does the method work?

Model mechanics

  • Bayesian inferential tests are based on the presence of prior belief (in clinical research often being mildly informative) being updated by data, and generating a posterior belief

Describing with code

  • Explanatory analysis

outcomes <- c("qol")
predictors <- c("gender", "age")
confounders <- c()
expanalysis <- sdatools::ExplanatoryAnalysis(data, predictors, confounders, outcomes, split_predictors = TRUE,
preprocess_missing = FALSE,
preprocess_linear_combos = FALSE,
preprocess_nzv = FALSE,
preprocess_high_correlation = FALSE,
labels = NULL)
knitr::kable(t(sdatools::predictedMeans(expanalysis)))

Data science packages

  • rstan [1] and brms [2] for Bayesian methods.
  • frequentist t-tests, Chi square tests, and correlation tests are available as part of base R, as well as across dozens of different packages.
  • epiR for Population Attributable Fraction [3].

Suggested companion methods

Learning materials

  1. Books

4. Output: how do I interpret this method's results?

Typical tables and plots and corresponding text description

  • Table one

sdatools::tableOne(Data, vars, strata), vars <- c("age", "gender","qol", "Diabetes"), strata <- c("Cancer")

Imgur

  • Table description: r table_nums("tableOne", "Sample description.", "cite") displays a description of the study sample. We present a comparison between patients with a cancer diagnosis and no cancer diagnosis. Our total sample included 1000 patients, 673 with a cancer diagnosis and 327 with no cancer diagnosis. The sample's mean age was 50.3 (+- 19) and most were male (60.4%). Compared to patients with a cancer diagnosis, those with no cancer diagnosis presented a higher incidence of diabetes (76.7, 80, p= 0.01)

  • Table description template:r table_nums("tableOne", "Sample description.", "cite") displays a description of the study sample. We also present a comparison between patients in ( {{group| Insert the projectarms example cancer diagnosis and no cancer diagnosis}}). Our total sample included ( {{Samplenumber| Insert the sample number example 1000}}), ( {{Samplenumber| Insert the first group example 302}}) who underwent ( {{First group| Insert the first group}}) and ( {{Second group sample number| Insert the sample number example 698}}) who underwent( {{Secondintervention| Insert the Secondgroup}}). The sample's mean age was ( {{Mean age| Insert the mean age}}).

Imgur

  • Table description: A multiple regression analysis was carried out between sociodemographic variables, and qol. The qol was significantly affected by age (p = 0.036) and diabetes diagnosis (p < 0.001).

    • Plots

      • Box plot

sdatools::boxPlot(patients,"age", strata)

Imgur

* ScatterPlot   

sdatools::scatterPlot(patients,"age", "qol")

Imgur

* Bar plot      

sdatools::barPlot(patients,"Cancer", "qol")

Imgur

  • Stackbar plot

sdatools::stackedBarPlot(patients,"Cancer", "qol")

Imgur

  • Pirate plot

sdatools::piratePlot(patients,"Cancer", "qol")

Imgur

  • t-test: [group 1] presented a significantly smaller [outcome] then [group 2], [mean 1 vs mean 2, p value].

  • Chi-square tests: a higher frequency of [var 1] was significantly associated with a higher frequency of [var 2](p value).

  • Pearson correlation test: an increase/decrease in [var 1] was significantly correlated with an increase/decrease in [var 2] (p value)

    a. Variable order: Always follow this order when presenting variables (describing in methods or in tables from results):
    1. Sociodemographic variables (age, education, gender, etc).
    2. Social determinants of health
    3. Comorbidities

    4. Clinical variables (diagnosis, etc)
    5. Outcomes   
    

    b. Univariate and bivariate analyses should be presented prior to modeling. For example, Kaplan Meyer plots should go before results from Cox Proportional Hazard models

Metaphors

Inferential tests assist in providing suggested explanations for situations or phenomena shown in the clinic. It is also possible to draw conclusions and make inferences after analyzing data collected in surveys (data observed in clinical trials).

Reporting guidelines

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

  • Frequentist (traditional or non-Bayesian) tests will often provide a p value along with 95% confidence intervals (CIs). The interpretation of p values is complex since it represents the probability of rejecting the null hypothesis, and not whether our actual hypothesis is correct or not. A given confidence interval level represents the proportion of possible confidence intervals that contain the true value of whatever you might be trying to estimate, for example a mean difference between two samples. The 95% confidence interval (CI) gives an expected range for the population odds ratio to fall within. It can be used to estimate the precision of the OR, where a large CI indicates a low level of precision of the OR, whereas a small CI indicates a higher precision of the OR. The CI is also used as a indicator of statistical significance for the OR if it does not overlap the null value (OR = 1). Of importance, negative CI values are just an artifact of the binomial distribution used to calculate them when the lower boundary is close to zero [6]. We can keep them as is or replace them with a zero value.

The p-value is the probability of observing the given effect at least as extreme as the one observed in the sample data, assuming the truth of null hypothesis. A p-value less than 0.05 means that observing such an extreme result under the null hypothesis would be very unlikely (less than 5% of the time), providing statistical significance to reject the null hypothesis (OR = 1).

  • Bayesian tests make use of credible intervals, which contain the correct answer in 95% of the time. This interpretation tends to be more intuitive and straightforward.

5. SporeData-specific

Data science functions

  • sdatools::tableOne
    • sdatools::boxPlot
    • sdatools::scatterPlot
    • sdatools::barPlot
    • sdatools::stackedBarPlot
    • sdatools::piratePlot
    • sdatools::ExplanatoryAnalysis(data, predictors, confounders, outcomes, split_predictors = TRUE, preprocess_missing = FALSE, preprocess_linear_combos = FALSE, preprocess_nzv = FALSE, preprocess_high_correlation = FALSE, labels = NULL)

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

  • Frequentist (traditional or non-Bayesian) tests will often provide a p value along with 95% confidence intervals (CIs). The interpretation of p values is complex since it represents the probability of rejecting the null hypothesis, and not whether our actual hypothesis is correct or not. A given confidence interval level represents the proportion of possible confidence intervals that contain the true value of whatever you might be trying to estimate, for example a mean difference between two samples.
  • Bayesian tests make use of credible intervals, which contain the correct answer in 95% of the time. This interpretation tends to be more intuitive and straightforward.

References

[1] Team SD. RStan: the R interface to Stan. R package version. 2016;2(1).
[2] Bürkner PC. Advanced Bayesian multilevel modeling with the R package brms. arXiv preprint arXiv:1705.11123. 2017 May 31.
[3] Stevenson M, Nunes T, Heuer C, Marshall J, Sanchez J, Thornton R, Reiczigel J, Robison-Cox J, Sebastiani P, Solymos P, Yoshida K. epiR: Tools for the analysis of epidemiological data. R package version 0.9-62.
[4] Faraone, Stephen V. 2008. “Interpreting Estimates of Treatment Effects: Implications for Managed Care.” P & T :A Peer-Reviewed Journal for Formulary Management 33 (12): 700–711. [5] World Health Organization. Metrics: population attributable fraction (PAF).
[6] Brown LD, Cai TT, Dasgupta A. Interval Estimation for a Binomial Proportion. Statistical Science. 1999;16:101-133.
[7] Fallahzadeh H, Ostovarfar M, Lotfi MH. Population attributable risk of risk factors for type 2 diabetes; Bayesian methods. Diabetes & Metabolic Syndrome: Clinical Research & Reviews. 2019 Mar 1;13(2):1365-8.

⚠️ **GitHub.com Fallback** ⚠️