01.Association02.Inferential tests  sporedata/researchdesigneR Wiki
1. Use cases: in which situations should I use this method?
They are used to compare two variables (or a variable vs. a population) as an initial exploratory analysis or explore unadjusted relations. Unadjusted relations are useful since they show "the world as it is" rather than exploring causes.
2. Input: what kind of data does the method require?
 Crosssectional or longitudinal data
 Outcome and predictor variables

Frequentist tests are chosen by algorithm. Common ones include
 Twosample ttest  outcome close to a normal distribution, risk factor dichotomous, patients coming from two distinct samples
 Paired ttest
 Onesample ttest
 Chisquare test
 Correlation test

Mock dataset
 library(fabricatr)
patients < fabricate(
N = 1000,
gender = draw_binary(0.5, N = N),
qol = round(runif(N, 45, 90)),
age = round(runif(N, 18, 85)),
prepost = draw_binary(prob = ifelse(qol < 40, 0.4, 0.7), N=N),
rural = draw_binary(prob = ifelse(qol < 55, 0.3, 0.9), N=N)
)
 library(fabricatr)
3. Algorithm: how does the method work?
Model mechanics
 Bayesian inferential tests are based on the presence of prior belief (in clinical research often being mildly informative) being updated by data, and generating a posterior belief
Data science packages
 rstan 1(#1) and brms 2(#2) for Bayesian methods.
 frequentist ttests, Chi square tests, and correlation tests are available as part of base R, as well as across dozens of different packages.
 epiR for Population Attributable Fraction 3(#3).
Suggested companion methods
 Causal modeling if the goal is to investigate causes.
 Machine learning if the goal is to conduct personalized predictions.
Learning materials
 Books
 Articles
4. Output: how do I interpret this method's results?
For each of the tests below, there is a frequentist as well as a Bayesian version of the test.
 Ttests  comparing a continuous variable across two groups, i.e. a continuous and a dichotomous variable. A ttest requires a continuous outcome variable and a dichotomous (yes/no) predictor.
 Chisquare tests  compares two categorical variables. Chisquare tests requires a categorical outcome variable and a categorical outcome variable
 Correlation tests  compares to continuous variables.
 Standardized Mean Difference (SMD)  comparing a continuous variable across two groups, i.e. a continuous and a dichotomous variable. We consider the following guidelines when interpreting SMD magnitude: SMD = 0.2 corresponds to a small effect; SMD = 0.5 corresponds to a medium effect; and SMD = 0.8 corresponds to a large effect 4(#4)
 Population Attributable Fraction (PAF)  calculates the contribution of individual risk factors to the burden of disease 5(#5).
Typical tables and plots and corresponding text description
 Table one
sdatools::tableOne(Data, vars, strata), vars < c("age", "gender","qol", "Diabetes"), strata < c("Cancer")

Table description:
r table_nums("tableOne", "Sample description.", "cite")
displays a description of the study sample. We present a comparison between patients with a cancer diagnosis and no cancer diagnosis. Our total sample included 1000 patients, 673 with a cancer diagnosis and 327 with no cancer diagnosis. The sample's mean age was 50.3 (+ 19) and most were male (60.4%). Compared to patients with a cancer diagnosis, those with no cancer diagnosis presented a higher incidence of diabetes (76.7, 80, p= 0.01) 
Table description template:
r table_nums("tableOne", "Sample description.", "cite")
displays a description of the study sample. We also present a comparison between patients in ( {{group Insert the projectarms example cancer diagnosis and no cancer diagnosis}}). Our total sample included ( {{Samplenumber Insert the sample number example 1000}}), ( {{Samplenumber Insert the first group example 302}}) who underwent ( {{First group Insert the first group}}) and ( {{Second group sample number Insert the sample number example 698}}) who underwent( {{Secondintervention Insert the Secondgroup}}). The sample's mean age was ( {{Mean age Insert the mean age}}). Explanatory analysis
outcomes < c("qol")
predictors < c("gender", "age")
confounders < c()
expanalysis < sdatools::ExplanatoryAnalysis(data, predictors, confounders, outcomes, split_predictors = TRUE,
preprocess_missing = FALSE,
preprocess_linear_combos = FALSE,
preprocess_nzv = FALSE,
preprocess_high_correlation = FALSE,
labels = NULL)
knitr::kable(t(sdatools::predictedMeans(expanalysis)))

Table description: A multiple regression analysis was carried out between sociodemographic variables, and qol. The qol was significantly affected by age (p = 0.036) and diabetes diagnosis (p < 0.001).

Plots
 Box plot

sdatools::boxPlot(patients,"age", strata)
* ScatterPlot
sdatools::scatterPlot(patients,"age", "qol")
* Bar plot
sdatools::barPlot(patients,"Cancer", "qol")
 Stackbar plot
sdatools::stackedBarPlot(patients,"Cancer", "qol")
 Pirate plot
sdatools::piratePlot(patients,"Cancer", "qol")

ttest: [group 1] presented a significantly smaller [outcome] then [group 2], [mean 1 vs mean 2, p value].

Chisquare tests: a higher frequency of [var 1] was significantly associated with a higher frequency of [var 2](p value).

Pearson correlation test: an increase/decrease in [var 1] was significantly correlated with an increase/decrease in [var 2] (p value)
a. Variable order: Always follow this order when presenting variables (describing in methods or in tables from results):
1. Sociodemographic variables (age, education, gender, etc).
2. Social determinants of health
3. Comorbidities
4. Clinical variables (diagnosis, etc)
5. Outcomesb. Univariate and bivariate analyses should be presented prior to modeling. For example, Kaplan Meyer plots should go before results from Cox Proportional Hazard models
Associated concepts
Inferential tests assist in providing suggested explanations for situations or phenomena shown in the clinic. It is also possible to draw conclusions and make inferences after analyzing data collected in surveys (data observed in clinical trials).
Reporting guidelines
Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.
 Frequentist (traditional or nonBayesian) tests will often provide a p value along with 95% confidence intervals (CIs). The interpretation of p values is complex since it represents the probability of rejecting the null hypothesis, and not whether our actual hypothesis is correct or not. A given confidence interval level represents the proportion of possible confidence intervals that contain the true value of whatever you might be trying to estimate, for example a mean difference between two samples.
 Bayesian tests make use of credible intervals, which contain the correct answer in 95% of the time. This interpretation tends to be more intuitive and straightforward.
5. SporeDataspecific
Templates
Data science functions
 sdatools::tableOne
 sdatools::boxPlot
 sdatools::scatterPlot
 sdatools::barPlot
 sdatools::stackedBarPlot
 sdatools::piratePlot
 sdatools::ExplanatoryAnalysis(data, predictors, confounders, outcomes, split_predictors = TRUE, preprocess_missing = FALSE, preprocess_linear_combos = FALSE, preprocess_nzv = FALSE, preprocess_high_correlation = FALSE, labels = NULL)
General description
Clinical areas of interest
Variable categories
Linkage to other datasets
Limitations
Related publications
SporeData data dictionaries
Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.
 Frequentist (traditional or nonBayesian) tests will often provide a p value along with 95% confidence intervals (CIs). The interpretation of p values is complex since it represents the probability of rejecting the null hypothesis, and not whether our actual hypothesis is correct or not. A given confidence interval level represents the proportion of possible confidence intervals that contain the true value of whatever you might be trying to estimate, for example a mean difference between two samples.
 Bayesian tests make use of credible intervals, which contain the correct answer in 95% of the time. This interpretation tends to be more intuitive and straightforward.
References
[1] Team SD. RStan: the R interface to Stan. R package version. 2016;2(1).
[2] Bürkner PC. Advanced Bayesian multilevel modeling with the R package brms. arXiv preprint arXiv:1705.11123. 2017 May 31.
[3] Stevenson M, Nunes T, Heuer C, Marshall J, Sanchez J, Thornton R, Reiczigel J, RobisonCox J, Sebastiani P, Solymos P, Yoshida K. epiR: Tools for the analysis of epidemiological data. R package version 0.962.
[4] Faraone, Stephen V. 2008. “Interpreting Estimates of Treatment Effects: Implications for Managed Care.” P & T :A PeerReviewed Journal for Formulary Management 33 (12): 700–711.
[5] World Health Organization. Metrics: population attributable fraction (PAF).