Statistical Recipes in Python - mauriceling/mauriceling.github.io GitHub Wiki

Descriptive Statistics

Methods to summarize a given set of data.

  1. Arithmetic mean
  2. Geometric mean
  3. Harmonic mean
  4. Maximum
  5. Minimum
  6. Moment
  7. Kurtosis
  8. Sample standard deviation
  9. Sample standard error
  10. Sample variance
  11. Skew

Normality Tests

Methods to examine whether a set of values are normally distributed.

  1. Kurtosis test: test whether kurtosis of data is normal
  2. Jarque-Bera test: normality test for large sample size (n > 2000)
  3. Shapiro-Wilk test for normality: normality test for small sample size
  4. Skew test: test whether the skew of data is normal

Equal Variance Tests

Methods to examine whether two or more sets of values have the same variance (or standard deviation).

  1. Bartlett's test
  2. Fligner-Killeen test
  3. Levene's test

Parametric and Nonparametric Correlations

Methods to examine the trend of 2 sets of values.

  1. Kendall's tau: correlation measure for ordinal data
  2. Pearson's correlation coefficient: correlation measure for normally distributed data
  3. Point biserial correlation coefficient: correlation measure between a binary variable and a continuous variable
  4. Somer's D: asymmetric measure of ordinal association
  5. Spearman's rank correlation coefficient: correlation measure for ranked data

Parametric Tests

Statistical tests assuming normality in data set(s).

  1. Alexander Govern test: test for equal means in 2 or more samples without assuming equal variances
  2. ANOVA - One-way: test for equal means in 2 or more samples assuming equal variances
  3. t-test - 2-samples (independent samples) assuming equal variance
  4. t-test - 2-samples (independent samples) assuming unequal variance
  5. t-test - paired (dependent samples)

Nonparametric Tests

Statistical tests without assuming normality in data set(s), also known as distribution-free tests.

  1. Brunner-Munzel test: non-parametric version of 2-samples (independent samples) t-test without assuming equal variances
  2. Chi-Square test: test whether 2 distributions are equal
  3. Cramér-von Mises test: test whether 2 distributions are equal
  4. Cressie-Read power divergence test: test whether 2 distributions are equal
  5. Epps-Singleton test: test whether 2 distributions are equal
  6. Freeman-Tukey test: test whether 2 distributions are equal
  7. G-test: test whether 2 distributions are equal
  8. Kolmogorov-Smirnov test: test whether 2 distributions are equal
  9. Kruskal-Wallis H-test: non-parametric version of ANOVA - One-way
  10. Mann-Whitney U test: non-parametric version of 2-samples (independent samples) t-test assuming equal variances
  11. Neyman's test of goodness of fit: test whether 2 distributions are equal
  12. Page's L test: measure of trend in observations between treatments
  13. Wilcoxon rank-sum test: non-parametric version of 2-samples (independent samples) t-test assuming equal variances
  14. Wilcoxon signed-rank test: non-parametric version of t-test - paired (dependent samples)

Contingency Tables

Methods to analyze 2x2 and MxN contingency tables.

  1. Barnard exact test: test whether variable for columns is independent to the variable for rows on 2x2 contingency table
  2. Boschloo’s exact test: test whether variable for columns is independent to the variable for rows on 2x2 contingency table
  3. Chi-square test of independence: test whether variable for columns is independent to the variable for rows on MxN contingency table
  4. Cramer’s V: measure the degree of association between two nominal variables on MxN contingency table
  5. Fisher exact test: test whether variable for columns is independent to the variable for rows on 2x2 contingency table
  6. G-test of independence: test whether variable for columns is independent to the variable for rows on MxN contingency table
  7. Pearson’s contingency coefficient: measure the degree of association between two nominal variables on MxN contingency table
  8. Relative risk: test whether exposure increases the risk of an outcome using 2x2 contingency table
  9. Odds ratio: test whether exposure increases the odds of an outcome using 2x2 contingency table
  10. Tschuprow’s T: measure the degree of association between two nominal variables on MxN contingency table

Combining Multiple Tests

Methods to combine p-values from independent tests on the same hypothesis.

  1. Fisher's combined probability test
  2. Mudholkar-George method
  3. Pearson's method
  4. Stouffer's Z-score method
  5. Tippett’s method