Dealing with Non Normal Distribution - SoojungHong/StatisticalMind GitHub Wiki

Reason for Non-normality

  • Outliers can cause your data the become skewed. The mean is especially sensitive to outliers. Try removing any extreme high or low values and testing your data again.
  • Multiple distributions may be combined in your data, giving the appearance of a bimodal or multimodal distribution. For example, two sets of normally distributed test results are combined in the following image to give the appearance of bimodal data.
  • Insufficient Data can cause a normal distribution to look completely scattered. For example, classroom test results are usually normally distributed. An extreme example: if you choose three random students and plot the results on a graph, you won’t get a normal distribution. You might get a uniform distribution (i.e. 62 62 63) or you might get a skewed distribution (80 92 99). If you are in doubt about whether you have a sufficient sample size, collect more data.
  • Data may be inappropriately graphed. For example, if you were to graph people’s weights on a scale of 0 to 1000 lbs, you would have a skewed cluster to the left of the graph. Make sure you’re graphing your data on appropriately labeled axes.

Dealing with Non-normal distribution

  1. Several tests, including the one sample Z test, T test and ANOVA assume normality. You may still be able to run these tests if your sample size is large enough (usually over 20 items).

  2. You can also choose to transform the data with a function, forcing it to fit a normal model. CoxBox transformation?

  3. a sample that is skewed or one that naturally fits another distribution type, you may want to run a non parametric test. A non parametric test is one that doesn’t assume the data fits a specific distribution type. Non parametric tests include the Wilcoxon signed rank test, the Mann-Whitney U Test and the Kruskal-Wallis test.