Dealing with Non Normal Distribution - SoojungHong/StatisticalMind GitHub Wiki
Reason for Non-normality
- Outliers can cause your data the become skewed. The mean is especially sensitive to outliers. Try removing any extreme high or low values and testing your data again.
- Multiple distributions may be combined in your data, giving the appearance of a bimodal or multimodal distribution. For example, two sets of normally distributed test results are combined in the following image to give the appearance of bimodal data.
- Insufficient Data can cause a normal distribution to look completely scattered. For example, classroom test results are usually normally distributed. An extreme example: if you choose three random students and plot the results on a graph, you won’t get a normal distribution. You might get a uniform distribution (i.e. 62 62 63) or you might get a skewed distribution (80 92 99). If you are in doubt about whether you have a sufficient sample size, collect more data.
- Data may be inappropriately graphed. For example, if you were to graph people’s weights on a scale of 0 to 1000 lbs, you would have a skewed cluster to the left of the graph. Make sure you’re graphing your data on appropriately labeled axes.
Dealing with Non-normal distribution
-
Several tests, including the one sample Z test, T test and ANOVA assume normality. You may still be able to run these tests if your sample size is large enough (usually over 20 items).
-
You can also choose to transform the data with a function, forcing it to fit a normal model. CoxBox transformation?
-
a sample that is skewed or one that naturally fits another distribution type, you may want to run a non parametric test. A non parametric test is one that doesn’t assume the data fits a specific distribution type. Non parametric tests include the Wilcoxon signed rank test, the Mann-Whitney U Test and the Kruskal-Wallis test.