Kruskal Wallis Test (H Test) - Statistics-and-Machine-Learning-with-R/Statistical-Methods-and-Machine-Learning-in-R GitHub Wiki
Kruskal-Wallis-Test (H-Test) :
Click for R-Script
The Kruskal-Wallis H test (sometimes also called the "one-way ANOVA on ranks") is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable. It is considered the nonparametric alternative to the one-way ANOVA, and an extension of the Mann-Whitney U test to allow the comparison of more than two independent groups.
For example, you could use a Kruskal-Wallis H test to understand whether exam performance, measured on a continuous scale from 0-100, differed based on test anxiety levels (i.e., your dependent variable would be "exam performance" and your independent variable would be "test anxiety level", which has three independent groups: students with "low", "medium" and "high" test anxiety levels).
It is important to realize that the Kruskal-Wallis H test is an omnibus test statistic and cannot tell you which specific groups of your independent variable are statistically significantly different from each other; it only tells you that at least two groups were different. Since you may have three, four, five or more groups in your study design, determining which of these groups differ from each other is important.
Assumptions
-
Your dependent variable should be measured at the ordinal or continuous level (i.e., interval or ratio). Examples of ordinal variables include Likert scales (e.g., a 7-point scale from "strongly agree" through to "strongly disagree")
-
Your independent variable should consist of two or more categorical, independent groups. Typically, a Kruskal-Wallis H test is used when you have three or more categorical, independent groups, but it can be used for just two groups (i.e., a Mann-Whitney U test is more commonly used for two groups)
-
You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves. For example, there must be different participants in each group with no participant being in more than one group
-
In order to know how to interpret the results from a Kruskal-Wallis H test, you have to determine whether the distributions in each group (i.e., the distribution of scores for each group of the independent variable) have the same shape (which also means the same variability). To understand what this means, take a look at the diagram below:
In the diagram on the left above, the distribution of scores for the "Caucasian", "African American" and "Hispanic" groups have the same shape. On the other hand, in the diagram on the right above, the distribution of scores for each group are not identical (i.e., they have different shapes and variabilities).
Steps for calculation
Sample question: A shoe company wants to know if three groups of workers have different salaries:
Women: 23K, 41K, 54K, 66K, 78K.
Men: 45K, 55K, 60K, 70K, 72K
Minorities: 18K, 30K, 34K, 40K, 44K.
1. Sort the data for all groups/samples into ascending order in one combined set. 20K 23K 30K 34K 40K 41K 44K 45K 54K 55K 60K 66K 70K 72K 90K
2. Assign ranks to the sorted data points. Give tied values the average rank. (20K 1) (23K 2) (30K 3) (34K 4) (40K 5) (41K 6) (44K 7) (45K 8) (54K 9) (55K 10) (60K 11) (66K 12) (70K 13) (72K 14) (90K 15)
3. Add up the different ranks for each group/sample. Women: 23K, 41K, 54K, 66K, 90K = 2 + 6 + 9 + 12 + 15 = 44. Men: 45K, 55K, 60K, 70K, 72K = 8 + 10 + 11 + 13 + 14 = 56. Minorities: 20K, 30K, 34K, 40K, 44K = 1 + 3 + 4 + 5 + 7 = 20.
4. Calculate the H statistic:
Where: n = sum of sample sizes for all samples, c = number of samples, Tj = sum of ranks in the jth sample, nj = size of the jth sample.
H = 6.72
5. Find the critical chi-square value, with c-1 degrees of freedom. For 3 – 1 degree of freedom and an alpha level of .05, the critical chi-square value is 5.9915.
6. Compare the H value from Step 4 to the critical chi-square value from Step 5. If the critical chi-square value is less than the H statistic, reject the null hypothesis that the medians are equal. If the chi-square value is not less than the H statistic, there is not enough evidence to suggest that the medians are unequal. In this case, 5.9915 is less than 6.72, so you can reject the null hypothesis.
Points to consider
Since it is a non-parametric method, the Kruskal–Wallis test does not assume a normal distribution of the residuals, unlike the analogous one-way analysis of variance. If the researcher can make the assumptions of an identically shaped and scaled distribution for all groups, except for any difference in medians, then the null hypothesis is that the medians of all groups are equal, and the alternative hypothesis is that at least one population median of one group is different from the population median of at least one other group.