U Test - Statistics-and-Machine-Learning-with-R/Statistical-Methods-and-Machine-Learning-in-R GitHub Wiki
The two-sample t-test is one of the most used statistical procedures. Its purpose is to test the hypothesis that the means of two groups are the same. The test assumes that the variable in question is normally distributed in two groups. When this assumption is in doubt, the non-parametric Wilcoxon-Mann-Whitney (or rank-sum ) test is sometimes suggested as an alternative to the t-test (e.g. the Wikipedia page on the t-test), which doesn't rely on distributional assumptions.
U-Test is to be used as an alternative to the two-sample t-test (for example because the normality assumption made by the latter is in doubt), it would seem a reasonable requirement that it ought to be testing the same 'thing'. What null and alternative hypotheses is U-Test testing? Although papers or books may present a single set of hypotheses, it turns out the WMW test is valid under a range of different sets of possible null and alternative hypotheses. But the commonly stated hypotheses are that the distributions in the two groups are the same (null) vs that the probability that a random observation from group 1 exceeds a random observation from group 2 differs from 0.5 (under the null this probability is 0.5).
- It is non-parametric test equivalent of independent T-Test
- Unlike t-test, this test compares independent groups/samples by rank test.
- Let's take a sample data of two groups 'Sales' person and 'Manager' person
(Figure-1) and categorize them in ranks based on their salary{in contrast to t-test number of data-points in groups are not equal}. - And order them with respect to their ranks (Figure-2).
'Figure-1' 'Figure-2'
- U-Statistics is applied to this data to check the comparison of two groups with respect to their salaries in terms of rank.
- Where U-stat states, Degree of overlap in ranks between the two groups.
- Bigger U-value gives a bigger overlap between groups and vice versa.
- We calculate 'sum of ranks' and 'mean rank' according to the position of the person (Figure-4).
- And underline the group with less value of 'sum of ranks', in this case, it is 'Sales'
- Now for each data point in that group, add up how many data-points in that group are smaller in rank.
- To understand the last point there are 3 different figures. Where first is with data used for this statement the other two are for general understanding.
- Now compare U-value with corresponding P-value{from U distribution table}.
- Where n1 and n2 are the number of data-points in group-1 and group-2.
- So, n1= 4 and n2 = 3 for Sales and Manager group respectively.
- We are taking alpha= .05 and it is pre determined significance level.
.
- For n1 and n2, corresponding critical value is 0 (zero).
- And for U=1 and alpha = .05 we find that
p>.05; n.s (there is no significant difference between two groups.