T Test - Statistics-and-Machine-Learning-with-R/Statistical-Methods-and-Machine-Learning-in-R GitHub Wiki
A t-test is a statistical hypothesis check which checks if two means/averages (of two groups) are reliably different from each other
- Looking at mean may show a difference but we cannot be sure if it is a reliable difference.
- For example: If two-person(you and me) flip the coin 100 times and you get more heads,
does that mean in the future you will again get more head? NO, because it's just a chance
- So, here comes the difference between Descriptive Stat. and Inferential Stat
Descriptive Statistics | Inferential Statistics |
---|---|
It is a stat, such as mean, it describes data but does not goes beyond that. | It is a stat, such as t-test, that allows us to make inferences beyond our data. |
- The working of t-test can be understood by a test.
- Let's test a cholesterol control pill on two groups
but in a way that half of them get the drug and rest half get the inactive drug.
- So, the mean cholesterol of both groups is different. But Is the
difference reliable? Is the drug working or not? There we can check by T-test analysis.
- Where difference between groups is 2 and difference with-in groups is 6
So, t=> 2/6 = 0.3 - To, check the reliability of test p-value is used. Whereas, p-value tells
that there is a real difference between two groups or its just a fluke. - Usually if p-value is less the 0.05 (means 5%) or less
then hen the effect is real otherwise not. - But do not forget p-value depends upon Sample size. Bigger the sample size better the accuracy.
Usually there are three main types of t-test
- Independent sample test.
- Paired sample test(dependent sample test).
- One sample test
Independent sample test | Paired sample test | One sample test |
---|---|---|
Tests the mean of two different groups | Tests the mean of one group twice | Tests the mean of one group against a set mean |
e.g. Testing the average quality of two different batches of beer | e.g. Testing balance of people before and after drinking alcohol | e.g. Testing IQ of group of people against a standard value 100 |
-
Results can only be applied to a population that resembles the sample.
e.g. Cholesterol drug test was conducted for adults, So i can not be true for children. -
Sample and Population should be roughly normal in distribution.
-
Each group should have the same numbers of data points. Otherwise, there will be inaccurate results.
-
All data should be independent.
- Non-parametric tests like Mann-Whitney U-test.
- It performs the same job as t-test, but it can work with normal distribution and ordered level data.