Statistics Done Wrong - KeynesYouDigIt/Knowledge GitHub Wiki

Statistical Significance

A statistically significant difference is one that is larger than could easily be produced by luck.

p Values

The p value is the probability of collecting data that shows a difference equal to or greater than what you actually observed. p values force you to reason about results that never actually occurred. It was originally just intended to be combined with your domain knowledge and give you a tool for interpretting data.

p < 0.05 is commonly regarded as statistically significant.

  • Not how right you are
  • It's a measure of "surprise" - A smaller value means you should be more surprised
  • "My data is inconsistent with the result not being true"
  • Doesn't tell you whether to the mean, median, or mode is more appropriate

You can get statistically significant results by:

* By collecting a ton of data and measuring a huge effect
* Measuring extremely tiny (but unimportant) differences 

Null-Hypothesis

Ensure that false-positives occurred at a predefined rate, called α.

You make a null hypothesis (that there is no effect), as well as an alternative hypothesis (that there is some effect). You perform a test, and reject the null hypothesis whenever p < α

Confidence Intervals

Combine a point estimate with a confidence in that estimate.

  • "I'm 95% confident that the result will be between these two boundaries"
  • A wide range lowers the value

Answer the same question as p values, but provide more information and are more straight-forward to interpret. They also require much less context. That makes them preferrable to p values.

Errors

  • False positives - Thinking there's an effect when there isn't
  • False negatives - Failing to notice a real effect

The results of experiments don't have these, procedures do.

Statistical Power

Tells you how much data to collect. Power is affected by:

  • Size of bias you're looking for (big is easier to detect than small)
  • Sample size (more data helps find smaller biases)
  • Measurement error (how easy it is to miscapture data)
  • 80% statistical power is generally accepted as valid.

Measured by a curve:

Stastical Power vs. Probability