Statistics - kaushikdas/TechnicalWritings GitHub Wiki

1. Understanding Statistical Inference

[Dr. Nicks Maths and Stats]

Statistical Analysis has two main focuses:

1. Descriptive statistics

  • Summarises data using graphs and summary values
    • Mean
    • Median
    • IQR (Inter Quartile Range)

Example: Summary of shoe count by sex

Min. 1st Qu. Median Mean 3rd Qu. Max. Std. Dev. Sample Size
Female 4 10 12 15.75 20 58 10.6 63
Male 2 4 5 6.429 7 40 5.22 98

The median number of shoes owned by a group of ladies is 12 whereas the median number of shoes owned by a group of gentlemen is 5.

  • Do not provide any conclusion beyond the data that we have. Therefore it does not provide us any information regarding the population data.

1.1 Code

Mean and standard deviation computation (less vulnerable to round-off error)

package kaushikd.intelligentprograms;

public class Accumulator {
    double s;
    double m;
    int N;

    public void addNumber(double x) {
        N++;
        s += 1.0 * (N - 1) / N * (x - m) * (x - m);
        m += (x - m) / N;
    }
    public double mean() {
        return m;
    }
    public double var() {
        return s / (N - 1);
    }
    public double stddev() {
        return Math.sqrt(var());
    }

    @Override
    public String toString() {
        return "Accumulator (" + N + " numbers): "
                + "mu = " + String.format("%7.5f", mean())
                + ", sigma = " + String.format("%7.5f", stddev());
    }

    public static void main(String[] args) {
        int T = Integer.parseInt(args[0]);

        Accumulator ac = new Accumulator();

        for (int i = 0; i < T; i++) {
            ac.addNumber(Math.random());
        }
        System.out.println(ac);
    }
}

/*
%java Accumulator 100
Accumulator (100 numbers): mu = 0.50470, sigma = 0.28573

%java Accumulator 200
Accumulator (200 numbers): mu = 0.46953, sigma = 0.29920

%java Accumulator 400
Accumulator (400 numbers): mu = 0.50910, sigma = 0.28568
 */

2. Inferential statistics

  • Allows us to draw conclusions beyond the data — it allows us to draw conclusions about the population from which these data are drawn
    • Inference (Definition): The process of drawing conclusions about population parameter is based on a sample taken from the population
      • 3 key ideas

        1. A sample is likely to be a good representation of the population.
        2. There is an element of uncertainty as to how well the sample represents the population.
        3. The way the sample is taken matters.