Kolmogorov Smirnov Test (KS Test) - estebanz01/ruby-statistics GitHub Wiki

Kolmogorov Smirnov Test

This is an implementation of the Kolmogorov-Smirnov goodness of fit test (KS test) for two samples.

This test give us some validations to know if two samples follows the same distribution or not, by measuring the distance between them. It does not tell which particular distribution fits.

Although this test has been implemented with the KolmogorovSmirnovTest.new, this also can be used with the alias KSTest.new.

Class methods

Two samples

This method expects three keywords: group_one:, group_two: and alpha:, where alpha: has a default value of 0.05. It returns a hash with the following keys:

d_max: It returns the maximum absolute difference between all samples.
d_critical: It returns D the critical value, calculated as specified here: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov%E2%80%93Smirnov_test.
total_samples: It returns the total number of samples evaluated.
alpha: the specified alpha value.
null: Either true or false. If true, it means that the null hypothesis should not be rejected.
alternative: Either true or false. If true, it means that the null hypothesis can be rejected.
confidence_level: Defined as 1 - alpha.

An example with two samples coming from the same distribution:

6] pry(main)> group_one = Distribution::StandardNormal.new.random(elements: 10, seed: 5)
=> [-0.33087015189408764,
 -0.2520921296030769,
 1.5824811170615634,
 -0.5916366579302884,
 -0.32986995777935924,
 -0.2048765105875873,
 0.6034716026094954,
 -0.7001790376899514,
 1.8573310072313118,
 0.6448475108927784]
[7] pry(main)> group_two = Distribution::StandardNormal.new.random(elements: 20, seed: 4)
=> [0.499951333237829,
 0.6935985082913116,
 -1.5845772351121241,
 0.5985751739673772,
 -1.1474766329454797,
 -0.08798692834027545,
 0.33225314537233536,
 0.3509971530825316,
 1.5469793290157272,
 0.046135567230164806,
 0.054432738865157676,
 -1.2089481591289206,
 0.39429521471439166,
 -1.1128121538537732,
 -1.3609655918364936,
 0.5424513084033299,
 -2.358073633011557,
 0.8378363539201328,
 0.9148409578006828,
 0.7965118987249309]
[8] pry(main)> StatisticalTest::KSTest.two_samples(group_one: group_one, group_two: group_two)
=> {:d_max=>0.3, :d_critical=>0.4740041355479394, :total_samples=>30, :alpha=>0.05, :null=>true, :alternative=>false, :confidence_level=>0.95}

An example with two samples, both from different distributions:

[10] pry(main)> group_one = Distribution::StandardNormal.new.random(elements: 10, seed: 5)
=> [-0.33087015189408764,
 -0.2520921296030769,
 1.5824811170615634,
 -0.5916366579302884,
 -0.32986995777935924,
 -0.2048765105875873,
 0.6034716026094954,
 -0.7001790376899514,
 1.8573310072313118,
 0.6448475108927784]
[11] pry(main)> group_two = Distribution::Weibull.new(3, 4).random(elements: 10, seed: 5)
=> [0.39738923502790524,
 0.7997225442592585,
 0.3868528026254838,
 0.8559574581334326,
 0.5513010739589683,
 0.6184303884163658,
 0.7133558756258811,
 0.5673993393806528,
 0.4448443026776265,
 0.37319828077807826]
[12] pry(main)> StatisticalTest::KSTest.two_samples(group_one: group_one, group_two: group_two)
=> {:d_max=>0.6, :d_critical=>0.5473328305111973, :total_samples=>20, :alpha=>0.05, :null=>false, :alternative=>true, :confidence_level=>0.95}