Kolmogorov Smirnov Test (KS Test) - estebanz01/ruby-statistics GitHub Wiki
Kolmogorov Smirnov Test
This is an implementation of the Kolmogorov-Smirnov goodness of fit test (KS test) for two samples.
This test give us some validations to know if two samples follows the same distribution or not, by measuring the distance between them. It does not tell which particular distribution fits.
Although this test has been implemented with the KolmogorovSmirnovTest.new
, this also can be used with the alias KSTest.new
.
Class methods
Two samples
This method expects three keywords: group_one:
, group_two:
and alpha:
, where alpha:
has a default value of 0.05
. It returns a hash with the following keys:
d_max
: It returns the maximum absolute difference between all samples.d_critical
: It returns D the critical value, calculated as specified here: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov%E2%80%93Smirnov_test.total_samples
: It returns the total number of samples evaluated.alpha
: the specified alpha value.null
: Eithertrue
orfalse
. If true, it means that the null hypothesis should not be rejected.alternative
: Eithertrue
orfalse
. If true, it means that the null hypothesis can be rejected.confidence_level
: Defined as1 - alpha
.
An example with two samples coming from the same distribution:
6] pry(main)> group_one = Distribution::StandardNormal.new.random(elements: 10, seed: 5)
=> [-0.33087015189408764,
-0.2520921296030769,
1.5824811170615634,
-0.5916366579302884,
-0.32986995777935924,
-0.2048765105875873,
0.6034716026094954,
-0.7001790376899514,
1.8573310072313118,
0.6448475108927784]
[7] pry(main)> group_two = Distribution::StandardNormal.new.random(elements: 20, seed: 4)
=> [0.499951333237829,
0.6935985082913116,
-1.5845772351121241,
0.5985751739673772,
-1.1474766329454797,
-0.08798692834027545,
0.33225314537233536,
0.3509971530825316,
1.5469793290157272,
0.046135567230164806,
0.054432738865157676,
-1.2089481591289206,
0.39429521471439166,
-1.1128121538537732,
-1.3609655918364936,
0.5424513084033299,
-2.358073633011557,
0.8378363539201328,
0.9148409578006828,
0.7965118987249309]
[8] pry(main)> StatisticalTest::KSTest.two_samples(group_one: group_one, group_two: group_two)
=> {:d_max=>0.3, :d_critical=>0.4740041355479394, :total_samples=>30, :alpha=>0.05, :null=>true, :alternative=>false, :confidence_level=>0.95}
An example with two samples, both from different distributions:
[10] pry(main)> group_one = Distribution::StandardNormal.new.random(elements: 10, seed: 5)
=> [-0.33087015189408764,
-0.2520921296030769,
1.5824811170615634,
-0.5916366579302884,
-0.32986995777935924,
-0.2048765105875873,
0.6034716026094954,
-0.7001790376899514,
1.8573310072313118,
0.6448475108927784]
[11] pry(main)> group_two = Distribution::Weibull.new(3, 4).random(elements: 10, seed: 5)
=> [0.39738923502790524,
0.7997225442592585,
0.3868528026254838,
0.8559574581334326,
0.5513010739589683,
0.6184303884163658,
0.7133558756258811,
0.5673993393806528,
0.4448443026776265,
0.37319828077807826]
[12] pry(main)> StatisticalTest::KSTest.two_samples(group_one: group_one, group_two: group_two)
=> {:d_max=>0.6, :d_critical=>0.5473328305111973, :total_samples=>20, :alpha=>0.05, :null=>false, :alternative=>true, :confidence_level=>0.95}