How to use chi squared test with Hash Function? - minseok127/HashSimulator GitHub Wiki

Hypothesis

Chi squared test๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ์— ์•ž์„œ Hash Table์— ๋Œ€ํ•œ ์ „์ œ ์กฐ๊ฑด์„ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

  1. Hash Table์— ์กด์žฌํ•˜๋Š” bin์˜ ๊ฐœ์ˆ˜๋Š” m
  2. Hash Table์— ๋„ฃ์„ Key์˜ ๊ฐœ์ˆ˜๋Š” k

์ด๋ฅผ ์ „์ œ๋กœ ํ•˜๋‚˜์˜ bin์— ์ด์ƒ์ ์œผ๋กœ ๋“ค์–ด๊ฐˆ key์˜ ๊ฐœ์ˆ˜๋Š” k / m ๊ฐœ์ž…๋‹ˆ๋‹ค(Uniform Distribution).

๋˜ํ•œ k๊ฐ€ ์นด์ด์ œ๊ณฑ๋ถ„ํฌ์˜ ์ž์œ ๋„๋กœ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

Chi squared Value

bin์˜ Index๋ฅผ i๋ผ๊ณ  ํ‘œํ˜„ํ•  ๋•Œ, i๋Š” 0์—์„œ m-1๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

bi๋ฅผ i๋ฒˆ์งธ bin์— ๋“ค์–ด์žˆ๋Š” key์˜ ๊ฐœ์ˆ˜๋ผ๊ณ  ๊ฐ€์ •ํ•œ๋‹ค๋ฉด ์นด์ด์ œ๊ณฑ๊ฐ’์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

ฯ‡2 = ฮฃ (bi - k / m)^2 / (k / m)

์ฃผ์–ด์ง„ ๋ชจ๋“  k๊ฐœ์˜ key๋“ค์„ Hash Function์œผ๋กœ ๊ณ„์‚ฐํ•˜๊ณ  ์ด๋ฅผ ํ† ๋Œ€๋กœ bi๊ฐ’์„ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

Get the p-value

์œ„์™€ ๊ฐ™์ด ์นด์ด์ œ๊ณฑ๊ฐ’์„ ๊ตฌํ•œ ๊ฒƒ์œผ๋กœ ํ† ๋Œ€๋กœ, ์ž์œ ๋„๊ฐ€ m์ธ ์นด์ด์ œ๊ณฑ๋ถ„ํฌ ์ƒ์—์„œ p-value๋ฅผ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.

์ด p-value๊ฐ’์ด ๋„ˆ๋ฌด ์ž‘๊ฑฐ๋‚˜, ๋„ˆ๋ฌด ํฐ ๊ฒฝ์šฐ Uniform Distribution์„ ๋‚˜ํƒ€๋‚ธ๋‹ค๋Š” ๊ฐ€์ •์„ ๊ธฐ๊ฐํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.