smhasher's FillFactor - minseok127/HashSimulator GitHub Wiki

Introduce

smhasher์˜ Distribution wiki ๋ฐ FillFactor wiki์— ๋‚˜์˜จ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ํŽ˜์ด์ง€์ž…๋‹ˆ๋‹ค.

Contents

Relative Work

Key๋“ค์ด Hash Table์˜ bins์— ํšจ์œจ์ ์œผ๋กœ ๋ถ„ํฌํ–ˆ๋Š” ์ง€๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด์„œ Test Distribution๊ณผ Ideal Distribution์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

ํŠน์ • ๋ถ„ํฌ์—์„œ ๊ธฐ๋Œ€๋˜๋Š” work์˜ ํฌ๊ธฐ๋ฅผ Expected Work(EW)๋ผ๊ณ  ํ‘œํ˜„ํ•  ๋•Œ

Test Distribution์˜ EW / Ideal Distribution์˜ EW = Relative Work(RW)๋ฅผ ํ†ตํ•ด์„œ ์ด์ƒ์ ์ธ ๋ถ„ํฌ์™€ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Flaw

ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์—๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š”๋ฐ

์ฒซ ๋ฒˆ์งธ๋กœ RW๊ฐ€ Hash Table์˜ ํฌ๊ธฐ์™€ key์˜ ๊ฐœ์ˆ˜์— ๋„ˆ๋ฌด ์˜ํ–ฅ์„ ๋งŽ์ด ๋ฐ›๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ table์—์„œ ๊ตฌํ•ด์ง„ RW๋ผ๋ฆฌ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์ด ์–ด๋ ต๋‹ค๋Š” ๊ฒƒ์ด๊ณ ,

๋‘ ๋ฒˆ์งธ๋กœ RW๋Š” lower bound๋Š” ์žˆ์ง€๋งŒ upper bound๋Š” table์˜ ํฌ๊ธฐ์™€ key์˜ ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ ๋งค์šฐ ์ปค์งˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์„ธ ๋ฒˆ์งธ๋กœ RW๋Š” ํ†ต๊ณ„์ ์œผ๋กœ ๊ฒฐํ•จ์ด ์žˆ๋Š” ๋ถ„ํฌ์—์„œ๋„ ๊ดœ์ฐฎ์€ ๊ฐ’์ด ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋ฆฌํ•˜์—ฌ RW์˜ ๋‹จ์ ์„ ๊ฐ€์ง€์ง€ ์•Š๋Š” ์ƒˆ๋กœ์šด ์ง€ํ‘œ์ธ FillFactor๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.

์ด ์ง€ํ‘œ๋Š” 0์—์„œ 1์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ€์ง€๊ธฐ ๋•Œ๋ฌธ์— ์ค‘๊ฐ„๊ฐ’์— ๋Œ€ํ•œ ์˜๋ฏธ์žˆ๋Š” ํ•ด์„์„ ํ•  ์ˆ˜ ์žˆ๊ณ , ๋‹ค๋ฅธ table์˜ ๊ฐ’๊ณผ๋„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Assume

Hash Table์— N๊ฐœ์˜ bins, K๊ฐœ์˜ key๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

๊ฐ bin๋“ค์— ๋“ค์–ด๊ฐ„ key์˜ ๊ฐœ์ˆ˜๋ฅผ Bi๋ผ๊ณ  ๋ถ€๋ฅผ ๋•Œ Bi์˜ ๊ธฐ๋Œ€๊ฐ’์€ K / N์ž…๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ๊ฐ Bi๋“ค์— ๋Œ€ํ•ด์„œ root-square-mean์„ ๊ณ„์‚ฐํ•˜๊ณ  ์ด๋ฅผ R๋กœ ๋ถ€๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค.

R = { (B1^2 + B2^2 + ... + BN^2) / N }^(1/2)

Ideal Distribution์—์„œ ์‚ฌ์šฉํ•˜๋Š” bins์˜ ๊ฐœ์ˆ˜๋ฅผ F๋ผ๊ณ  ์ •์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ฆ‰ Ideal Distribution์ด 0๋ฒˆ๋ถ€ํ„ฐ F๋ฒˆ๊นŒ์ง€๋งŒ ์‚ฌ์šฉํ•˜๊ณ  F + 1๋ฒˆ๋ถ€ํ„ฐ N - 1๋ฒˆ๊นŒ์ง€๋Š” ๋น„์–ด์žˆ๋Š” bins๋กœ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ด ๊ฒฝ์šฐ F๊ฐ€ 1์ด๋ผ๋ฉด ๊ฐ€์žฅ ํšจ์œจ์ด ์•ˆ์ข‹์€ ๊ฒƒ์ด๊ณ , F๊ฐ€ N์ด๋ผ๋ฉด ๊ฐ€์žฅ ํšจ์œจ์ด ์ข‹์€ ์ƒํ™ฉ์ž…๋‹ˆ๋‹ค.

FillFactor

root-square-mean์„ ๊ตฌํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ MR์ด๋ผ๊ณ  ํ•˜๋ฉด Ideal Distribution์˜ root-square-mean์€ MR(F,N,K)๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ฐ’์„ Test Distribution์˜ root-square-mean์ธ R๊ฐ’๊ณผ ๋น„๊ตํ•˜์—ฌ F๋ฅผ ์—ญ์œผ๋กœ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.

์ฆ‰ R = MR(F,N,K)๋ผ๋Š” ์‹์„ ์„ธ์šฐ๊ณ  ์ด๋ฅผ ๋ณ€ํ˜•ํ•˜์—ฌ F = ~ ์™€ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

FillFactor wiki์— ๋”ฐ๋ฅด๋ฉด F๋ผ๋Š” ๊ฐ’์€

double f = (k*k - 1) / (n*r*r - k); // k : total key count, n : total bin count, r : root-square-mean of test

์ด์™€ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์ด F๊ฐ’์„ ํ†ตํ•ด์„œ FillFactor๋ฅผ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

return 1 - (f / n);

๊ฒฐ๊ตญ FillFactor๊ฐ€ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ์€ ๋ชจ๋“  key๋“ค์ด F๊ฐœ์˜ bins์— ์ด์ƒ์ ์œผ๋กœ ๋ถ„ํฌํ•˜์˜€์„ ๋•Œ, ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š” bins์˜ ๋น„์œจ์ž…๋‹ˆ๋‹ค.

์ฆ‰ Test Distribution์ด FillFactor๋งŒํผ์„ wasteํ•˜๋Š” Ideal Distribution๊ณผ ๊ฐ™์€ ํšจ์œจ์„ ๋ณด์ธ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.