Naive Bayes - BD-SEARCH/MLtutorial GitHub Wiki

Bayes Theorem

  • ์‚ฌ์ „ํ™•๋ฅ ๊ณผ ์šฐ๋„ํ™•๋ฅ ์„ ์•Œ ๋•Œ ์‚ฌํ›„ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•.

  • P(A|B)๋ฅผ ์•Œ๊ณ  ์žˆ์„ ๋•Œ, ๊ด€๊ณ„๊ฐ€ ์ •๋ฐ˜๋Œ€์ธ P(B|A)๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•

    • ๊ฒฐ๊ณผ๋ฅผ ๊ด€์ธกํ•œ ๋’ค ์›์ธ์„ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•.
  • Bayes Theorem์˜ ํ†ต์‹œ์  ํ•ด์„

    • ๋ฐ์ดํ„ฐ D์˜ ๊ด€์ ์—์„œ ๋ณด์•˜์„ ๋•Œ ๊ฐ€์„ค H์˜ ํ™•๋ฅ ์„ ์ˆ˜์ •ํ•ด์ค€๋‹ค. ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์˜ฌ ๋•Œ๋งˆ๊ฐ€ ๊ฐ€์„ค์— ๋Œ€ํ•œ ํ™•๋ฅ ์ด ๋‹ฌ๋ผ์ง„๋‹ค.
    • P(A|B): ์‚ฌํ›„ ํ™•๋ฅ . ๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•œ ์ดํ›„ ๊ฐ€์„ค์˜ ํ™•๋ฅ 
    • P(A): ์‚ฌ์ „ ํ™•๋ฅ . ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ธฐ ์ „ ๊ฐ€์„ค์˜ ํ™•๋ฅ 
    • P(B|A): likelihood. ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ€์„ค์— ํฌํ•จ๋  ํ™•๋ฅ 
    • P(B): ํ•œ์ • ์ƒ์ˆ˜. ์–ด๋–ค ๊ฐ€์„ค์—๋“  ํฌํ•จ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋น„์œจ.
  • ์˜ˆ: ๊ฒ€์‚ฌ์—์„œ ์–‘์„ฑ์ด ๋‚˜์™”์„ ๋•Œ X๋ผ๋Š” ๋ณ‘์— ๊ฑธ๋ ธ์„ ํ™•๋ฅ 

    • A = ๋ณ‘์— ๊ฑธ๋ ธ์„ ์‚ฌ๊ฑด
    • B = ๊ฒ€์‚ฌ์—์„œ ์–‘์„ฑ์ด ๋‚˜์˜ฌ ์‚ฌ๊ฑด
    • ๊ฐ€์ •
      • P(A): 1%. ์ „์ฒด ์ธ๊ตฌ์˜ 1%๊ฐ€ X ๋ณ‘์— ๊ฑธ๋ ธ๋‹ค.
      • P(B): ๊ฒ€์‚ฌ์—์„œ ์–‘์„ฑ์ด ๋‚˜์˜ฌ ํ™•๋ฅ . P(A) X P(B|A) + P(~A) X P(B|~A) --> X๋ผ๋Š” ๋ณ‘์— ๊ฑธ๋ ธ์œผ๋ฉด์„œ ๊ฒ€์‚ฌ์—์„œ ์–‘์„ฑ์ด ๋‚˜์˜ฌ ํ™•๋ฅ  + X๋ผ๋Š” ์— ์•ˆ ๊ฑธ๋ ธ์œผ๋ฉด์„œ ๊ฒ€์‚ฌ์—์„œ ์–‘์„ฑ์ด ๋‚˜์˜ฌ ํ™•๋ฅ .
      • P(B|A): 99%. ๊ฒ€์‚ฌ ๊ฒฐ๊ณผ์˜ ์‹ ๋ขฐ๋„๋Š” 99%์ด๋‹ค.
    • P(A|B)
      = P(B|A)P(A)/P(B)
      = 0.99 X 0.01/(0.99 X 0.01 + 0.01 X 0.99)
      = 0.5

reference

Naive Bayes Classifier

  • Bayes Theorem์—์„œ ์‚ฌ๊ฑด A์™€ B๊ฐ€ ๋…๋ฆฝ์ด ์•„๋‹ˆ๋”๋ผ๋„, ๋…๋ฆฝ์ด๋ผ ๊ฐ€์ •ํ•˜๊ณ  Bayes Theorem์„ ์ด์šฉํ•˜์—ฌ classification ์ˆ˜ํ–‰
    • ์ŠคํŒธ ๋ฌธ์„œ ํ•„ํ„ฐ๋ง ๋“ฑ ๊ฐ„๋‹จํ•œ classification์— ์ž์ฃผ ์‚ฌ์šฉ๋œ๋‹ค.
  • Naive Bayes๋Š” ๋…๋ฆฝ๋œ n๊ฐœ์˜ feature๊ฐ€ ์žˆ์„ ๋•Œ, C ์‚ฌ๊ฑด์ผ ํ™•๋ฅ ์„ ๊ตฌํ•˜๋Š” ํ™•๋ฅ  ๋ชจ๋ธ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. image
  • Bayes Theorem์„ ์ด์šฉํ•˜์—ฌ ์œ„ ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.

image image

  • posterior: ํŠน์ • ๊ฐœ์ฒด x๊ฐ€ ํŠน์ • ๊ทธ๋ฃน c์— ์†ํ•  ํ™•๋ฅ ๊ฐ’
  • prior: ํŠน์ • ๊ทธ๋ฃน c๊ฐ€ ๋ฐœํ˜„ํ•  ํ™•๋ฅ (Class Prior PRobability)
  • likelihood: ํŠน์ • ๊ทธ๋ฃน c์— ๋Œ€ํ•˜์—ฌ, ํŠน์ • ๊ฐœ์ฒด x๊ฐ€ ๊ฑฐ๊ธฐ์— ์†ํ•  ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์ด์ž likelihood. (์…€ ์ˆ˜ ์žˆ๋Š” ์‚ฌ๊ฑด์— ๋Œ€ํ•ด์„  probability distribution function๊ณผ likelihood๊ฐ€ ๋™์ผ)
  • evidence: ํŠน์ • ๊ฐœ์ฒด๊ฐ€ ๋ฐœ์ƒํ•  ํ™•๋ฅ , predictor prior probability. ๋ชจ๋“  ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋™์ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— classification ํ•  ๋•Œ๋Š” ๋ฌด์‹œํ•˜๊ธฐ๋„ ํ•จ.

image

  • Naive Bayes Classifier์˜ ํ•œ๊ณ„
    • ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ ๊ฒฐ๊ณผ๊ฐ€ 0์ด ๋‚˜์˜ค๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋„ˆ๋ฌด ๋นˆ๋ฒˆํ•˜๋‹ค. ๋ชจ๋“  ์†์„ฑ์— ๋Œ€ํ•œ class likelihood๋ฅผ ๊ณฑํ•œ๋‹ค๋Š” ํŠน์„ฑ ๋•Œ๋ฌธ์—, ํ•œ ์†์„ฑ์ด๋ผ๋„ class likelihood๊ฐ€ 0์ด ๋˜๋ฉด ๊ฒฐ๊ณผ ํ™•๋ฅ ์ด ๋ฌด์กฐ๊ฑด 0์ด ๋œ๋‹ค.
      • class likelihood๊ฐ€ 0์ด ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋ถ„์ž์— prior probabilty X bias๋ฅผ, ๋ถ„๋ชจ์— bias๋ฅผ ๋”ํ•ด์ค€๋‹ค. bias๊ฐ€ ํด์ˆ˜๋ก prior probability๋ฅผ ๋” ์‹ ๋ขฐํ•˜๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ๋‹ค.
    • ์ข…์†์„ฑ์ด ๋งค์šฐ ๋†’์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์ง€๋Š” ๊ฒฝ์šฐ, ๊ฒฐ๊ณผ๊ฐ€ ๋ฐ”๋žŒ์งํ•˜๋‹ค๊ณ  ํ™•์‹ ํ•  ์ˆ˜ ์—†๋‹ค. Naive Bayes๋Š” ๋ฐ์ดํ„ฐ ๊ฐ„์— ์ข…์†์„ฑ์ด ์—†๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

Example: Spam filter

  • Naive Bayes classifier๋ฅผ ์ŠคํŒธ ํ•„ํ„ฐ๋ง์— ์‚ฌ์šฉํ•ด ๋ณด์ž.
  • ์˜ˆ์‹œ ๋ฌธ์žฅ: "๊ธ‰์ „ ๋Œ€์ถœ ์ƒ๋‹ด"
    • P(๊ธ‰์ „|์ŠคํŒธ) = 0.1, P(๋Œ€์ถœ|์ŠคํŒธ) = 0.2, P(์ƒ๋‹ด|์ŠคํŒธ) = 0.1
    • P(๊ธ‰์ „|HAM) = 0.01, P(๋Œ€์ถœ|ham) = 0.05, P(์ƒ๋‹ด|ham) = 0.2
  • P(์ŠคํŒธ|"๊ธ‰์ „ ๋Œ€์ถœ ์ƒ๋‹ด") = P(์ŠคํŒธ)P(๊ธ‰์ „|์ŠคํŒธ)P(๋Œ€์ถœ|์ŠคํŒธ)P(์ƒ๋‹ด|์ŠคํŒธ) = P(์ŠคํŒธ) * 0.1 * 0.2 * 0.1 = 0.002 P(์ŠคํŒธ)
  • P(ham|"๊ธ‰์ „ ๋Œ€์ถœ ์ƒ๋‹ด") = P(ham) * 0.01 * 0.05 * 0.02 = 0.00001 P(ham)
  • P(ham) = P(spam)์ด๋ผ๊ณ  ๊ฐ€์ •ํ•˜๋ฉด, "๊ธ‰์ „ ๋Œ€์ถœ ์ƒ๋‹ด"์ด ์ŠคํŒธ์ผ ๊ฐ€๋Šฅ์„ฑ์ด ๋” ํฌ๋‹ค.
    • P("๊ธ‰์ „ ๋Œ€์ถœ ์ƒ๋‹ด")์€ spam, ham์ธ ๊ฒฝ์šฐ ๋ชจ๋‘ ๊ฐ™์œผ๋ฏ€๋กœ ์ƒ๋žตํ•œ๋‹ค.

Reference

โš ๏ธ **GitHub.com Fallback** โš ๏ธ