Interpretable Machine Learning - newlife-js/Wiki GitHub Wiki

Interpretable Machine Learning

by Christoph Molnar (๋ฒˆ์—ญ : TooTouch)

Interpretable Machine Learning์˜ ์ค‘์š”์„ฑ

  • ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ์‹ ๋ขฐํ•˜์ง€ ๋ชปํ•˜๋Š” ์ด์œ : accuracy์™€ ๊ฐ™์€ ๋‹จ์ผ ํ‰๊ฐ€ ์ง€ํ‘œ๋Š” ํ˜„์‹ค๋ฌธ์ œ์— ์‚ฌ์šฉํ•˜๊ธฐ์— ๋ถˆ์™„์ „ํ•œ ์ง€ํ‘œ์ด๊ธฐ ๋•Œ๋ฌธ
  • ์˜ˆ์ธก๋งŒ์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†๋Š” ๋ฌธ์ œ์— ๋” ์ค‘์š”

Taxonomy(๋ถ„๋ฅ˜)

Intrinsic: shallow decision tree์™€ ๊ฐ™์€ sparse ์„ ํ˜• ๋ชจ๋ธ๊ณผ ๊ฐ™์€ ๋‹จ์ˆœํ•œ ๊ตฌ์กฐ๋กœ ์ธํ•ด ํ•ด์„ ๊ฐ€๋Šฅ
Post Hoc: ๋ชจ๋ธ ํ•™์Šต ํ›„ ํ•ด์„ ๋ฐฉ๋ฒ• ์ ์šฉ(Permutation Feature Importance)

Scope of Interpretability

์ „์ฒด๋ก ์  ํ•ด์„๊ฐ€๋Šฅ์„ฑ: Hyperplane๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋กœ ํ•ด์„.. (3์ฐจ์› ์ด์ƒ์€ ์ธ๊ฐ„์˜ ์ƒ์ƒ๋ฒ”์œ„ ๋ฐ–)
๋ชจ๋“ˆ ์ˆ˜์ค€์—์„œ ์ „์ฒด๋ก ์  ํ•ด์„๊ฐ€๋Šฅ์„ฑ: ๋‹จ์ผ๊ฐ€์ค‘์น˜๋กœ ํ•ด์„(์„ ํ˜•๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜)
๋‹จ์ผ ์˜ˆ์ธก์น˜์— ๋Œ€ํ•œ ์ง€์—ญ์  ํ•ด์„๊ฐ€๋Šฅ์„ฑ: ํ•˜๋‚˜์˜ x๋กœ ์˜ˆ์ธก

Evaluation of Interpretability

์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ˆ˜์ค€: ์ „๋ฌธ์ ์ธ ์‹ค์ œ ์‚ฌ์šฉ์ž์— ์˜ํ•ด ํ‰๊ฐ€๋ฐ›๋Š” ๊ฒƒ
์ธ๊ฐ„ ์ˆ˜์ค€: ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ˆ˜์ค€์„ ๋‹จ์ˆœํ™”ํ•˜์—ฌ ์ „๋ฌธ๊ฐ€๊ฐ€ ์•„๋‹Œ ์‚ฌ์šฉ์ž๊ฐ€ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ ๊ธฐ๋Šฅ ์ˆ˜์ค€: ๋ชจ๋ธ์˜ ํด๋ž˜์Šค๊ฐ€ ์ด๋ฏธ ์ธ๊ฐ„ ์ˆ˜์ค€์˜ ํ‰๊ฐ€์—์„œ ํ‰๊ฐ€๋œ ๊ฒฝ์šฐ(์งง์€ ํŠธ๋ฆฌ์ผ์ˆ˜๋ก ์„ค๋ช…๋ ฅ์ด ๋†’๋‹ค ๋“ฑ)


Interpretable Models


1. Linear Regression

  • ๊ฐ€์ค‘์น˜๋“ค๊ณผ feature๊ฐ’์˜ ๊ณฑ์œผ๋กœ ์˜ˆ์ธก๊ฐ’์— ๋Œ€ํ•œ ๊ธฐ์—ฌ๋„๋ฅผ ์„ค๋ช…
  • ๋น„์„ ํ˜•์„ฑ์ด๋‚˜ ๊ตํ˜ธ์ž‘์šฉ์ด ๋งŽ์€ ๊ฒฝ์šฐ์—๋Š” ์ ์ ˆ์น˜ ๋ชปํ•จ
  • ์„ ํ˜•์„ฑ์€ ์„ค๋ช…์„ ๋” ์ผ๋ฐ˜์ ์ด๊ณ  ๊ฐ„๋‹จํ•˜๊ฒŒ ๋งŒ๋“ฆ

โ–  ๋„ˆ๋ฌด ๋งŽ์€ feature๋“ค์ด ์กด์žฌํ•  ๊ฒฝ์šฐ์—๋Š” ์„ ํ˜• ๋ชจ๋ธ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์—†์Œ. -> sparsity๋ฅผ ์ ์šฉ

1) Lasso: ํฐ ๊ฐ€์ค‘์น˜ ํ•ฉ์— ํŽ˜๋„ํ‹ฐ๋ฅผ ๋ถ€๊ณผํ•ด, ๋งŽ์€ feature๋“ค์˜ ๊ฐ€์ค‘์น˜๋“ค์„ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ์ •๊ทœํ™” ๋ฐฉ๋ฒ•
2) ์ „์ฒ˜๋ฆฌ
  • ์ „๋ฌธ์ ์ธ ์ง€์‹์„ ์ด์šฉํ•ด ์ˆ˜๋™์ ์œผ๋กœ feature ์„ ํƒ
  • feature๊ณผ ๋ชฉํ‘œ๊ฐ’ ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์ž„๊ณ„๊ฐ’์„ ๋„˜๋Š” ๊ฒฝ์šฐ ์„ ํƒ(feature์ด ์„œ๋กœ ๋…๋ฆฝ์ ์ด๋ผ๊ณ  ๊ฐ€์ •)
3) ๋‹จ๊ณ„์ (step-wise)
  • Forward Selection: feature ํ•˜๋‚˜๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์„œ, ๊ฐ€์žฅ ์ข‹์€ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” feature๋“ค์„ ํ•˜๋‚˜์”ฉ ์ถ”๊ฐ€
  • Backward Selection: ๋ชจ๋“  feature๋ฅผ ๋„ฃ์€ ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์„œ, ๊ฐ€์žฅ ์ข‹์€ ๋ชจ๋ธ์„ ๋งŒ๋“ค๋„๋ก feature๋ฅผ ํ•˜๋‚˜์”ฉ ์ œ๊ฑฐ
์žฅ์ : ๋งŽ์€ ๋ถ„์•ผ์—์„œ ์‚ฌ์šฉ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋†’์€ ์ˆ˜์ค€์˜ ๊ฒฝํ—˜๊ณผ ์ „๋ฌธ์ง€์‹์ด ์žˆ๊ณ , ์ตœ์ ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ํ™•์‹คํžˆ ์•Œ ์ˆ˜ ์žˆ์Œ.
๋‹จ์ : ์˜ค์ง ์„ ํ˜• ๊ด€๊ณ„๋งŒ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Œ(๋น„์„ ํ˜•์„ฑ/๊ตํ˜ธ์ž‘์šฉ x)

2. Logistic Regression

  • feature๋“ค์˜ ์„ ํ˜•๊ฒฐํ•ฉ์— ๋กœ์ง€์Šคํ‹ฑํ•จ์ˆ˜๋ฅผ ์ ์šฉ
  • ๊ฐ€์ค‘์น˜ ๋Œ€์‹  odds ratio๋กœ ํ•ด์„ํ•จ
์žฅ์ : ์„ ํ˜• ํšŒ๊ท€์˜ ์žฅ์ ๊ณผ ๊ฐ™์Œ + ํ™•๋ฅ ๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ
๋‹จ์ : ๊ตํ˜ธ์ž‘์šฉ x, ํ•˜๋‚˜์˜ feature๊ฐ€ ๋‘ ํด๋ž˜์Šค๋กœ ์™„์ „ํžˆ ๋ถ„๋ฆฌํ•œ๋‹ค๋ฉด ํ•™์Šต๋˜์ง€ ์•Š์Œ

3. GLM(Generalized Linear Model), GAM(Generalized Additive Model)

GLM: ๋ชจ๋“  ๊ฒฐ๊ณผ๊ฐ’์˜ ์œ ํ˜•(๋น„๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ, ์Œ์ˆ˜๊ฐ€ ์—†๋Š” ์œ ํ˜• ๋“ฑ)์„ ๋ชจ๋ธ๋งํ•˜๊ธฐ ์œ„ํ•ด ํ™•์žฅ์‹œํ‚จ ์„ ํ˜• ๋ชจ๋ธ
์„ ํ˜•๋ธ๊ณผ ๊ธฐ๋Œ€๊ฐ’์„ ๋น„์„ ํ˜• ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด์„œ ์—ฐ๊ฒฐ

GAM: ๊ฐ€์ค‘์น˜ ํ•ฉ์ด ์•„๋‹Œ, ๊ฐ๊ฐ์˜ feature๋ณ„๋กœ ์ž„์˜์˜ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•œ ๊ฐ’์˜ ํ•ฉ์œผ๋กœ ๋ชจ๋ธ๋ง(๋น„์„ ํ˜• ํ•ด๊ฒฐ ์œ„ํ•ด)

์žฅ์ : ๋งŽ์€ ๊ณณ์—์„œ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์Œ, ํ•ด์„์„ฑ์„ ์ผ๋ถ€ ์œ ์ง€ํ•˜๋ฉด์„œ ์œ ์—ฐํ•œ ๋ชจ๋ธ๋กœ ์›ํ™œํ•˜๊ฒŒ ์ „ํ™˜ ๊ฐ€๋Šฅ
๋‹จ์ : ์„ ํ˜• ๋ชจ๋ธ์˜ ๋„ˆ๋ฌด ๋งŽ์€ ๋ถ€๋ถ„์„ ์ˆ˜์ •ํ•˜๋ฉด ๋ชจ๋ธ์„ ํ•ด์„ํ•  ์ˆ˜ ์—†๊ฒŒ ๋จ.

4. Decision Tree

  • impurity๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ feature์„ ๊ตฌ๋ถ„ํ•˜๋Š” ํŠธ๋ฆฌ๋ฅผ ๊ตฌ์„ฑ
  • ๋ถ„ํ•  ํ›„ ๊ฐ์†Œํ•œ impurity์˜ ์ •๋„๋ฅผ feature ์ค‘์š”๋„๋กœ ์‚ฌ์šฉ
์žฅ์ : feature ๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ ๊ณ ๋ คํ•˜๋Š”๋ฐ ์ ํ•ฉ, ํ•ด์„๊ณผ ์‹œ๊ฐํ™”๊ฐ€ ๊ฐ„๋‹จ.
๋‹จ์ : ์„ ํ˜• ๊ด€๊ณ„๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์—†์Œ. ํŠธ๋ฆฌ๊ฐ€ ์•ˆ์ •์ ์ด์ง€ ์•Š์Œ(๋ฐ์ดํ„ฐ์…‹, ์ฒซ ๋ถ„ํ•  feature์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง). ํŠธ๋ฆฌ๊ฐ€ ๊นŠ์„์ˆ˜๋ก ํ•ด์„ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค.

5. Decision Rules

  • If-Then ๊ตฌ์กฐ๋กœ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ์‹ โ–  support : ๊ทœ์น™ ์กฐ๊ฑด์ด ์ ์šฉ๋˜๋Š” ๊ด€์ธก์ง€์˜ ๋ฐฑ๋ถ„์œจ(์ง€์›ํ•˜๋Š” ๋ฒ”์œ„) โ–  accuracy: ๊ทœ์น™์ด ์˜ฌ๋ฐ”๋ฅธ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋น„์œจ

1) OneR: ๋‹จ์ผ feature์—์„œ ๊ทœ์น™์„ ํ•™์Šต

  • ์ ์ ˆํ•œ ๊ฐ„๊ฒฉ์„ ์„ ํƒํ•˜์—ฌ ์—ฐ์†ํ˜• feature๋ฅผ ๋ฒ”์ฃผํ™”
  • feature์™€ ๊ฒฐ๊ณผ ์‚ฌ์ด์— ๊ต์ฐจ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์–ด ๊ฐ€์žฅ ์˜ค๋ฅ˜๊ฐ€ ์ ์€ feature๋ฅผ ์„ ํƒ

2) Sequential covering: ์ „์ฒด ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ ๊ทœ์น™์„ ํฌํ•จํ•˜๋Š” ์˜์‚ฌ๊ฒฐ์ • ๋ชฉ๋ก์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๋‹จ์ผ ๊ทœ์น™์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ํ•™์Šต

  • ๊ทœ์น™ 1๋กœ ํ•™์Šตํ•˜๊ณ , ๊ทœ์น™ 1์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ง€์ ์„ ์ œ๊ฑฐํ•œ ํ›„, ๋‚˜๋จธ์ง€ ๋ฐ์ดํ„ฐ๋กœ ๊ทธ ๋‹ค์Œ ๊ทœ์น™ 2๋ฅผ ํ•™์Šตํ•œ๋‹ค.

3) Bayesian Rule Lists

  • ๋ฐ์ดํ„ฐ์—์„œ ๋นˆ๋ฒˆํ•œ ํŒจํ„ฑ์„ ๋ฏธ๋ฆฌ ํŒŒ์•…
  • ๋ฏธ๋ฆฌ ํ™•์ธ๋œ ๊ทœ์น™์˜ ์„ ํƒ ํ•ญ๋ชฉ์—์„œ ์˜์‚ฌ๊ฒฐ์ • ๋ชฉ๋ก์„ ํ•™์Šต
์žฅ์ : If-Then ๊ทœ์น™์€ ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์›€, ์˜ˆ์ธก ์†๋„๋„ ๋น ๋ฆ„.
๋‹จ์ : ํšŒ๊ท€๋ฅผ ์™„์ „ํžˆ ๋ฌด์‹œ.feature์„ ๋ฒ”์ฃผํ™”ํ•ด์•ผ ํ•จ. feature๊ณผ ์ถœ๋ ฅ ๊ฐ„์˜ ์„ ํ˜•๊ด€๊ณ„ ์„ค๋ช…ํ•˜๊ธฐ ์–ด๋ ค์›€.

6. RuleFit

  • ์˜์‚ฌ๊ฒฐ์ • ๊ทœ์น™์˜ ํ˜•ํƒœ๋กœ ์ž๋™์œผ๋กœ ํƒ์ง€๋œ ์ƒํ˜ธ์ž‘์šฉ ํšจ๊ณผ๋ฅผ ํฌํ•จํ•˜๋Š” ํฌ์†Œ ์„ ํ˜• ๋ชจ๋ธ์„ ํ•™์Šต
  • ์˜์‚ฌ ๊ฒฐ์ • ํŠธ๋ฆฌ์—์„œ ์ƒํ˜ธ์ž‘์šฉ์„ ๊ณ ๋ คํ•œ ์ƒˆ feature์„ ์ž๋™์œผ๋กœ ์ƒ์„ฑ
  • ์•™์ƒ๋ธ” ๋“ฑ์„ ์ด์šฉํ•ด์„œ ์ตœ๋Œ€ํ•œ ๋งŒ์€ ๊ทœ์น™์„ ๋งŒ๋“ฆ(๋…ธ๋“œ์˜ ์˜ˆ์ธก๊ฐ’์€ ๋ฒ„๋ฆฌ๊ณ , ๋ถ„ํ•  ์กฐ๊ฑด๋งŒ ์‚ฌ์šฉ)
  • ๋งŒ๋“ค์–ด์ง„ ๊ทœ์น™๋“ค๊ณผ ๊ธฐ์กด feature์„ ์‚ฌ์šฉํ•ด ํฌ์†Œ ์„ ํ˜• ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด ๊ฐ€์ค‘์น˜ ์ถ”์ •์น˜๋ฅผ ์–ป์Œ
  • Lasso ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜์— ์„ ํ˜• ํ•ญ์˜ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ณฑํ•ด์„œ feature ์ค‘์š”๋„๋ฅผ ์–ป์Œ
์˜ˆ์‹œ

์žฅ์ : feature ์ƒํ˜ธ์ž‘์šฉ์„ ์ž๋™์œผ๋กœ ์ถ”๊ฐ€(๋น„์„ ํ˜• ๊ด€๊ณ„ ๋ชจ๋ธ๋ง), ๋ถ„๋ฅ˜ ๋ฐ ํšŒ๊ท€ ๋ชจ๋‘ ์ปค๋ฒ„ ๊ฐ€๋Šฅ, ๋กœ์ปฌ ํ•ด์„์„ฑ ํ–ฅ์ƒ(๊ฐœ๋ณ„ ๊ด€์ธก์น˜์—๋Š” ์†Œ์ˆ˜์˜ ๊ทœ์น™๋งŒ ์ ์šฉ)
๋‹จ์ : ๋งŽ์€ ๊ทœ์น™์„ ๋งŒ๋“ค์–ด์„œ ํ•ด์„์„ฑ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Œ.

Model-Agnostic Methods

  • ๋ชจ๋ธ์˜ ํ•™์Šต๊ณผ ์„ค๋ช…์„ ๋ถ„๋ฆฌ์‹œ์ผœ, ํ•™์Šต์˜ ์ข…๋ฅ˜์— ์ œํ•œ๋˜์ง€ ์•Š์€ ์„ค๋ช…์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ํ•ด์„ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ๋งŒ์„ ์‚ฌ์šฉํ•˜๊ธฐ์—๋Š” ์„ฑ๋Šฅ์ด ๋–จ์–ด์ ธ์„œ...
  • Model flexibility(์–ด๋А ๋ชจ๋ธ์ด๋“  ์ ์šฉ ๊ฐ€๋Šฅ), Explanation flexibility(ํŠน์ • form์˜ ์„ค๋ช…์— ๊ตญํ•œ๋˜์ง€ ์•Š์Œ), Representation flexibility(์„ค๋ช…ํ•˜๋Š” ๋ชจ๋ธ ๋ณ„๋กœ ๋‹ค๋ฅธ feature representation์„ ์‚ฌ์šฉ)

โ–  Example-based Explanation: ๋ชจ๋ธ์„ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ํŠน์ • dataset์„ ์„ ํƒ(model-agnostic์—์„œ๋Š” feature์˜ summary๋ฅผ create)


Global Model-Agnostic Methods

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ average behavior๋ฅผ describeํ•˜๋Š” ๋ฐฉ๋ฒ•

1) Partial Dependence Plot(PDP)

1,2๊ฐ€์ง€ feature๊ฐ€ ์˜ˆ์ธก ๊ฒฐ๊ณผ์— ๋ฏธ์น˜๋Š” marginal effect๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๋ฐฉ๋ฒ•
S๋Š” ๊ด€์‹ฌ ์žˆ๋Š” feature์˜ ์ง‘ํ•ฉ(1~2๊ฐœ)์ด๋ฉฐ, C๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์— ์‚ฌ์šฉ๋œ ๋‹ค๋ฅธ feature๋“ค(S์˜ feature๋“ค๊ณผ๋Š” ์ƒ๊ด€๊ด€๊ณ„ ์—†๋‹ค๋Š” ๊ฐ€์ •)
PDP๋Š” training set์— ๋Œ€ํ•˜์—ฌ S์˜ ๊ฐ’์— ๋”ฐ๋ผ ์ƒ์„ฑ๋˜๋Š” ๊ฒฐ๊ณผ์˜ ํ‰๊ท ์„ ๊ทธ๋ฆผ

Feature Importance

PDP์˜ average curve๋กœ๋ถ€ํ„ฐ์˜ deviation์ด ํด์ˆ˜๋ก ์ค‘์š”๋„๊ฐ€ ๋†’์Œ

์žฅ์ : ์ง๊ด€์ , ํ•ด์„์ด ๋ช…ํ™•, ๊ตฌํ˜„ํ•˜๊ธฐ ์‰ฌ์›€
๋‹จ์ : ํ˜„์‹ค์ ์ธ ์ตœ๋Œ€ feature ์ˆ˜๊ฐ€ 2๊ฐœ(3D๊นŒ์ง€๋ฐ–์— ํ‘œํ˜„์ด ์•ˆ๋˜๋ฏ€๋กœ), ๋…๋ฆฝ์„ฑ ๊ฐ€์ •์ด ํ•„์š”, ํ‰๊ท ์œผ๋กœ ์ธํ•œ ํŠน์ด๊ฐ’ ํšจ๊ณผ ์ˆจ๊ฒจ์ง

2) Accumulated Local Effects(ALE) Plot

feature๋“ค์ด ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ ์˜ˆ์ธก์— ํ‰๊ท ์ ์œผ๋กœ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋ƒ„.
PDP๋ณด๋‹ค ๋น ๋ฅด๊ณ  unbiased(๋ณ€์ˆ˜ ๊ฐ„ ์ƒ๊ด€๊ด€๊ณ„ ๊ณ ๋ ค)๋˜์–ด ์žˆ๋‹ค.

  • Grid๋ฅผ Window๋กœ ๋‚˜๋ˆ„์–ด์„œ, Window ๋‚ด์˜ ์˜ˆ์ธก๊ฐ’์˜ ์ฐจ์ด๋ฅผ ํ‰๊ท ๋‚ด์–ด์„œ grid์— ๋”ฐ๋ผ accumulate ํ•œ๋‹ค.
์žฅ์ : Unbiased, Faster, ํ•ด์„์ด ๋ช…ํ™•(zero-centered)
๋‹จ์ : ๊ฐ„๊ฒฉ ์„ค์ • ์–ด๋ ค์›€(๊ฐ„๊ฒฉ ๋งŽ์œผ๋ฉด ๋ถˆ์•ˆ์ •, ์ ์œผ๋ฉด ๋ถ€์ •ํ™•)

3) Feature Interaction

Feature๊ฐ„ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•จ
โ–  H-statistics: ๋‘ feature๊ฐ„ or ํ•œ feature๊ณผ ๋‚˜๋จธ์ง€ feature๋“ค์˜ interaction์„ partial dependence๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธก์ •ํ•œ ํ†ต๊ณ„๋Ÿ‰

์žฅ์ : PD๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•จ, ์˜๋ฏธ์žˆ๋Š” ํ•ด์„๋ ฅ์„ ๊ฐ–๋Š”๋‹ค, ์ฐจ์›์ด ์—†์–ด์„œ feature๊ฐ„/๋ชจ๋ธ๊ฐ„ ๋น„๊ต ๊ฐ€๋Šฅ, ๋ชจ๋“  ์ข…๋ฅ˜์˜ interaction ํƒ์ง€
๋‹จ์ : ์—ฐ์‚ฐ๋Ÿ‰ ๋งŽ์Œ, ์ƒ˜ํ”Œ์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ ๋‹ฌ๋ผ์ง

4) Functional Decomposition

๊ณ ์ฐจ์› ํ•จ์ˆ˜๋ฅผ ๊ฐ๊ฐ์˜ feature effect์™€ interaction effect์˜ ํ•ฉ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ.

โ–  (Generalized) Functional ANOVA โ–  ALE โ–  Statistical Regression Models


5) Permutation Feature Importance

Feature์˜ ๊ฐ’์„ permuteํ•จ์— ๋”ฐ๋ผ ๋ณ€ํ•˜๋Š” prediction error์˜ ์ฆ๊ฐ€๋ฅผ ์‚ฌ์šฉ

  • Training data๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•œ ๋’ค, test data์— permutation ์ ์šฉํ•˜์—ฌ feature importance ๋„์ถœ ์˜ˆ) image
์žฅ์ : ํ•ด์„๋ ฅ ์ข‹์Œ, error ratio๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋–„๋ฌธ์— ๋‹ค๋ฅธ ๋ฌธ์ œ๋“ค๋ผ๋ฆฌ ๋น„๊ต ๊ฐ€๋Šฅ, feature๊ฐ„ interaction ๊ณ ๋ ค
๋‹จ์ : unlabeled data์—๋Š” ์ ์šฉ ๋ถˆ๊ฐ€, correlated๋œ feature์ด ์žˆ์œผ๋ฉด unrealistic data instance์— ์˜ํ•ด biased๋  ์ˆ˜ ์žˆ์Œ, correlated feature์„ ์ถ”๊ฐ€ํ•˜๋ฉด ๊ด€๋ จ๋œ feature์˜ importance๊ฐ€ ์ค„์–ด๋“ค ์ˆ˜ ์ž‡์Œ

6) Global Surrogate

black box model์˜ ์˜ˆ์ธก์— ๊ทผ์‚ฌํ•˜๋Š” ์˜ˆ์ธก๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ

  • black box model์— ์‚ฌ์šฉํ•œ dataset์„ X๋กœ, black box model์˜ ์˜ˆ์ธก์„ y๋กœ ํ•ด์„œ linear model์ด๋‚˜ decision tree ๊ฐ™์€ ํ•ด์„๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์„ ํ•™์Šต
  • r^2๋กœ black box model๊ณผ surrogate model์˜ ์˜ˆ์ธก์˜ ์œ ์‚ฌ์„ฑ์„ ์ธก์ • ์˜ˆ) image
์žฅ์ : flexible(์–ด๋–ค ํ•ด์„๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์ด๋“  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ), ์ง๊ด€์ , r^2๋กœ surrogate ๋ชจ๋ธ์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ๊ทผ์‚ฌํ•˜๋Š”์ง€๋ฅผ ์ธก์ •ํ•  ์ˆ˜ ์žˆ์Œ
๋‹จ์ : ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ model์— ๋Œ€ํ•œ ๊ฒฐ๋ก ๋งŒ ๋‚ด๋ฆด ์ˆ˜ ์ž‡์Œ, ์–ด๋А ์ •๋„์˜ r^2๊ฐ€ ์ข‹์€ ๊ฑด์ง€๊ฐ€ ๋ถˆ๋ช…ํ™•

7) Prototype and Criticism

Prototype: ๋ชจ๋“  data๋ฅผ ์ž˜ ๋Œ€ํ‘œํ•˜๋Š” data instance
Criticism: prototype์— ์˜ํ•ด ๋Œ€ํ‘œ๋˜์ง€ ๋ชปํ•˜๋Š” data instance
๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ๋•๊ณ , ํ•ด์„๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ฑฐ๋‚˜, black box model์„ ํ•ด์„ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋จ image image

โ–  MMD-critic: prototype๊ณผ ์‹ค๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ๋น„๊ตํ•˜์—ฌ, ๊ดด๋ฆฌ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” prototype์„ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•

  • prototype๊ณผ criticism์˜ ๊ฐฏ์ˆ˜๋ฅผ ์„ ์ •
  • greedy search๋กœ prototype ์ฐพ๊ธฐ
  • greedy search๋กœ criticism ์ฐพ๊ธฐ
  • data density๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•œ kernel function์„ ์ด์šฉํ•˜์—ฌ ๋‘ ๋ถ„ํฌ์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” witness function์„ ์‚ฌ์šฉ

Local Model-Agnostic Methods

Individual Prediction์— ๋Œ€ํ•œ ์„ค๋ช…์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•

1) Individual Conditional Expectation(ICE)

ํ•œ feature์˜ ๋ณ€ํ™”์— ๋”ฐ๋ผ ๊ฐ๊ฐ์˜ instance์˜ ์˜ˆ์ธก์ด ์–ด๋–ป๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š”์ง€๋ฅผ line์œผ๋กœ ๋‚˜ํƒ€๋ƒ„
PDP๋Š” ICE๋“ค์˜ ํ‰๊ท ์ด๋ผ๊ณ  ๋ณด๋ฉด ๋จ.

  • centered ICE Plot: ๊ฐ๊ฐ์˜ prediction์˜ ์ฐจ์ด๊ฐ€ ์‹œ์ž‘์ ์ด ๋‹ค๋ฅธ ๊ฒƒ ๋•Œ๋ฌธ์ผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์‹œ์ž‘์ ์„ ์ผ์น˜์‹œํ‚ด. image

  • Derivative ICE Plot: ๋ณ€ํ™”์˜ ๋ฐฉํ–ฅ๊ณผ feature์˜ range ํŒŒ์•…์ด ์‰ฌ์›€. image

์žฅ์ : ์ดํ•ด๊ฐ€ ์‰ฝ๊ณ  ์ง๊ด€์ , heterogeneous relationship ์ฐพ์„ ์ˆ˜ ์žˆ์Œ.
๋‹จ์ : ํ•˜๋‚˜์˜ feature๋งŒ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Œ, correlated๋œ feature์ผ ๊ฒฝ์šฐ ์–ด๋ ค์›€, ๋„ˆ๋ฌด ๋งŽ์€ line์ด ์žˆ์„ ์ˆ˜ ์žˆ์Œ, ํ‰๊ท ์„ ๋ณด๊ธฐ ์–ด๋ ค์›€.

2) Local Interpretable Model-agnostic Explanations(LIME)

Individual prediction์„ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•œ local surrogate model์„ ํ•™์Šต

  • ์ƒˆ๋กœ์šด dataset(๊ด€์‹ฌ instance์— perturbed sample ํฌํ•จ) ๋งŒ๋“ค์–ด black box model์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šต
  • ๊ด€์‹ฌ instance์— ๊ฐ€๊นŒ์šด new sample์— ๋” ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ์คŒ
  • Local approximation์—๋งŒ ์ดˆ์ ์„ ๋‘๊ณ , global approximation์„ ์ž˜ํ•  ํ•„์š”๋Š” ์—†์Œ
  • Loss(black box model์˜ prediction๊ณผ์˜ ์ฐจ์ด)๋ฅผ ์ตœ์†Œ๋กœ ํ•˜๋Š” ๋ฐ ์ดˆ์ ์„ ๋‘๋ฉด์„œ model complexity์— ์ œํ•œ์„ ๋‘ 
  • neighborhood์˜ ํฌ๊ธฐ๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” kernel width ์กฐ์ ˆ ์ž˜ํ•ด์•ผ ํ•จ image

์˜๋ฌธ) ๊ธฐ์กด ๋ชจ๋ธ์ด perturbed sample๋„ ์ž˜ ์˜ˆ์ธกํ•œ๋‹ค๋Š” ๊ฐ€์ •์ด ์žˆ์–ด์•ผ ํ•˜๋Š” ๊ฒƒ์ธ๊ฐ€?
๊ด€์‹ฌ instance๋Š” ์–ด๋–ป๊ฒŒ ์„ ์ •ํ•˜๋Š”๊ฐ€?

์žฅ์ : black box model์˜ ์ข…๋ฅ˜์™€ ์ƒ๊ด€์—†์ด ์ ์šฉ ๊ฐ€๋Šฅ, ๊ฐ„๊ฒฐํ•˜๊ณ  ์ง๊ด€์ ์ธ ์„ค๋ช… ๊ฐ€๋Šฅ, tablular, text, image ๋ฐ์ดํ„ฐ ๋ชจ๋‘ ์ ์šฉ ๊ฐ€๋Šฅ, fidelity measure ๊ฐ€๋Šฅ, ํŒจํ‚ค์ง€ ์ž˜ ๋˜์–ด ์žˆ์Œ,
๋‹จ์ : neighborhood์˜ ํฌ๊ธฐ๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” kernel width ์„ ์ •์ด ์–ด๋ ค์›€, Gaussian distribution์— ์˜ํ•œ sampling์˜ ํ•œ๊ณ„(ex: correlated feature), model complexity ๋ฏธ๋ฆฌ ์ •ํ•ด์•ผ ํ•จ, instability of the explanations

3) Counterfactual Explanation

predefined output์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ๋ณ€ํ™”(ex: Y->N / 90->100)์‹œํ‚ค๋Š” ๊ฐ€์žฅ ์ž‘์€ ๋ณ€ํ™”๊ฐ€ ํ•„์š”ํ•œ feature๋ฅผ ์„ค๋ช…

  • feature value ๋ณ€ํ™”์˜ ํฌ๊ธฐ๋ฅผ ์ž‘๊ฒŒ ํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜์ง€๋งŒ, ๋ณ€ํ™”ํ•˜๋Š” feature ๊ฐฏ์ˆ˜๋„ ์ž‘์•„์•ผ ํ•จ
  • ๋‹ค์ˆ˜์˜ counterfactual instance๋ฅผ generateํ•˜๋Š” ๊ฒŒ ๋ฐ”๋žŒ์งํ•  ๋•Œ๋„ ์žˆ์Œ
  • ๊ทธ๋Ÿด ๋“ฏํ•œ(ํ˜„์‹ค์ ์ธ) counterfactual instance๋ฅผ generateํ•ด์•ผ ํ•จ

โ–  Generating Counterfactual Explanations (1) Minimizing Loss by Watchter:
desired outcome๊ณผ couterfactual์˜ prediction์˜ ์ฐจ์ด

  • ์‹ค์ œ point์™€ counterfactual point ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ
    image image
  • ๋‹จ์ : ์ ์€ ์ˆ˜์˜ feature๋งŒ ๊ตฌํ•จ, categorial feature๋Š” ๋‹ค๋ฃจ๊ธฐ ์–ด๋ ค์›€

(2) Minimizing 4 Loss by Dandl:
desired outcome๊ณผ couterfactual์˜ prediction์˜ ์ฐจ์ด

  • ์‹ค์ œ point์™€ counterfactual point ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ(Gower distance)
  • ๋ณ€ํ™”๋œ feature์˜ ๊ฐฏ์ˆ˜ + likely feature values/combinations๋ฅผ ๊ฐ€์ง„ counterfactual image
์žฅ์ : ํ•ด์„์ด ๋ช…ํ™•, ์ƒˆ๋กœ์šด counterfactual๋ฅผ ๋งŒ๋“ค๊ฑฐ๋‚˜ / ๊ธฐ์กด dataset ์•ˆ์—์„œ outcome์ด ๋ณ€ํ•˜๊ฒŒ ๋งŒ๋“  feature๋ฅผ ๋ฝ‘๊ฑฐ๋‚˜ ๋‘˜ ๋‹ค ๊ฐ€๋Šฅ, data๋‚˜ model์— ์ƒ๊ด€์—†์ด prediction function์—๋งŒ ์ ‘๊ทผ์„ ์š”๊ตฌ(๋ฌด์Šจ ๋ง์ธ์ง€ ๋ชจ๋ฅด๊ฒ ์Œ...), implementation ์‰ฌ์›€
๋‹จ์ : ๋‹ค์ˆ˜์˜ counterfacutal explanation์ด ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ

4) Scoped Rules(Anchor)

๋‹ค๋ฅธ feature๊ฐ€ individual prediction์— ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ํŠน์ • feature๋“ค์˜ decision rule์„ ์ฐพ๋Š” ๋ฐฉ๋ฒ•

  • ํŠน์ • coverage ์ด์ƒ์˜ input space์—์„œ ํŠน์ • threshold์˜ precision์„ ๋งŒ์กฑํ•˜๋Š” ์กฐ๊ฑด์„ ์ฐพ์Œ image

โ–  Finding Anchors

  • Candidate Generation
  • Best Candidate Identification
  • Candidate Precision Validation
  • Modified Beam Search image
์žฅ์ : ํ•ด์„์ด ์‰ฌ์›€, subsettable, non-linear ๋ชจ๋ธ์— ์œ ํšจ
๋‹จ์ : highly configurable, require discretization, many calls to the ML model

5) Sharpley Values

game theory์—์„œ ์ฐฉ์•ˆํ•˜์—ฌ, single instance(game)์— ๋Œ€ํ•ด feature values(player)๊ฐ€ gain(single prediction - average prediction)์— ๊ธฐ์—ฌํ•˜๋Š” ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐฉ๋ฒ•
Sharpley Value : average of marginal contributions to all possible coalitions

์žฅ์ : contrastive explanation ๊ฐ€๋Šฅ(์ „์ฒด dataset/subset/single data point์™€๋„ ๋น„๊ต ๊ฐ€๋Šฅ), solid theory ์žˆ์Œ
๋‹จ์ : ์—ฐ์‚ฐ๋Ÿ‰ โ†‘, ์ž˜๋ชป๋œ ํ•ด์„์˜ ๊ฐ€๋Šฅ์„ฑ(ํ•ด๋‹น feature๊ฐ€ ์ œ๊ฑฐ๋˜์—ˆ์„ ๋Œ€์˜ contribution์ด ์•„๋‹Œ, feature value๊ฐ€ average์™€์˜ ์ฐจ์ด์— ๊ธฐ์—ฌํ•˜๋Š” ์ •๋„๋ฅผ ๋‚˜ํƒ€๋ƒ„), prediction model์ด ์•„๋‹˜, feature๊ฐ€ correlated ๋˜์–ด์ž‡์œผ๋ฉด, unrealistic data๋ฅผ ํฌํ•จํ•  ์ˆ˜ ์žˆ์Œ

6) SHAP(SHapley Additive exPlanations)

kernel-based estimation approach for Shapley values image

โ–  KernelSHAP
estimates for an instance x the contributions of each feature value to the prediction image

โ–  TreeSHAP
Tree-based model์„ ์œ„ํ•œ SHAP

  • marginal expectation ๋Œ€์‹  conditional expectation์„ ์‚ฌ์šฉ
  • KernelSHAP๋ณด๋‹ค ์—ฐ์‚ฐ complexity๊ฐ€ ๋‚ฎ์Œ(TLD^2 < TL2^M, T: # of trees, L: max # of leaves, D: max depth)

์˜ˆ) Cervical Cancer image image image

์ฐธ๊ณ ) SHAP graph ํ•ด์„

์žฅ์ : solid theoretical foundation, constrastive explanations, LIME๊ณผ Shapley values๋ฅผ ์—ฐ๊ฒฐ, fast implementation for tree-based models(global model interpretations๋ฅผ ์œ„ํ•œ ์—ฐ์‚ฐ์— ์œ ๋ฆฌ)
๋‹จ์ : KernelSHAP์€ ๋А๋ฆผ, KernelSHAP์€ feature dependence๋ฅผ ๋ฌด์‹œ, TreeSHAP์€ unintuitive feature attribution์„ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ์Œ, ์ž˜๋ชป ํ•ด์„๋  ์ˆ˜ ์žˆ์Œ

Neural Network Interpretation

์œ„์˜ interpretation method๋“ค๊ณผ์˜ ์ฐจ๋ณ„์ 

  • hidden layers์˜ feature๋“ค์„ uncoverํ•  ์ˆ˜ ์žˆ๋„๋ก
  • gradient๋ฅผ interpretation์— ์ด์šฉํ•  ์ˆ˜ ์žˆ์Œ

1) Learned Features

โ–  Feature Visualization
finding the input that maximizes the activation of that unit(individual neurons, channel, entire layers)
image

  • unit์˜ activation์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” image๋ฅผ ์ฐพ๋Š” optimization problem์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋จ
    image
  • ๊ธฐ์กด ๋ฐ์ดํ„ฐ์—์„œ ์ฐพ์„ ์ˆ˜๋„ ์žˆ๊ณ , ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜๋„ ์žˆ์Œ image
  • tabular data์— ๋Œ€ํ•ด์„œ๋Š” unit์˜ activation์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” feature์˜ ์กฐํ•ฉ์„ ์ฐพ๋Š” ๋ฌธ์ œ

โ–  Network Dissection
์ฐธ๊ณ : Network Dissection CNN unit์˜ interpretability๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•
๊ฐ€์ •: Units of a neural network (like convolutional channels) learn disentangled concept

  • Broden dataset(Broadly and densely labeled data)๊ฐ€ ํ•„์š”(pixel level์˜ concepts์„ labeling ํ•ด์ค˜์•ผ ํ•จ)
  • image์—์„œ top activated area๋ฅผ ์ฐพ์•„๋‚ด activation mask๋ฅผ ๋งŒ๋“ ๋‹ค.
  • ํ•ด๋‹น activation mask์™€ ๊ฐ€์žฅ ๋งŽ์ด ์ผ์น˜ํ•˜๋Š” concept๋ฅผ ์ฐพ๋Š”๋‹ค. image
์žฅ์ : unique insight๋ฅผ ์คŒ, unit์„ concept๊ณผ ์ž๋™์œผ๋กœ ์—ฐ๊ฒฐํ•ด์คŒ, non-technical way๋กœ ์†Œํ†ต ๊ฐ€๋Šฅ, class๋ฅผ ๋„˜์–ด์„œ concept๊นŒ์ง€ detect ๊ฐ€๋Šฅ
๋‹จ์ : feature visualization image๋Š” ํ•ด์„ ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ, unit์ด ๋„ˆ๋ฌด ๋งŽ์Œ, pixel level labeled data ํ•„์š”

2) Pixel Attribution(Only for image)

classification๊ณผ ๊ด€๋ จ ์žˆ๋Š” pixel์„ highlightํ•˜๋Š” ๋ฐฉ๋ฒ•(sensitivity map, saliency map, pixel attribution map ๋“ฑ)

  • Vanilla Gradient
  • DeconvNet
  • Grad-CAM(Gradient-weighted Class Activation Map)
  • Guided Grad-CAM
  • SmoothGrad image image
์žฅ์ : explanations are visual, faster to compute than mode-agnostic methods, many methods to choose from
๋‹จ์ : explanation์ด ์ •ํ™•ํ•œ์ง€ ์•Œ๊ธฐ ์–ด๋ ค์›€, fragile, unreliable

3) Detecting Concepts

Neural network์— ์˜ํ•ด ํ•™์Šต๋œ latent space์— ๋‚ด์žฌ๋œ concept์„ detect

โ–  TCAV(Testing with Concept Activation Vectors)
concept๊ณผ class ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ฌ˜์‚ฌ(์˜ˆ: ์ค„๋ฌด๋Šฌ๊ฐ€ ์–ผ๋ฃฉ๋ง class์— ์–ด๋–ป๊ฒŒ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€) CAV: numerical representation that generalizes a concept in the activation space of a NN layer image image

์žฅ์ : ML expertise ํ•„์š” ์—†์Œ, customizability(any concept), global explanations
๋‹จ์ : shallow NN์—์„œ๋Š” ์„ฑ๋Šฅ โ†“, concept labeling ํ•„์š”, text๋‚˜ tabular data์—๋Š” ์ ์šฉ ์–ด๋ ต๋‹ค.

โ–  ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋“ค: ACE, CBM, CW


4) Adversarial Examples

small perturbation์„ ๊ฐ€ํ•ด์„œ model์„ deceiveํ•˜๋Š” samples
(model์˜ ์ทจ์•ฝ์ ์„ ์ฐพ๊ธฐ ์œ„ํ•œ ๊ฒƒ์ธ ๋“ฏ)

  • Fast gradient sign method
  • 1-pixel attack
  • Adversarial patch
  • Black box attack

5) Influential Instances

model์˜ parameter๋‚˜ prediction์„ ๋ณ€ํ™”์‹œํ‚ค๋Š” instance๋ฅผ ์ฐพ๋Š” ๊ฒƒ
(model์„ debugํ•˜๊ฑฐ๋‚˜ ์„ค๋ช…ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋จ, problematic instance๊ฐ€ ์žˆ๊ฑฐ๋‚˜, measurement error๊ฐ€ ์žˆ๊ฑฐ๋‚˜ ๋“ฑ)

  • ํ•ด๋‹น instance๋ฅผ ์ œ๊ฑฐ(deletion Diagnostics)ํ•˜๊ฑฐ๋‚˜ loss์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ๊ธˆ ํฌ๊ฒŒ(influence functions) ํ–ˆ์„ ๋•Œ ์–ผ๋งˆ๋‚˜ ๋ณ€ํ•˜๋Š”์ง€
  • outlier์™€๋Š” ๋‹ค๋ฆ„(dataset๊ณผ์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋จผ ๊ฒƒ), ํ•˜์ง€๋งŒ outlier๊ฐ€ influential instance๊ฐ€ ๋  ์ˆ˜๋Š” ์žˆ์Œ image

โ–  Deletion Diagnostics: instance๋ฅผ ์ œ๊ฑฐํ•˜์—ฌ parameter๋‚˜ prediction์˜ ๋ณ€ํ™”๋ฅผ ๋ณด๋Š” ๊ฒƒ

  • DFBETA: parameter์˜ ๋ณ€ํ™” ์ •๋Ÿ‰ํ™” image
  • Cook's distance: prediction์˜ ๋ณ€ํ™” ์ •๋Ÿ‰ํ™” image

โ–  Influence functions: loss์˜ ๊ฐ€์ค‘์น˜(e)๋ฅผ ์กฐ๊ธˆ ํฌ๊ฒŒ ํ–‡์„ ๋•Œ์˜ ๋ณ€ํ™”๋ฅผ ๋ณด๋Š” ๊ฒƒ retraining์„ ํ•˜์ง€ ์•Š๊ณ , ๋ณ€ํ™”๋ฅผ approximateํ•˜๋Š” ๋ฐฉ๋ฒ•
image

์žฅ์ : model debugging, model๊ฐ„ ๋น„๊ต์— ์‚ฌ์šฉ
๋‹จ์ : ์—ฐ์‚ฐ โ†‘, influence functions๋Š” 2์ฐจ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ loss function์—๋งŒ ๊ฐ€๋Šฅํ•˜๊ณ  approximate์ผ ๋ฟ์ž„, influential์˜ ๊ธฐ์ค€์ด ๋ถˆ๋ช…ํ™•
โš ๏ธ **GitHub.com Fallback** โš ๏ธ