About LGBM - SoMinHyung/2020MiraeAsset GitHub Wiki

1. Light - gbm


(0) ์„ค์น˜

!pip install lightgbm==2.2.3

์ƒ์œ„๋ฒ„์ „์—์„œ๋Š” feature๊ฐ€ ํ•œ๊ธ€์ด๋ฉด ์ธ์‹์„ ํ•˜์ง€ ๋ชปํ•œ๋‹ค๊ณ  ํ•จ.


(1) ๊ฐœ์š”

Light GBM์€ Gradient Boosting ํ”„๋ ˆ์›Œํฌ๋กœ Tree ๊ธฐ๋ฐ˜ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค.

๊ธฐ์กด์˜ Tree ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋‹ค๋ฅธ ์ ์€ Light GBM์€ Tree๊ฐ€ ์ˆ˜์ง์ ์œผ๋กœ ํ™•์žฅ๋˜๋Š” ๋ฐ˜๋ฉด์— ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ Tree๊ฐ€ ์ˆ˜ํ‰์ ์œผ๋กœ ํ™•์žฅ๋œ๋‹ค๋Š” ์ ์ด๋‹ค.

์ฆ‰, Light GBM์€ leaf-wise ์ธ ๋ฐ˜๋ฉด ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ level-wise ์ด๋‹ค.

ํ™•์žฅํ•˜๊ธฐ ์œ„ํ•ด์„œ max delta loss๋ฅผ ๊ฐ€์ง„ leaf๋ฅผ ์„ ํƒํ•˜๊ฒŒ ๋œ๋‹ค.

๋™์ผํ•œ leaf๋ฅผ ํ™•์žฅํ•  ๋•Œ, leaf-wise ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ level-wise ์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณด๋‹ค ๋” ๋งŽ์€ loss, ์†์‹ค์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.


์žฅ์  : Light GBM์€ ํฐ ์‚ฌ์ด์ฆˆ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๊ณ  ์‹คํ–‰์‹œํ‚ฌ ๋•Œ ์ ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ฐจ์ง€ํ•ฉ๋‹ˆ๋‹ค. ๋˜ ๊ฒฐ๊ณผ์˜ ์ •ํ™•๋„์— ์ดˆ์ ์„ ๋งž์ถฐ์„œ ํ•™์Šตํ•œ๋‹ค.

๋‹จ์  : ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ 100๊ฐœ๊ฐ€ ๋„˜์–ด์„œ ์‚ฌ์šฉํ•˜๊ธฐ ๋ณต์žกํ•œ ๋ถ€๋ถ„์ด ์žˆ๋‹ค.


(2) ํŒŒ๋ผ๋ฏธํ„ฐ

  1. max_depth : Tree์˜ ์ตœ๋Œ€ ๊นŠ์ด๋ฅผ ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋ชจ๋ธ ๊ณผ์ ํ•ฉ์„ ๋‹ค๋ฃฐ ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์—ฌ๋Ÿฌ๋ถ„์˜ ๋ชจ๋ธ์ด ๊ณผ์ ํ•ฉ๋œ ๊ฒƒ ๊ฐ™๋‹ค๊ณ  ๋Š๋ผ์‹ ๋‹ค๋ฉด ์ œ ์กฐ์–ธ์€ max_depth ๊ฐ’์„ ์ค„์ด๋ผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  2. min_data_in_leaf : Leaf๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ตœ์†Œํ•œ์˜ ๋ ˆ์ฝ”๋“œ ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋””ํดํŠธ๊ฐ’์€ 20์œผ๋กœ ์ตœ์  ๊ฐ’์ž…๋‹ˆ๋‹ค. ๊ณผ์ ํ•ฉ์„ ํ•ด๊ฒฐํ•  ๋•Œ ์‚ฌ์šฉ๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค.

  3. feature_fraction : Boosting (๋‚˜์ค‘์— ๋‹ค๋ค„์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค) ์ด ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์ผ ๊ฒฝ์šฐ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 0.8 feature_fraction์˜ ์˜๋ฏธ๋Š” Light GBM์ด Tree๋ฅผ ๋งŒ๋“ค ๋•Œ ๋งค๋ฒˆ ๊ฐ๊ฐ์˜ iteration ๋ฐ˜๋ณต์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ ์ค‘์—์„œ 80%๋ฅผ ๋žœ๋คํ•˜๊ฒŒ ์„ ํƒํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

  4. bagging_fraction : ๋งค๋ฒˆ iteration์„ ๋Œ ๋•Œ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€๋ฅผ ์„ ํƒํ•˜๋Š”๋ฐ ํŠธ๋ ˆ์ด๋‹ ์†๋„๋ฅผ ๋†’์ด๊ณ  ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•  ๋•Œ ์ฃผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

  5. early_stopping_round : ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋ถ„์„ ์†๋„๋ฅผ ๋†’์ด๋Š”๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ๋งŒ์•ฝ ์–ด๋–ค validation ๋ฐ์ดํ„ฐ ์ค‘ ํ•˜๋‚˜์˜ ์ง€ํ‘œ๊ฐ€ ์ง€๋‚œ early_stopping_round ๋ผ์šด๋“œ์—์„œ ํ–ฅ์ƒ๋˜์ง€ ์•Š์•˜๋‹ค๋ฉด ํ•™์Šต์„ ์ค‘๋‹จํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ง€๋‚˜์นœ iteration์„ ์ค„์ด๋Š”๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

  6. lambda : lambda ๊ฐ’์€ regularization ์ •๊ทœํ™”๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ๊ฐ’์˜ ๋ฒ”์œ„๋Š” 0 ์—์„œ 1 ์‚ฌ์ด์ž…๋‹ˆ๋‹ค.

  7. min_gain_to_split : ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋ถ„๊ธฐํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ์ตœ์†Œํ•œ์˜ gain์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. Tree์—์„œ ์œ ์šฉํ•œ ๋ถ„๊ธฐ์˜ ์ˆ˜๋ฅผ ์ปจํŠธ๋กคํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

  8. max_cat_group : ์นดํ…Œ๊ณ ๋ฆฌ ์ˆ˜๊ฐ€ ํด ๋•Œ, ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๋Š” ๋ถ„๊ธฐ ํฌ์ธํŠธ๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ Light GBM ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์นดํ…Œ๊ณ ๋ฆฌ ๊ทธ๋ฃน์„ max_cat_group ๊ทธ๋ฃน์œผ๋กœ ํ•ฉ์น˜๊ณ  ๊ทธ๋ฃน ๊ฒฝ๊ณ„์„ ์—์„œ ๋ถ„๊ธฐ ํฌ์ธํŠธ๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค. ๋””ํดํŠธ ๊ฐ’์€ 64 ์ž…๋‹ˆ๋‹ค.


(3) ํ•ต์‹ฌ ํŒŒ๋ผ๋ฏธํ„ฐ

  1. Task : ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์ˆ˜ํ–‰ํ•˜๊ณ ์ž ํ•˜๋Š” ์ž„๋ฌด๋ฅผ ๊ตฌ์ฒดํ™”ํ•ฉ๋‹ˆ๋‹ค. train ํŠธ๋ ˆ์ด๋‹์ผ ์ˆ˜๋„ ์žˆ๊ณ  predict ์˜ˆ์ธก์ผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

  2. application : ๊ฐ€์žฅ ์ค‘์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ, ๋ชจ๋ธ์˜ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ •ํ•˜๋Š”๋ฐ ์ด๊ฒƒ์ด regression ํšŒ๊ท€๋ถ„์„ ๋ฌธ์ œ์ธ์ง€ ๋˜๋Š” classification ๋ถ„๋ฅ˜ ๋ฌธ์ œ์ธ์ง€๋ฅผ ์ •ํ•ฉ๋‹ˆ๋‹ค. Light GBM์—์„œ ๋””ํดํŠธ๋Š” regression ํšŒ๊ท€๋ถ„์„ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

  • regression: ํšŒ๊ท€๋ถ„์„
  • binary: ์ด์ง„ ๋ถ„๋ฅ˜
  • multiclass: ๋‹ค์ค‘ ๋ถ„๋ฅ˜
  1. boosting : ์‹คํ–‰ํ•˜๊ณ ์ž ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํƒ€์ž…์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ๋””ํดํŠธ๊ฐ’์€ gdbt ์ž…๋‹ˆ๋‹ค.
  • gdbt : Traditional Gradient Boosting Decision Tree
  • rf : Random Forest
  • dart : Dropouts meet Multiple Additive Regression Trees
  • goss : Gradient-based One-Side Sampling
  • num_boost_round : boosting iteration ์ˆ˜๋กœ ์ผ๋ฐ˜์ ์œผ๋กœ 100 ์ด์ƒ์ž…๋‹ˆ๋‹ค.
  1. learning_rate : ์ตœ์ข… ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ๊ฐ๊ฐ์˜ Tree์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค. GBM์€ ์ดˆ๊ธฐ์˜ ์ถ”์ •๊ฐ’์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ๊ฐ๊ฐ์˜Tree ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”์ •๊ฐ’์„ ์—…๋ฐ์ดํŠธ ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์ด๋Ÿฌํ•œ ์ถ”์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ณ€ํ™”์˜ ํฌ๊ธฐ๋ฅผ ์ปจํŠธ๋กคํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ๊ฐ’์€ 0.1, 0.001, 0.003 ๋“ฑ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  2. num_leaves : ์ „์ฒด Tree์˜ leave ์ˆ˜ ์ด๊ณ , ๋””ํดํŠธ๊ฐ’์€ 31์ž…๋‹ˆ๋‹ค.

  3. device : ๋””ํดํŠธ ๊ฐ’์€ cpu ์ธ๋ฐ gpu๋กœ ๋ณ€๊ฒฝํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

โš ๏ธ **GitHub.com Fallback** โš ๏ธ