Hyper Parameter Tuning with Cross Validation - jaeaehkim/trading_system_beta GitHub Wiki

Motivation

  • Train data์— ๋Œ€ํ•ด Feature Selection์„ ๋งˆ์น˜๊ณ  ๋‚œ ๋’ค์— Hyper Parameter Tuning์€ ํ•„์ˆ˜์ ์ธ ๊ณผ์ •์ด๋‹ค.
  • Feature๊ฐ€ ์ •ํ•ด์ง€๊ณ  ๋‚œ ๋’ค Model์˜ HP๋ฅผ ์ •ํ•˜๊ธฐ ์œ„ํ•ด CV๋ฅผ ํ†ตํ•ด Metric(Score function)์„ ์ตœ๋Œ€ํ™” ํ•˜๋Š” Parameter๋กœ ์ตœ์ข… ๊ฒฐ์ • ์ง“๋Š”๋‹ค. ์ด์ „์— ๋ชจ๋ธ์„ ๊ณผ์ ํ•ฉ ์‹œํ‚ค์ง€ ์•Š๊ธฐ ์œ„ํ•œ ๋งŽ์€ ๊ณ ๋ฏผ์„ ํ–ˆ์œผ๋ฏ€๋กœ HP Tuning ์ž‘์—…์—์„  ์ตœ๋Œ€ ์„ฑ๋Šฅ์˜ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋ฉด ๋œ๋‹ค.
  • Search ๋ฐฉ๋ฒ•๋ก ์„ ๋ฐฐ์šธํ…๋ฐ ์ด๊ฒƒ์ด Meta-Label๋กœ ์—ฌ๋Ÿฌ ML ๋ชจ๋ธ ํ˜น์€ Rule-based ๋ชจ๋ธ๋กœ ํ™•์žฅ๋˜๋ฉด ๊ฐ ํŒŒํŠธ์— ๊ณ„์† ์ ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

Grid Search Cross Validation

  • Grid Search๋Š” Hyper parameter๋กœ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ๊ฒฝ์šฐ์˜ ์ˆ˜์— ๋Œ€ํ•ด์„œ search ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. CV์—์„  scoring function์„ ๊ฐ ๊ฒฝ์šฐ๋งˆ๋‹ค ๊ณ„์‚ฐํ•˜์—ฌ ranking์„ ๋งค๊ฒจ selectionํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ๊ธฐ์ € ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ(Underlying structure of the data)๋ฅผ ๋ชจ๋ฅด๋Š” ๊ฒฝ์šฐ์— ์ด๋ฅผ ํ†ตํ•ด ์ธ์‚ฌ์ดํŠธ๋ฅผ ์–ป๋Š” ๋ฐฉ์‹์œผ๋กœ ํ™œ์šฉ ๊ฐ€๋Šฅํ•จ.
  • Ensemble-Methods, Cross-Validataion-in-Model๋ฅผ ํ†ตํ•ด์„œ ์–ป์€ ์ง€์‹์„ ๋ฐ˜์˜ํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ฝ”๋“œ๋กœ ํ‘œํ˜„
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import BaggingClassifier
from scipy.stats import rv_continuous, kstest
from cv import PurgedKFold

def clfHyperFit(feat, lbl, t1, pipe_clf, param_grid, cv=3, bagging=[0, None, 1.0],
                rndSearchIter=0, n_jobs=-1, pctEmbargo=0, **fit_params):
    if set(lbl.values) == {0, 1}:
        scoring = 'f1' # f1 for meta-labeling
    else:
        scoring = 'neg_log_loss' # symmetric towards all classes
    
    # 1) hyperparameter searching, on train data
    inner_cv = PurgedKFold(n_splits=cv, t1=t1, pctEmbargo=pctEmbargo)
    if rndSearchIter == 0:
        gs = GridSearchCV(estimator=pipe_clf, param_grid=param_grid, scoring=scoring, cv=inner_cv, n_jobs=n_jobs)
    else:
        gs = RandomizedSearchCV(estimator=pipe_clf, param_distributions=param_grid, scoring=scoring, cv=inner_cv, n_jobs=n_jobs, n_iter=rndSearchIter)
    gs = gs.fit(feat, lbl, **fit_params).best_estimator_
    # 2) fit validated model on the entirety of the data
    if bagging[1] > 0:
        gs = BaggingClassifier(base_estimator=TheNewPipe(gs.steps), n_estimators=int(bagging[0]), max_samples=float(bagging[1]),
                              max_features=float(bagging[2]), n_jobs=n_jobs)
        gs = gs.fit(feat, lbl, sample_weight=fit_params[gs.base_estimator.steps[-1][0] + '__sample_weight'])
        gs = Pipeline([('bag', gs)])
    return gs
  • ์ฝ”๋“œ ๋ถ„์„
    • TheNewPipe๋Š” Sample-Weights๋ฅผ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด sklearn์˜ Pipeline์„ ์˜ค๋ฒ„๋ผ์ด๋”ฉ
      • ์ตœ๊ทผ sklearn์—์„  ์—…๋ฐ์ดํŠธ ๋˜์–ด์žˆ์Œ
    • scoring์„ Meta-Labeling ๋ฐฉ์‹์„ ์“ฐ๊ฒŒ ๋˜๋ฉด ์ข€ ๋” ์„ธ๋ถ€์ ์œผ๋กœ ๊ตฌ๋ณ„(primary model & sub model)ํ•˜์—ฌ ๋ชจ๋ธ์„ train ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํŠน์ • ํด๋ž˜์Šค๋กœ ๋ผ๋ฒจ๋ง ๋˜์–ด์žˆ๋Š” ํ‘œ๋ณธ์ด ๋Œ€๋Ÿ‰์œผ๋กœ ๋“ค์–ด์žˆ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฏ€๋กœ 'F1 Score'๋กœ ํ•˜๋Š” ๊ฒƒ์„ ์ฝ”๋“œ์— ๋ฐ˜์˜ํ–ˆ๊ณ  ์ „์ฒด์ ์ธ train data์— ๋Œ€ํ•ด์„œ ํ•™์Šตํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ๋ชจ๋“  ๊ฒฝ์šฐ์— ๋Œ€ํ•ด ๋™์ผํ•œ ์ •๋„๋กœ ์˜ˆ์ธก์— ๊ด€์‹ฌ์ด ์žˆ์œผ๋ฏ€๋กœ 'accuracy' or 'neg log loss'๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค. 'neg log loss'๋ฅผ ์“ฐ๋Š” ๊ฒƒ์„ ์ถ”์ฒœํ•˜๊ณ  ์ด์œ ๋Š” ๋’ค์—์„œ ์„ค๋ช…ํ•œ๋‹ค.
    • search space๋Š” param_grid๋ฅผ ํ†ตํ•ด ์ •์˜ํ•˜๊ณ  GridSearchCV, RandomizedSearchCV ์ค‘ ์„ ํƒํ•œ ํ›„ feat=X, lbl=y๋ฅผ ํ†ตํ•ด best_estimator๋ฅผ ์ฐพ๋Š”๋‹ค.
    • sample weight๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ fit

Randomized Search Cross Validation

  • ์œ„์˜ ์ฝ”๋“œ์—์„  ์ด๋ฏธ Randomized Search CV๊ฐ€ ๋ฐ˜์˜๋˜์–ด ์žˆ๋‹ค. ์ด๊ฒƒ์„ ์“ฐ๋Š” ์ด์œ ๋Š” Grid Search CV๋Š” ๋ณต์žกํ•ด์งˆ ์ˆ˜๋ก ๊ฐ๋‹นํ•  ์ˆ˜ ์—†๋Š” ์—ฐ์‚ฐ๋Ÿ‰.
  • ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ด๋ฉด์„œ ์ข‹์€ ํ†ต๊ณ„์  ์„ฑ์งˆ์„ ๊ฐ–๋Š” ๋Œ€์•ˆ์€ ๊ฐ Hyper Parameter๋ฅผ uniform distribution์„ ์ด์šฉํ•ด์„œ ๋ฝ‘์•„๋‚ด๋Š” ๋ฐฉ๋ฒ•์ด๊ณ  ์ด๋Š” ์กฐํ•ฉ์˜ ๊ฐœ์ˆ˜๋ฅผ ์‰ฝ๊ฒŒ ํ†ต์ œํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ธก๋ฉด์—์„œ ์ž์ฃผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. (Begstra, 2011)

Log Uniform Distribution

  • ๊ฐ Hyper Parameter์— ๋Œ€ํ•œ ๋ฆฌ์„œ์น˜๋ฅผ ํ†ตํ•ด์„œ knowledge๊ฐ€ ์ƒ๊ธด๋‹ค๋ฉด Log Uniform Distribution์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
    • ์˜ˆ๋ฅผ ๋“ค๋ฉด, SVM ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ Hyper Parameter C์˜ ๊ฒฝ์šฐ 0.01 ~ 1์˜ ์ฆ๊ฐ€์™€ 1 ~ 100์˜ ์ฆ๊ฐ€๊ฐ€ ๋น„์Šทํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚˜๋ฏ€๋กœ(๋กœ๊ทธ์ ) ๋‹จ์œ„ ๋ณ„๋กœ ๊ท ๋“ฑํ•˜๊ฒŒ ๋ฝ‘๋Š” ๊ฒƒ์€ ๋น„ํšจ์œจ์ ์ด์–ด์„œ ์ด๋Ÿฐ ๋ถ€๋ถ„์„ log ํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•ด์„œ Hyper Parameter๋ฅผ tuning์„ ๋” ๋น ๋ฅธ ์†๋„๋กœ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

Scoring and Hyper Parameter Tuning

  • image
    • accuracy๋Š” ๋†’์€ ํ™•๋ฅ ๋กœ ์ž˜๋ชป๋œ ๋งค์ˆ˜ ์˜ˆ์ธก ํ•œ ๊ฒฝ์šฐ" vs "๋‚ฎ์€ ํ™•๋ฅ ๋กœ ์ž˜๋ชป๋œ ๋งค์ˆ˜ ์˜ˆ์ธก ํ•œ ๊ฒฝ์šฐ" ๋ฅผ ๋™์ผํ•˜๊ฒŒ ์ทจ๊ธ‰
    • ํˆฌ์ž ์ „๋žต์ด ๊ถ๊ทน์ ์œผ๋กœ ๋ˆ์„ ๋ฒŒ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋†’์€ ํ™•๋ฅ ๋กœ ์˜ˆ์ธกํ•œ ๊ฒฝ์šฐ๋ฅผ ์ข€ ๋” ์ค‘์š”ํ•˜๊ฒŒ ํ•™์Šตํ•ด์•ผ ํ•œ๋‹ค.
      • neg log loss์˜ ๊ฒฝ์šฐ๋Š” p_(n,k)์— ๋ ˆ์ด๋ธ” k์— ๋Œ€ํ•œ n๋ฒˆ์งธ ์˜ˆ์ธก ํ™•๋ฅ ์ด๊ณ  ์ด๊ฒƒ์ด ์‹์— ๋ฐ˜์˜๋˜์–ด ์žˆ๋‹ค. ํ™•๋ฅ ์˜ ํฌ๊ธฐ๋Š” 'ํฌ์ง€์…˜ ํฌ๊ธฐ'์™€ ์—ฐ๊ด€๋˜์–ด ์žˆ๋‹ค.
      • y_(n,k)๋Š” ๋ ˆ์ด๋ธ” 1,-1์˜ ๊ด€์ ์—์„œ '๋ฐฉํ–ฅ์„ฑ'์˜ ์˜๋ฏธ๊ฐ€ ๋“ค์–ด๊ฐ€ ์žˆ๊ณ  ์ด์ „์— Cross-Validataion-in-Model์— cv score function์„ ํ†ตํ•ด sample weight๋ฅผ ์ฃผ๊ณ  ์žˆ์œผ๋ฏ€๋กœ '์ˆ˜์ต๋ฅ  ํฌ๊ธฐ'์— ๋Œ€ํ•œ ์š”์†Œ๋„ ํ•™์Šต ๋ชฉํ‘œ์— ๋“ค์–ด๊ฐ€๊ฒŒ ๋œ๋‹ค
    • y_(n,k), p_(n,k) ๋‘ ๊ฐ€์ง€ ์š”์†Œ์— ์—ฌ๋Ÿฌ ์ „์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด์„œ '๋ฐฉํ–ฅ์„ฑ', 'ํฌ์ง€์…˜ ํฌ๊ธฐ', '์ˆ˜์ต๋ฅ  ํฌ๊ธฐ'์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด์•˜๊ณ  ๊ทธ๋Ÿฌ๋ฏ€๋กœ ๋‹จ์ˆœํ•˜๊ฒŒ accuracy๋ฅผ ์“ฐ๊ธฐ ๋ณด๋‹ค๋Š” neg log loss๋ฅผ ์“ฐ๋Š”๊ฒŒ 'ํˆฌ์ž ์ „๋žต ๋ชจ๋ธ'์„ ๋งŒ๋“œ๋Š” ๊ด€์ ์—์„  ํ•ฉ๋ฆฌ์ ์ด๋ผ ๋ณผ ์ˆ˜ ์žˆ์Œ.

Application to Quant System

  • Hyper parameter tuning์€ ํฌ๊ฒŒ 3๊ฐ€์ง€ ํŒŒํŠธ๋กœ ๋‚˜๋‰œ๋‹ค๊ณ  ๋ด„
    1. Train Data๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๊ณผ์ •์—์„œ์˜ Hyper parameter
      • Bar, Event, Feature, Labeling ๊ฐ๊ฐ์—์„œ ๋งŽ์€ Hyper Parameter๊ฐ€ ํ•„์š”ํ•จ
    2. Model ๋‚ด์žฌ Hyper Parameter
      • SVM -> C, gamma
      • RF -> n_estimators, max_depth, min_samples_leaf, min_weight_fraction, max_features, max_samples, max_leaf_nodes, min_impurity_decrease, class_weight, ccp_alpha
    3. Backtest Hyper Parameter
      • bet type, bet sizing amplify, transaction fee, turn over rate, Cross Validation (k), Walk Forward update len
  • Log Uniform Distribution์„ ํ™•์žฅํ•ด์„œ Bayesian์ ์œผ๋กœ subjective priors๋ฅผ ๊ฐ Hyper parameter์— ๋งž๊ฒŒ ๋ฐ˜์˜ํ•œ ๋ถ„ํฌ๋ฅผ ์ด์šฉํ•ด์„œ sampling ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ™•์žฅ ๊ฐ€๋Šฅ