Bet Sizing - jaeaehkim/trading_system_beta GitHub Wiki

Motivation

  • ML ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋†’์€ ์ •ํ™•๋„๋ฅผ ์ œ๊ณตํ•œ๋‹ค๊ณ  ํ•ด๋„ ๋ฒ ํŒ… ํฌ๊ธฐ(Bet sizing)์— ๋”ฐ๋ผ ์ „ํ˜€ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ผ ์ˆ˜ ์žˆ์Œ
  • Hyper Parameter Tuning with Cross Validation์—์„œ neg log loss๊ฐ€ accuracy ๋ณด๋‹ค ์ข‹์€ ์ด์œ ๋ฅผ ๋งํ–ˆ๋“ฏ์ด ๋†’์€ ํ™•๋ฅ ์„ ๋ณด์ผ ๋•Œ ๋ฐฉํ–ฅ์„ฑ์„ ์ž˜ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜๋‹ค. ์—ฌ๊ธฐ์„œ ๋” ์ค‘์š”ํ•œ ๊ฒƒ์€ ์•ž์— ๊ฒƒ์ด ์–ด๋А ์ •๋„ ์ „์ œ๋˜์—ˆ์„ ๋•Œ ๋†’์€ ํ™•๋ฅ ์„ ๋ณด์ด๋Š” ์ƒํ™ฉ์—์„  ๋ฒ ํŒ…์˜ ํฌ๊ธฐ๋ฅผ ํฌ๊ฒŒ ๊ฐ€์ ธ๊ฐ€๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ๋ฐ˜๋Œ€๋กœ ๋œ๋‹ค๋ฉด ์ˆ˜์ต์ด ๋‚  ์ˆ˜ ์žˆ๋Š” ์ „๋žต์ด ์†์‹ค์„ ๋ณด๊ฒŒ ๋˜๋Š” ์ƒํ™ฉ๋„ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

Strategy-Independent Bet Sizing Approaches

  • ์ „๋žต ์ž์ฒด(Direction model)์™€๋Š” ๋…๋ฆฝ์ ์ธ Bet Sizing ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ด๋Š” Rule-Based ๋ฐฉ์‹์˜ Bet sizing๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ Triple Barrier Labeling ์ธก๋ฉด์—์„œ ๊ทผ๋ณธ์ ์ธ ์ฐจ์ด์ ์ด ์กด์žฌํ•œ๋‹ค. ๋””ํ…Œ์ผํ•˜๊ฒŒ๋Š” bet size๋ฅผ ๊ฒฐ์ •ํ•  ๋•Œ discreteํ•˜์ง€ ์•Š๊ณ  continuousํ•˜๊ฒŒ ์ •ํ•œ๋‹ค.
    • ํ•ด๋‹น ์ „๋žต์„ ์ดํ•ดํ•˜๋ ค๋ฉด ๋‘ ๊ฐ€์ง€ ๊ฐœ๋…์„ ์ˆ™์ง€ํ•ด์•ผ ํ•จ. Labeling, Sample Weight
  • ์™œ ์ „๋žต ์ž์ฒด์™€ ๋…๋ฆฝ์ ์ธ๊ฐ€?
    • ์™„๋ฒฝํ•˜๊ฒŒ ๋…๋ฆฝ์ ์ด์ง„ ์•Š์œผ๋‚˜ ์ƒ๋Œ€์ ์œผ๋กœ ๋…๋ฆฝ์ ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Œ. (vs Bet Sizing From Predicted Probabilities)
    • ํ•ด๋‹น Bet Sizing ๊ธฐ๋ฒ•์€ Event Bar(์ฐธ๊ณ  :Data-Structures)์— ์˜ํ•ด sampling ๋œ observations๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋˜์–ด์žˆ๊ณ  Model์˜ inference ๊ฐ’ ๋ณด๋‹ค๋Š” Event Bar์˜ ๊ฐœ์ˆ˜์— ์˜์กดํ•ด์„œ bet sizing์„ ๊ณ„์‚ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋…๋ฆฝ์ ์ธ ํŽธ.

Method 1

Calculate Trading Signal Series

  • image
    • image
    • image
      • Time t์—์„œ ์ƒ์„ฑ๋œ Triple Barrier์˜ ๊ฐœ์ˆ˜๋ฅผ ์„ธ๋Š” ๊ฒƒ์ด ๊ธฐ๋ณธ์ ์ธ ์›๋ฆฌ์ด๋‹ค. ๋‹ค๋งŒ, ์—ฌ๊ธฐ์„œ Long Triple Barrier์™€ Short Triple Barrier์˜ ๊ฐœ์ˆ˜๋ฅผ ๊ฐ๊ฐ ์„ธ์–ด์•ผ ํ•œ๋‹ค. ์ด๋•Œ, Long,Short์ธ์ง€๋Š” Model์˜ inference๊ฐ€ ์žˆ์–ด์•ผ ํ•œ๋‹ค. ์ •์ˆ˜ ๋‹จ์œ„๋กœ ๊ฐ’์ด ์‚ฐ์ถœ๋  ๊ฒƒ์ด๊ณ  ์ด๋ฅผ Gaussian distribution์„ ํ™œ์šฉํ•ด weight๋กœ mapping ํ•œ๋‹ค.

Mapping : Trading Signal -> Bet Size

  • image
    • image
      • ์™œ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ํ™œ์šฉํ•˜๋Š”๊ฐ€?
        • ๊ณ„์‚ฐ๋œ Trading Signal์„ if else: ์—†์ด continuousํ•˜๊ฒŒ mapping ๊ฐ€๋Šฅํ•จ
      • ์™œ ํ˜ผํ•ฉ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ํ™œ์šฉํ•˜๋Š”๊ฐ€?
        • ๋‹จ์ˆœํ•˜๊ฒŒ ๊ฐ•ํ•œ ์‹œ๊ทธ๋„์— ๊ฐ•ํ•œ ๋ฒ ํŒ…์œผ๋กœ ํ•˜๊ธฐ ์œ„ํ•ด์„  ๋‹จ์ผ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ CDF(cumulative distribution function) ๊ฐ’์œผ๋กœ ๋งคํ•‘ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋ฉด ๋จ.
        • ๋‹ค๋งŒ, ์‚ฌ๋žŒ์˜ ํ–‰๋™์„ ๋ชจ๋ฐฉํ•˜๋ฉด Buy Signal์ด ๋งค์šฐ ๊ฐ•ํ•ด์ง„ ์ƒํƒœ๋ผ๋ฉด ๋” ๊ฐ•ํ•ด์ง€๊ธฐ ์–ด๋ ค์šด ์ƒํ™ฉ์ด๋ผ๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ Buy Signal์ด ๋งค์šฐ ์•ฝํ•ด์ง„ ์ƒํƒœ๋ผ๋ฉด ๊ฐ•ํ•ด์ง€๊ธฐ ์‰ฌ์šด ์ƒํ™ฉ์ด๋ผ๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•œ mapping model๋กœ๋Š” ํ˜ผํ•ฉ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ํ™œ์šฉํ•˜๋ฉด ๋œ๋‹ค.

Method 2

  • image
    • ์กฐ๊ธˆ ๋” ๊ฐ„๋‹จํ•˜๊ณ  ์ž๊ธˆ ๊ด€๋ฆฌ ๋ฐฉ์‹์—์„œ ์ž์ฃผ ์‚ฌ์šฉํ•˜๋Š” ์•„์ด๋””์–ด๋ฅผ ์ ‘๋ชฉํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
      • long/short bet triple barrier์˜ ์ตœ๋Œ€ ๊ฐœ์ˆ˜๋ฅผ ํ™œ์šฉํ•ด ๋น„์œจ๋กœ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.
      • backtest ๊ด€์ ์—์„  ์ € ์ตœ๋Œ€ ๊ฐœ์ˆ˜๋ฅผ ๋ฏธ๋ฆฌ ์•Œ ์ˆ˜ ์—†์œผ๋ฏ€๋กœ Look-ahead bias๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๊ฐ’์„ ์ •ํ•ด์•ผ ํ•จ. Concurrent Label์ฒ˜๋Ÿผ barrier๋ฅผ ์น˜๊ธฐ ์ „๊นŒ์ง€๋Š” vector์— ์œ ์ง€ ๊ธฐ๊ฐ„ ๋™์•ˆ 1๊ฐ’์„ ์ฑ„์›Œ๋„ฃ๊ฒŒ ๋œ๋‹ค. Triple Barreir ๊ธฐ๋ฒ•์€ ๊ณต์กดํ•˜๋Š” ๊ธฐ๊ฐ„์ด ๋ฌด์กฐ๊ฑด ํฌํ•จ๋˜๊ฒŒ ๋˜๋Š”๋ฐ ์ด๋•Œ Long Event ๋ฐœ์ƒ ๊ฐœ์ˆ˜์™€ Short Event ๋ฐœ์ƒ ๊ฐœ์ˆ˜์˜ ์ฐจ์ด๋ฅผ ํ™œ์šฉํ•ด bet sizing์„ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ.

Bet Sizing From Predicted Probabilities

  • Quant Researcher๊ฐ€ ๊ฐ€๊ณตํ•œ ๋‹ค์–‘ํ•œ Feature๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•™์Šต๋œ model์˜ probability๋ฅผ ์ ๊ทน ํ™œ์šฉํ•˜์—ฌ ์ด๋ฅผ bet size๋กœ mapping ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด์ „์— model์„ ํ‰๊ฐ€ํ•  ๋•Œ neg log loss (์ฐธ๊ณ  : Scoring and Hyper Parameter Tuning) ๋ฐฉ์‹์œผ๋กœ ํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์— model probability๋ฅผ bet size๋กœ mappingํ•˜๋Š” ๊ฒƒ์€ ํ•ฉ๋ฆฌ์ ์ด๋‹ค.

Method 3

image

  • image
    • image
    • image
      • x labeling outcome์ด 1,-1๊ณผ ๋ฐ–์— ์—†๋Š” ๊ฒฝ์šฐ์˜ ํ†ต๊ณ„๋Ÿ‰(test static) ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•  ์ˆ˜ ์žˆ๊ณ  Z๋Š” ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ ์ด๋‹ค.
    • image
      • side์— ๋Œ€ํ•œ ์˜ˆ์ธก x๊ฐ’๊นŒ์ง€ ๋ฐ˜์˜ํ•˜์—ฌ bet size๋ฅผ ๊ฒฐ์ •ํ•˜๋„๋ก ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.
      • ์ด๋Ÿฐ mapping์„ ํ†ตํ•ด์„œ p=0.501 ๊ณผ ๊ฐ™์ด ์•ฝ๊ฐ„์˜ ์šฐ์œ„๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ๋Š” bet size๋ฅผ '๋งŽ์ด' ์ค„์ด๊ณ  p=0.7๊ณผ ๊ฐ™์ด 0.2 ์ •๋„์˜ ์šฐ์œ„๋ฅผ ๋ณด์ด๋Š” ๊ฒฝ์šฐ๋Š” ๊ต‰์žฅํžˆ ์ ์œผ๋ฏ€๋กœ bet size๋ฅผ '๋งŽ์ด' ๋†’์ด๋Š” ๋ฐฉ์‹ ์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค.
def getSignal(events, stepSize, prob, pred, numClasses, numThreads, **kwargs):
    """
    :param events:
    :param stepSize:
    :param prob: pd.Series with index=events.index
    :param pred: pd.Series with index=events.index
    :param numClasses: fit._classes
    :param numThreads:
    :param kwargs:
    :return:
    """
    if not prob.shape[0]:
        return pd.Series()
    signal0 = (prob - 1. / numClasses) / (prob * (1. - prob)) ** 0.5
    signal0 = pred * (2 * norm.cdf(signal0) - 1)
    return signal1
  • ์ฝ”๋“œ ๋ถ„์„
    • ์ฒซ ๋ฒˆ์งธ signal0๋Š” ํ†ต๊ณ„๋Ÿ‰์„ ๋งŒ๋“œ๋Š” ์—ฐ์‚ฐ
    • ๋‘ ๋ฒˆ์งธ signal0๋Š” ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ cdf์— ๋Œ€์ž…ํ•˜์—ฌ ๋ฒ ํŒ… ๋น„์œจ๊ณผ ํ•จ๊ป˜ ์•ž์— pred๋ฅผ ๊ณฑํ•˜์—ฌ side์— ๋Œ€ํ•œ ์ •๋ณด๋„ ํ•œ ๋ฒˆ์— ํ‘œํ˜„

Method 4 : Averaging Active Bets

  • Triple Barrier ๊ธฐ๋ฒ•์€ Box๋ฅผ ํ˜•์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์–ธ์ œ๋‚˜ concurrentํ•œ ์ƒํ™ฉ์ด ๋ฐœ์ƒํ•œ๋‹ค. ์ด๋•Œ, ์ด๋ฏธ ๋ฒ ํŒ…ํ•˜๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ํฌ๊ฒŒ 2๊ฐ€์ง€ ์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.
    • ์ƒํ™ฉ : p=0.6 -> m=0.3์œผ๋กœ ๋ฒ ํŒ… ์ค‘์ด์—ˆ๋Š”๋ฐ p=0.7 -> m=0.7 ์‹ ํ˜ธ๊ฐ€ ๋‚˜์˜จ ๊ฒฝ์šฐ
      • ์ฒซ ๋ฒˆ์งธ : ๋ฎ์–ด์“ฐ๊ธฐ ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜๋ฉด ๊ณผ๋„ํ•œ ๊ฑฐ๋ž˜๋Ÿ‰์„ ์œ ๋ฐœํ•˜๋‚˜ ์ตœ๊ทผ ์ •๋ณด๋กœ ๋น ๋ฅด๊ฒŒ ์—…๋ฐ์ดํŠธ ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰,๋งŒํผ ์ถ”๊ฐ€ ๋ฒ ํŒ…์„ ํ•˜๋Š” ๊ฒƒ
      • ๋‘ ๋ฒˆ์งธ : averaging ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜๋ฉด ์—…๋ฐ์ดํŠธ๋Š” ์กฐ๊ธˆ ๋Šฆ์–ด์ง€๋‚˜ ๊ฑฐ๋ž˜๋Ÿ‰์„ ๊ณผ๋„ํ•˜๊ฒŒ ๋Š˜๋ฆฌ์ง€ ์•Š์œผ๋ฉด์„œ ํ˜„์žฌ ์ƒํ™ฉ์— ์–ด๋А์ •๋„ ๋Œ€์‘ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ์ด ๊ฒฝ์šด, (0.3+0.7)/2 = 0.5์ด๋ฏ€๋กœ ํ˜„์žฌ 0.3์—์„œ 0.2๋งŒํผ๋งŒ ์ถ”๊ฐ€ ๋ฒ ํŒ…์„ ํ•˜๊ฒŒ ๋œ๋‹ค.
def mpAvgActiveSignals(signals, molecule):
    '''
    At time loc, average signal among those still active.
    Signal is active if:
      a) issued before or at loc AND
      b) loc before signal's endtime, or endtime is still unknown (NaT).
    '''
    out = pd.Series()
    for loc in molecule:
        df0 = (signals.index.values <= loc) & ((loc < signals['t1']) | pd.isnull(signals['t1']))
        act = signals[df0].index
        if len(act) > 0:
            out[loc] = signals.loc[act, 'signal'].mean()
        else:
            out[loc] = 0
    return out
  • ์ฝ”๋“œ ๋ถ„์„
    • for loc in molecule:์„ ํ†ตํ•ด์„œ time t์ธ ์ƒํ™ฉ์—์„œ ๋ชจ๋“  signal์˜ index๋ฅผ ์ฐพ๋Š” ์—ฐ์‚ฐ ์ง„ํ–‰
    • out[loc] = signals.loc[act, 'signal'].mean() signal์ด ์žˆ๋Š” ๊ฒฝ์šฐ์— ์ด๋ฅผ mean ํ•ด์ฃผ๋Š” ์ž‘์—… ์ง„ํ–‰

Size Discretization

def discreteSignal(signal0, stepSize):
    signal1 = (signal0 / stepSize).round()*stepSize
    signal1[signal1 > 1] = 1
    signal1[signal1 < -1] = -1
    return signal1
  • ๋ถ€๊ฐ€์ ์ธ ๋ฐฉ๋ฒ•
    • stepSize๋กœ rounding์„ ํ•ด์„œ ํฐ ์˜ํ–ฅ์ด ์—†๋Š” Signal์ด ์™”์„ ๋•Œ ๋ฌด์‹œํ•  ์ˆ˜๋„ ์žˆ๊ณ  ์“ธ๋ฐ์—†๋Š” ๊ฑฐ๋ž˜๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ์Œ

Dynamic Bet Sizes and Limit Prices

  • ๊ธฐ๋ณธ์ ์ธ ์›๋ฆฌ๋Š” Triple Barrier๋ฅผ ํ˜•์„ฑํ•  ๋•Œ ์›๋ž˜ Probability๋ฅผ ํ•œ ๋ฒˆ๋งŒ ์˜ˆ์ธกํ•˜๊ณ  Barrier๊ฐ€ ๋๋‚  ๋•Œ๊นŒ์ง€ ํ•ด๋‹น ์˜ˆ์ธก์„ ์œ ์ง€ํ•œ๋‹ค. (๋‹ค๋ฅธ ์‹œ๊ทธ๋„์ด ๋“ค์–ด์™€์„œ average๋กœ ์ธํ•ด ๋ฐ”๋€Œ๋Š” ๊ฒƒ์„ ์ œ์™ธ)
  • Dynamic Bet์€ Triple Barrier ๊ตฌ๊ฐ„ ์•ˆ์—์„œ ์‹œ์žฅ ๊ฐ€๊ฒฉ p์— ๋”ฐ๋ผ ์˜ˆ์ธก ๊ฐ€๊ฒฉ f๋ฅผ tick ๋‹จ์œ„๋กœ ์—…๋ฐ์ดํŠธ ํ•˜๋ฉด์„œ f๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ bet size๋ฅผ ๋‹ค์‹œ ๊ณ„์‚ฐํ•˜๊ณ  ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ฐฉ์‹
  • ํ•ด๋‹น ๋ฐฉ๋ฒ•์€ ์ด๋ก ์ ์œผ๋กœ๋Š” ๊ต‰์žฅํžˆ ํ•ฉ๋ฆฌ์ ์ด๋‚˜ production level๋กœ ๊ฐœ๋ฐœํ•˜๊ธฐ์—” ๋งค์šฐ ๋งŽ์€ ๋ฆฌ์„œ์น˜๊ฐ€ ํ•„์š”ํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋จ. ์ถ”ํ›„ ์—ฐ๊ตฌ ๊ณผ์ œ.