Labeling - jaeaehkim/trading_system_beta GitHub Wiki

Motivation

  • Data Structures ํŒŒํŠธ์—์„  Financial Raw Data๊ฐ€ ์–ด๋–ค ์ข…๋ฅ˜ ๋ฐ ํ˜•ํƒœ๋กœ ์กด์žฌํ•˜๋Š” ์ง€๋ถ€ํ„ฐ IIDํ•œ ํ˜•ํƒœ์˜ ๊ตฌ์กฐํ™” Bar๋กœ ๋งŒ๋“ค๊ณ , Bar๋ฅผ Event-Based Sampling ํ•˜๋Š” ๋ฐฉ๋ฒ•๊นŒ์ง€ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค.
  • ์ด์   Data๊ฐ€ ์กด์žฌํ•˜๋Š” ์ƒํƒœ์—์„œ ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ Labeling ํ•˜์—ฌ ML Model์—๊ฒŒ ์ „๋‹ฌํ•ด์ค„ ๊ฒƒ์ธ๊ฐ€ ๋ผ๋Š” ๋ฌธ์ œ๊ฐ€ ๋‚จ์•˜๋‹ค. ์ง€๋„ํ•™์Šต์˜ ํŠน์„ฑ์ƒ Input์€ (๋ฌธ์ œ X - ๋‹ต์ง€ y)๋กœ ๊ตฌ์„ฑํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค.
  • ์ „๋ฐ˜์ ์œผ๋กœ Trading, Finance Paper๋ฅผ ๋ณด๋ฉด ๋Œ€๋ถ€๋ถ„ Time Bar๋กœ ์‹œ์ž‘ํ•ด์„œ ์ผ์ • ์‹œ๊ฐ„ ๊ตฌ๊ฐ„์˜ Return์„ ๊ฐ€์ง€๊ณ  0๋ณด๋‹ค ํฌ๋ฉด 1, ์•„๋‹ˆ๋ฉด -1 ํ˜น์€ 0์ด ์•„๋‹Œ ๊ธฐ์ค€์œผ๋กœ ํ•œ๋‹ค๋“ ์ง€ ํ˜น์€ 3-4๊ฐ€์ง€ class๋กœ labeling ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋ฒˆ ์žฅ์—์„  ML Model์ด ์‚ฌ๋žŒ์˜ ์ƒ๊ฐ์„ ํŠนํ™”ํ•ด์„œ ํ•™์Šตํ•œ๋‹ค๋Š” ์ธ์‚ฌ์ดํŠธ๋ฅผ ์ ์šฉํ•˜์—ฌ Path-dependentํ•œ Labeling ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•  ๊ฒƒ์ด๋‹ค.

The Fixed-Time Horizon Method

  1. 3 2_1 3 2_2
  2. 3 2_3
  • ๋Œ€๋ถ€๋ถ„์˜ ํŽ˜์ดํผ๋Š” "๊ณ ์ •-์‹œ๊ฐ„ Horizon" ๋ฐฉ์‹์œผ๋กœ Labeling์„ ์‹œ๋„ํ•œ๋‹ค. ์ง๊ด€์ ์ด๋‚˜ ๋‹จ์ˆœํ•˜๋‹ค. ์ด๋ฅผ ์ˆ˜์‹์œผ๋กœ generalํ•˜๊ฒŒ ํ‘œํ˜„ํ•˜๋ฉด ์œ„์™€ ๊ฐ™๋‹ค.
    • X๋Š” ์–ด๋–ค feature vector๋ผ๊ณ  ๋ณด๋ฉด ๋œ๋‹ค. (ex. close price ์‰ฝ๊ฒŒ) y๋Š” ์‹œ๊ฐ„ ๊ตฌ๊ฐ„ ๋‚ด์—์„œ์˜ ์ˆ˜์ต๋ฅ (r_(t~t+h))์˜ ์ž„๊ณ„๊ฐ’(tau)์— ๋”ฐ๋ฅธ label
  • ๋ฌธ์ œ์ 
    • ์ผ๋‹จ Fixed-Time์„ ์“ฐ๊ธฐ ์œ„ํ•ด์„  ๊ธฐ์ดˆ Bar๊ฐ€ Time Bar์—ฌ์•ผ ํ•œ๋‹ค. ์ด๋Š” Data Structures ์—์„œ ๋ดค๋“ฏ ์ข‹์€ ํ†ต๊ณ„์  ์„ฑ์งˆ (IID)์„ ๊ฐ–์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์—์„œ ๋ถ€ํ„ฐ ๋ฌธ์ œ๊ฐ€ ๋œ๋‹ค.
    • ์ถ”๊ฐ€๋กœ, Fixed-Time์œผ๋กœ Bar๋ฅผ ๊ตฌ์„ฑํ•˜๊ฒŒ ๋˜๋ฉด ๋ชจ๋“  ์‹œ๊ฐ„์— ๋™์ผํ•œ ์ •๋ณด๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋Š” ์˜ค๋ฅ˜๋ฅผ ๋ฒ”ํ•˜๋Š”๋ฐ ์ด๋กœ ์ธํ•ด ๋ณ€๋™์„ฑ์ด ์ ์€ ๋งŽ์€ ๊ตฌ๊ฐ„์—์„  label์ด ์ž„๊ณ„๊ฐ’์„ ๋„˜์ง€ ๋ชปํ•˜๊ฒŒ ๋  ๊ฒƒ์ด๊ณ (๋Œ€๋ถ€๋ถ„ label=0) ML Model์ด Accuracy๋งŒ ๋†’์ด๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต๋  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค. ์ฆ‰, ์‹ค์งˆ์  ์œ ํšจ์„ฑ์ด ๋–จ์–ด์ง„๋‹ค.
  • ๊ธฐ๋ณธ ํ•ด๊ฒฐ์ฑ…
    1. ์ž„๊ณ„๊ฐ’ tau๋ฅผ ์ƒ์ˆ˜๋กœ ๊ณ ์ •ํ•˜์ง€ ์•Š๊ณ  return's rolling std๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ณ€๋™ํ•˜๋Š” ์ž„๊ณ„๊ฐ’ sigma_t๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿผ ์‹œ๊ธฐ๋ณ„๋กœ ์ ์ ˆํ•˜๊ฒŒ ์ž„๊ณ„๋ฅผ ๋„˜๋Š” label=1์ด ์ƒ์„ฑ๋  ๊ฒƒ์ด๋‹ค.
    2. ์ฒ˜์Œ๋ถ€ํ„ฐ Volume/Dollar Bar๋ฅผ ํ™œ์šฉํ•˜์—ฌ constantํ•œ ์ž„๊ณ„๊ฐ’ tau๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฏธ Bar ์ž์ฒด์— ๊ฑฐ๋ž˜๋Ÿ‰/๊ฑฐ๋ž˜๋Œ€๊ธˆ์— ๋น„๋ก€ํ•ด ์ •๋ณด๊ฐ€์ค‘์ด ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— constantํ•œ tau์—ฌ๋„ ์ข€ ๋” ๋‚ซ๊ฒŒ labeling์ด ๋  ์ˆ˜ ์žˆ๋‹ค.

The Triple-Barrier Method

  • AFML(Advances in Financial Machine Learning)์—์„œ Marcos Lopez ๊ต์ˆ˜๊ฐ€ ์†Œ๊ฐœํ•œ Labeling ๊ธฐ๋ฒ•์ด๋‹ค. ์ด๋ฅผ Triple-Barrier Method(ํŠธ๋ฆฌํ”Œ ๋ฐฐ๋ฆฌ์–ด ๊ธฐ๋ฒ•)๋ผ ๋ถ€๋ฅด์ž.
  • ํ•ด๋‹น ๊ธฐ๋ฒ•์€ 2๊ฐœ์˜ Horizontal Barrier(์ˆ˜ํ‰์„ )๊ณผ 1๊ฐœ์˜ Vertical Barrier(์ˆ˜์ง์„ )์„ ์ •์˜ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
  • ํˆฌ์ž ์‹œ์  t์—์„œ ์ง์‚ฌ๊ฐํ˜•์˜ Barrier Box๋ฅผ ๊ทธ๋ฆด ์ˆ˜ ์žˆ๊ณ , price์˜ ์›€์ง์ž„์ด ์–ด๋–ค barrier๋ฅผ ๋‹ฟ๋Š” ๊ทธ ์ˆœ๊ฐ„์˜ ๊ฐ’์œผ๋กœ labeling ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.
  • ์ฆ‰, ๋™์ ์œผ๋กœ ํƒ€์ž„์ปท, ์ต์ ˆ, ์†์ ˆ์˜ ๋ผ์ธ์„ ์žก์•„์ฃผ๋Š” ๊ทธ generalํ•œ ํ–‰์œ„๋ฅผ labeling ํ•จ์œผ๋กœ์จ ML Model์ด ์ด๋ฅผ ๋™์ ์œผ๋กœ ๊ณ„์‚ฐํ•œ inference ๊ฐ’์„ ๋‚ด๊ฒŒ ํ•ด์ค€๋‹ค.
    • ์ด๋•Œ, bar ์ฒซ ์‹œ์ž‘์ ์—์„œ ์ˆ˜ํ‰์„ 2๊ฐœ์™€ ์ˆ˜์ง์„  1๊ฐœ ์ค‘์—์„œ ์ฒ˜์Œ ๋งŒ๋‚œ ๋ถ€๋ถ„์œผ๋กœ ๋Š์–ด์„œ labeling์„ ์ง„ํ–‰ํ•œ๋‹ค.
  • ์ด๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๊ฒŒ ๊ทธ๋ฆผ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
  • 3 4_1 3 4_2

Method Flow

getTEvents

def getTEvents(gRaw, upper, lower=None):
    if lower is None:
        lower = - upper
    assert (upper >=0 and lower <= 0)
    tEvents, sPos, sNeg = [], 0, 0
    diff = gRaw.diff()
    for i, change in enumerate(diff.values[1:]):
        sPos, sNeg = max(0, sPos + change), min(0, sNeg + change)

        if sNeg < lower:
            sNeg = 0
            tEvents.append(i)

        if sPos > upper:
            sPos = 0
            tEvents.append(i)

    return gRaw.index[tEvents]

-3 4 tEvents

  • Data Structures ์—์„œ VB/DB or TIB/VIB/DIB or TRB/VRB/DRB์˜ ํ˜•ํƒœ์˜ Bar๋กœ ๋งŒ๋“ค๊ณ  CUSUM Filter๋ฅผ ์ด์šฉํ•ด์„œ ํ•ด๋‹น ์‹œ์ ์˜ T index๋ฅผ ์ถ”์ถœ

getBins : Triple Barrier labeling ์‹œ์ž‘

def getBins(events, price):
    """
    refer to the explanation before snippet 3.5
    :param events: returned value of getEvents, columns=['trgt','tl']
    :param price: volume_df.price
    :return: columns=['ret', 'bin']
    """
    events_ = events.dropna(subset=['tl'])
    px = events_.index.union(events_['tl'].values).drop_duplicates()
    px = price.reindex(px, method='bfill')
    out = pd.DataFrame(index=events_.index)
    out['ret'] = px.loc[events_['tl'].values].values / px.loc[events_.index] - 1
    if 'side' in events_:
        out['ret'] *= events_['side']
    out['bin'] = np.sign(out['ret'])
    if 'side' in events_:
        out.loc[out['ret']<=0, 'bin'] = 0
    return out
  • 3 4 out
  • getBins๋ฅผ ํ†ตํ•ด ์ตœ์ข…์ ์ธ Triple Barrier Method์˜ output์ด ์ „๋‹ฌ๋จ. ์ค‘๊ฐ„ ๊ณผ์ •์— add Vertical Barrier๋กœ ์ˆ˜์ง์„  ์ •๋ณด, add Horizontal Barrier๋กœ ์ˆ˜ํ‰์„  ์ •๋ณด๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ ์ด ๋‘˜์„ ๋ฐ˜์˜ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ getEvents์—์„œ ๊ฐ€๊ณตํ•˜์—ฌ getBins๋กœ ์ „๋‹ฌํ•จ.
  • getBins์˜ output์œผ๋กœ ์ถ”๊ฐ€์ ์ธ feature๋ฅผ ๊ฐ€๊ณตํ•˜๊ณ  ์ด๋ฅผ ML Model์— ํ•™์Šต์‹œํ‚ด.

getEvents

def getEvents(price, tEvents, ptSl, trgt, minRet, numThreads, tl=False, side=None):
    """
    refer to the explanation before and after snippet 3.3, with enhancements explained right before snippet 3.6
    :param price:
    :param tEvents:
    :param ptSl:
    :param trgt: dailyVol(horizontal bar unit width), index = price.index which contains tEvents
    :param minRet: at last dailyVol must be larger than this scalar
    :param numThreads:
    :param tl: index = tEvents
    :param side: side
    :return: index=subset of trgt.index(tEvents), columns=['tl','trgt', 'side'(if side is not None)] where 'tl' is not vertical bar timestamp but first-time for barrier touch
    """

    trgt = trgt.loc[trgt.index.isin(tEvents)]
    trgt = trgt[trgt > minRet]

    if tl is False:
        tl = pd.Series(pd.NaT, index=tEvents)

    if side is None:
        side_, ptSl_ = pd.Series(1., index=trgt.index), [ptSl[0],ptSl[0]]
    else:
        side_, ptSl_ = side.loc[trgt.index], ptSl[:]

    events = pd.concat({'tl': tl, 'trgt': trgt, 'side': side_}, axis=1).dropna(subset=['trgt'])
    df0 = mpPandasObj(func=applyPtSlonTl, pdObj=('molecule', events.index), numThreads=numThreads, price=price, events=events, ptSl=ptSl_)
    events['tl'] = df0.dropna(how='all').min(axis=1)

    if side is None:
        events = events.drop('side', axis=1)

    return events
  • 3 4 getEvents
  • ์ตœ์ข… getBins์—์„œ ๋‚˜์˜จ ๊ฒƒ๊ณผ bin column์˜ ์ž„๊ณ„์ ์„ ํ™œ์šฉํ•œ labeling ์ฐจ์ด๋งŒ ์กด์žฌํ•œ๋‹ค. getEvents๋Š” Vertical/Horizontal Barrier๋ฅผ ๋ฐ˜์˜ํ•œ ์ตœ์ข… ์ •๋ณด๋ฅผ ์‚ฐ์ถœํ•œ๋‹ค.
add Vertical Barrier
def addVerticalBarrier(tEvents, price, numDays=None):
    """
    refer to snippet 3.4, in book, it resorts to pd.Timedelta(days=Numdays), but here we assume general-bars, which are not necessarily time-bars, so I set h=10
    :param tEvents: getTEvents(returns(volume_df.price), threshold) = pd.DateTimeIndex(sampled with cumsum)
    :param price: whole data before sampling, volume_df.price with index=np.datetime64
    :param horizon: vertical barrier size, initialization=10
    :return: Series with values of tl timestamp and index of tEvents timestamp
    """
    tl = price.index.searchsorted(tEvents) + 10 if numDays is None else price.index.searchsorted(tEvents+pd.Timedelta(days=numDays))
    tl = tl[tl < len(price)]
    return pd.Series(price.index[tl], index=tEvents[:len(tl)])
  • 3 4 addVertical
  • getTEvents์˜ T Index๋งŒ ์žˆ๋Š” ๊ฐ’์—์„œ ์•„๋ฌด๋ฆฌ ์†์ต์ ˆ(ptsl:profit taking / stop loss)์ด ์•ˆ ๊ฑธ๋ฆฌ๋”๋ผ๋„ ์ตœ๋Œ€๋กœ Time cut์œผ๋กœ ๋Š์–ด์ฃผ์–ด ์‹œ์ž‘์ (index)์™€ ๋งˆ์ง€๋ง‰์ (vertical)์ด ์กด์žฌํ•˜๋Š” output์ด ๋‚˜์˜จ๋‹ค.
add Horizontal Barrier (add ptsl line)
def applyPtSlonTl(price, events, ptSl, molecule):
    """
    refer to the explanation before snippet 3.2
    :param price: volume_df.price
    :param events: index=tEvents, columns=['tl','target'(dailyvol), 'side']
    :param ptSl: horizontal bar width ratio
    :param molecule: a list with the subset of event indices that will be processed by a single thread
    :return: index=tEvents, columns=['pt', 'sl'], first time touching each barrier
    """
    events_ = events.loc[molecule]
    out = events_['tl'](/jaeaehkim/trading_system_beta/wiki/'tl').copy(deep=True)
    if ptSl[0] > 0:
        pt = ptSl[0]*events_['trgt']
    else:
        pt = pd.Series(index=events.index) # NaNs
    if ptSl[1] > 0:
        sl = -ptSl[1]*events_['trgt']
    else:
        sl = pd.Series(index=events.index)

    for loc, tl in events_['tl'].fillna(price.index[-1]).iteritems():
        df0 = price[loc:tl]
        df0 = (df0 / price[loc] - 1) * events_.at[loc,'side']
        out.loc[loc, 'sl'] = df0[df0 < sl[loc]].index.min()
        out.loc[loc, 'pt'] = df0[df0 > pt[loc]].index.min()

    return out
  • 3 4 addPtsl
  • vertical์€ ์ผ๋‹จ ์ƒ๊ฐํ•˜์ง€ ์•Š๊ณ  ์ต์ ˆ(pt),์†์ ˆ(sl)๋ผ์ธ์„ ์ตœ๊ทผ ๋ณ€๋™์„ฑ์„ ํ™œ์šฉํ•˜์—ฌ ๋™์ ์œผ๋กœ ์žก์•„์ฃผ๋Š” ์—ญํ• ์„ ํ•˜๊ณ  ์ฒซ ๋ฒˆ์งธ ๋„๋‹ฌํ–ˆ๋˜ ์‹œ์ ์„ ๊ธฐ๋กํ•œ๋‹ค.

Meta-Labeling

  • Triple-Barrier๋ฅผ ํ™œ์šฉํ•ด์„œ Labeling์„ ํ•˜๊ณ  ์ด๋ฅผ ML Model์— ์ ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ML Model์˜ Inference๋กœ side(Long/Short), size(probability)๋ฅผ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ๊ทธ๋Œ€๋กœ ํ™œ์šฉํ•ด๋„ ๋œ๋‹ค. Meta-Labeling์˜ ๋ชฉ์ ์€ ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ์ฒ˜๋ฆฌ๋  ์ˆ˜ ์žˆ๋Š” ํ”„๋กœ์„ธ์Šค๋ฅผ Decomposition ํ•ด์„œ ์„ค๊ณ„ํ•˜์—ฌ Model์˜ Metric ์ค‘์—์„œ F1-Score๋ฅผ ๊ทน๋Œ€ํ™” ์‹œํ‚ด์— ์žˆ๋‹ค. ์ด๊ฒƒ์€ side,size๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ชจ๋ธ ์•ˆ์—์„œ์˜ decomposition์„ ์ €์„œ์—์„œ ์–˜๊ธฐํ•˜๊ณ  ์žˆ์ง€๋งŒ ํˆฌ์žํ•˜๋Š” ํ”„๋กœ์„ธ์Šค์˜ ๋ชจ๋“  ๋ถ€๋ถ„์—์„œ Meta-Labelingํ™” ์‹œํ‚ฌ ์ˆ˜ ์žˆ๊ณ  decomposition๋œ ๋ชจ๋“  ๋ถ€๋ถ„์ด ML Model์ผ ํ•„์š”๋„ ์—†๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. (Quantametal Way)
  • Meta-Labeling : Detailed decomposition of model
    1. label์„ 1์ฐจ์ ์œผ๋กœ min return ๊ฐ’์„ ์ด์šฉํ•ด 1,0์œผ๋กœ ๋ถ™์—ฌ์„œ ๋ฒ ํŒ… ์—ฌ๋ถ€๋ฅผ ๊ฒฐ์ •
    2. pt,sl ์ˆ˜์น˜
  • Question
    1. model์„ decomposingํ•ด์„œ ์šด์˜ํ•˜๋Š”๊ฒŒ ๋ฌด์กฐ๊ฑด ์ข‹์„๊นŒ?
    • ๋ชจ๋ธ ๊ด€๋ฆฌ ๋น„์šฉ์ด ๋งŽ์ด ๋“ค์–ด๊ฐ.
      • ํ•˜๋‚˜์˜ model ๋งˆ๋‹ค ์–ด๋–ป๊ฒŒ labeling ํ•  ๊ฒƒ์ธ์ง€, feature๋“ค์„ ์–ด๋–ป๊ฒŒ ๊ด€๋ฆฌํ•  ๊ฒƒ์ธ์ง€ etc
    1. single-labeling์— ์ด๋ฏธ ๋‹ค ํฌํ•จ๋œ ๊ฒƒ์„ ์ž˜๊ฒŒ ๋‚˜๋ˆ„์–ด์„œ ์„ธ๋ถ€์ ์œผ๋กœ ํ•˜๋Š” ๊ฒƒ ๋ฟ์ด๊ธฐ ๋•Œ๋ฌธ์— main model์„ triple-barrier method๋ฅผ ํ™œ์šฉํ•ด์„œ train ํ•œ ์ดํ›„์— ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„์€ rule-based๋กœ ์ฒ˜๋ฆฌํ•˜๋Š”๊ฒŒ ๋‚˜์„ ๊ฒƒ์ด๋ผ๋Š”๊ฒŒ ๊ฐœ์ธ์ ์ธ ์ƒ๊ฐ.
    • ๋‹ค์–‘ํ•œ 1-config = 1-strategy ๋ฅผ ์ฐพ์•„๋‚ด์–ด ensemble ์‹œํ‚ค๋Š” ๋ฐฉ์‹
      • rule์—์„œ ํ•„์š”ํ•œ hyper parameter ๊ฐ’๋“ค์„ config์— ํฌํ•จ์‹œํ‚ค๋Š” ๋ฐฉ์‹์œผ๋กœ ์„ค๊ณ„ํ•˜์—ฌ ์ตœ๋Œ€ํ•œ model search๋ฅผ ํ•  ๋•Œ ๋„“์€ ์˜์—ญ์„ ํƒ๊ตฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง„ํ–‰
        • pt,sl ๋น„๋Œ€์นญ
        • betting model : single prob betting, average prob betting etc
      • ensemble ์‹œํ‚ฌ ๋•Œ ๊ฑฐ๋ž˜๋น„์šฉ/ํšŒ์ „์œจ์„ ๊ทน๋Œ€ํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์‹์œผ๋กœ ์„ค๊ณ„ํ•ด์•ผ ํ•จ.
        • ๋‹จ์ผ ์ „๋žต ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ / ๋ฉ”ํƒ€ ์ „๋žต ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ -> Point in Time ๋ฐฉ์‹
    1. Model์˜ Metric๊ณผ Backtest์˜ Metric์„ ๊ตฌ๋ณ„ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์–ด๋А์ •๋„ Meta-Labeling ๊ธฐ๋ฒ•์ด ์Šค๋ฉฐ๋“ค์–ด ๊ฐˆ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐ. Meta-Labeling์€ ๋น„์šฉ๊ณผ ํšจ์šฉ๊ฐ„์˜ Trade-off ๊ด€๊ณ„์— ์žˆ์œผ๋ฏ€๋กœ ์ ๋‹นํ•œ ์„ ์—์„œ Decision์„ ๋‚ด๋ ค์ค˜์•ผ ํ•จ.

Qunatamental Way

  • Meta-Labeling์„ ํ™•์žฅ์‹œํ‚จ ๊ฐœ๋…์ด๋‹ค. ๊ฒฐ๊ตญ์—” Meta-Labeling์€ ์ •๋Ÿ‰ํ™”๋œ ๊ฒƒ ์•ˆ์—์„œ ๋” ์„ธ๋ถ€์ ์œผ๋กœ ์ชผ๊ฐœ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ์„ ์–˜๊ธฐํ•œ๋‹ค๋ฉด, Quantamental Way๋Š” ๋น„์ •๋Ÿ‰ํ™”๋œ ๋ชจ๋“  ๊ฒฝํ—˜์„ ์ •๋Ÿ‰ํ™” ํ•˜๋Š” ๊ณผ์ •์„ ์–˜๊ธฐํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ •๋Ÿ‰ํ™”๋Š” ํ—ค์ง€ํŽ€๋“œ๊ฐ€ ์ผํ•˜๋Š” ํ”„๋กœ์„ธ์Šค์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ชจ๋“  ์˜์‚ฌ๊ฒฐ์ •์— ์ ์šฉ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ ํšŒ์‚ฌ ๋‚ด๋ถ€์—์„œ ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๋ถ€๋ถ„์ด ์ •๋Ÿ‰ํ™” ๋˜์—ˆ๋Š”์ง€๊ฐ€ ๋งค์šฐ ์ค‘์š”ํ•œ ์‹œ๋Œ€๊ฐ€ ์˜ฌ ๊ฒƒ์ด๋‹ค.

Application to Quant System

  • Lagbeling์„ ์ฒซ barrier๊ฐ€ touch ํ•˜๋Š” ๋ถ€๋ถ„์œผ๋กœ๋งŒ ํ•  ํ•„์š”๊ฐ€ ์žˆ์„๊นŒ?
    • barrier touchํ•˜๋Š” ๊ฒƒ๋„ ๋‹ค์–‘ํ•˜๊ฒŒ ํ•œ๋‹ค๋ฉด ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด์งˆ ๊ฒƒ์œผ๋กœ ๋ณด์ž„.