Statistical Nodes API - Point72/csp GitHub Wiki

This page contains the documentation for the csp.stats  library. The stats  library contains functions to calculate statistics on time series data over rolling windows.

Table of Contents

Base Statistics:

  • count: counts the number of data ticks within a given interval
  • unique: counts the number of unique values within a given interval
  • sum: rolling sum of values within a given interval
  • prod: rolling product of values within a given interval
  • first: the earliest value still within the interval
  • last: the last value of the interval
  • mean: the mean of values within the interval
  • gmean: the geometric mean of values within the interval

Order Statistics:

  • max: the maximum value within the interval
  • min: the minimum value within the interval
  • median: the median value within the interval
  • quantile: the quantile value within the interval
  • argmin: the time at which the minimum interval value ticked
  • argmax: the time at which the maximum interval value ticked
  • rank: the time series rank of the most recent tick in the interval

Moment-Based Statistics:

  • var: variance of the time series within the interval
  • stddev: standard deviation within the interval
  • sem: standard error within the interval
  • cov: covariance between two in-sequence time series within the interval
  • corr: correlation between two in-sequence time series within the interval
  • skew: skewness of the time series within the interval
  • kurt: kurtosis (or excess kurtosis) of the time series within the interval

Exponential Moving Statistics:

  • ema: exponential moving average, with numerous different variations available
  • ema_var: exponential moving variance
  • ema_std: exponential moving standard deviation
  • ema_cov: exponential moving covariance between two in-sequence time series

NumPy Specific Statistics:

  • cov_matrix: covariance matrix between N time-series (in a NumPy array) over a rolling time interval
  • corr_matrix: normalized correlation matrix between N time-series (in a NumPy array) a rolling time interval
  • list_to_numpy: converts a listbasket of time-series into a NumPy array
  • numpy_to_list: converts a NumPy array time-series into a listbasket

Cross-Sectional Statistics:

  • cross_sectional: receive all data within the current window for a cross-sectional calculation

Base Statistics

Count

count(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data.
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, ignores NaN values in the window (does not count them). If false, NaN values make the count NaN.
    • By default, ignore_na is True
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
    • By default, there is no reset series.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_points, NaN is returned.

Returns:

  • A time-series of how many data points are currently in the interval. If a tick count is used, then it is necessarily less than or equal to the interval.

Examples: count

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
count(x, interval=3)
# NaN is not counted
{'2020-01-03': 3, '2020-01-04': 2, '2020-01-05': 2}

2. Including NaN

count(x, interval=3, ignore_na=False)
{'2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}

3. Triggering

trigger = {'2020-01-03': True, '2020-01-05': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), ignore_na=True, trigger=trigger)
{'2020-01-03': 3, '2020-01-05': 2}

4. Sampling

sampler = {'2020-01-01': True, '2020-01-02': True, '2020-01-03': True, '2020-01-05': True, '2020-01-06': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), sampler=sampler)
{'2020-01-03': 3, '2020-01-05': 2}

Note: the x value at 2020-01-04 is ignored completely since sampler does not tick, while the value at 2020-01-06 is treated as NaN.

5. Reset

reset = {'2020-01-04': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), reset=reset)
{'2020-01-03': 3, '2020-01-04': 0, '2020-01-05': 1}

Note: the window data is reset at 2020-01-04, and its value is NaN, so the count is 0

6. NumPy

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,3]}
count(x_np, interval=3, min_window=1)
{'2020-01-01': [1,1], '2020-01-02': [2,1], '2020-01-03': [3,2]} # count is per element

Unique

unique(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0,
    precision: int = 10
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • trigger: another optional time-series which can be use to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
    • By default, there is no reset series.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
  • precision: the decimal place precision at which two floats are considered non-unique. For example, if precision=2, then 2.001 and 2.002 would be considered non-unique.
    • By default, precision is set to 10 decimal places.

Returns:

  • a time-series of how many unique (excluding nan) values are currently in the interval

Examples: unique

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 2, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 3}
unique(x, interval=3, min_window=2)
{'2020-01-02': 1, '2020-01-03': 2, '2020-01-04': 2, '2020-01-05': 1}

2. Triggering

trigger = {'2020-01-03': True, '2020-01-05': True}
unique(x, interval=timedelta(days=3), min_window=timedelta(days=2), trigger=trigger)
{'2020-01-03': 2, '2020-01-05': 1}

3. NumPy

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,1]}
unique(x_np, interval=3, min_window=1)
{'2020-01-01': [1,1], 2020-01-02: [2,1], '2020-01-03': [3,1]} # unique is per element

Sum

sum(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    precise: bool = False,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data. Can either be a ts[Union[float, np.ndarray]] or ts[np.ndarray].
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • precise: if True we use a more numerically stable implementation (Kahan) which is less efficient
  • ignore_na: if True, does not include any nan values in the window. If false, nan values are included and will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted sum (optional).
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset": another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling sums over the interval

Examples: sum

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
sum(x, interval=3)
{'2020-01-03': 6, '2020-01-04: 5', '2020-01-05': 8}

2. Including NaNs

sum(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 3, '2020-01-03': 6, '2020-01-04': nan, '2020-01-05': nan}

3. Weighted single input

weights = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-04': 3}
sum(x_np, interval=3, weights=weights)
{'2020-01-03': 11, '2020-01-04': 10, '2020-01-05': 21} # 21 = 5x3 + 3x2

4. NumPy

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,1]}
sum(x_np, interval=3, min_window=1)
{'2020-01-01': [1,1], '2020-01-02': [3,1], '2020-01-03': [4,2]}

5. NumPy weighted sum

np_weights = {'2020-01-01': [1,2], '2020-01-02': [2,1}
sum(x_np, interval=3, min_window=1, weights=np_weights)
{'2020-01-01': [1,2], '2020-01-02': [5,2], '2020-01-03': [11,3]} # weights applied elementwise

Product

prod(
    x: ts[Union[float, np.ndarray]],
    interval : Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, does not include any nan values in the window. If false, nan values are included and will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling products over the interval. The computation is unstable for large products and windows.

Examples: prod

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
prod(x, interval=3, min_window=2, ignore_na=True)
{'2020-01-02': 2, '2020-01-03': 6 '2020-01-04': 6, '2020-01-05': 15}

2. NumPy

x_np = {'2020-01-01': [1,2], '2020-01-02': [3,4], '2020-01-03': [5,6]}
prod(x_np, 3, 2)
{'2020-01-02': [3,8], '2020-01-03': [15,24]}

First

first(
    x: ts[Union[float, np.ndarray]],
    interval : Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0,
    ignore_na: bool = True
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
  • ignore_na: if True, will return the first non-nan value in the window. If False, will return the first value in the window

Returns:

  • a time-series of the earliest (non-nan) value still within the given interval

Examples: first

See last

Last

last(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, will return the last non-nan value in the window. If False, will return the last value in the window
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of the most recent value within the given interval

Examples: first and last

Starttime: 2020-01-01 00:00:00

1. Default - first

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
first(x, interval=3)
{'2020-01-03': 1, '2020-01-04': 2, '2020-01-05': 3}

2. Including NaN - last

last(x, interval=3, ignore_na=False)
{'2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}

3. Triggering - last

trigger = {'2020-01-03': True, '2020-01-04': True}
last(x, interval=timedelta(days=3), ignore_na=True, trigger=trigger)
{'2020-01-03': 3, '2020-01-04': 3}

4. NumPy - first

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,3]}
first(x_np, interval=2)
# first non-nan value
{'2020-01-02': [1,1], '2020-01-03': [2,3]}

Mean

mean(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
  • x: the time-series data. Can either be a ts[Union[float, np.ndarray]] or a ts[np.ndarray].
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted mean (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling means over the interval. Computation uses smart updating so overflow is not an issue, since no sums are kept

Examples: mean

See gmean

Geometric Mean

gmean(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
)→ ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling geometric means over the interval. Requires a strictly positive-valued input.

Examples: mean and gmean

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
mean(x, interval=3, min_window=2)
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': 2.5, '2020-01-05': 4.0}

2. Including NaN

mean(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': nan, '2020-01-05': nan}

3. Geometric mean

trigger = {'2020-01-03': True, '2020-01-05': True}
gmean(x, interval=timedelta(days=3), min_window=timedelta(days=2), ignore_na=True, trigger=trigger)
{'2020-01-03': 1.817, '2020-01-05': 3.873}

4. Weighted mean

weights = {'2020-01-01': 1, '2020-01-03': 2}
mean(x, interval=3, min_window=2, ignore_na=True, weights=weights)
{'2020-01-02': 1.5, '2020-01-03': 2.25, '2020-01-04': 2.667, '2020-01-05': 4.0}

Note: the first two observations get relative weight of 1, then the last three get relative weight of 2

5. NumPy weighted mean

x_np = {'2020-01-01': [1., 1., 1.], '2020-01-02': [2., 2., 2.], '2020-01-03': [3., 3., 3.]}
np_weights = {'2020-01-01': [1., 1., 1.], '2020-01-02': [2., 1., 2.], '2020-01-03': [3., 1., 3.]}
mean(x_np, 3, 2)
{'2020-01-02': [1.667, 1.5, 1.667], '2020-01-03': [2.667, 2.0, 2.6667]}

Order Statistics

Maximum

max(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling maximums over the interval.

Examples: max

See min

Minimum

min(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling minimums over the interval.

Examples: max and min

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
min(x, interval=3, min_window=2)
{'2020-01-02': 1, '2020-01-03': 1, '2020-01-04': 2, '2020-01-05': 3}

2. Including NaN

max(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}

3. NumPy example

x_np= {'2020-01-01': [2,3], '2020-01-02': [6,1], '2020-01-03': [1,9]}
min(x, interval=timedelta(days=3), min_window=timedelta(days=1))
{'2020-01-02': [2,1], '2020-01-03': [1,1]}

Median

median(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling medians over the interval. Uses midpoint interpolation if there are an even number of samples.

Examples: median

See quantile

Quantile

quantile(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    quant: Union[float, List[float]] = None,
    min_window: Union[timedelta, int] = None,
    interpolate: str = "linear",
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
    ) → Union[ts[Union[float, np.ndarray]], [ts[Union[float, np.ndarray]]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • quant: the quantile to calculate, which must be between 0 and 1
    • If provided a list, then all quantiles will be calculated for the list.
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks.
  • interpolate: the interpolation method to use when the quantile does not correspond to an individual value. Must be one of the following options:
    • "linear": interpolates linearly between the two closest values. For example, the 0.333 quantile of (1,2) with linear interpolation is 1.333.
    • "lower": returns the lower of the two closest values.
    • "higher": returns the higher of the two closest values.
    • "midpoint": returns the midpoint between the two closest values. For example, the 0.333 quantile of (1,2) with midpoint interpolation is 1.5.
    • "nearest": returns the value at the nearest position.  For example, the 0.333 quantile of (1,2) with nearest interpolation is 1. In cases of ties, the higher value is returned.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series or list-basket of time-series of rolling quantiles over the interval.
    • If the quant parameter is a list then a list-basket will be returned.
    • If it is a float then a time-series will be returned.
    • The order of quantiles in the list-basket is equal to the order of the input.

Examples: median and quantile

Starttime: 2020-01-01 00:00:00

1. Median

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
median(x, interval=3, min_window=2)
{'2020-01-02': 1.5, '2020-01-03': 2, '2020-01-04': 2.5, '2020-01-05': 4}

2. Quantile with multiple values

quantile(x, interval=3, quant=[0.25, 0.5, 0.75], min_window=2, ignore_na=False)
[
    {'2020-01-02': 1.25, '2020-01-03': 1.5, '2020-01-04': nan, '2020-01-05': nan},
    {'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': nan, '2020-01-05': nan},
    {'2020-01-02': 1.75, '2020-01-03': 2.5, '2020-01-04': nan, '2020-01-05': nan}
]

3. Quantile with trigger

trigger = {'2020-01-03': True, '2020-01-05': True}
quantile(x, interval=timedelta(days=3), quant=0.333, min_window=timedelta(days=2), interpolate="midpoint", ignore_na=True, trigger=trigger)
{'2020-01-03': 1.5, '2020-01-05': 4}

4. NumPy array with multiple quantiles

x_np = {'2020-01-01': [1,2,3], '2020-01-02': [2,3,4], '2020-01-03': [3,4,5]}
quantile(x_np, interval=3, quant=[0.25,0.5,0.75], min_window=1)
# this is a listbasket of NumPy array time series
[
    {'2020-01-01': [1,2,3], '2020-01-02': [1.25, 2.25, 3.25], '2020-01-03': [1.5, 2.5, 3.5]},
    {'2020-01-01': [1,2,3], '2020-01-02': [1.5, 2.5, 3.5], '2020-01-03': [2., 3., 4.]},
    {'2020-01-01': [1,2,3], '2020-01-02': [1.75, 2.75, 3.75], '2020-01-03': [2.5, 3.5, 4.5]}
]

Argmin

argmin(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    return_most_recent: bool = True,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[datetime, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • return_most_recent: if True, in the case of a tie, the most recent time will be returned. If false, the least recent time will be returned.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid.

Returns:

  • a time-series of rolling argmin values over the interval, returned as a datetime or NumPy array of np.datetime64 objects. If no data is present or NaN invalidation occurs, the default time '1970-1-1 00:00:00' is returned.

Examples: argmin

See argmax

Argmax

argmax(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    return_most_recent: bool = True,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[datetime, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • return_most_recent: if True, in the case of a tie, the most recent time will be returned. If false, the least recent time will be returned.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling argmax values over the interval, returned as a datetime or NumPy array of np.datetime64 objects.  If no data is present or NaN invalidation occurs, the default time '1970-1-1 00:00:00' is returned.

Examples: argmax and argmin`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 1, '2020-01-04': nan, '2020-01-05': 4}
argmax(x, 3)
{'2020-01-03': '2020-01-02', '2020-01-04': '2020-01-02', '2020-01-05': '2020-01-05'}
argmin(x, 3)
{'2020-01-03': '2020-01-03', '2020-01-04': '2020-01-03', '2020-01-05': '2020-01-03'}

2. NumPy example

x_np = {'2020-01-01': [1,2], '2020-01-02': [2,1], '2020-01-03': [3,0]}
argmax(x_np, 3, 2)
{'2020-01-02': ['2020-01-02', '2020-01-01'], '2020-01-03': ['2020-01-03', '2020-01-01']}
argmin(x_np, 3, 1)
{'2020-01-02': ['2020-01-01', '2020-01-02'], '2020-01-03': ['2020-01-01', '2020-01-03']}

3. return_most_recent=False

argmin(x, 3, return_most_recent=False)
{'2020-01-03': '2020-01-01', '2020-01-04': '2020-01-03', 2020-01-05: '2020-01-03'} # Note how the first element is '2020-01-01', not '2020-01-03'

Rank

rank(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    method: str = "min",
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0,
    na_option: str = "keep"
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use 100-tick rolling interval with no output until we have 50 ticks
  • method:  the method to use to rank groups of records that have the same value
    • "min": the lowest rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=1
    • "max": the highest rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=3
    • "avg": the average rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=2
    • By default, the "min" method is used.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, nan is returned.
  • na_option: how to rank a nan value when it is the last value to be ranked
    • "keep": return a nan rank for a nan value
    • "last": rank the last non-nan value present in the interval
    • By default, the "keep" option is used.
  • Output: a time-series of rolling ranks over the interval, where a rank of 0 means that the current (last) ticked value is the smallest in the given interval.

Examples: rank

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 2, '2020-01-04': 5, '2020-01-05': 4}
rank(x, 5, min_window=3)
{'2020-01-03': 1, '2020-01-04': 3, '2020-01-05': 3}

2. NumPy example

x_np = {'2020-01-01': [1,2], '2020-01-02': [3,2], '2020-01-03': [2,1]}
rank(x_np, 3, 2)
# Note how the second element at '2020-01-02' is 0, not 1, as by default the "min" method is used
{'2020-01-02': [1, 0], '2020-01-03': [1, 0]}

3. "keep" vs "last" NaN option

x = {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 2, '2020-01-04': nan, '2020-01-05': 4}
rank(x, 5, min_window=3, na_option="keep")
{'2020-01-03': 1, '2020-01-04': nan, '2020-01-05': 3}
rank(x, 5, min_window=3, na_option="last")
# the last valid value, 1, is ranked at '2020-01-04'
{'2020-01-03': 1, '2020-01-04': 1, '2020-01-05': 3}

Moment-Based Statistics

Variance

var(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom. Example: if ddof=1, then normalization term is 1/(N-1). If ddof=0, then 1/N.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted variance (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling variance over the interval. If insufficient samples for given ddof, then no value output is generated. Since the smart mean is being used, overflow is not a problem.

Examples: var

See Standard Error.

Standard Deviation

stddev(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted standard deviation (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling standard deviations over the interval. If insufficient samples for given ddof, then no value output is generated.

Examples: stddev

See Standard Error.

Standard Error

sem(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted standard error (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling standard errors

Examples: Variance, Standard Deviation, Standard Error

Starttime: 2020-01-01 00:00:00

1. Variance

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
var(x, interval=3, min_window=2)
{'2020-01-02': 0.5, '2020-01-03': 1.0, '2020-01-04': 0.5, '2020-01-05': 2.0}

2. Biased variance

var(x, interval=3, min_window=2, ddof=0, ignore_na=True) # biased
{'2020-01-02': 0.25, '2020-01-03': 0.666, '2020-01-04': 0.25, '2020-01-05': 1.0}

3. Standard deviation including NaNs

stddev(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 0.707, '2020-01-03': 1.0, '2020-01-04': nan, '2020-01-05': nan}

4. Standard error with triggering

trigger = {'2020-01-03': True, '2020-01-05': True}
sem(x, interval=timedelta(days=3), min_window=timedelta(days=2), trigger=trigger)
{'2020-01-03': 0.707, '2020-01-05': 1.0}

Covariance

cov(
    x: ts[Union[float, np.ndarray]],
    y: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: time-series data. If x is of type np.ndarray, then the covariance calculation is performed element-wise with the corresponding values in y.
  • y: time-series data that ticks in sequence with x
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted covariance (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling covariances between x and y

Examples: cov

See Correlation.

Correlation

corr(
    x: ts[Union[float, np.ndarray]],
    y: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: time-series data. If x is of type np.ndarray, then the correlation calculation is performed element-wise with the corresponding values in y.
  • y: time-series data that ticks in sequence with x
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted correlation (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling Pearson correlation coefficients between x and y

Examples: Covariance and Correlation

Starttime: 2020-01-01 00:00:00

1. Covariance

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': 4, '2020-01-05': 5}
y = {'2020-01-01': 5, '2020-01-02': 4, '2020-01-03': 3, '2020-01-04': 2, '2020-01-05': 1}
cov(x, y, interval=3, min_window=2)
{'2020-01-02': -0.5, '2020-01-03': -1.0, '2020-01-04': -1.0, '2020-01-05': -1.0}

2. Correlation

corr(x, y, interval=3)
{'2020-01-03': -1.0, '2020-01-04': -1.0, '2020-01-05': -1.0}

Skewness

skew(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    bias: bool = False,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • bias: if True, calculates a biased (unadjusted) skew. If false (default), calculates a Gaussian-unbiased measure.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted skew (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling sample skew measures, using the adjusted Fisher–Pearson standardized moment coefficient.

Examples: skew

See Kurtosis.

Kurtosis

kurt(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    excess: bool = True,
    biasbool = False,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • excess: if True (default) uses the definition of excess kurtosis (kurt - 3). If false, uses the standard definition.
  • bias: if True, calculates a biased (unadjusted) kurtosis. If false (default), calculates a Gaussian-unbiased measure.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted kurtosis (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling sample kurtosis measures, using the adjusted Fisher–Pearson standardized moment coefficient.

Examples: skew and kurt

Starttime: 2020-01-01 00:00:00

1. Skew

x = {'2020-01-01': 1, '2020-01-02': 2, ..., 2020-01-10: 10}
skew(x, interval=7)
{2020-01-07: 0, 2020-01-08: 0, 2020-01-09: 0, 2020-01-10: 0}

2. Kurtosis

kurt(x, interval=7) # excess kurtosis
{2020-01-07: -1.2, 2020-01-08: -1.2, 2020-01-09: -1.2, 2020-01-10: -1.2}

Exponential Moving Statistics

Exponential Moving Average

ema(
    x: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[timedelta] = None,
    adjust: bool = True,
    horizon: int = None,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data

  • min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.

  • alpha: the EMA weight parameter specified directly. If adjust = True, EMA is calculated such that

    $$EMA(t) = \frac{\sum\limits_{t=-n}^{0} (1-\alpha)^{-t} x(-t)}{\sum\limits_{t=-n}^{0} (1-\alpha)^{-t}}$$

    If adjust = False, EMA is calculated such that

    $$EMA(t) = (1-\alpha)EMA(t-1) + \alpha x(t)$$ $$EMA(t=0) = x(0)$$

    By default, adjust = True, to give better estimates for starting intervals.

    The following are alternative methods to specify the $\alpha$ parameter.

    • span: specify alpha in terms of span, such that

      $$\alpha = \frac{2}{span+1}$$

    • com: specify alpha in terms of centre of mass, such that

      $$\alpha = \frac{1}{1+com}$$

    • halflife: Halflife is different from the other parameters. Half-life is a timedelta argument that specifies the half-life of observation weights. Half-life is useful when observations are irregularly spaced and a better estimate is needed to properly weight more recent data. Let $t_{-1}$ be the time of the last observation.

      Then:

      $$\lambda(t)  = 1 - \exp(\frac{-(t-t_{-1})*\ln(2)}{halflife})$$ $$EMA(t) = \frac{ \lambda(t)*EMA(t-1) + x(t)}{\text{normalization constant}}$$

      Something to note is that the ignore_na flag does not matter if a halflife interval is specified. The behavior would be the same in both cases, since an absolute time interval is being used to re-weight the moving average, not a tick interval.

      Exactly one of alpha, span, com, halflife must be given

  • adjust: if True, early observations are adjusted to give a more "smoothed" estimate of the EMA. The difference is that if adjust=True, then each new observation receives a relative weight of 1. If adjust = False, each new observation receives a relative weight of alpha.

    • adjust=True means that:

    $$EMA(t) = \frac{x(t)+(1-\alpha)x(t-1)+(1-\alpha)^2 x(t-2) + ... + (1-\alpha)^n x(t-n)}{1+(1-\alpha)+(1-\alpha)^ 2 + ... + (1-\alpha)^n}$$

    • adjust=False means that:

    $$EMA(t) = \frac{\alpha * x(t) + \alpha * (1-\alpha) * x(t-1) + \alpha * (1-\alpha)^2 * x(t-2) + ... + \boldsymbol{(1-\alpha)^n x(0)}}{\alpha+\alpha*(1-\alpha)+\alpha*(1-\alpha)^ 2 + ... + (1-\alpha)^n}$$

    $$\text{and thus } EMA(t=0) = x(0)$$

    Adjust only applies with tick specified intervals, not time specified intervals. Time specified intervals (i.e. half-life) do not need adjustment as they are, by definition, already adjusted.

  • horizon: the maximum number of ticks to use in the computation. For example, if horizon = 10, then only the 10 most recent data points are used. If not specified, all data points for x are used, with early ticks decaying exponentially in weighting. Horizon will be ignored with a half-life (time-based) interval.

    • If horizon is set to h, then even if x has more than h ticks the EMA will computed as such if adjust=True.

    $$EMA(t) = \frac{\sum_{t=-h}^{0} (1-\alpha)^{-t} x(t)}{\sum_{t=-h}^{0} (1-\alpha)^{-t}}$$

    • The only difference if adjust=False is that the first ever tick, while in the window, receives weight 1 at the start instead of weight  $\alpha$ like the rest of the values.
  • ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position, and renormalized as such.

    • For example, let us consider a dataset (1,nan,2) using adjust=True.
      • If ignore_na=True then the weighting is based on relative position as such: $$EMA(t=2) = \frac{(1-\alpha)*1 + 2}{(1-\alpha)+1}$$
      • If ignore_na=False then the weighting is based on global position as such: $$EMA(t=2) = \frac{(1-\alpha)^2*1 + 2}{(1-\alpha)^2+1}$$
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned

    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:

    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation.

  • recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.

    • Note: only valid when a finite-horizon EMA is used.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of exponentially-weighted moving averages over the interval.

Examples: ema

Starttime: 2020-01-01 00:00:00

1. Unadjusted EMA

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': 4, '2020-01-05': 5}
ema(x, alpha=0.1, adjust=False) # unadjusted
{'2020-01-01': 1.0, '2020-01-02': 1.1, '2020-01-03': 1.29, '2020-01-04': 1.561, '2020-01-05': 1.9049}

2. Adjusted EMA

ema(x, alpha=0.1, adjust=True)  # adjusted, default method
{'2020-01-01': 1.0, '2020-01-02': 1.5263, '2020-01-03': 2.0701, '2020-01-04': 2.6313, '2020-01-05': 3.20971}

3. Finite horizon EMA

ema(x, alpha=0.1, adjust=True, horizon=2) # finite horizon
{'2020-01-01': 1.0, '2020-01-02': 1.5263, '2020-01-03': 2.5263, '2020-01-04': 3.5263, '2020-01-05': 4.5263}

4. Time-based decay EMA

ema(x, halflife=timedelta(days=1)) # time-based
{'2020-01-01': 1.0, '2020-01-02': 1.6666, '2020-01-03': 2.4286, '2020-01-04': 3.2666, '2020-01-05': 4.1613}

5. Unadjusted EMA for NumPy array

x_np = {'2020-01-01': [1,2], '2020-01-02': [4,5], '2020-01-03': [7,8]}
ema(x_np, alpha=0.1, adjust=False)
{'2020-01-01': [1,2], '2020-01-02': [1.3,2.3], '2020-01-03': [1.87,2.87] }

Exponential Moving Variance

ema_var(
    x: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[Union[float, timedelta]] = None,
    adjust: bool = True,
    horizon: int = None,
    bias: bool = False,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
  • alpha, span, com, halflife: as described in EMA
  • adjust: as specified in EMA
  • horizon: as specified in EMA.
  • bias: if True, uses a biased population weighted variance. If false, normalized by a proper debiasing factor.
  • ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
    • Note: only valid when a finite-horizon EMA is used.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of exponentially-weighted moving variances over the interval.

Examples: ema_var

See Exponential Moving Standard Deviation

Exponential Moving Standard Deviation

ema_std(
    x: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[Union[float, timedelta]] = None,
    adjust: bool = True,
    horizon: int = None,
    bias: bool = False,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0,
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
  • alpha, span, com, halflife: as described in EMA
  • adjust: as specified in EMA
  • horizon: as specified in EMA.
  • bias: if True, uses a biased population weighted variance. If false, normalized by debiasing factor
  • ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
    • Note: only valid when a finite-horizon EMA is used.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of exponentially-weighted moving standard deviations over the interval.

Examples: Exp. Moving Variance and Standard Deviation

Starttime: 2020-01-01 00:00:00

1. Exp. Moving Standard Deviation

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
ema_std(x, min_periods=2, span=20, adjust=False, bias=False, ignore_na=False)
{'2020-01-02': 0.707, '2020-01-03': 1.11636, '2020-01-04': 1.11636, '2020-01-05': 1.937005}

2. Exp. Moving Variance

ema_var(x, min_periods=2, span=20, adjust=False, bias=True, ignore_na=False)
{'2020-01-02': 0.086168, '2020-01-03': 0.390588 '2020-01-04': 0.390588, '2020-01-05': 1.644124}

Exponential Moving Covariance

ema_cov(
    x: ts[Union[float, np.ndarray]],
    y: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[Union[float, timedelta]] = None,
    adjust: bool = True,
    horizon: int = None,
    bias: bool = False,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: time-series data. If x is of type np.ndarray, the exponential-moving covariance is calculated element-wise with the corresponding values in y.
  • y: time-series data which ticks in-sequence with x
  • min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
  • alpha, span, com, halflife: as described in EMA
  • adjust: as specified in EMA
  • horizon: as specified in EMA.
  • bias: if True, uses a biased population weighted covariance. If false, normalized by debiasing factor
  • ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
    • Note: only valid when a finite-horizon EMA is used.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of exponentially-weighted moving covariance over the interval.

NumPy Specific Statistics

Covariance Matrix

cov_matrix(
    x: ts[np.ndarray],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[np.ndarray]

Args:

  • x: the time-series of dimension (N,) arrays which represent N variables
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted covariance matrix (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the statistic, and in doing so clears any accumulated floating-point error.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of (potentially weighted) covariance matrices, each of which is a NumpyNDArray of dimensionality (N,N)

Correlation Matrix

corr_matrix(
    x: ts[np.ndarray],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[np.ndarray]

Args:

  • x: the time-series of dimension (N,) arrays which represent N variables
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted correlation matrix (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the statistic, and in doing so clears any accumulated floating-point error.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of (potentially weighted) correlation matrices, each of which is a NumpyNDArray of dimensionality (N,N)

Examples: Covariance and Correlation Matrices

Starttime: 2020-01-01 00:00:00

1. Covariance

x = {'2020-01-01': np.array([0., 0., 0.]), '2020-01-02': np.array([1., -1., 2.]), '2020-01-03': np.array([2., -2., 4.])}
cov_matrix(x, 3, ddof=0)
{'2020-01-03': np.array([1, -1, 2],
                     [-1, 1, -2],
                      [2, -2, 4])}

2. Correlation

corr_matrix(x, 3)
{'2020-01-03': np.array([1, -1, 1],
                     [-1, 1, -1],
                      [1, -1, 1])}

NumPy Conversions

list_to_numpy(x: [ts[float]], fillna: bool = False) → ts[np.ndarray]

Args:

  • x: a listbasket of time series
  • fillna: If False, unticked elements are treated as NaN. If True, unticked elements will hold their previous value in the array.

Returns:

  • a NumPy 1D array where each value corresponds to the element of the listbasket with the same index
numpy_to_list(x: ts[np.ndarray], n: int) → [ts[float]]

Args:

  • x: a NumPy array valued time series
  • n: the number of output channels in the listbasket Returns:
  • a listbasket where each value corresponds to the element of the array with the same index

Examples: NumPy Conversions

Starttime: 2020-01-01 00:00:00

1. List to NumPy

x1 = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3}
x2 = {'2020-01-01': 1.5, '2020-01-03': 3.5}
list_to_numpy([x1,x2], fillna=False)
{'2020-01-01': [1, 1.5], '2020-01-02': [2, np.nan], '2020-01-03': [3, 3.5]} # no x2 tick on day 2
list_to_numpy([x1,x2], fillna=True)
{'2020-01-01': [1, 1.5], '2020-01-02': [2, 1.5], '2020-01-03': [3, 3.5]} # holds x2 value for day 2

2. NumPy to list

x_np = {'2020-01-01': [1,2], '2020-01-02': [3,4], '2020-01-03': [5,6]}
numpy_to_list(x_np, 2)
[
    {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 5},
    {'2020-01-01': 2, '2020-01-02': 4, '2020-01-03': 6}
]

Cross-Sectional Statistics

Cross Sectional

cross_sectional(
    x: ts[Union[float,np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    trigger: ts[object] = None,
    as_numpy: bool = False,
    sampler: ts[object] = None,
    reset: ts[object] = None
) → ts[Union[np.ndarray, List[float], List[np.ndarray]]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • as_numpy: if True, the data will be returned as a NumPy array instead of a list.
    • For a single-valued time series, this is a one-dimensional NumPy array
    • For a NumPy array time series, this is a NumPy array of one extra dimension
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation

Returns:

  • a time-series where each tick contains all the data of x currently within the interval. Use this for custom cross-sectional calculations

Examples: Cross-sectional calculations

Starttime: 2020-01-01 00:00:00

x = {'2020-01-01': 1, '2020-01-01': 2, '2020-01-01': 3, '2020-01-01': 4, '2020-01-01': 5}
cs = cross_sectional(x, interval=3, min_window=2)
cs
{'2020-01-02': [1,2], '2020-01-03': [1,2,3], '2020-01-04': [2,3,4], '2020-01-05': [3,4,5]}

Calculate a cross-sectional mean

cs_mean = csp.apply(cs, lambda v: sum(v)/len(v), float)
cs_mean
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': 3.0, '2020-01-05': 4.0}

Get the results as a NumPy array

cs = cross_sectional(x, interval=3, min_window=2, as_numpy=True)
cs
{'2020-01-02': np.array([1,2]), '2020-01-03': np.array([1,2,3]), '2020-01-04': np.array([2,3,4]), '2020-01-05': np.array([3,4,5])}
⚠️ **GitHub.com Fallback** ⚠️