Statistical Nodes API - Point72/csp GitHub Wiki

This page contains the documentation for the csp.stats library. The stats library contains functions to calculate statistics on time series data over rolling windows.

Base Statistics:

count: counts the number of data ticks within a given interval
unique: counts the number of unique values within a given interval
sum: rolling sum of values within a given interval
prod: rolling product of values within a given interval
first: the earliest value still within the interval
last: the last value of the interval
mean: the mean of values within the interval
gmean: the geometric mean of values within the interval

Order Statistics:

max: the maximum value within the interval
min: the minimum value within the interval
median: the median value within the interval
quantile: the quantile value within the interval
argmin: the time at which the minimum interval value ticked
argmax: the time at which the maximum interval value ticked
rank: the time series rank of the most recent tick in the interval

Moment-Based Statistics:

var: variance of the time series within the interval
stddev: standard deviation within the interval
sem: standard error within the interval
cov: covariance between two in-sequence time series within the interval
corr: correlation between two in-sequence time series within the interval
skew: skewness of the time series within the interval
kurt: kurtosis (or excess kurtosis) of the time series within the interval

Exponential Moving Statistics:

ema: exponential moving average, with numerous different variations available
ema_var: exponential moving variance
ema_std: exponential moving standard deviation
ema_cov: exponential moving covariance between two in-sequence time series

NumPy Specific Statistics:

cov_matrix: covariance matrix between N time-series (in a NumPy array) over a rolling time interval
corr_matrix: normalized correlation matrix between N time-series (in a NumPy array) a rolling time interval
list_to_numpy: converts a listbasket of time-series into a NumPy array
numpy_to_list: converts a NumPy array time-series into a listbasket

Cross-Sectional Statistics:

cross_sectional: receive all data within the current window for a cross-sectional calculation

Base Statistics

Count

count(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data.
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
ignore_na: if True, ignores NaN values in the window (does not count them). If false, NaN values make the count NaN.
- By default, ignore_na is True
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- By default, there is no reset series.
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_points, NaN is returned.

Returns:

A time-series of how many data points are currently in the interval. If a tick count is used, then it is necessarily less than or equal to the interval.

Examples: `count`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
count(x, interval=3)

# NaN is not counted
{'2020-01-03': 3, '2020-01-04': 2, '2020-01-05': 2}

2. Including NaN

count(x, interval=3, ignore_na=False)

{'2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}

3. Triggering

trigger = {'2020-01-03': True, '2020-01-05': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), ignore_na=True, trigger=trigger)

{'2020-01-03': 3, '2020-01-05': 2}

4. Sampling

sampler = {'2020-01-01': True, '2020-01-02': True, '2020-01-03': True, '2020-01-05': True, '2020-01-06': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), sampler=sampler)

{'2020-01-03': 3, '2020-01-05': 2}

Note: the x value at 2020-01-04 is ignored completely since sampler does not tick, while the value at 2020-01-06 is treated as NaN.

5. Reset

reset = {'2020-01-04': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), reset=reset)

{'2020-01-03': 3, '2020-01-04': 0, '2020-01-05': 1}

Note: the window data is reset at 2020-01-04, and its value is NaN, so the count is 0

6. NumPy

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,3]}
count(x_np, interval=3, min_window=1)

{'2020-01-01': [1,1], '2020-01-02': [2,1], '2020-01-03': [3,2]} # count is per element

Unique

unique(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0,
    precision: int = 10
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
trigger: another optional time-series which can be use to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- By default, there is no reset series.
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
precision: the decimal place precision at which two floats are considered non-unique. For example, if precision=2, then 2.001 and 2.002 would be considered non-unique.
- By default, precision is set to 10 decimal places.

Returns:

a time-series of how many unique (excluding nan) values are currently in the interval

Examples: `unique`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 2, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 3}
unique(x, interval=3, min_window=2)

{'2020-01-02': 1, '2020-01-03': 2, '2020-01-04': 2, '2020-01-05': 1}

2. Triggering

trigger = {'2020-01-03': True, '2020-01-05': True}
unique(x, interval=timedelta(days=3), min_window=timedelta(days=2), trigger=trigger)

{'2020-01-03': 2, '2020-01-05': 1}

3. NumPy

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,1]}
unique(x_np, interval=3, min_window=1)

{'2020-01-01': [1,1], 2020-01-02: [2,1], '2020-01-03': [3,1]} # unique is per element

Sum

sum(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    precise: bool = False,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data. Can either be a ts[Union[float, np.ndarray]] or ts[np.ndarray].
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
precise: if True we use a more numerically stable implementation (Kahan) which is less efficient
ignore_na: if True, does not include any nan values in the window. If false, nan values are included and will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted sum (optional).
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset": another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling sums over the interval

Examples: `sum`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
sum(x, interval=3)

{'2020-01-03': 6, '2020-01-04: 5', '2020-01-05': 8}

2. Including NaNs

sum(x, interval=3, min_window=2, ignore_na=False)

{'2020-01-02': 3, '2020-01-03': 6, '2020-01-04': nan, '2020-01-05': nan}

3. Weighted single input

weights = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-04': 3}
sum(x_np, interval=3, weights=weights)

{'2020-01-03': 11, '2020-01-04': 10, '2020-01-05': 21} # 21 = 5x3 + 3x2

4. NumPy

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,1]}
sum(x_np, interval=3, min_window=1)

{'2020-01-01': [1,1], '2020-01-02': [3,1], '2020-01-03': [4,2]}

5. NumPy weighted sum

np_weights = {'2020-01-01': [1,2], '2020-01-02': [2,1}
sum(x_np, interval=3, min_window=1, weights=np_weights)

{'2020-01-01': [1,2], '2020-01-02': [5,2], '2020-01-03': [11,3]} # weights applied elementwise

Product

prod(
    x: ts[Union[float, np.ndarray]],
    interval : Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
ignore_na: if True, does not include any nan values in the window. If false, nan values are included and will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling products over the interval. The computation is unstable for large products and windows.

Examples: `prod`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
prod(x, interval=3, min_window=2, ignore_na=True)

{'2020-01-02': 2, '2020-01-03': 6 '2020-01-04': 6, '2020-01-05': 15}

2. NumPy

x_np = {'2020-01-01': [1,2], '2020-01-02': [3,4], '2020-01-03': [5,6]}
prod(x_np, 3, 2)

{'2020-01-02': [3,8], '2020-01-03': [15,24]}

First

first(
    x: ts[Union[float, np.ndarray]],
    interval : Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0,
    ignore_na: bool = True
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
ignore_na: if True, will return the first non-nan value in the window. If False, will return the first value in the window

Returns:

a time-series of the earliest (non-nan) value still within the given interval

Examples: `first`

See last

Last

last(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
ignore_na: if True, will return the last non-nan value in the window. If False, will return the last value in the window
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of the most recent value within the given interval

Examples: `first` and `last`

Starttime: 2020-01-01 00:00:00

1. Default - first

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
first(x, interval=3)

{'2020-01-03': 1, '2020-01-04': 2, '2020-01-05': 3}

2. Including NaN - last

last(x, interval=3, ignore_na=False)

{'2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}

3. Triggering - last

trigger = {'2020-01-03': True, '2020-01-04': True}
last(x, interval=timedelta(days=3), ignore_na=True, trigger=trigger)

{'2020-01-03': 3, '2020-01-04': 3}

4. NumPy - first

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,3]}
first(x_np, interval=2)

# first non-nan value
{'2020-01-02': [1,1], '2020-01-03': [2,3]}

Mean

mean(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

x: the time-series data. Can either be a ts[Union[float, np.ndarray]] or a ts[np.ndarray].
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted mean (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling means over the interval. Computation uses smart updating so overflow is not an issue, since no sums are kept

Examples: `mean`

See gmean

Geometric Mean

gmean(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
)→ ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling geometric means over the interval. Requires a strictly positive-valued input.

Examples: `mean` and `gmean`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
mean(x, interval=3, min_window=2)

{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': 2.5, '2020-01-05': 4.0}

2. Including NaN

mean(x, interval=3, min_window=2, ignore_na=False)

{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': nan, '2020-01-05': nan}

3. Geometric mean

trigger = {'2020-01-03': True, '2020-01-05': True}
gmean(x, interval=timedelta(days=3), min_window=timedelta(days=2), ignore_na=True, trigger=trigger)

{'2020-01-03': 1.817, '2020-01-05': 3.873}

4. Weighted mean

weights = {'2020-01-01': 1, '2020-01-03': 2}
mean(x, interval=3, min_window=2, ignore_na=True, weights=weights)

{'2020-01-02': 1.5, '2020-01-03': 2.25, '2020-01-04': 2.667, '2020-01-05': 4.0}

Note: the first two observations get relative weight of 1, then the last three get relative weight of 2

5. NumPy weighted mean

x_np = {'2020-01-01': [1., 1., 1.], '2020-01-02': [2., 2., 2.], '2020-01-03': [3., 3., 3.]}
np_weights = {'2020-01-01': [1., 1., 1.], '2020-01-02': [2., 1., 2.], '2020-01-03': [3., 1., 3.]}
mean(x_np, 3, 2)

{'2020-01-02': [1.667, 1.5, 1.667], '2020-01-03': [2.667, 2.0, 2.6667]}

Order Statistics

Maximum

max(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling maximums over the interval.

Examples: `max`

See min

Minimum

min(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling minimums over the interval.

Examples: `max` and `min`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
min(x, interval=3, min_window=2)

{'2020-01-02': 1, '2020-01-03': 1, '2020-01-04': 2, '2020-01-05': 3}

2. Including NaN

max(x, interval=3, min_window=2, ignore_na=False)

{'2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}

3. NumPy example

x_np= {'2020-01-01': [2,3], '2020-01-02': [6,1], '2020-01-03': [1,9]}
min(x, interval=timedelta(days=3), min_window=timedelta(days=1))

{'2020-01-02': [2,1], '2020-01-03': [1,1]}

Median

median(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling medians over the interval. Uses midpoint interpolation if there are an even number of samples.

Examples: `median`

See quantile

Quantile

quantile(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    quant: Union[float, List[float]] = None,
    min_window: Union[timedelta, int] = None,
    interpolate: str = "linear",
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
    ) → Union[ts[Union[float, np.ndarray]], [ts[Union[float, np.ndarray]]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
quant: the quantile to calculate, which must be between 0 and 1
- If provided a list, then all quantiles will be calculated for the list.
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks.
interpolate: the interpolation method to use when the quantile does not correspond to an individual value. Must be one of the following options:
- "linear": interpolates linearly between the two closest values. For example, the 0.333 quantile of (1,2) with linear interpolation is 1.333.
- "lower": returns the lower of the two closest values.
- "higher": returns the higher of the two closest values.
- "midpoint": returns the midpoint between the two closest values. For example, the 0.333 quantile of (1,2) with midpoint interpolation is 1.5.
- "nearest": returns the value at the nearest position. For example, the 0.333 quantile of (1,2) with nearest interpolation is 1. In cases of ties, the higher value is returned.
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series or list-basket of time-series of rolling quantiles over the interval.
- If the quant parameter is a list then a list-basket will be returned.
- If it is a float then a time-series will be returned.
- The order of quantiles in the list-basket is equal to the order of the input.

Examples: `median` and `quantile`

Starttime: 2020-01-01 00:00:00

1. Median

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
median(x, interval=3, min_window=2)

{'2020-01-02': 1.5, '2020-01-03': 2, '2020-01-04': 2.5, '2020-01-05': 4}

2. Quantile with multiple values

quantile(x, interval=3, quant=[0.25, 0.5, 0.75], min_window=2, ignore_na=False)

[
    {'2020-01-02': 1.25, '2020-01-03': 1.5, '2020-01-04': nan, '2020-01-05': nan},
    {'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': nan, '2020-01-05': nan},
    {'2020-01-02': 1.75, '2020-01-03': 2.5, '2020-01-04': nan, '2020-01-05': nan}
]

3. Quantile with trigger

trigger = {'2020-01-03': True, '2020-01-05': True}
quantile(x, interval=timedelta(days=3), quant=0.333, min_window=timedelta(days=2), interpolate="midpoint", ignore_na=True, trigger=trigger)

{'2020-01-03': 1.5, '2020-01-05': 4}

4. NumPy array with multiple quantiles

x_np = {'2020-01-01': [1,2,3], '2020-01-02': [2,3,4], '2020-01-03': [3,4,5]}
quantile(x_np, interval=3, quant=[0.25,0.5,0.75], min_window=1)

# this is a listbasket of NumPy array time series
[
    {'2020-01-01': [1,2,3], '2020-01-02': [1.25, 2.25, 3.25], '2020-01-03': [1.5, 2.5, 3.5]},
    {'2020-01-01': [1,2,3], '2020-01-02': [1.5, 2.5, 3.5], '2020-01-03': [2., 3., 4.]},
    {'2020-01-01': [1,2,3], '2020-01-02': [1.75, 2.75, 3.75], '2020-01-03': [2.5, 3.5, 4.5]}
]

Argmin

argmin(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    return_most_recent: bool = True,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[datetime, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
return_most_recent: if True, in the case of a tie, the most recent time will be returned. If false, the least recent time will be returned.
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid.

Returns:

a time-series of rolling argmin values over the interval, returned as a datetime or NumPy array of np.datetime64 objects. If no data is present or NaN invalidation occurs, the default time '1970-1-1 00:00:00' is returned.

Examples: `argmin`

See argmax

Argmax

argmax(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    return_most_recent: bool = True,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[datetime, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
return_most_recent: if True, in the case of a tie, the most recent time will be returned. If false, the least recent time will be returned.
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling argmax values over the interval, returned as a datetime or NumPy array of np.datetime64 objects. If no data is present or NaN invalidation occurs, the default time '1970-1-1 00:00:00' is returned.

Examples: `argmax and` argmin`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 1, '2020-01-04': nan, '2020-01-05': 4}
argmax(x, 3)

{'2020-01-03': '2020-01-02', '2020-01-04': '2020-01-02', '2020-01-05': '2020-01-05'}

argmin(x, 3)

{'2020-01-03': '2020-01-03', '2020-01-04': '2020-01-03', '2020-01-05': '2020-01-03'}

2. NumPy example

x_np = {'2020-01-01': [1,2], '2020-01-02': [2,1], '2020-01-03': [3,0]}
argmax(x_np, 3, 2)

{'2020-01-02': ['2020-01-02', '2020-01-01'], '2020-01-03': ['2020-01-03', '2020-01-01']}

argmin(x_np, 3, 1)

{'2020-01-02': ['2020-01-01', '2020-01-02'], '2020-01-03': ['2020-01-01', '2020-01-03']}

3. return_most_recent=False

argmin(x, 3, return_most_recent=False)

{'2020-01-03': '2020-01-01', '2020-01-04': '2020-01-03', 2020-01-05: '2020-01-03'} # Note how the first element is '2020-01-01', not '2020-01-03'

Rank

rank(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    method: str = "min",
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0,
    na_option: str = "keep"
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use 100-tick rolling interval with no output until we have 50 ticks
method: the method to use to rank groups of records that have the same value
- "min": the lowest rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=1
- "max": the highest rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=3
- "avg": the average rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=2
- By default, the "min" method is used.
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, nan is returned.
na_option: how to rank a nan value when it is the last value to be ranked
- "keep": return a nan rank for a nan value
- "last": rank the last non-nan value present in the interval
- By default, the "keep" option is used.
Output: a time-series of rolling ranks over the interval, where a rank of 0 means that the current (last) ticked value is the smallest in the given interval.

Examples: `rank`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 2, '2020-01-04': 5, '2020-01-05': 4}
rank(x, 5, min_window=3)

{'2020-01-03': 1, '2020-01-04': 3, '2020-01-05': 3}

2. NumPy example

x_np = {'2020-01-01': [1,2], '2020-01-02': [3,2], '2020-01-03': [2,1]}
rank(x_np, 3, 2)

# Note how the second element at '2020-01-02' is 0, not 1, as by default the "min" method is used
{'2020-01-02': [1, 0], '2020-01-03': [1, 0]}

3. "keep" vs "last" NaN option

x = {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 2, '2020-01-04': nan, '2020-01-05': 4}
rank(x, 5, min_window=3, na_option="keep")

{'2020-01-03': 1, '2020-01-04': nan, '2020-01-05': 3}

rank(x, 5, min_window=3, na_option="last")

# the last valid value, 1, is ranked at '2020-01-04'
{'2020-01-03': 1, '2020-01-04': 1, '2020-01-05': 3}

Moment-Based Statistics

Variance

var(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ddof: delta degrees of freedom. Example: if ddof=1, then normalization term is 1/(N-1). If ddof=0, then 1/N.
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted variance (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling variance over the interval. If insufficient samples for given ddof, then no value output is generated. Since the smart mean is being used, overflow is not a problem.

Examples: `var`

See Standard Error.

Standard Deviation

stddev(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ddof: delta degrees of freedom
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted standard deviation (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling standard deviations over the interval. If insufficient samples for given ddof, then no value output is generated.

Examples: `stddev`

See Standard Error.

Standard Error

sem(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ddof: delta degrees of freedom
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted standard error (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling standard errors

Examples: Variance, Standard Deviation, Standard Error

Starttime: 2020-01-01 00:00:00

1. Variance

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
var(x, interval=3, min_window=2)

{'2020-01-02': 0.5, '2020-01-03': 1.0, '2020-01-04': 0.5, '2020-01-05': 2.0}

2. Biased variance

var(x, interval=3, min_window=2, ddof=0, ignore_na=True) # biased

{'2020-01-02': 0.25, '2020-01-03': 0.666, '2020-01-04': 0.25, '2020-01-05': 1.0}

3. Standard deviation including NaNs

stddev(x, interval=3, min_window=2, ignore_na=False)

{'2020-01-02': 0.707, '2020-01-03': 1.0, '2020-01-04': nan, '2020-01-05': nan}

4. Standard error with triggering

trigger = {'2020-01-03': True, '2020-01-05': True}
sem(x, interval=timedelta(days=3), min_window=timedelta(days=2), trigger=trigger)

{'2020-01-03': 0.707, '2020-01-05': 1.0}

Covariance

cov(
    x: ts[Union[float, np.ndarray]],
    y: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

x: time-series data. If x is of type np.ndarray, then the covariance calculation is performed element-wise with the corresponding values in y.
y: time-series data that ticks in sequence with x
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ddof: delta degrees of freedom
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted covariance (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling covariances between x and y

Examples: `cov`

See Correlation.

Correlation

corr(
    x: ts[Union[float, np.ndarray]],
    y: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

x: time-series data. If x is of type np.ndarray, then the correlation calculation is performed element-wise with the corresponding values in y.
y: time-series data that ticks in sequence with x
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted correlation (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling Pearson correlation coefficients between x and y

Examples: Covariance and Correlation

Starttime: 2020-01-01 00:00:00

1. Covariance

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': 4, '2020-01-05': 5}
y = {'2020-01-01': 5, '2020-01-02': 4, '2020-01-03': 3, '2020-01-04': 2, '2020-01-05': 1}
cov(x, y, interval=3, min_window=2)

{'2020-01-02': -0.5, '2020-01-03': -1.0, '2020-01-04': -1.0, '2020-01-05': -1.0}

2. Correlation

corr(x, y, interval=3)

{'2020-01-03': -1.0, '2020-01-04': -1.0, '2020-01-05': -1.0}

Skewness

skew(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    bias: bool = False,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
bias: if True, calculates a biased (unadjusted) skew. If false (default), calculates a Gaussian-unbiased measure.
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted skew (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling sample skew measures, using the adjusted Fisher–Pearson standardized moment coefficient.

Examples: `skew`

See Kurtosis.

Kurtosis

kurt(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    excess: bool = True,
    bias: bool = False,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
excess: if True (default) uses the definition of excess kurtosis (kurt - 3). If false, uses the standard definition.
bias: if True, calculates a biased (unadjusted) kurtosis. If false (default), calculates a Gaussian-unbiased measure.
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted kurtosis (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of rolling sample kurtosis measures, using the adjusted Fisher–Pearson standardized moment coefficient.

Examples: `skew` and `kurt`

Starttime: 2020-01-01 00:00:00

1. Skew

x = {'2020-01-01': 1, '2020-01-02': 2, ..., 2020-01-10: 10}
skew(x, interval=7)

{2020-01-07: 0, 2020-01-08: 0, 2020-01-09: 0, 2020-01-10: 0}

2. Kurtosis

kurt(x, interval=7) # excess kurtosis

{2020-01-07: -1.2, 2020-01-08: -1.2, 2020-01-09: -1.2, 2020-01-10: -1.2}

Exponential Moving Statistics

Exponential Moving Average

ema(
    x: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[timedelta] = None,
    adjust: bool = True,
    horizon: int = None,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
alpha: the EMA weight parameter specified directly. If adjust = True, EMA is calculated such that

$$EMA(t) = \frac{\sum\limits_{t=-n}^{0} (1-\alpha)^{-t} x(-t)}{\sum\limits_{t=-n}^{0} (1-\alpha)^{-t}}$$

If adjust = False, EMA is calculated such that

$$EMA(t) = (1-\alpha)EMA(t-1) + \alpha x(t)$$ $$EMA(t=0) = x(0)$$

By default, adjust = True, to give better estimates for starting intervals.

The following are alternative methods to specify the $\alpha$ parameter.
- span: specify alpha in terms of span, such that
  
  $$\alpha = \frac{2}{span+1}$$
- com: specify alpha in terms of centre of mass, such that
  
  $$\alpha = \frac{1}{1+com}$$
- halflife: Halflife is different from the other parameters. Half-life is a timedelta argument that specifies the half-life of observation weights. Half-life is useful when observations are irregularly spaced and a better estimate is needed to properly weight more recent data. Let $t_{-1}$ be the time of the last observation.
  
  Then:
  
  $$\lambda(t) = 1 - \exp(\frac{-(t-t_{-1})*\ln(2)}{halflife})$$ $$EMA(t) = \frac{ \lambda(t)*EMA(t-1) + x(t)}{\text{normalization constant}}$$
  
  Something to note is that the ignore_na flag does not matter if a halflife interval is specified. The behavior would be the same in both cases, since an absolute time interval is being used to re-weight the moving average, not a tick interval.
  
  Exactly one of alpha, span, com, halflife must be given
adjust: if True, early observations are adjusted to give a more "smoothed" estimate of the EMA. The difference is that if adjust=True, then each new observation receives a relative weight of 1. If adjust = False, each new observation receives a relative weight of alpha.
- adjust=True means that:
$$EMA(t) = \frac{x(t)+(1-\alpha)x(t-1)+(1-\alpha)^2 x(t-2) + ... + (1-\alpha)^n x(t-n)}{1+(1-\alpha)+(1-\alpha)^ 2 + ... + (1-\alpha)^n}$$
- adjust=False means that:
$$EMA(t) = \frac{\alpha * x(t) + \alpha * (1-\alpha) * x(t-1) + \alpha * (1-\alpha)^2 * x(t-2) + ... + \boldsymbol{(1-\alpha)^n x(0)}}{\alpha+\alpha*(1-\alpha)+\alpha*(1-\alpha)^ 2 + ... + (1-\alpha)^n}$$

$$\text{and thus } EMA(t=0) = x(0)$$

Adjust only applies with tick specified intervals, not time specified intervals. Time specified intervals (i.e. half-life) do not need adjustment as they are, by definition, already adjusted.
horizon: the maximum number of ticks to use in the computation. For example, if horizon = 10, then only the 10 most recent data points are used. If not specified, all data points for x are used, with early ticks decaying exponentially in weighting. Horizon will be ignored with a half-life (time-based) interval.
- If horizon is set to h, then even if x has more than h ticks the EMA will computed as such if adjust=True.
$$EMA(t) = \frac{\sum_{t=-h}^{0} (1-\alpha)^{-t} x(t)}{\sum_{t=-h}^{0} (1-\alpha)^{-t}}$$
- The only difference if adjust=False is that the first ever tick, while in the window, receives weight 1 at the start instead of weight $\alpha$ like the rest of the values.
ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position, and renormalized as such.
- For example, let us consider a dataset (1,nan,2) using adjust=True.
  - If ignore_na=True then the weighting is based on relative position as such: $$EMA(t=2) = \frac{(1-\alpha)*1 + 2}{(1-\alpha)+1}$$
  - If ignore_na=False then the weighting is based on global position as such: $$EMA(t=2) = \frac{(1-\alpha)^2*1 + 2}{(1-\alpha)^2+1}$$
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation.
recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
- Note: only valid when a finite-horizon EMA is used.
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of exponentially-weighted moving averages over the interval.

Examples: `ema`

Starttime: 2020-01-01 00:00:00

1. Unadjusted EMA

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': 4, '2020-01-05': 5}
ema(x, alpha=0.1, adjust=False) # unadjusted

{'2020-01-01': 1.0, '2020-01-02': 1.1, '2020-01-03': 1.29, '2020-01-04': 1.561, '2020-01-05': 1.9049}

2. Adjusted EMA

ema(x, alpha=0.1, adjust=True)  # adjusted, default method

{'2020-01-01': 1.0, '2020-01-02': 1.5263, '2020-01-03': 2.0701, '2020-01-04': 2.6313, '2020-01-05': 3.20971}

3. Finite horizon EMA

ema(x, alpha=0.1, adjust=True, horizon=2) # finite horizon

{'2020-01-01': 1.0, '2020-01-02': 1.5263, '2020-01-03': 2.5263, '2020-01-04': 3.5263, '2020-01-05': 4.5263}

4. Time-based decay EMA

ema(x, halflife=timedelta(days=1)) # time-based

{'2020-01-01': 1.0, '2020-01-02': 1.6666, '2020-01-03': 2.4286, '2020-01-04': 3.2666, '2020-01-05': 4.1613}

5. Unadjusted EMA for NumPy array

x_np = {'2020-01-01': [1,2], '2020-01-02': [4,5], '2020-01-03': [7,8]}
ema(x_np, alpha=0.1, adjust=False)

{'2020-01-01': [1,2], '2020-01-02': [1.3,2.3], '2020-01-03': [1.87,2.87] }

Exponential Moving Variance

ema_var(
    x: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[Union[float, timedelta]] = None,
    adjust: bool = True,
    horizon: int = None,
    bias: bool = False,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
alpha, span, com, halflife: as described in EMA
adjust: as specified in EMA
horizon: as specified in EMA.
bias: if True, uses a biased population weighted variance. If false, normalized by a proper debiasing factor.
ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
- Note: only valid when a finite-horizon EMA is used.
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of exponentially-weighted moving variances over the interval.

Examples: `ema_var`

See Exponential Moving Standard Deviation

Exponential Moving Standard Deviation

ema_std(
    x: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[Union[float, timedelta]] = None,
    adjust: bool = True,
    horizon: int = None,
    bias: bool = False,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0,
) → ts[Union[float, np.ndarray]]

Args:

x: the time-series data
min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
alpha, span, com, halflife: as described in EMA
adjust: as specified in EMA
horizon: as specified in EMA.
bias: if True, uses a biased population weighted variance. If false, normalized by debiasing factor
ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
- Note: only valid when a finite-horizon EMA is used.
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of exponentially-weighted moving standard deviations over the interval.

Examples: Exp. Moving Variance and Standard Deviation

Starttime: 2020-01-01 00:00:00

1. Exp. Moving Standard Deviation

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
ema_std(x, min_periods=2, span=20, adjust=False, bias=False, ignore_na=False)

{'2020-01-02': 0.707, '2020-01-03': 1.11636, '2020-01-04': 1.11636, '2020-01-05': 1.937005}

2. Exp. Moving Variance

ema_var(x, min_periods=2, span=20, adjust=False, bias=True, ignore_na=False)

{'2020-01-02': 0.086168, '2020-01-03': 0.390588 '2020-01-04': 0.390588, '2020-01-05': 1.644124}

Exponential Moving Covariance

ema_cov(
    x: ts[Union[float, np.ndarray]],
    y: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[Union[float, timedelta]] = None,
    adjust: bool = True,
    horizon: int = None,
    bias: bool = False,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

x: time-series data. If x is of type np.ndarray, the exponential-moving covariance is calculated element-wise with the corresponding values in y.
y: time-series data which ticks in-sequence with x
min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
alpha, span, com, halflife: as described in EMA
adjust: as specified in EMA
horizon: as specified in EMA.
bias: if True, uses a biased population weighted covariance. If false, normalized by debiasing factor
ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
- Note: only valid when a finite-horizon EMA is used.
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of exponentially-weighted moving covariance over the interval.

NumPy Specific Statistics

Covariance Matrix

cov_matrix(
    x: ts[np.ndarray],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[np.ndarray]

Args:

x: the time-series of dimension (N,) arrays which represent N variables
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ddof: delta degrees of freedom
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted covariance matrix (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the statistic, and in doing so clears any accumulated floating-point error.
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of (potentially weighted) covariance matrices, each of which is a NumpyNDArray of dimensionality (N,N)

Correlation Matrix

corr_matrix(
    x: ts[np.ndarray],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[np.ndarray]

Args:

x: the time-series of dimension (N,) arrays which represent N variables
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
weights: a time-series of weights for each observation in x, used to calculate a weighted correlation matrix (optional). Weights do not need to be normalized.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
recalc: another optional time-series which triggers a clean recalculation of the statistic, and in doing so clears any accumulated floating-point error.
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

a time-series of (potentially weighted) correlation matrices, each of which is a NumpyNDArray of dimensionality (N,N)

Examples: Covariance and Correlation Matrices

Starttime: 2020-01-01 00:00:00

1. Covariance

x = {'2020-01-01': np.array([0., 0., 0.]), '2020-01-02': np.array([1., -1., 2.]), '2020-01-03': np.array([2., -2., 4.])}
cov_matrix(x, 3, ddof=0)

{'2020-01-03': np.array([1, -1, 2],
                     [-1, 1, -2],
                      [2, -2, 4])}

2. Correlation

corr_matrix(x, 3)

{'2020-01-03': np.array([1, -1, 1],
                     [-1, 1, -1],
                      [1, -1, 1])}

NumPy Conversions

list_to_numpy(x: [ts[float]], fillna: bool = False) → ts[np.ndarray]

Args:

x: a listbasket of time series
fillna: If False, unticked elements are treated as NaN. If True, unticked elements will hold their previous value in the array.

Returns:

a NumPy 1D array where each value corresponds to the element of the listbasket with the same index

numpy_to_list(x: ts[np.ndarray], n: int) → [ts[float]]

Args:

x: a NumPy array valued time series
n: the number of output channels in the listbasket Returns:
a listbasket where each value corresponds to the element of the array with the same index

Examples: NumPy Conversions

Starttime: 2020-01-01 00:00:00

1. List to NumPy

x1 = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3}
x2 = {'2020-01-01': 1.5, '2020-01-03': 3.5}
list_to_numpy([x1,x2], fillna=False)

{'2020-01-01': [1, 1.5], '2020-01-02': [2, np.nan], '2020-01-03': [3, 3.5]} # no x2 tick on day 2

list_to_numpy([x1,x2], fillna=True)

{'2020-01-01': [1, 1.5], '2020-01-02': [2, 1.5], '2020-01-03': [3, 3.5]} # holds x2 value for day 2

2. NumPy to list

x_np = {'2020-01-01': [1,2], '2020-01-02': [3,4], '2020-01-03': [5,6]}
numpy_to_list(x_np, 2)

[
    {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 5},
    {'2020-01-01': 2, '2020-01-02': 4, '2020-01-03': 6}
]

Cross-Sectional Statistics

Cross Sectional

cross_sectional(
    x: ts[Union[float,np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    trigger: ts[object] = None,
    as_numpy: bool = False,
    sampler: ts[object] = None,
    reset: ts[object] = None
) → ts[Union[np.ndarray, List[float], List[np.ndarray]]]

Args:

x: the time-series data
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
as_numpy: if True, the data will be returned as a NumPy array instead of a list.
- For a single-valued time series, this is a one-dimensional NumPy array
- For a NumPy array time series, this is a NumPy array of one extra dimension
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation

Returns:

a time-series where each tick contains all the data of x currently within the interval. Use this for custom cross-sectional calculations

Examples: Cross-sectional calculations

Starttime: 2020-01-01 00:00:00

x = {'2020-01-01': 1, '2020-01-01': 2, '2020-01-01': 3, '2020-01-01': 4, '2020-01-01': 5}
cs = cross_sectional(x, interval=3, min_window=2)
cs

{'2020-01-02': [1,2], '2020-01-03': [1,2,3], '2020-01-04': [2,3,4], '2020-01-05': [3,4,5]}

Calculate a cross-sectional mean

cs_mean = csp.apply(cs, lambda v: sum(v)/len(v), float)
cs_mean

{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': 3.0, '2020-01-05': 4.0}

Get the results as a NumPy array

cs = cross_sectional(x, interval=3, min_window=2, as_numpy=True)
cs

{'2020-01-02': np.array([1,2]), '2020-01-03': np.array([1,2,3]), '2020-01-04': np.array([2,3,4]), '2020-01-05': np.array([3,4,5])}

Statistical Nodes API - Point72/csp GitHub Wiki

Table of Contents

Base Statistics

Count

Examples: count

Unique

Examples: unique

Sum

Examples: sum

Product

Examples: prod

First

Examples: first

Last

Examples: first and last

Mean

Examples: mean

Geometric Mean

Examples: mean and gmean

Order Statistics

Maximum

Examples: max

Minimum

Examples: max and min

Median

Examples: median

Quantile

Examples: median and quantile

Argmin

Examples: argmin

Argmax

Examples: argmax and argmin`

Rank

Examples: rank

Moment-Based Statistics

Variance

Examples: var

Standard Deviation

Examples: stddev

Standard Error

Examples: Variance, Standard Deviation, Standard Error

Covariance

Examples: cov

Correlation

Examples: Covariance and Correlation

Skewness

Examples: skew

Kurtosis

Examples: skew and kurt

Exponential Moving Statistics

Exponential Moving Average

Examples: ema

Exponential Moving Variance

Examples: ema_var

Exponential Moving Standard Deviation

Examples: Exp. Moving Variance and Standard Deviation

Exponential Moving Covariance

NumPy Specific Statistics

Covariance Matrix

Correlation Matrix

Examples: Covariance and Correlation Matrices

NumPy Conversions

Examples: NumPy Conversions

Cross-Sectional Statistics

Cross Sectional

Examples: Cross-sectional calculations

⚠️ **GitHub.com Fallback** ⚠️

Examples: `count`

Examples: `unique`

Examples: `sum`

Examples: `prod`

Examples: `first`

Examples: `first` and `last`

Examples: `mean`

Examples: `mean` and `gmean`

Examples: `max`

Examples: `max` and `min`

Examples: `median`

Examples: `median` and `quantile`

Examples: `argmin`

Examples: `argmax and` argmin`

Examples: `rank`

Examples: `var`

Examples: `stddev`

Examples: `cov`

Examples: `skew`

Examples: `skew` and `kurt`

Examples: `ema`

Examples: `ema_var`

⚠️ GitHub.com Fallback ⚠️