Statistical Nodes API - Point72/csp GitHub Wiki
This page contains the documentation for the csp.stats
library. The
stats
library contains functions to calculate statistics on time
series data over rolling windows.
Base Statistics:
- count: counts the number of data ticks within a given interval
- unique: counts the number of unique values within a given interval
- sum: rolling sum of values within a given interval
- prod: rolling product of values within a given interval
- first: the earliest value still within the interval
- last: the last value of the interval
- mean: the mean of values within the interval
- gmean: the geometric mean of values within the interval
Order Statistics:
- max: the maximum value within the interval
- min: the minimum value within the interval
- median: the median value within the interval
- quantile: the quantile value within the interval
- argmin: the time at which the minimum interval value ticked
- argmax: the time at which the maximum interval value ticked
- rank: the time series rank of the most recent tick in the interval
Moment-Based Statistics:
- var: variance of the time series within the interval
- stddev: standard deviation within the interval
- sem: standard error within the interval
- cov: covariance between two in-sequence time series within the interval
- corr: correlation between two in-sequence time series within the interval
- skew: skewness of the time series within the interval
- kurt: kurtosis (or excess kurtosis) of the time series within the interval
Exponential Moving Statistics:
- ema: exponential moving average, with numerous different variations available
- ema_var: exponential moving variance
- ema_std: exponential moving standard deviation
- ema_cov: exponential moving covariance between two in-sequence time series
NumPy Specific Statistics:
- cov_matrix: covariance matrix between N time-series (in a NumPy array) over a rolling time interval
- corr_matrix: normalized correlation matrix between N time-series (in a NumPy array) a rolling time interval
- list_to_numpy: converts a listbasket of time-series into a NumPy array
- numpy_to_list: converts a NumPy array time-series into a listbasket
Cross-Sectional Statistics:
- cross_sectional: receive all data within the current window for a cross-sectional calculation
count(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data.
-
interval: the rolling interval over which to use data.
If unspecified or set to
None
, an expanding (unbounded) window will be used.- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
-
ignore_na: if
True
, ignores NaN values in the window (does not count them). If false, NaN values make the count NaN.-
By default,
ignore_na
is True
-
By default,
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
-
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- By default, there is no reset series.
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_points, NaN is returned.
Returns:
- A time-series of how many data points are currently in the interval. If a tick count is used, then it is necessarily less than or equal to the interval.
Starttime: 2020-01-01 00:00:00
1. Default behavior
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
count(x, interval=3)
# NaN is not counted
{'2020-01-03': 3, '2020-01-04': 2, '2020-01-05': 2}
2. Including NaN
count(x, interval=3, ignore_na=False)
{'2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}
3. Triggering
trigger = {'2020-01-03': True, '2020-01-05': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), ignore_na=True, trigger=trigger)
{'2020-01-03': 3, '2020-01-05': 2}
4. Sampling
sampler = {'2020-01-01': True, '2020-01-02': True, '2020-01-03': True, '2020-01-05': True, '2020-01-06': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), sampler=sampler)
{'2020-01-03': 3, '2020-01-05': 2}
Note: the x value at 2020-01-04 is ignored completely since sampler does not tick, while the value at 2020-01-06 is treated as NaN.
5. Reset
reset = {'2020-01-04': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), reset=reset)
{'2020-01-03': 3, '2020-01-04': 0, '2020-01-05': 1}
Note: the window data is reset at 2020-01-04, and its value is NaN, so the count is 0
6. NumPy
x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,3]}
count(x_np, interval=3, min_window=1)
{'2020-01-01': [1,1], '2020-01-02': [2,1], '2020-01-03': [3,2]} # count is per element
unique(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0,
precision: int = 10
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
-
trigger: another optional time-series which can be use to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
-
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- By default, there is no reset series.
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
-
precision: the decimal place precision at which two floats are considered non-unique. For example, if precision=2, then 2.001 and 2.002 would be considered non-unique.
- By default, precision is set to 10 decimal places.
Returns:
- a time-series of how many unique (excluding nan) values are currently in the interval
Starttime: 2020-01-01 00:00:00
1. Default behavior
x = {'2020-01-01': 2, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 3}
unique(x, interval=3, min_window=2)
{'2020-01-02': 1, '2020-01-03': 2, '2020-01-04': 2, '2020-01-05': 1}
2. Triggering
trigger = {'2020-01-03': True, '2020-01-05': True}
unique(x, interval=timedelta(days=3), min_window=timedelta(days=2), trigger=trigger)
{'2020-01-03': 2, '2020-01-05': 1}
3. NumPy
x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,1]}
unique(x_np, interval=3, min_window=1)
{'2020-01-01': [1,1], 2020-01-02: [2,1], '2020-01-03': [3,1]} # unique is per element
sum(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
precise: bool = False,
ignore_na: bool = True,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
-
x: the time-series data. Can either be a
ts[Union[float, np.ndarray]]
orts[np.ndarray]
. -
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
- precise: if True we use a more numerically stable implementation (Kahan) which is less efficient
- ignore_na: if True, does not include any nan values in the window. If false, nan values are included and will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted sum (optional).
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset": another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling sums over the interval
Starttime: 2020-01-01 00:00:00
1. Default behavior
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
sum(x, interval=3)
{'2020-01-03': 6, '2020-01-04: 5', '2020-01-05': 8}
2. Including NaNs
sum(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 3, '2020-01-03': 6, '2020-01-04': nan, '2020-01-05': nan}
3. Weighted single input
weights = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-04': 3}
sum(x_np, interval=3, weights=weights)
{'2020-01-03': 11, '2020-01-04': 10, '2020-01-05': 21} # 21 = 5x3 + 3x2
4. NumPy
x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,1]}
sum(x_np, interval=3, min_window=1)
{'2020-01-01': [1,1], '2020-01-02': [3,1], '2020-01-03': [4,2]}
5. NumPy weighted sum
np_weights = {'2020-01-01': [1,2], '2020-01-02': [2,1}
sum(x_np, interval=3, min_window=1, weights=np_weights)
{'2020-01-01': [1,2], '2020-01-02': [5,2], '2020-01-03': [11,3]} # weights applied elementwise
prod(
x: ts[Union[float, np.ndarray]],
interval : Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
- ignore_na: if True, does not include any nan values in the window. If false, nan values are included and will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling products over the interval. The computation is unstable for large products and windows.
Starttime: 2020-01-01 00:00:00
1. Default behavior
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
prod(x, interval=3, min_window=2, ignore_na=True)
{'2020-01-02': 2, '2020-01-03': 6 '2020-01-04': 6, '2020-01-05': 15}
2. NumPy
x_np = {'2020-01-01': [1,2], '2020-01-02': [3,4], '2020-01-03': [5,6]}
prod(x_np, 3, 2)
{'2020-01-02': [3,8], '2020-01-03': [15,24]}
first(
x: ts[Union[float, np.ndarray]],
interval : Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0,
ignore_na: bool = True
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
- ignore_na: if True, will return the first non-nan value in the window. If False, will return the first value in the window
Returns:
- a time-series of the earliest (non-nan) value still within the given interval
See last
last(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data.
If unspecified or set to None, an expanding (unbounded)
window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before
outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
- ignore_na: if True, will return the last non-nan value in the window. If False, will return the last value in the window
-
trigger: another optional time-series which can be used
to externally trigger computations. Whenever the trigger
ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of the most recent value within the given interval
Starttime: 2020-01-01 00:00:00
1. Default - first
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
first(x, interval=3)
{'2020-01-03': 1, '2020-01-04': 2, '2020-01-05': 3}
2. Including NaN - last
last(x, interval=3, ignore_na=False)
{'2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}
3. Triggering - last
trigger = {'2020-01-03': True, '2020-01-04': True}
last(x, interval=timedelta(days=3), ignore_na=True, trigger=trigger)
{'2020-01-03': 3, '2020-01-04': 3}
4. NumPy - first
x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,3]}
first(x_np, interval=2)
# first non-nan value
{'2020-01-02': [1,1], '2020-01-03': [2,3]}
mean(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
-
x: the time-series data. Can either be a
ts[Union[float, np.ndarray]]
or ats[np.ndarray]
. -
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted mean (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling means over the interval. Computation uses smart updating so overflow is not an issue, since no sums are kept
See gmean
gmean(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0
)→ ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling geometric means over the interval. Requires a strictly positive-valued input.
Starttime: 2020-01-01 00:00:00
1. Default behavior
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
mean(x, interval=3, min_window=2)
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': 2.5, '2020-01-05': 4.0}
2. Including NaN
mean(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': nan, '2020-01-05': nan}
3. Geometric mean
trigger = {'2020-01-03': True, '2020-01-05': True}
gmean(x, interval=timedelta(days=3), min_window=timedelta(days=2), ignore_na=True, trigger=trigger)
{'2020-01-03': 1.817, '2020-01-05': 3.873}
4. Weighted mean
weights = {'2020-01-01': 1, '2020-01-03': 2}
mean(x, interval=3, min_window=2, ignore_na=True, weights=weights)
{'2020-01-02': 1.5, '2020-01-03': 2.25, '2020-01-04': 2.667, '2020-01-05': 4.0}
Note: the first two observations get relative weight of 1, then the last three get relative weight of 2
5. NumPy weighted mean
x_np = {'2020-01-01': [1., 1., 1.], '2020-01-02': [2., 2., 2.], '2020-01-03': [3., 3., 3.]}
np_weights = {'2020-01-01': [1., 1., 1.], '2020-01-02': [2., 1., 2.], '2020-01-03': [3., 1., 3.]}
mean(x_np, 3, 2)
{'2020-01-02': [1.667, 1.5, 1.667], '2020-01-03': [2.667, 2.0, 2.6667]}
max(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- By default, the min_window is equal to the interval
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling maximums over the interval.
See min
min(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling minimums over the interval.
Starttime: 2020-01-01 00:00:00
1. Default behavior
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
min(x, interval=3, min_window=2)
{'2020-01-02': 1, '2020-01-03': 1, '2020-01-04': 2, '2020-01-05': 3}
2. Including NaN
max(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}
3. NumPy example
x_np= {'2020-01-01': [2,3], '2020-01-02': [6,1], '2020-01-03': [1,9]}
min(x, interval=timedelta(days=3), min_window=timedelta(days=1))
{'2020-01-02': [2,1], '2020-01-03': [1,1]}
median(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling medians over the interval. Uses midpoint interpolation if there are an even number of samples.
See quantile
quantile(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
quant: Union[float, List[float]] = None,
min_window: Union[timedelta, int] = None,
interpolate: str = "linear",
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0
) → Union[ts[Union[float, np.ndarray]], [ts[Union[float, np.ndarray]]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
quant: the quantile to calculate, which must be between 0 and 1
- If provided a list, then all quantiles will be calculated for the list.
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks.
-
interpolate: the interpolation method to use when the quantile does not correspond to an individual value. Must be one of the following options:
- "linear": interpolates linearly between the two closest values. For example, the 0.333 quantile of (1,2) with linear interpolation is 1.333.
- "lower": returns the lower of the two closest values.
- "higher": returns the higher of the two closest values.
- "midpoint": returns the midpoint between the two closest values. For example, the 0.333 quantile of (1,2) with midpoint interpolation is 1.5.
- "nearest": returns the value at the nearest position. For example, the 0.333 quantile of (1,2) with nearest interpolation is 1. In cases of ties, the higher value is returned.
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series or list-basket of time-series of rolling quantiles over the interval.
- If the quant parameter is a list then a list-basket will be returned.
- If it is a float then a time-series will be returned.
- The order of quantiles in the list-basket is equal to the order of the input.
Starttime: 2020-01-01 00:00:00
1. Median
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
median(x, interval=3, min_window=2)
{'2020-01-02': 1.5, '2020-01-03': 2, '2020-01-04': 2.5, '2020-01-05': 4}
2. Quantile with multiple values
quantile(x, interval=3, quant=[0.25, 0.5, 0.75], min_window=2, ignore_na=False)
[
{'2020-01-02': 1.25, '2020-01-03': 1.5, '2020-01-04': nan, '2020-01-05': nan},
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': nan, '2020-01-05': nan},
{'2020-01-02': 1.75, '2020-01-03': 2.5, '2020-01-04': nan, '2020-01-05': nan}
]
3. Quantile with trigger
trigger = {'2020-01-03': True, '2020-01-05': True}
quantile(x, interval=timedelta(days=3), quant=0.333, min_window=timedelta(days=2), interpolate="midpoint", ignore_na=True, trigger=trigger)
{'2020-01-03': 1.5, '2020-01-05': 4}
4. NumPy array with multiple quantiles
x_np = {'2020-01-01': [1,2,3], '2020-01-02': [2,3,4], '2020-01-03': [3,4,5]}
quantile(x_np, interval=3, quant=[0.25,0.5,0.75], min_window=1)
# this is a listbasket of NumPy array time series
[
{'2020-01-01': [1,2,3], '2020-01-02': [1.25, 2.25, 3.25], '2020-01-03': [1.5, 2.5, 3.5]},
{'2020-01-01': [1,2,3], '2020-01-02': [1.5, 2.5, 3.5], '2020-01-03': [2., 3., 4.]},
{'2020-01-01': [1,2,3], '2020-01-02': [1.75, 2.75, 3.75], '2020-01-03': [2.5, 3.5, 4.5]}
]
argmin(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
return_most_recent: bool = True,
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0
) → ts[Union[datetime, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- return_most_recent: if True, in the case of a tie, the most recent time will be returned. If false, the least recent time will be returned.
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid.
Returns:
- a time-series of rolling argmin values over the interval, returned as a datetime or NumPy array of np.datetime64 objects. If no data is present or NaN invalidation occurs, the default time '1970-1-1 00:00:00' is returned.
See argmax
argmax(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
return_most_recent: bool = True,
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0
) → ts[Union[datetime, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- return_most_recent: if True, in the case of a tie, the most recent time will be returned. If false, the least recent time will be returned.
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling argmax values over the interval, returned as a datetime or NumPy array of np.datetime64 objects. If no data is present or NaN invalidation occurs, the default time '1970-1-1 00:00:00' is returned.
Starttime: 2020-01-01 00:00:00
1. Default behavior
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 1, '2020-01-04': nan, '2020-01-05': 4}
argmax(x, 3)
{'2020-01-03': '2020-01-02', '2020-01-04': '2020-01-02', '2020-01-05': '2020-01-05'}
argmin(x, 3)
{'2020-01-03': '2020-01-03', '2020-01-04': '2020-01-03', '2020-01-05': '2020-01-03'}
2. NumPy example
x_np = {'2020-01-01': [1,2], '2020-01-02': [2,1], '2020-01-03': [3,0]}
argmax(x_np, 3, 2)
{'2020-01-02': ['2020-01-02', '2020-01-01'], '2020-01-03': ['2020-01-03', '2020-01-01']}
argmin(x_np, 3, 1)
{'2020-01-02': ['2020-01-01', '2020-01-02'], '2020-01-03': ['2020-01-01', '2020-01-03']}
3. return_most_recent=False
argmin(x, 3, return_most_recent=False)
{'2020-01-03': '2020-01-01', '2020-01-04': '2020-01-03', 2020-01-05: '2020-01-03'} # Note how the first element is '2020-01-01', not '2020-01-03'
rank(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
method: str = "min",
ignore_na: bool = True,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
min_data_points: int = 0,
na_option: str = "keep"
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use 100-tick rolling interval with no output until we have 50 ticks
-
method: the method to use to rank groups of records
that have the same value
-
"min"
: the lowest rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=1 -
"max"
: the highest rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=3 -
"avg"
: the average rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=2 - By default, the "min" method is used.
-
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, nan is returned.
-
na_option
: how to rank a nan value when it is the last value to be ranked-
"keep"
: return a nan rank for a nan value -
"last"
: rank the last non-nan value present in the interval - By default, the "keep" option is used.
-
- Output: a time-series of rolling ranks over the interval, where a rank of 0 means that the current (last) ticked value is the smallest in the given interval.
Starttime: 2020-01-01 00:00:00
1. Default behavior
x = {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 2, '2020-01-04': 5, '2020-01-05': 4}
rank(x, 5, min_window=3)
{'2020-01-03': 1, '2020-01-04': 3, '2020-01-05': 3}
2. NumPy example
x_np = {'2020-01-01': [1,2], '2020-01-02': [3,2], '2020-01-03': [2,1]}
rank(x_np, 3, 2)
# Note how the second element at '2020-01-02' is 0, not 1, as by default the "min" method is used
{'2020-01-02': [1, 0], '2020-01-03': [1, 0]}
3. "keep" vs "last" NaN option
x = {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 2, '2020-01-04': nan, '2020-01-05': 4}
rank(x, 5, min_window=3, na_option="keep")
{'2020-01-03': 1, '2020-01-04': nan, '2020-01-05': 3}
rank(x, 5, min_window=3, na_option="last")
# the last valid value, 1, is ranked at '2020-01-04'
{'2020-01-03': 1, '2020-01-04': 1, '2020-01-05': 3}
var(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ddof: int = 1,
ignore_na: bool = True,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ddof: delta degrees of freedom. Example: if ddof=1, then normalization term is 1/(N-1). If ddof=0, then 1/N.
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted variance (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling variance over the interval. If insufficient samples for given ddof, then no value output is generated. Since the smart mean is being used, overflow is not a problem.
See Standard Error.
stddev(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ddof: int = 1,
ignore_na: bool = True,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ddof: delta degrees of freedom
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted standard deviation (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling standard deviations over the interval. If insufficient samples for given ddof, then no value output is generated.
See Standard Error.
sem(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ddof: int = 1,
ignore_na: bool = True,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
): → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ddof: delta degrees of freedom
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted standard error (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling standard errors
Starttime: 2020-01-01 00:00:00
1. Variance
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
var(x, interval=3, min_window=2)
{'2020-01-02': 0.5, '2020-01-03': 1.0, '2020-01-04': 0.5, '2020-01-05': 2.0}
2. Biased variance
var(x, interval=3, min_window=2, ddof=0, ignore_na=True) # biased
{'2020-01-02': 0.25, '2020-01-03': 0.666, '2020-01-04': 0.25, '2020-01-05': 1.0}
3. Standard deviation including NaNs
stddev(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 0.707, '2020-01-03': 1.0, '2020-01-04': nan, '2020-01-05': nan}
4. Standard error with triggering
trigger = {'2020-01-03': True, '2020-01-05': True}
sem(x, interval=timedelta(days=3), min_window=timedelta(days=2), trigger=trigger)
{'2020-01-03': 0.707, '2020-01-05': 1.0}
cov(
x: ts[Union[float, np.ndarray]],
y: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ddof: int = 1,
ignore_na: bool = True,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
): → ts[Union[float, np.ndarray]]
Args:
- x: time-series data. If x is of type np.ndarray, then the covariance calculation is performed element-wise with the corresponding values in y.
- y: time-series data that ticks in sequence with x
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ddof: delta degrees of freedom
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted covariance (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling covariances between x and y
See Correlation.
corr(
x: ts[Union[float, np.ndarray]],
y: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
): → ts[Union[float, np.ndarray]]
Args:
- x: time-series data. If x is of type np.ndarray, then the correlation calculation is performed element-wise with the corresponding values in y.
- y: time-series data that ticks in sequence with x
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted correlation (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling Pearson correlation coefficients between x and y
Starttime: 2020-01-01 00:00:00
1. Covariance
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': 4, '2020-01-05': 5}
y = {'2020-01-01': 5, '2020-01-02': 4, '2020-01-03': 3, '2020-01-04': 2, '2020-01-05': 1}
cov(x, y, interval=3, min_window=2)
{'2020-01-02': -0.5, '2020-01-03': -1.0, '2020-01-04': -1.0, '2020-01-05': -1.0}
2. Correlation
corr(x, y, interval=3)
{'2020-01-03': -1.0, '2020-01-04': -1.0, '2020-01-05': -1.0}
skew(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
bias: bool = False,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
): → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
- bias: if True, calculates a biased (unadjusted) skew. If false (default), calculates a Gaussian-unbiased measure.
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted skew (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling sample skew measures, using the adjusted Fisher–Pearson standardized moment coefficient.
See Kurtosis.
kurt(
x: ts[Union[float, np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
excess: bool = True,
bias: bool = False,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
): → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data.
If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
- excess: if True (default) uses the definition of excess kurtosis (kurt - 3). If false, uses the standard definition.
- bias: if True, calculates a biased (unadjusted) kurtosis. If false (default), calculates a Gaussian-unbiased measure.
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted kurtosis (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of rolling sample kurtosis measures, using the adjusted Fisher–Pearson standardized moment coefficient.
Starttime: 2020-01-01 00:00:00
1. Skew
x = {'2020-01-01': 1, '2020-01-02': 2, ..., 2020-01-10: 10}
skew(x, interval=7)
{2020-01-07: 0, 2020-01-08: 0, 2020-01-09: 0, 2020-01-10: 0}
2. Kurtosis
kurt(x, interval=7) # excess kurtosis
{2020-01-07: -1.2, 2020-01-08: -1.2, 2020-01-09: -1.2, 2020-01-10: -1.2}
ema(
x: ts[Union[float, np.ndarray]],
min_periods: int = 1,
alpha: Optional[float] = None,
span: Optional[float] = None,
com: Optional[float] = None,
halflife: Optional[timedelta] = None,
adjust: bool = True,
horizon: int = None,
ignore_na: bool = False,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
-
x: the time-series data
-
min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
-
alpha: the EMA weight parameter specified directly. If adjust = True, EMA is calculated such that
$$EMA(t) = \frac{\sum\limits_{t=-n}^{0} (1-\alpha)^{-t} x(-t)}{\sum\limits_{t=-n}^{0} (1-\alpha)^{-t}}$$ If
adjust = False
, EMA is calculated such that$$EMA(t) = (1-\alpha)EMA(t-1) + \alpha x(t)$$ $$EMA(t=0) = x(0)$$ By default, adjust = True, to give better estimates for starting intervals.
The following are alternative methods to specify the $\alpha$ parameter.
-
span: specify alpha in terms of span, such that
$$\alpha = \frac{2}{span+1}$$ -
com: specify alpha in terms of centre of mass, such that
$$\alpha = \frac{1}{1+com}$$ -
halflife: Halflife is different from the other parameters. Half-life is a timedelta argument that specifies the half-life of observation weights. Half-life is useful when observations are irregularly spaced and a better estimate is needed to properly weight more recent data. Let $t_{-1}$ be the time of the last observation.
Then:
$$\lambda(t) = 1 - \exp(\frac{-(t-t_{-1})*\ln(2)}{halflife})$$ $$EMA(t) = \frac{ \lambda(t)*EMA(t-1) + x(t)}{\text{normalization constant}}$$ Something to note is that the
ignore_na
flag does not matter if a halflife interval is specified. The behavior would be the same in both cases, since an absolute time interval is being used to re-weight the moving average, not a tick interval.Exactly one of alpha, span, com, halflife must be given
-
-
adjust: if True, early observations are adjusted to give a more "smoothed" estimate of the EMA. The difference is that if
adjust=True
, then each new observation receives a relative weight of 1. If adjust = False, each new observation receives a relative weight of alpha.-
adjust=True
means that:
$$EMA(t) = \frac{x(t)+(1-\alpha)x(t-1)+(1-\alpha)^2 x(t-2) + ... + (1-\alpha)^n x(t-n)}{1+(1-\alpha)+(1-\alpha)^ 2 + ... + (1-\alpha)^n}$$ -
adjust=False
means that:
$$EMA(t) = \frac{\alpha * x(t) + \alpha * (1-\alpha) * x(t-1) + \alpha * (1-\alpha)^2 * x(t-2) + ... + \boldsymbol{(1-\alpha)^n x(0)}}{\alpha+\alpha*(1-\alpha)+\alpha*(1-\alpha)^ 2 + ... + (1-\alpha)^n}$$ $$\text{and thus } EMA(t=0) = x(0)$$ Adjust only applies with tick specified intervals, not time specified intervals. Time specified intervals (i.e. half-life) do not need adjustment as they are, by definition, already adjusted.
-
-
horizon: the maximum number of ticks to use in the computation. For example, if horizon = 10, then only the 10 most recent data points are used. If not specified, all data points for x are used, with early ticks decaying exponentially in weighting. Horizon will be ignored with a half-life (time-based) interval.
- If horizon is set to h, then even if x has more than h ticks the EMA will computed as such if
adjust=True
.
$$EMA(t) = \frac{\sum_{t=-h}^{0} (1-\alpha)^{-t} x(t)}{\sum_{t=-h}^{0} (1-\alpha)^{-t}}$$ - The only difference if
adjust=False
is that the first ever tick, while in the window, receives weight 1 at the start instead of weight$\alpha$ like the rest of the values.
- If horizon is set to h, then even if x has more than h ticks the EMA will computed as such if
-
ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position, and renormalized as such.
- For example, let us consider a dataset (1,nan,2) using
adjust=True
.- If
ignore_na=True
then the weighting is based on relative position as such:$$EMA(t=2) = \frac{(1-\alpha)*1 + 2}{(1-\alpha)+1}$$ - If
ignore_na=False
then the weighting is based on global position as such:$$EMA(t=2) = \frac{(1-\alpha)^2*1 + 2}{(1-\alpha)^2+1}$$
- If
- For example, let us consider a dataset (1,nan,2) using
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
-
reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation.
-
recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
- Note: only valid when a finite-horizon EMA is used.
-
min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of exponentially-weighted moving averages over the interval.
Starttime: 2020-01-01 00:00:00
1. Unadjusted EMA
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': 4, '2020-01-05': 5}
ema(x, alpha=0.1, adjust=False) # unadjusted
{'2020-01-01': 1.0, '2020-01-02': 1.1, '2020-01-03': 1.29, '2020-01-04': 1.561, '2020-01-05': 1.9049}
2. Adjusted EMA
ema(x, alpha=0.1, adjust=True) # adjusted, default method
{'2020-01-01': 1.0, '2020-01-02': 1.5263, '2020-01-03': 2.0701, '2020-01-04': 2.6313, '2020-01-05': 3.20971}
3. Finite horizon EMA
ema(x, alpha=0.1, adjust=True, horizon=2) # finite horizon
{'2020-01-01': 1.0, '2020-01-02': 1.5263, '2020-01-03': 2.5263, '2020-01-04': 3.5263, '2020-01-05': 4.5263}
4. Time-based decay EMA
ema(x, halflife=timedelta(days=1)) # time-based
{'2020-01-01': 1.0, '2020-01-02': 1.6666, '2020-01-03': 2.4286, '2020-01-04': 3.2666, '2020-01-05': 4.1613}
5. Unadjusted EMA for NumPy array
x_np = {'2020-01-01': [1,2], '2020-01-02': [4,5], '2020-01-03': [7,8]}
ema(x_np, alpha=0.1, adjust=False)
{'2020-01-01': [1,2], '2020-01-02': [1.3,2.3], '2020-01-03': [1.87,2.87] }
ema_var(
x: ts[Union[float, np.ndarray]],
min_periods: int = 1,
alpha: Optional[float] = None,
span: Optional[float] = None,
com: Optional[float] = None,
halflife: Optional[Union[float, timedelta]] = None,
adjust: bool = True,
horizon: int = None,
bias: bool = False,
ignore_na: bool = False,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
- min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
- alpha, span, com, halflife: as described in EMA
- adjust: as specified in EMA
- horizon: as specified in EMA.
- bias: if True, uses a biased population weighted variance. If false, normalized by a proper debiasing factor.
- ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
-
recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
- Note: only valid when a finite-horizon EMA is used.
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of exponentially-weighted moving variances over the interval.
See Exponential Moving Standard Deviation
ema_std(
x: ts[Union[float, np.ndarray]],
min_periods: int = 1,
alpha: Optional[float] = None,
span: Optional[float] = None,
com: Optional[float] = None,
halflife: Optional[Union[float, timedelta]] = None,
adjust: bool = True,
horizon: int = None,
bias: bool = False,
ignore_na: bool = False,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0,
) → ts[Union[float, np.ndarray]]
Args:
- x: the time-series data
- min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
- alpha, span, com, halflife: as described in EMA
- adjust: as specified in EMA
- horizon: as specified in EMA.
- bias: if True, uses a biased population weighted variance. If false, normalized by debiasing factor
- ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
-
recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
- Note: only valid when a finite-horizon EMA is used.
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of exponentially-weighted moving standard deviations over the interval.
Starttime: 2020-01-01 00:00:00
1. Exp. Moving Standard Deviation
x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
ema_std(x, min_periods=2, span=20, adjust=False, bias=False, ignore_na=False)
{'2020-01-02': 0.707, '2020-01-03': 1.11636, '2020-01-04': 1.11636, '2020-01-05': 1.937005}
2. Exp. Moving Variance
ema_var(x, min_periods=2, span=20, adjust=False, bias=True, ignore_na=False)
{'2020-01-02': 0.086168, '2020-01-03': 0.390588 '2020-01-04': 0.390588, '2020-01-05': 1.644124}
ema_cov(
x: ts[Union[float, np.ndarray]],
y: ts[Union[float, np.ndarray]],
min_periods: int = 1,
alpha: Optional[float] = None,
span: Optional[float] = None,
com: Optional[float] = None,
halflife: Optional[Union[float, timedelta]] = None,
adjust: bool = True,
horizon: int = None,
bias: bool = False,
ignore_na: bool = False,
trigger: ts[object] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
Args:
- x: time-series data. If x is of type np.ndarray, the exponential-moving covariance is calculated element-wise with the corresponding values in y.
- y: time-series data which ticks in-sequence with x
- min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
- alpha, span, com, halflife: as described in EMA
- adjust: as specified in EMA
- horizon: as specified in EMA.
- bias: if True, uses a biased population weighted covariance. If false, normalized by debiasing factor
- ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
-
recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
- Note: only valid when a finite-horizon EMA is used.
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of exponentially-weighted moving covariance over the interval.
cov_matrix(
x: ts[np.ndarray],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ddof: int = 1,
ignore_na: bool = True,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[np.ndarray]
Args:
-
x: the time-series of dimension
(N,)
arrays which representN
variables -
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ddof: delta degrees of freedom
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted covariance matrix (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the statistic, and in doing so clears any accumulated floating-point error.
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of (potentially weighted) covariance matrices, each of which is a NumpyNDArray of dimensionality
(N,N)
corr_matrix(
x: ts[np.ndarray],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
ignore_na: bool = True,
trigger: ts[object] = None,
weights: ts[Union[float, np.ndarray]] = None,
sampler: ts[object] = None,
reset: ts[object] = None,
recalc: ts[object] = None,
min_data_points: int = 0
) → ts[np.ndarray]
Args:
-
x: the time-series of dimension
(N,)
arrays which representN
variables -
interval: the rolling interval over which to use data.
If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
- ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
- weights: a time-series of weights for each observation in x, used to calculate a weighted correlation matrix (optional). Weights do not need to be normalized.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
- recalc: another optional time-series which triggers a clean recalculation of the statistic, and in doing so clears any accumulated floating-point error.
- min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
Returns:
- a time-series of (potentially weighted) correlation matrices, each of which is a NumpyNDArray of dimensionality
(N,N)
Starttime: 2020-01-01 00:00:00
1. Covariance
x = {'2020-01-01': np.array([0., 0., 0.]), '2020-01-02': np.array([1., -1., 2.]), '2020-01-03': np.array([2., -2., 4.])}
cov_matrix(x, 3, ddof=0)
{'2020-01-03': np.array([1, -1, 2],
[-1, 1, -2],
[2, -2, 4])}
2. Correlation
corr_matrix(x, 3)
{'2020-01-03': np.array([1, -1, 1],
[-1, 1, -1],
[1, -1, 1])}
list_to_numpy(x: [ts[float]], fillna: bool = False) → ts[np.ndarray]
Args:
- x: a listbasket of time series
- fillna: If False, unticked elements are treated as NaN. If True, unticked elements will hold their previous value in the array.
Returns:
- a NumPy 1D array where each value corresponds to the element of the listbasket with the same index
numpy_to_list(x: ts[np.ndarray], n: int) → [ts[float]]
Args:
- x: a NumPy array valued time series
- n: the number of output channels in the listbasket Returns:
- a listbasket where each value corresponds to the element of the array with the same index
Starttime: 2020-01-01 00:00:00
1. List to NumPy
x1 = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3}
x2 = {'2020-01-01': 1.5, '2020-01-03': 3.5}
list_to_numpy([x1,x2], fillna=False)
{'2020-01-01': [1, 1.5], '2020-01-02': [2, np.nan], '2020-01-03': [3, 3.5]} # no x2 tick on day 2
list_to_numpy([x1,x2], fillna=True)
{'2020-01-01': [1, 1.5], '2020-01-02': [2, 1.5], '2020-01-03': [3, 3.5]} # holds x2 value for day 2
2. NumPy to list
x_np = {'2020-01-01': [1,2], '2020-01-02': [3,4], '2020-01-03': [5,6]}
numpy_to_list(x_np, 2)
[
{'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 5},
{'2020-01-01': 2, '2020-01-02': 4, '2020-01-03': 6}
]
cross_sectional(
x: ts[Union[float,np.ndarray]],
interval: Union[timedelta, int] = None,
min_window: Union[timedelta, int] = None,
trigger: ts[object] = None,
as_numpy: bool = False,
sampler: ts[object] = None,
reset: ts[object] = None
) → ts[Union[np.ndarray, List[float], List[np.ndarray]]]
Args:
- x: the time-series data
-
interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
- if an int, represents the number of ticks to use
- if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
-
min_window: the minimum allowable interval to use before outputting data
- If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
- If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
-
as_numpy: if True, the data will be returned as a NumPy array instead of a list.
- For a single-valued time series, this is a one-dimensional NumPy array
- For a NumPy array time series, this is a NumPy array of one extra dimension
-
trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
- By default, the trigger is the series itself.
-
sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
- If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
- If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
- If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
- By default, the sampler is the series itself.
- reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
Returns:
- a time-series where each tick contains all the data of x currently within the interval. Use this for custom cross-sectional calculations
Starttime: 2020-01-01 00:00:00
x = {'2020-01-01': 1, '2020-01-01': 2, '2020-01-01': 3, '2020-01-01': 4, '2020-01-01': 5}
cs = cross_sectional(x, interval=3, min_window=2)
cs
{'2020-01-02': [1,2], '2020-01-03': [1,2,3], '2020-01-04': [2,3,4], '2020-01-05': [3,4,5]}
Calculate a cross-sectional mean
cs_mean = csp.apply(cs, lambda v: sum(v)/len(v), float)
cs_mean
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': 3.0, '2020-01-05': 4.0}
Get the results as a NumPy array
cs = cross_sectional(x, interval=3, min_window=2, as_numpy=True)
cs
{'2020-01-02': np.array([1,2]), '2020-01-03': np.array([1,2,3]), '2020-01-04': np.array([2,3,4]), '2020-01-05': np.array([3,4,5])}