Time Slicer Average - electricitymaps/electricitymaps-contrib GitHub Wiki
Model overview
The Time Slicer Average (TSA) is an estimation model that does not require a training phase as it uses the dataset directly to compute estimates for missing data points. It does this by using neighbouring observations taken from “slices” of time to compute an average which is uses to fill in the missing data.
Detailed description
The easiest way to illustrate how the model works is with a simple base case. Let’s consider an arbitrary hourly gap in a dataset:
What the TSA model will do is look for nearby data points observed at the same hour in the day as the missing data point but from different days in the same month.
Using the selected points from the same hour but on different days in the same month, it will compute an average of the selected points and fill in the gap with this average.
The model uses this process for all missing data. In the case where there aren’t enough points in selected “slices'' of time (due to other gaps in the selected “slices”), the model will progressively iterate over wider windows of time until at least 3 observations exist in the selected time “slices” for a window of time. It will then use the average of those selected observations in the chosen window to fill in the missing data.
Note: the same mechanism can be applied if the gap is open-ended. In that case the averaging will be computing with values preceding the missing open gap.
Detrended extension
Sometimes, the estimates generated by the TSA model are not continuous with the bounds of the data gap. That happens when the data points used to generate the average are not at the same "level" as the bounds of the data gap that needs to be filled. The following is a typical example:
To align, or "detrend", the estimates we use the following procedure,
which ensures that the estimates are continuous with the bounds of the observed gaps.