Forecast Evaluation - european-modelling-hubs/RespiCast-SyndromicIndicators GitHub Wiki

Forecasts are evaluated against surveillance reported data in order to assess their performance in different countries and at different time.

Currently we consider two evaluation metrics: the Weighted Interval Score (WIS) and the Absolute Error (AE).

Weighted Interval Score

The Weighted Interval Score is a proper scoring method designed for quantile forecasts. This score is divided into a component representing the sharpness (or uncertainty) of the forecast and penalties for overestimation and underestimation. When considering a single interval of width $\alpha$, the score is calculated as follows:

$$IS_{\alpha}(F, y) = (u - l) + \frac{2}{\alpha} \times (l-y) \times \mathbb{I}(y < l) + \frac{2}{\alpha} \times (y - u) \times \mathbb{I}(y > u)$$

Where $\mathbb{I}$ is the indicator function, $y$ is the reported value, and $l$ and $u$ lower and upper bound of the prediction interval. For $K$ prediction intervals and the median $m$, the WIS is defined as:

$$WIS_{\alpha_{0:K}}(F, y) = \frac{1}{K + 1/2} \times (w_{0} \times |y - m| + \sum_{k}[w_k \times IS_{\alpha_k}(F, y)])$$

Where $w_k$ is the weight of each prediction interval. Here we set $w_k = \alpha_k / 2$ and $w_0 = 0.5$.

Absolute Error

The Absolute Error is defined as the absolute difference between the predicted median and the reported surveillance data: $$AE = |A_t - m_t|$$

How to Interpret Forecast Evaluation

For both the WIS and the AE we compute the logarithm of the ratio between the metric of the baseline model and the selected model. For the WIS, for example, this is defined as:

$$log_2(WIS_{baseline} / WIS_{model})$$

Where $WIS_{model}$ is the WIS of the model under consideration and $WIS_{baseline}$ is the WIS of a naive baseline model. An analogous quantity can be compute for the Absolute Error. It follows that positive (negative) values will indicates better (worse) performance with respect to the baseline (i.e., th log ratio associated to the baseline model is $0$). It's noteworthy that due to the logarithmic transformation, this metric is symmetric around zero, ensuring equitable representation of both positive and negative performance.

References