Submission Format - european-modelling-hubs/RespiCast-SyndromicIndicators GitHub Wiki
Each forecast should be stored as a comma-separated value (CSV) file in your model-output/team-model
folder.
The CSV file must use a standardised file name, and contain specific variable names and values which identify the forecast you are submitting. This allows us to evaluate and compare across forecasts. The automatic check validates both the filename and file contents to ensure the file can be used in the visualization and ensemble forecasting.
File name
Each forecast file within the subdirectory should have the following name format:
YYYY-MM-DD-team-model.csv
The date YYYY-MM-DD
is the origin date of the forecast (i.e., last day of submission window). The team
and model
in this file name must match the name of the model-output
directory this file is in (and correspond to the team_abbr
and model_abbr
parameters in the metadata file).
File format
Required variables
The CSV file must be contain only the following columns (in any order). No additional columns are allowed.
column | column type | description |
---|---|---|
origin_date |
date | Date as YYYY-MM-DD, last day of submission window (Wednesday) |
target |
string | Value must be one of: "ILI incidence", "ARI incidence" |
target_end_date |
date | Date in format YYYY-MM-DD: the last day of the target week (Sunday) |
horizon |
integer | Week ahead from -1 to 4, i.e. target week of the forecast, starting from the week corresponding to the last ERVISS data update |
location |
string | An ISO-2 or ISO 3166-2:GB country code |
output_type |
string | One of "quantile" or "median" |
output_type_id |
string | When output_type = "quantile" , one of the 23 accepted quantiles. When output_type = "median" shall be an empty string |
value |
decimal | The forecasted incidence, a non-negative number of new ILI/ARI cases per $100,000$ in the target week and output type specified |
Notes on each variable
origin_date
This should correspond with the date in the filename: see above. The date must use the format YYYY-MM-DD
and represents the origin date of the forecast (i.e., last day of submission window).
Note: A file with origin_date
and target_end_date
for each submission round is provided here
target
Values in the target
column must be character (string) and must be equal to "ILI incidence" or "ARI incidence"
target_end_date
Values in the target_end_date
column must be a date in the format YYYY-MM-DD
.
This is the date for the forecast target
and will be the Sunday at the end of the week time period. We provide a template CSV to convert between an ISO week and its end date.
Note: A file with origin_date
and target_end_date
for each submission round is provided here
horizon
Values in the horizon
column must be an integer indicating the week ahead to which the forecast is referred. The horizon is computed with respect to the week of the last data update. Consult the forecasting_weeks
file for a correlation between the origin_date and the dates to which the horizons are related. Beginning from forecasting round 11 of the 2023-2024 season, we've introduced the inclusion of horizons 0 and -1. These horizons are also calculated in relation to the week of the last available ground truth target data point.
We use the ISO week format. Each week starts on Monday and ends on Sunday. For more details check the template file for CSV files converting between dates and ISO weeks.
location
Values in the location
column must be one of the ISO 3166-1 alpha-2 (ISO-2) geocodes for EU countries or an ISO 3166-2:GB extended geocode for UK countries. We provide a geocode file to convert between country names and ISO-2 codes or, if using R, you can use the countrycode package.
output_type
Values in the output_type
column are one of
- “median”
- “quantile”
This value indicates whether that row corresponds to a median forecast or a quantile forecast. Median forecasts are used in visualisation as point values, while quantile forecasts are used in visualisation and in ensemble construction, as long as all the quantiles given below are present.
Forecasts must include exactly 1 “median” forecast for each unique combination of location
, target
, horizon
.
output_type_id
When output_type
is set to “quantile”, then output_type_id
must be one of the 23 accepted quantiles in the format "0.###"". Teams should provide the following 23 quantiles:
c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)
i.e.
0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500
0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990
When output_type
is set to median, output_type_id
shall be an empty string.
value
Values in this column should be non-negative decimal value.
- For a “median” prediction,
value
is simply the value of that point prediction for thetarget
,location
, andhorizon
associated with that row. - For a “quantile” prediction,
value
is the inverse of the cumulative distribution function (CDF) for thetarget
,location
,horizon
, andquantile
associated with that row.
Example
The following shows a few lines from an example CSV file complying with the required format:
origin_date,target,target_end_date,horizon,location,output_type,output_type_id,value
2023-12-05,ILI incidence,2023-12-03,1,IT,quantile,0.975,0.973104
2023-12-05,ILI incidence,2023-12-10,2,IT,quantile,0.975,0.982182
2023-12-05,ILI incidence,2023-12-17,3,IT,quantile,0.975,0.99
2023-12-05,ILI incidence,2023-12-24,4,IT,quantile,0.975,1.084212
2023-12-05,ILI incidence,2023-12-03,1,IT,quantile,0.250,0.5046
2023-12-05,ILI incidence,2023-12-03,1,IT,median,,0.701233