CRPS Model support implementation design decisions - conrad-blucher-institute/semaphore GitHub Wiki

This page document the design decisions made for Semaphore and Flare changes to support CRPS-type models.

Database changes

See also issue 784.

We will create a new outputs table with an associated TOAST table for storing outputs. The new data type for the value column is BYTEA and we will remove the ensemble ID column since with the new prediction shape this will become irrelevant.

The data value for a model will be stored in an NDArray with the following shape: (m = Number of Model Members, i = number of input vectors, 0 = number of outputs per model member run). The shape of the NDArray output for any given model will make it explicit what type of models we are dealing with:

Older Models: (1,1,1) - one h5 file, 1 input vector, 1 output per model member run MRE: (1,100,1) - one h5 file, 100 input vectors, 1 output per model member run CRPS: (10, 100, 100) - 10 keras files, 100 input vectors, 100 outputs per model member run

The output of each model will be stored in a single row in the outputs table for any given lead time of models like the CRPS models or MRE models. The database will be configured so that outputs for both the MRE and the CRPS models are stored in the TOAST table (the threshold will be set to just above the space to store a single output from any of the (1,1,1) models.

Eventually, we will store statistics related to a model run (especially CRPS but possibly MRE as well) so that we do not have to deserialize the entire NDArray to extract values and calculate statistics.

DSpec Changes