Cold Stunning CRPS model ‐ Meeting notes - conrad-blucher-institute/semaphore GitHub Wiki

Overview

CRPS Definition

CRPS stands for Continuously Ranked Probability Score which measures how well a probability distribution matches a real value. For our purposes, we need to know that the CRPS model is composed of 10 ensemble members (aka 10 separate .keras files). Each of these 10 members is fed 100 different input vectors. For each input vector fed to each model, 100 outputs are made. From these 100 outputs, a vertical spread of probable values is made, where each value has the same probability of occurring (each output has a 1 in 100 chance of occurring in this case).

The equations and variables for inputs, outputs, models, and lead times are shown below:

  1. 1 CRPS ensemble model = 10 ensemble members (aka 10 .keras files)
  2. 1 member = 100 outputs
  3. We receive 100 prototypes from TWC. Each prototype is an input vector of values
  4. Therefore, 100 outputs points * 100 input vectors = 10,000 outputs per member
  5. Then, 10,000 outputs per member * 10 members = 100,000 total outputs for a single lead time
  6. Lastly, 100,000 total outputs per lead time * X number of lead times = the total outputs across all members and lead times
  7. Or in full, X number of lead times * 10 CRPS members * 100 input vectors * 100 outputs per input vector

Purpose of the model

The purpose of implementing the CRPS model is to introduce calibrated uncertainty quantification (UQ) for water temperature predictions for all lead times. What this means in simpler terms, is that instead of saying "The temperature will be 80 degrees" we are trying to say "We are confident the temperature will be between 78-82 degrees". Uncertainty quantification simply means we are taking in unknown variables and parameters to produce a likely range of possible values instead of predicting an exact value.

How this differs from the current MRE cold stunning models

There are 3 main differences between CRPS and MRE

  • The models themselves
  • The output of the models
  • The inputs for the models

Addressing the models themselves, each member/model has different weights and biases meaning the same inputs fed to 2 different models, will produce different outputs. Following this, the MRE models have a behavior of "multi-run single-model" where where 1 model is ran many times. In contrast, the CRPS model has a behavior of "multi-run multi-model" where many models (10 members/models) are ran many times.

For the output differences, the MRE models produce 1 output point per input vector, while the CRPS model produces 100 output points per input vector, per member.

Lastly, for the input differences, this is where CRPS and MRE are alike. This means we are using different input vectors per run of the model. For MRE, this means using 100 different input vectors and running the same model 100 different times, while for CRPS this means using 100 different input vectors and running each of the 10 members 100 different times.

Member Characteristics

All members of the CRPS model share the same architecture meaning:

  • All members have the same code
  • All members have the same activation function
  • All members have the same loss function
  • All members have the same inputs

Lastly, each of the 10 ensemble members corresponds to 1 verified time.

Implementation

Inputs

We will need:

  • 24 hours of water and air temperatures ordered from the present to the past (descending order)
  • predicted air temperatures (same count as the lead time)
  • TWC inputs (100 different input vectors/prototypes)

Data Ordering and Validation

TBD.

Database and Architecture Considerations

We need to support multiple keras files per DSPEC. Some possible ways of doing this:

  • Ensemble group ID
  • Member group ID

For storage method considerations we have the options of:

  • storing 1 long row
  • storing multiple rows
  • making a new output table for large ensemble outputs

We are currently leaning towards a single run of the CRPS model to be 1 row. Meaning running each of the 10 members produces 10 rows.

For what we are actually storing, we have the options of:

  • storing raw predictions
  • storing post-processed predictions

Keras File Compatibility Implementation

See https://github.com/conrad-blucher-institute/semaphore/issues/773.

To support keras files, we have decided to follow the proposed solution below:

    """Private method to load a model saved as an h5 or keras file designated
    in the dspec file using TensorFlow/Keras
    """
    model_folder = construct_true_path(getenv('MODEL_FOLDER_PATH'))
    
    # Check if the path already has an extension
    if modelPath.endswith('.h5') or modelPath.endswith('.keras'):
        model_file_path = model_folder + modelPath
    else:
        # Try .keras first, then fall back to .h5
        keras_path = model_folder + modelPath + '.keras'
        h5_path = model_folder + modelPath + '.h5'
        
        if path.exists(keras_path):
            model_file_path = keras_path
        elif path.exists(h5_path):
            model_file_path = h5_path
        else:
            raise Semaphore_Exception(
                f'Model file for {modelPath} not found! '
                f'Searched for:\n  {keras_path}\n  {h5_path}'
            )
    
    if not path.exists(model_file_path): 
        raise Semaphore_Exception(f'Model file {modelPath} not found at {model_file_path}!')
    
    return load_model(model_file_path, compile=False)

Multi-Member DSPEC Implementation

See https://github.com/conrad-blucher-institute/semaphore/issues/775.

The proposed solution is to add something along the lines of

"ensembleInfo": {
    "isEnsemble": true,
    "memberCount": 30,
    "filePattern": "{modelFileName}_member{i}.keras"
}

to dspec files then refactoring dspecParser and make_prediction() to loop over the member count and load each member.

Multi Model Data Handling Implementation

See https://github.com/conrad-blucher-institute/semaphore/issues/784.

The proposed solution is to backup and reset the data base to be empty to start fresh instead of doing a very large database migration. This is because for actually storing data, we can pickle and depickle data into bytea for efficient storage. This would require us to also refactor semaphore's internals to deal with tensors to represent data.

Some other considerations for this implementation are:

  • what to do with meta data? We only want to pickle data only since pickling is a costly operation
  • can a second outputs table be made to support more complex outputs while keeping the old outputs table? This would allow current models to function as is.

Visualizations

Fan/Ribbon Graph

This graph will have

  • 1 line that traces the median of all verified times. That is, the median of the 10,000 outputs per verified time.
  • a light-colored ribbon around the 5th/95th percentiles
  • a dark-colored ribbon around the 25th/75th percentiles
  • a potential NDFD predictions line These metrics are computed over the 10,000 values for a single verified time

Box Plot

This graph will have a box plot composed of

  • the median
  • a box around the 25th/75th percentiles
  • fences stretching to the min and max These metrics are computed over the 10,000 values for a single verified time

TWC Spaghetti Graph

This graph will have 100 lines for the median of each of the 100 input vectors across all verified times.

Optional Model Member Spaghetti Graph

This graph will have 10 lines where each line represents the median of all outputs for a single member.

Other

References

Older Notes

Jeret Presentation Notes: CRPS_and_NLL_Experiments_Details.pptx CRPS and NLL Experiments -> Title CRPS -> Continuously ranked probability score NLL -> Negative Log Likelihood ^ Use two different activation functions 1 CRPS Model = 10 members = 10 Keras Files 1 NLL Ensemble Model = 30 Members = 30 Keras Files What Makes members a part of a certain ensemble model? Each member for the ensemble model has the same architecture (code, activation function, loss function, same inputs) and the difference is the random weights and biases from the model training. Each ensemble model is for one time stamp. Inputs: 24Hrs of water temp and air temp current to past Predicted air temperature with the same number as the lead time Is this also TWC stuff… -> more clarification on this Ordering: It SEEMS everything is in ascending order.. We have requested that they double check CRPS outputs 100 predictions per member NLL output 1 prediction per member (cold stunning team’s slide more detailed) Dr. Tissot Notes: Input vector set rather than ensemble input? Distribution of input vector! Since this denotes the togetherness of the input vector. We perhaps need to come up with more terminology for these things (100 outputs, etc) for discussion and also for how we store them in the database. Will need to support providing more than one .keras file for one dspec. Ensemble group id and member group id? To change database architecture or to store the output in one long row Two different output tables one with the longer string for the ensemble outputs Is that worth it or do it how we normally do it and then Do a report on the projected DB growth Miss Tissot thinks we need to stick with if we run a single model single row output SO 30 members would have 30 rows in the database. Something we would want to do is be able to identify if this model run was a part of something, an ensemble run? Etc. Miss Tissot is also wondering if we should calculate and store some of the calculated values based on the model outputs to be stored in the semaphore database Storing raw and post processed predictions CRPS uses a cron job to calculate the kinds of post processing that the cold stunning team wants, stored in semaphore database so that Flare isn’t doing too much work when querying data from semaphore. There might be a way we can triggers this like right after the models run and or smth etc.

Dr. Tissot’s dream is to have the spaghetti, the ribbon, and the box plot for this model. But unsure if the spaghetti diagram would be too visually busy with the way these models work. Leaning towards the CRPS model unless there are big advantages to the NLL in semaphore implementation. Cold stunning might have specifications for the visualization of the data visa via color, etc. Transfer to flare what’s needed to do a box plot Dr. Tissot will continue to think about the spaghetti