Getting Started - Wx-Alliance-Alliance-Meteo/paradis_model GitHub Wiki

Getting Started: Training and Forecasting with PARADIS

Welcome! This guide walks you through setting up your environment, training the model, and generating forecasts using PARADIS.

Pre-requisites

Necessary python packages can be installed via pip install -r requirements.txt

Obtaining a compatible dataset

Download the original dataset from WeatherBench 2:

cd scripts
bash download_dataset.sh OUTPUT_DIR

where OUTPUT_DIR is the destination directory and then preprocess it

python scripts/preprocess_weatherbench_data.py -i /path/to/ERA5/5.625deg_wb2 -o /path/to/ERA5/5.65deg

Running PARADIS

For training and forecasting, the configuration file, located in the config/ directory, provides the default list of hyperparameters and options. A short description of these parameters is provided in that file.

Training

An example running script to generate a training at low resolution can be

# Define the dataset path
root_dir=PATH/TO/DATASET
python train.py \
    dataset.root_dir="${root_dir}" \
    dataset.n_time_inputs=2 \
    compute.batch_size=32 \
    compute.use_amp=True \
    compute.num_devices=1 \
    compute.num_workers=10 \
    training.log_every_n_steps=10 \
    training.print_losses=False \
    training.max_epochs=30 \
    training.dataset.start_date=2010-01-01 \
    training.dataset.end_date=2015-12-31 \
    training.validation_dataset.start_date=2020-01-01 \
    training.validation_dataset.end_date=2020-12-31 \
    training.optimizer.lr=3e-3 \
    training.scheduler.wsd.warmup=0.1 \
    training.scheduler.wsd.decay=0.2 \
    training.loss_function.type=reversed_huber \
    normalization.standard=true

For faster training at low resolutions, you may set the options training.dataset.preload=True and training.validation_dataset.preload=True to keep the dataset in CPU memory and avoid frequent disk reads.

Forecasting

The following script generates a forecast with the above trained model for the year 2020. Results are stored at results/forecast.zarr. A checkpoint from the training section is required.

root_dir=PATH/TO/DATASET
checkpoint_path=PATH/TO/CHECKPOINT
/python forecast.py \
    dataset.root_dir="${root_dir}" \
    dataset.n_time_inputs=2 \
    model.forecast_steps=1 \
    compute.use_amp=True \
    compute.num_devices=1 \
    compute.num_workers=5 \
    forecast.enable=true \
    forecast.start_date=2020-01-01T00:00:00 \
    forecast.end_date=2020-12-31T00:00:00 \
    forecast.output_file='results/forecast.zarr' \
    training.dataset.start_date=2020-01-01 \
    training.dataset.end_date=2021-01-02 \
    init.checkpoint_path=${checkpoint_path} 

Visualizing Results & Post-Processing

A notebook with detailed steps on computing root-mean-square error (RSME) is available in the scripts directory.