Post processing for concatenating model outputs along spatial dimensions - uaf-arctic-eco-modeling/dvm-dos-tem GitHub Wiki

For a "poor man's parallelization" we can write a script that modifies the run-mask and starts many model runs, each with a subset of the run mask enabled. One challenge with this approach is that we end up with many model output files that need to be joined together to fill in the entire output grid. For example if we had a 10x10 grid cell run and we broke that down into 25 separate 2x2 runs, we would have at the end 25 directories of output, each with a different 2x2 region of the grid filled in. In this workflow, we will setup the experiment to

  1. generate some sample outputs
  2. work on developing the process of combining the output files

Setup

We will run this on atlas, although it could be done on a smaller machine. First sign into atlas and setup your environment so you can build and run dvmdostem:

$ ssh -X atlas
$ cd /atlas_scratch/tcarman2/dvm-dos-tem
tcarman2@atlas [output-utils-and-plots]$ source env-setup-scripts/setup-env-for-atlas.sh

Next checkout the desired branch for your experiment and compile the code:

tcarman2@atlas [output-utils-and-plots]$ git checkout master
tcarman2@atlas [master]$ make -j16

Next make some directories for the experiments. In this case we are going to do 2 runs, one with the upper left 3x3 pixels enabled in the run mask and the other with the lower right 2x2 pixels enabled in the run mask. So I will make 2 directories for keeping the outputs:

tcarman2@atlas [master]$ mkdir ../upper-left-outputs
tcarman2@atlas [master]$ mkdir ../lower-right-outputs

Next, I will make two "staging" directories where I can keep the custom run-masks for each run. I am not sure if this is strictly necessary, but I will do it just to be safe that the two runs don't conflict with eachother.

tcarman2@atlas [master]$ mkdir ../staging-upper-left-outputs
tcarman2@atlas [master]$ mkdir ../staging-lower-right-outputs

Now copy the run mask into the staging directories:

tcarman2@atlas [master]$ cp DATA/Toolik_10x10/run-mask.nc ../staging-upper-left-outputs
tcarman2@atlas [master]$ cp DATA/Toolik_10x10/run-mask.nc ../staging-lower-right-outputs

Now I am going to modify each of these copied run masks to get what I want. First reset each mask to all zeros, then we'll turn back on the pixels we want.

tcarman2@atlas [master]$ ./scripts/runmask-util.py --reset ../staging-upper-left-outputs/run-mask.nc
tcarman2@atlas [master]$ ./scripts/runmask-util.py --reset ../staging-lower-right-outputs/run-mask.nc

tcarman2@atlas [master]$ for y in $(seq 0 2); do for x in $(seq 0 2); do ./scripts/runmask-util.py --yx $y $x ../staging-upper-left-outputs/run-mask.nc; done; done;
tcarman2@atlas [master]$ for y in $(seq 8 9); do for x in $(seq 8 9); do ./scripts/runmask-util.py --yx $y $x ../staging-lower-right-outputs/run-mask.nc; done; done;

Maybe we better check to see what we've done:

tcarman2@atlas [master]$ ./scripts/runmask-util.py --show ../staging-lower-right-outputs/run-mask.nc 
========== BEFORE ==================================
** Keep in mind that in this display the origin is the upper 
** left of the grid! This is opposite of the way that ncdump 
** and ncview display data (origin is lower left)!!

'../staging-lower-right-outputs/run-mask.nc'
<type 'netCDF4._netCDF4.Variable'>
int64 run(Y, X)
unlimited dimensions: 
current shape = (10, 10)
filling on, default _FillValue of -9223372036854775806 used

[[0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 1 1]]

Looks pretty good. Now we need to modify the config file and start a run, and then modify the config file again and start the second run. We'll start with the lower right:

Starting the runs

tcarman2@atlas [master]$ vim config/config.js
# Set the lines for run mask and the output directory accordingly:
"runmask_file":       "/atlas_scratch/tcarman2/staging-lower-right-outputs/run-mask.nc",
"output_dir":         "/atlas_scratch/tcarman2/lower-right-outputs/",
# Save and QUIT

Ok, next up is submitting a job to Slurm. For this we'll use a script and submit it to SBATCH. Here is the script:

tcarman2@atlas [master]$ cat srunner.sh
#!/bin/bash -l

# Job name, for clarity
#SBATCH --job-name="pmp-lower-right"

# Reservation
#SBATCH --reservation=snap_8 

# Partition specification
#SBATCH -p main

# Number of MPI tasks
#SBATCH -n 1

echo $SBATCH_RESERVATION
echo $SLURM_JOB_NODELIST

# Load up my custom paths stuff
module purge
module load jsoncpp/1.8.1-foss-2016a netCDF/4.4.0-foss-2016a Boost/1.55.0-foss-2016a-Python-2.7.11
module load netcdf4-python/1.2.2-foss-2016a-Python-2.7.11

mpirun -n 1 ./dvmdostem -l disabled --max-output-volume 25GB -p 100 -e 1000 -s 250 -t 109 -n 91

Pretty straight forward. Next we simply submit that to sbatch as follows:

tcarman2@atlas [master]$ sbatch srunner.sh

And we can check on the job using squeue or by using the "years since disturbance monitoring" script:

tcarman2@atlas [master]$ squeue
         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       3812271      main     bash malindgr  R 2-02:55:46      1 atlas09
       3812272      main     bash malindgr  R 1-16:13:48      1 atlas10
       3812277      main pmp-lowe tcarman2  R       1:22      1 atlas01

To use the monitoring script we have to set the directory so it knows where to look for outputs, specifically the YSD file. This is what it looks like when the run is done:

tcarman2@atlas [master]$ watch ./scripts/ysdmon.py --setd /atlas_scratch/tcarman2/lower-right-outputs/
Every 2.0s: ./scripts/ysdmon.py --setd /atlas_scratch/tcarman2/lower-right-outputs/                                                                                                                     Fri Feb 16 14:15:52 2018

Opening dataset:  /atlas_scratch/tcarman2/lower-right-outputs/YSD_yearly_eq.nc
[[-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- 1100 1100]
 [-- -- -- -- -- -- -- -- 1100 1100]]
Opening dataset:  /atlas_scratch/tcarman2/lower-right-outputs/YSD_yearly_sp.nc
[[-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- 1350 1350]
 [-- -- -- -- -- -- -- -- 1350 1350]]
Opening dataset:  /atlas_scratch/tcarman2/lower-right-outputs/YSD_yearly_tr.nc
[[-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- 1459 1459]
 [-- -- -- -- -- -- -- -- 1459 1459]]
Opening dataset:  /atlas_scratch/tcarman2/lower-right-outputs/YSD_yearly_sc.nc
[[-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- -- --]
 [-- -- -- -- -- -- -- -- 1550 1550]
 [-- -- -- -- -- -- -- -- 1550 1550]]

Looks good. When you have the first run going, you can repeat it for the other corner of the grid using the other staging and output directories that we created.

Saving the outputs

When the runs are done, I first move the outputs over to the /workspace... directory to free up room on /atlas_scratch. I am also going to copy the run-mask.nc and the input vegetation.nc into each output directory as we've found that these files can be handy to have around for plotting, and it makes it less ambiguous where the output data came from:

tcarman2@atlas [master]$ cp DATA/Toolik_10x10/vegetation.nc ../lower-right-outputs/
tcarman2@atlas [master]$ cp DATA/Toolik_10x10/vegetation.nc ../upper-left-outputs/

tcarman2@atlas [master]$ cp ../staging-lower-right-outputs/run-mask.nc ../lower-right-outputs/
tcarman2@atlas [master]$ cp ../staging-upper-left-outputs/run-mask.nc ../upper-left-outputs/

tcarman2@atlas [master]$ mkdir /workspace/Shared/Tech_Projects/dvmdostem/poor_man_parallel_test

tcarman2@atlas [master]$ cp -r ..upper-left-outputs /workspace/Shared/Tech_Projects/dvmdostem/poor_man_parallel_test/
tcarman2@atlas [master]$ cp -r ..lower-right-outputs /workspace/Shared/Tech_Projects/dvmdostem/poor_man_parallel_test/

Post-processing

Need to try using ncos to stitch files together...