3 Running your first scenario permutation - jayfuhrman/gcam-hpc-starterpack GitHub Wiki

To run parallelized GCAM scenario permutations, we use two xml file types:

1. base configuration file

This file specifies paths to a base set of input files, as well as general model settings (e.g., how many model periods to run) to be read in for ALL scenarios in your permutation.

2. permutation file

This file specifies groups (ComponentSets) of additional sets of input files (FileSets), to be read in "on top of" your base configuration file. Each combination of FileSets from all ComponentSets will become its own scenario. For example: if you want to run a sensitivity analysis of high and low costs of solar PV, you will create two FileSets, with subelements containing paths to modified input files for the high and low-cost Solar PV. These FileSets should in turn, be sub-elements of a Component-Set for SolarPV which contains all the FileSets you want (in this case 2, for a total of 2 scenarios). If this is confusing, an example file, "batch_exemplar.xml" is provided in the configuration-sets directory.

You can also create an additional ComponentSet which contains, for example, high, medium and low cost for offshore wind. This allows you to permute all combinations of cost assumptions for SolarPV (2) and offshore wind (3), for a total of 6 scenarios. You can add even more Component-Sets too, but note the factorial growth, so it is good to think carefully about designing the permutations to manage the number of simultaneous runs you're doing to what you actually need. If you want to do 10 or fewer runs, this is generally not an issue, but Rivanna is a shared computing resource, so larger sets of simultaneous runs may spend a longer time in the queue before they start, or error out all-together with a message like: "requested node configuration is not available". This generally indicates you should reduce the number of simultaneous runs in your permutation.

In this repository's configuration-sets directory, two additional files are provided.

To run the gcam reference scenario, use configuration_empty_scenario_components.xml as your base configuration file, and reference_batch.xml as your permutation file (even though in this case, you will only be doing one run). This setup should work as of gcam version 5.3, but you can adapt to other versions by simply copying and pasting the input file path contents of the exe/configuration_ref.xml ScenarioComponents into the reference_batch.xml file provided here.

3. Starting a new set of runs from the command line

You can start your first run from a terminal window:

cd GCAM-core

./master.sh configuration-sets/[your base configuration file] configuration-sets/[your permutation file]

This will start a program which walks you through generating separate config. files for each scenario you want, as well as confirming the number of scenarios you're running. (Generally hit "y" through each step). At this stage, though, it is a good idea to make sure the number of runs you think you are running, matches up what the program thinks. If not, you can cancel (ctrl + c) and modify your permutation file. After you've confirmed "y" at each step, the program will copy all the files you need to your "scratch" directory and begin the job.

If everything works, the script will output a message "we're off and running with job #########"

After a few minutes, you may want to confirm the runs have started and all files have read in successfully by checking for a bunch of new "exe_n" directories (each containing an individual scenario) and/or opening an "output_n.txt" file contained within these directories in scratch. Note the difference in directory paths to access scratch. If the model errored out because you messed up a file path, it's a lot better to find out now rather than the next morning when you were hoping to have results to process!

After a few hours (or overnight) the runs should have completed, and any queries you specified will be written into separate csv files in each exe_n directory. In the meantime, you can check the status of your model run on the Rivanna using the command squeue -u [your computing id]

Note that each time you do a new scenario permutation, all exe_n directories will get overwritten by your new runs. If you want to come back to the data from your current run later for further processing, you may want to create an "archive" or "old_runs" directory in scratch and cut and paste all of the exe_n directories you want to keep there to "hide" them from being overwritten by your new runs (this is may be easier to do in FastX). These directories have many GB of data in each one, so "cutting" as opposed to copying and pasting into your archive folder will be much faster