Performance Notes - ESCOMP/CTSM GitHub Wiki

CTSM Performance/Cost

Keith ran a set of simulations with ctsm1.0.dev105 in July, 2020 to obtain cost and performance numbers. Results shown below.

CTSM1.0.DEV105 Cost and TOT Run Time

CLM5 Performance/Cost as a function of NTASKS

Keith Oleson did an analysis in April, 2018 of cost and performance of CLM5 for various compsets and resolutions (1deg and 4x5). 3 ensembles of runs at 1deg and 1 ensemble of runs at 4x5 were conducted. Here are the results of that analysis.

The first table below shows Cost and LND run time for various configurations and resolutions at the default setting for NTASKS (60 for 1deg, 50 for 2deg, 16 for 4X5). One ensemble member for each.

The second table/plot shows Cost and TOT (total), ATM, LND run time for the BGC/Crop configuration at 1 deg for various NTASKS. Shown is the average results for three ensembles of runs.

The third table/plot shows Cost and TOT (total), ATM, LND run time for the BGC configuration at 4x5 for various NTASKS (one ensemble member).

Default CLM5 (NTASKS=60) Cost and LND Run time

CLM5 Cost and Performance at 1deg

CLM5 Cost and Performance at 4x5

Cost increases from CLM4 to CLM5

Bill Sacks did an analysis November 12, 2015 of cost increases from CLM4 to what was then the out-of-the-box version of CLM5. Here are the results of that analysis:

In clm4_5_3_r149, CLM4CN took 0.328 sec/mday, whereas the default CLM5 configuration - CLM50%BGC-CROP with CISM1 (glc_mec) - took 4.002 sec/mday. This is an increase of 12.2x. This can be broken down as follows:

(Timings are 'LND Run' times from 20-day runs with no output (REST_OPTION=never, but no other changes - so hbuf is still activated), at f09_g16 with 600 tasks and 1 thread (except datm, which had 30 tasks, with ROOTPE_ATM=600), cold start 1850 runs. Unless stated otherwise, timings were done from clm4_5_3_r149.)

  • As of clm4_0_60 (clm4.5 first brought to trunk), CLM45CN was 1.89x the cost of CLM4CN. (Note: For the clm4_0_60 run with CLM45CN, I used the surface dataset created in clm4_0_80; earlier f09 surface datasets did not appear to be set up properly for CLM4.5.)

  • The addition of the fire model (in clm4_0_80) made CLM45CN performance significantly worse, mainly due to the cost of reading the lightning stream. This seems to be the main factor responsible for the cost increase between clm4_0_60 and the cesm1.2.0 release (roughly clm4_5_07); however, it may not have been the sole factor. The cost increase for a CLM45CN run between clm4_0_60 and cesm1.2.0 was 1.45x. However, this is not an apples-to-apples comparison, because there were likely Machines and other changes between these two points.

  • Adding the additional memory needed for dynamic landunits increased the cost by about 1.1x for non-crop runs (this was done in clm4_5_43).

    • Update: the 1.1x number was generated from a CLM45BGC run, I think, so may be an overestimate for CLM45CN
  • As of clm4_5_3_r149, CLM45CN is 2.70x the cost of CLM4CN. This is slightly less than you would get by multiplying the above numbers (1.89*1.45*1.1 = 3.01).

  • As of clm4_5_3_r149, CLM45BGC is a further 1.35x the cost of CLM45CN

  • CLM50BGC is only 1.074x the cost of CLM45BGC, if using the old vertical soil layer structure

  • The new vertical soil layer structure (CLM5 default) increases the cost by 1.31x

  • Adding crop (CLM50%BGC-CROP relative to CLM50%BGC) increases the cost by 2.27x. Much of this is due to 0-weight (inactive) crop columns, which are added for the sake of dynamic landunits. If you just allocate memory for non-zero-weight crop columns, the cost increase due to crop is more like 1.5x.

  • Adding glc_mec increases the cost by 1.045x

  • From a first pass, there do not appear to have been other major contributors to the cost increase. In particular, it appears that the major refactorings done to CLM45 in the last couple of years have NOT had a significant performance impact. (However, my rough analysis could have missed changes, particularly if there were some increases compensated for by other decreases.)

Note that (by construction), multiplying the above factors, we get 2.70*1.35*1.074*1.31*2.27*1.045 = 12.2 - which is the total cost increase from CLM4CN to the present default configuration.

List of expected and potentially unavoidable cost increases

(From Dave Lawrence 2015-11-11)

  • Lake model (6x more lake points, though some of these replace wetland: 2.5x more (lake+wetland))

  • Methane (new model, could it be optimized?)

  • Crop model (new capability will incur a cost, but could potentially be reduced)

  • Nearly 2x more urban columns