The Charm++ runtime system has tools to help analyze parallel performance. The main tool is Projections. The use of projections is described in the Charm++ manual. A key step for getting projections data is adding the "--enable-tracing" flag to the charm build command

Table of Contents Compiling for Projections Running for Projections Running the Projections GUI Getting Load Balancing Information Using TAU Performance Analysis with ChaNGa

Compiling for Projections

The "configure" script has an "--enable-projections" option which adds link options to enable ChaNGa to produce projections data when it is run.

Running for Projections

When the projections capable executable is run, it will generate ".log.gz" files, one file for each processor, and a .sts file. By default, these files end up in the directory in which the executable resides. Note that these can get quite large, so keep the run short. To change the directory, DIR, where these files go, use the option +traceroot DIR. Also, in order to prevent the projections logging from impacting the performance of the program, an option +logsize <number of log entries> to increase the buffer size of the logging information. The default is currently set at 1,000,000 which means approximately 80-90 MB of a core's memory is reserved for projections buffers. To determine the log size actually needed, first make a run and examine the log files as follows.

run grep ^8 *.log. If anything shows up, this means that at least one processor was forced to flush it's performance logs.
If something does show up then zcat *.log.gz | wc -l will tell you how big a +logsize to use to prevent log flushing from impacting performance.

ChaNGa will also report at the end of the run whether logsize needs to be increased.

For -tracemode summary, the option is +bincount <number of bins>. The default is 10,000. To determine if a re-bin will be forced mid-run (and hence affect the processor's performance), it is simply good enough to find out for how long the application ran. As long as <number of bins> X <bin size> (default 1 ms) is a duration longer than an application's run time, no mid-run interference will occur. So, in the default case, an application can run for as long as 10 seconds without a re-bin.

Running the Projections GUI

This is a java program that can be started with charm/tools/projections/bin/projections. The .sts file can be given as an argument.

The menu items under tools include:

Graph: this is very memory intensive. It plots processor usage or messages as a function of either processor or interval. (Not particularly useful)

Timelines: This is also very memory intensive. For each processor, this gives a timeline of entry methods that were executed. This is useful to see exactly the sequence of events on each processor.

Usage Profile: This gives a profile of the processor utilization over the selected interval. As well as a bar graph, it can give a table of the utilization of each entry point.

Overview: This gives a processor utilization overview. As a function of time and processor number, the utilization will be shown as a color. Colors can also designate the entry point being executed. Note that when you switch from utilization to entry points, the log files get reread, and this takes time.

Time Profile Graph: This gives an overall time profile of entry points being executed. As a function of time, the execution time spent in each entry point across the entire selected processors is plotted, with each entry point getting a different color.

Getting Load Balancing Information

Dynamic load balancing is a key feature of Charm++ available to ChaNGa. To get information on what the load balancer is doing use the following options.

+LBDebug <number> where the higher the number, the more debugging information you will receive on stderr and stdout.

The type of information probably varies with the choice of load balancer, but for RefineLB, one gets an estimate of the load on each processor, with the background load in parenthesis. One also gets a report on which pieces migrate to which processors, and a mapping of pieces to processors.

Using TAU Performance Analysis with ChaNGa

TAU is a parallel performance analysis tool that can also be used with ChaNGa. TAU has a graphical user interface that allows the user to quickly identify performance bottlenecks.

In order to use it, source must be downloaded from the above site, and Charm++ needs to be built referring to the TAU libraries. See the build instructions for NAMD on the TAU wiki.

Performance Analysis - N-BodyShop/changa GitHub Wiki

Table of Contents

Compiling for Projections

Running for Projections

Running the Projections GUI

Getting Load Balancing Information

Using TAU Performance Analysis with ChaNGa

⚠️ GitHub.com Fallback ⚠️

Performance Analysis - N-BodyShop/changa GitHub Wiki

Table of Contents

Compiling for Projections

Running for Projections

Running the Projections GUI

Getting Load Balancing Information

Using TAU Performance Analysis with ChaNGa

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️