Intel Diagnostic Tools - firemodels/fds GitHub Wiki

There are several packages that are part of the Intel oneAPI HPC Toolkit that analyze code performance and structure. Each is described briefly below. At NIST, we typically use the command-line (cl) version of each package to "collect" information during the FDS simulation, and then we use the graphical user interface (gui) to analyze the results.

Intel Inspector on a Linux Cluster:

Intel Inspector is a tool that can help detect improperly coded OpenMP directives.

Compile FDS in debug mode with the following options -shared-intel -check none -O1 -g.
Add the string inspxe-cl -collect ti2 -result-dir my_results -- just before the name of the executable in the srun command in the SLURM batch script. These commands invoke the command line (cl) version of Intel Inspector. The argument ti2 is a particular level of detail. my_results is just a name given to the directory that is created within the current directory to hold the analysis information. You can call it anything you want. Note that the name of the cluster node that runs the case is appended to the directory name.
Run the case using the sbatch command to launch the run script. Make sure that the case is relatively small and that you run just a few time steps. Inspector takes a long time to check for race conditions.
When the case is done, open up the Inspector graphical interface with the command inspxe-gui. You'll need to be at the console of the head node of the cluster to make this work.
Assuming the GUI opens, look at the list of errors and where in the code the errors occur. Typically, errors occur where the same variable is "touched" by multiple OpenMP threads. If you do not have access to the GUI, analyze the results using command line form of Inspector, type:
```
inspxe-cl -report problems -r my_results
```

Intel Inspector in a workstation (from the Graphical User Interface):

Compile FDS in debug mode with the following options -shared-intel -check none -O0 -g, to check for data races you can also use -O1.
Login to the directory where your case is and type inspxe-gui.
Go to the File tab and select new project. Add in the pop up window the FDS executable with full path in the Application window, and your FDS input file name in Application parameters. In the User-defined environment variables select modify and add a variable OMP_NUM_THREADS with 2 or 4 as value. Select Ok.Change the Working Directory to the directory where you have your FDS input file.
Go to the File tab and select new analysis. In the Analysis Type drop down menu select Threading Error Analysis. Move the analysis dial to Detect Deadlocks and Data Races. Click START.
Once the case is run (might take a long time depending on the case and computer). Check on the results for data races detected.

Intel Trace Collector and Analyzer

The Intel Trace Collector and Analyzer are two separate programs with a single purpose---to enable you to visualize the work flow of each MPI process of an FDS simulation.

Compile the fully-optimized code with the option -tcollect.

Add these lines to the SLURM batch script for a job that uses multiple MPI processes:

export LD_PRELOAD=/opt/intel/oneapi/itac/latest/slib/libVT.so
export VT_LOGFILE_FORMAT=stfsingle
export VT_PCTRACE=5

Visualize the results by issuing this command at the console:
```
traceanalyzer job_name.stf
```
The main consideration in tracing FDS is that the trace file can become enormous if you run a long job and trace each and every function and subroutine call. To prevent this, there is a configuration file called fds_trace.conf in the directory Build/Scripts that contains a list of the main subroutines called in FDS. Only these subroutines are traced, keeping the trace file to a reasonable size and enabling you to more easily visualize the work flow. Make sure that the job only runs a handful of time steps, as there's no need to make the trace file bigger than it already is.

The most important graphic in the Trace Analyzer is the timeline. Get this from the Charts menu, Event Timeline. You will first see the entire timeline, but you can click and drag over shorter time intervals to see details. You will also notice that the first time you use the Trace Analyzer, everything is either colored red (MPI) or blue (Application). Go to the chart in the lower left corner and right click on the Groups, and choose to ungroup them. You should see the modules and subroutines you've chosen to trace. Keep ungrouping until you get down to the subroutine level. If you right-click again, you can choose to color the various routines, making it much easier to visualize. Your chosen color scheme will be saved in a file called .itarc in your home directory.

Intel Advisor

This folder contains the optimal build for Intel Advisor, which can assist in determining effective optimization locations for threading and vectorization.

Required Setup

Source advixe-vars.sh, from the installation folder for Inspector. For example:

source /opt/intel19/advisor/advixe-vars.sh

Compile the advise version of FDS using the script make_fds.sh in this directory. The relevant compiler options are listed here.

Recommended Steps

For useful analysis capabilities, using advixe-gui on the platform used for collection is ideal, or at least having access to the used source and executable on the machine where Advisor is intalled.

To setup, use advixe-gui before collecting data. X11 forwarding is necessary if logging in to a remote cluster. Inside the GUI, use the new project option to create a project for FDS, selecting the 'advise' version's executable as the target.

Collection

The base command used on one's platform to run FDS is a proper starting point. mpiexec fds [test case] is a common input, and thus we'll use it for the model here.

Before the fds executable in the run command, such as between mpiexec and the executable, place advixe-cl -collect <analysis-type> [Optional actions] -- . Most frequently, analysis-type can be 'survey'. This obtains a basic look at the program, and can be followed up with 'suitability' after annotation. Note that the others have not been incredibly useful/stable in FDS development. You can learn about annotating here

Thus, a possible input could be:

mpiexec -np 1 advixe-cl -collect survey -- $HOME/firemodels/fds/Build/impi_intel_linux_64_advise/fds_impi_intel_linux_64_advise simple_test.fds

Analysis

Graphical User Interface (Recommended)

Here, the command line interface has not been explored, so advixe-gui is the offering that can be recommended. Open the project file with the GUI to analyze results.

IMPORTANT NOTE

Generally, unless directly interested in testing the suitability of several locations at once, or in vectorization, Amplifier is a more useful tool for FDS development. It can be found in the 'vtune' build folder.

Intel VTune Profiler

Vtune Profiler is included in the Intel oneAPI Base Toolkit. It is most useful for profiling the code; that is, generating a list of the most frequently used subroutines. The easiest way to use it is to

Compile the code in release mode; that is, with full optimizations. Add -g to the argument list so that Vtune can have access to code line numbers.
Create a SLURM script and add the verbiage below to the srun command between the srun options and the full path to the executable:
```
srun ... vtune -quiet -collect hotspots -trace-mpi -result-dir my_results <full path to fds executable> my_job.fds
```
When the job is done, issue the following command at the console:
```
vtune-gui
```
When the graphical user interface opens, click on the folder icon to the left that says something like "Open result". On the file menu, go into the directory you named to store the information. Open up the file with the "vtune" suffix. The most useful information is found by clicking the "Bottoms up" tab, which lists the most CPU intensive routines.