OpenMP Notes - firemodels/fds GitHub Wiki

Starting with the release of FDS 6.1.0 the default version of FDS includes OpenMP parallelization. Unlike MPI parallelization, OpenMP does not require you to split up the computational domain into individual meshes. But since OpenMP is a shared memory parallelization, it is limited to the resources of one machine, whereas MPI can take advantage of multiple machines connected over a network.

By default, an OpenMP version of FDS should use some fraction of the available cores associated with the CPU, usually four. The number of available "threads" is indicated by FDS at the start of the run. You can just type the name of the executable if you want to see how many threads are available.

Limitations and recommended settings

Most processors today offer virtual threads or so called hyperthreading/SMT. So far all benchmarks performed have shown that hyperthreading is detrimental to OpenMP performance in FDS.

The degree of parallelization increases with larger cell counts. So larger simulations will see a greater speedup. But at some point the performance will top out, on a dual socket Xeon X5570 this occurred somewhere between 0.5 and 2 million cells. Depending on cache sizes, memory bandwidths etc. this may be different for individual users.

The degree of parallelization lies somewhere between 40 and 80 percent. According to Amdahl's Law you will see a stark decrease in the return of investment as you add more threads. In most cases your computational efficiency (speedup/threads) will drop below 50 percent once you pass four threads. If you can run two simulations at the same time with four threads each instead of one with eight threads you will be making better use of your power bill.

When using MPI parallelization you can also use OpenMP. Here you will want to limit the number of threads used by each MPI process. With P as the number of MPI processes launched per machine, T as the number of threads per MPI process and C as the number of physical cores of your machine, choose T such that: P*T=C.

Parallelization with MPI will always deliver greater speedups than OpenMP given the same number of cores to run on. So if you can safely use MPI (and still obtain valid results) you should do so. If you have additional computational resources you can add OpenMP parallelization to speed things up further.

To summarize:

  • MPI will usually give you a greater speedup
  • expect a speedup of two when using four threads
  • beyond four threads you won't see much improvement
  • don't use hyperthreading, it slows things down

Limiting number of threads for OpenMP

To limit the number of threads, you need to set an environment variable OMP_NUM_THREADS. See below how this works on Linux and Windows.

For Linux, to limit the number of threads to, say, 2, enter

export OMP_NUM_THREADS=2

Note that this only affects the given session. If you want to create a default, enter this command in the start up script.

For Windows, to limit the number of threads you have to create a new environment variable called OMP_NUM_THREADS. After saving the variable you have to restart your command line environment (normally no reboot is necessary). For a given session, you can just enter

set OMP_NUM_THREADS=2

Stacksize Issues

To run the OpenMP version of FDS, you usually have to allocate a certain amount of memory (RAM) to be used by the program. On a Windows computer, go to "System Properties", then "Advanced", then "Environment Variables." Add the new system variable OMP_STACKSIZE with the value of 16M. If FDS-OpenMP does not work, use a higher value for OMP_STACKSIZE (200M seems to be a good value). You can also adjust the OMP_STACKSIZE by typing

set OMP_STACKSIZE=16M`

(for 16M) on your Windows command line before you start FDS.

Error Messages

If Windows (64-bit System) reports error messages like

  • OMP: Error #136: Cannot create thread.
  • OMP: System error #8: Not enough storage is available to process this command. try to reduce your OMP_STACKSIZE value if it is "large" (e. g. 1G). This has solved the problem for some tests.

Thread Checking

For those who have purchased the Intel Thread Checker, scripts to perform OpenMP thread checking, inspect_openmp.sh, and to report results, inspect_report.sh, are located in the Verification/Thread_Check directory of the GitHub repository firemodels/fds. To perform thread checking on the input file casename.fds, type:

inspect_openmp.sh casename.fds

This command should only be performed on cases that run for a VERY short time as the inspection process takes a long time. To output results from the thread checker, type

inspect_report.sh

Type inspect_openmp.sh -h or inspect_report -h to output usage information.