GPU - hyschive/gamer-fork GitHub Wiki
To enable GPU:
-
Generate the
Makefile
and recompile the code (see Installation for details)- Set GPU_COMPUTE_CAPABILITY according to the GPU architecture on your system in the configuration file
- Generate
Makefile
with --gpu=true
- Recompile the code with
make clean; make
-
Query the GPUs on your system [optional]
Related options: --gpu,
Parameters described on this page: OPT__GPUID_SELECT, FLU_GPU_NPGROUP, POT_GPU_NPGROUP, CHE_GPU_NPGROUP, NSTREAM
Other related parameters: none
Parameters below are shown in the format: Name
(Valid Values) [Default Value]
-
- Description: See Set and Validate GPU IDs.
- Restriction: Must be smaller than the total number of GPUs in a node.
-
- Description: Number of patch groups updated by the GPU/CPU fluid solvers at a single time. See also Performance Optimizations: GPU.
- Restriction: Must be a multiple of GPU_NSTREAM.
-
- Description: Number of patch groups updated by the GPU/CPU Poisson solvers at a single time. See also Performance Optimizations: GPU.
- Restriction: Must be a multiple of GPU_NSTREAM.
-
- Description: Number of patch groups updated by the GPU/CPU GRACKLE solvers at a single time. See also Performance Optimizations: GPU. The GPU version is currently not supported.
- Restriction:
-
- Description: Number of CUDA streams for the asynchronous memory copy between CPU and GPU. See also Performance Optimizations: GPU.
- Restriction: See the restrictions on FLU_GPU_NPGROUP and POT_GPU_NPGROUP.
To query all GPUs on a node, use the command
nvidia-smi
Here is an example on a node with 2 Tesla K40m GPUs:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.66 Driver Version: 375.66 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K40m Off | 0000:05:00.0 Off | 0 | | N/A 28C P0 72W / 235W | 1071MiB / 11439MiB | 30% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K40m Off | 0000:42:00.0 Off | 0 | | N/A 26C P0 75W / 235W | 1071MiB / 11439MiB | 36% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 35286 C ./gamer 1067MiB | | 1 35287 C ./gamer 1067MiB | +-----------------------------------------------------------------------------+
It shows that the
CUDA device compute mode
of both GPUs are set to Default
(corresponding to cudaComputeModeDefault
),
and there are currently two running jobs using GPU ID 0 and 1, respectively.
On a node with NGPU, each GPU has a unique ID in the range 0 to NGPU-1. GAMER uses the runtime parameter OPT__GPUID_SELECT to set the GPU ID associated with each MPI process.
-
OPT__GPUID_SELECT = -2
: set by CUDA runtime. Typically, this option should work together with thecudaComputeModeExclusive
CUDA device compute mode, by which different MPI ranks in the same node will be assigned with different GPUs automatically. Otherwise, all MPI ranks will use GPU 0, which is likely undesirable. ThecudaComputeModeExclusive
compute mode can be set bynvidia-smi -c 1
, which requires root privileges. -
OPT__GPUID_SELECT = -1
: set by MPI ranks. Specifically, it will set GPU ID to MPI_Rank % NGPU, where % is the integer modulus operator. This is the recommended method when running on a system with multiple GPUs on each node. However, one must be careful about the order of MPI ranks among different nodes to ensure full utilization of all GPUs. For example, if you have two MPI ranks with MPI_Rank=0 and 2 running a node with NGPU=2, both ranks will access GPU 0 (since both 0%2 and 2%2 are equal to 0) and GPU 1 will become idle, which is undesirable. One straightforward approach is to adopt a "SMP-style" rank ordering, by which ranks are placed consecutively until the node is filled up, then on to the next node. More detailed illustration can be found in the Blue Waters User Guide. Please also consult your system documentation. -
OPT__GPUID_SELECT >= 0
: simply set GPU ID toOPT__GPUID_SELECT
. Valid inputs are 0 to NGPU-1.
See also Hybrid MPI/OpenMP/GPU.
To validate the ID and configuration of the GPU adopted by each
MPI process, search for the keyword "Device Diagnosis" in the log file
Record__Note
generated during the initialization of GAMER. You should
see something like
Device Diagnosis *********************************************************************************** MPI_Rank = 0, hostname = golub123, PID = 47842 CPU Info : CPU Type : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz CPU MHz : 2499.982 Cache Size : 25600 KB CPU Cores : 10 Total Memory : 63.0 GB GPU Info : Number of GPUs : 2 GPU ID : 0 GPU Name : Tesla K40m CUDA Driver Version : 8.0 CUDA Runtime Version : 7.0 CUDA Major Revision Number : 3 CUDA Minor Revision Number : 5 Clock Rate : 0.745000 GHz Global Memory Size : 11439 MB Constant Memory Size : 64 KB Shared Memory Size per Block : 48 KB Number of Registers per Block : 65536 Warp Size : 32 Number of Multiprocessors: : 15 Number of Cores per Multiprocessor: 192 Total Number of Cores: : 2880 Max Number of Threads per Block : 1024 Max Size of the Block X-Dimension : 1024 Max Size of the Grid X-Dimension : 2147483647 Concurrent Copy and Execution : Yes Concurrent Up/Downstream Copies : Yes Concurrent Kernel Execution : Yes GPU has ECC Support Enabled : Yes ***********************************************************************************
This example shows that the MPI rank 0 is using GPU 0
on the node golub123
, which has 2 GPUs in total.