NCCL‐Tests - eunki-7/llm-rdma-mlops-lab GitHub Wiki
NCCL Tests
This page explains how to verify multi-node GPU communication using nccl-tests.
📦 Build
cd 10-nccl-tests
make build
This builds the Docker image containing nccl-tests
.
▶️ Run
bash run_mpi.sh ./hostfile.example
-
hostfile.example
defines the nodes participating in the test:node0 slots=1 node1 slots=1 node2 slots=1 node3 slots=1
-
The script launches an AllReduce performance benchmark across nodes.
📊 Example Output
# nThread 1 nGpus 1 minBytes 8 maxBytes 1073741824 step: 2(factor)
#
# Rank Latency(us) Bus BW(GB/s)
0-1 11.25 162.3
This confirms NCCL communication over RDMA + Infiniband/RoCE is working properly.