NCCL‐Tests - eunki-7/llm-rdma-mlops-lab GitHub Wiki

NCCL Tests

This page explains how to verify multi-node GPU communication using nccl-tests.


📦 Build

cd 10-nccl-tests
make build

This builds the Docker image containing nccl-tests.


▶️ Run

bash run_mpi.sh ./hostfile.example
  • hostfile.example defines the nodes participating in the test:

    node0 slots=1
    node1 slots=1
    node2 slots=1
    node3 slots=1
    
  • The script launches an AllReduce performance benchmark across nodes.


📊 Example Output

# nThread 1 nGpus 1 minBytes 8 maxBytes 1073741824 step: 2(factor)
#
# Rank  Latency(us)   Bus BW(GB/s)
0-1    11.25         162.3

This confirms NCCL communication over RDMA + Infiniband/RoCE is working properly.


🖼️ Architecture Reference