System Overview - calab-ntu/gpu-cluster GitHub Wiki

Eureka
Spock

Eureka

One Login node + 33 Computing nodes
Each computing node has
- One AMD 16-cores CPU (Ryzen Threadripper 2950X) with 128 GB memory
- One NVIDIA GPU (GeForce RTX 2080 Super) with 8 GB memory
- See System Specification for details
Operating system: CentOS Linux 7.7
Storage : ~350 TB
Interconnect: InfiniBand 100 Gb/s EDR

Spock

One Login node + 28 Computing nodes
Each computing node has
- One AMD 32-cores CPU (Ryzen Threadripper Pro 5975wx) with 256 GB memory
- One NVIDIA GPU (GeForce RTX 3080 Ti) with 12 GB memory
- See System Specification for details
Operating system: Ubuntu server 22.04
Storage : ~350 TB
Interconnect: InfiniBand 200 Gb/s EDR

Important notes about switching from eureka to spock

Performance: spock should be about 2-3 times faster in both CPU and GPU
CPU RAM: 2x larger (128 GB → 256 GB)
GPU RAM: 1.5x larger (8 GB → 12 GB)
Interconnet bandwidth: 2x higher (100 Gb/s → 200 Gb/s)
Disk I/O bandwidth: 5-10x higher when using /projectV
Use environment modules to deploy different software tools
For running GAMER
- Job submission script: submit_spock.job
- Configuration file: spock_intel.config
- Edit generate_make.sh to adopt --machine=spock_intel and --gpu_arch=AMPERE
- Set OMP_NTHREAD 8 in Input__Parameter
- [Optional] Change GPU_COMPUTE_CAPABILITY from 800 to 860 for GPU_ARCH == AMPERE (here) if you haven’t updated to the latest main or psidm branch yet

Best Practice

See User Policy .

Links