System Overview - calab-ntu/gpu-cluster GitHub Wiki

Eureka

  • One Login node + 33 Computing nodes
  • Each computing node has
    • One AMD 16-cores CPU (Ryzen Threadripper 2950X) with 128 GB memory
    • One NVIDIA GPU (GeForce RTX 2080 Super) with 8 GB memory
    • See System Specification for details
  • Operating system: CentOS Linux 7.7
  • Storage : ~350 TB
  • Interconnect: InfiniBand 100 Gb/s EDR

Spock

  • One Login node + 28 Computing nodes
  • Each computing node has
    • One AMD 32-cores CPU (Ryzen Threadripper Pro 5975wx) with 256 GB memory
    • One NVIDIA GPU (GeForce RTX 3080 Ti) with 12 GB memory
    • See System Specification for details
  • Operating system: Ubuntu server 22.04
  • Storage : ~350 TB
  • Interconnect: InfiniBand 200 Gb/s EDR

Important notes about switching from eureka to spock

  • Performance: spock should be about 2-3 times faster in both CPU and GPU
  • CPU RAM: 2x larger (128 GB โ†’ 256 GB)
  • GPU RAM: 1.5x larger (8 GB โ†’ 12 GB)
  • Interconnet bandwidth: 2x higher (100 Gb/s โ†’ 200 Gb/s)
  • Disk I/O bandwidth: 5-10x higher when using /projectV
  • Use environment modules to deploy different software tools
  • For running GAMER
    • Job submission script: submit_spock.job
    • Configuration file: spock_intel.config
    • Edit generate_make.sh to adopt --machine=spock_intel and --gpu_arch=AMPERE
    • Set OMP_NTHREAD 8 in Input__Parameter
    • [Optional] Change GPU_COMPUTE_CAPABILITY from 800 to 860 for GPU_ARCH == AMPERE (here) if you havenโ€™t updated to the latest main or psidm branch yet

Best Practice

See User Policy .

Links

Home