Fujitsu A64FX - AshokBhat/notes GitHub Wiki

About

A64FX deployments

Organization Timeline Vendor Nodes Flops Memory Notes
LRZ HPE Cray CS500 - - - -
Nagoya University July 2020 Fujitsu 2304 FX1000 7.782 PFLOPS 72 TB
Fugaku Supercomputer 2021 Fujitsu 158,976 415 PFLOPS HBM2 32 GB/node
Canon 2020 Fujitsu 192 FX1000 648 TFLOPS 6 TB
Sandia National Labs Spring, 2020 Penguin Computer Inc. Fujitsu PRIMEHPC FX700
LANL 2020 HPE HPE Apollo 80
Uni of Bristol, Isambard 2 Late 2020 Cray/HPE 72 nodes

Sources

Software

Fujitsu software environment

Fujitsu MPI

  • Supports FX1000 and FX700

Fujitsu SSL2

  • BLAS (with some routines supporting FP16)
  • LAPACK

CASTEP also requires high-performance BLAS/LAPACK numerical libraries. We used the Fujitsu SSL2 libraries to provide these functions on the A64FX, MKL on the Intel based systems, and the Arm Performance Libraries (Armpl) on the ThunderX2 system

The Fujitsu maths libraries (SSL2) have been shown to be easy replacements for the Intel MKL and Arm performance libraries for some of the applications we have considered in this paper, but not for all requirements we encountered (i.e. FFTW for CASTEP). Therefore, some further work on optimised libraries for the A64FX system would be beneficial.

Fujitsu Compiler

  • Two compilers: a cross compiler that runs on the PRIMERGY, and the native compiler that runs on the PRIMEHPC FX1000.
  • C/C++ compiler - C11, C++14, and partial C++17 support. Can run in clang mode, compatible with Clang/LLVM.
  • Fortran Compiler - Supports Fortran 2018
  • OpenMP - 4.5 and partial 5.0 support

FFTW

CASTEP requires a high-performance FFT library to function. This is usually provided by FFTW3 or Intel MKL. Fujitsu kindly provided their early development version of FFTW3 for the A64FX platform

Profile and debugger

  • Provides profiler and debugger

Source

Systems

HPE Apollo 80 System

  • Chassis - 2U Configure-to-order Chassis with Rack Mount Rail Kit
  • Server tray options
    • 1U 2-node Blade A64FX 1.8GHz 48-core 32GB HBM M.2 Configure-to-order Server
    • 1U 2-node Blade A64FX 2.0GHz 48-core 32GB HBM M.2 Configure-to-order Server
  • Software
    • HPE Cray Programming Environment
    • HPE Message Passing Interface (MPI)
    • GNU compiler suite
    • Arm Allinea Studio
    • Arm Forge Professional
    • Rogue Wave Software® TotalView®
    • Mellanox HPC-X

FX1000 vs FX700

See also