Fujitsu A64FX - AshokBhat/notes GitHub Wiki
About
- Arm-based SoC from Fujitsu
A64FX deployments
Organization | Timeline | Vendor | Nodes | Flops | Memory | Notes |
---|---|---|---|---|---|---|
LRZ | HPE Cray CS500 | - | - | - | - | |
Nagoya University | July 2020 | Fujitsu | 2304 FX1000 | 7.782 PFLOPS | 72 TB | |
Fugaku Supercomputer | 2021 | Fujitsu | 158,976 | 415 PFLOPS | HBM2 32 GB/node | |
Canon | 2020 | Fujitsu | 192 FX1000 | 648 TFLOPS | 6 TB | |
Sandia National Labs | Spring, 2020 | Penguin Computer Inc. | Fujitsu PRIMEHPC FX700 | |||
LANL | 2020 | HPE | HPE Apollo 80 | |||
Uni of Bristol, Isambard 2 | Late 2020 | Cray/HPE | 72 nodes |
Sources
- https://www.sandia.gov/news/publications/labnews/articles/2020/05-22/Fujitsu.html
- https://www.fujitsu.com/global/about/resources/news/press-releases/2020/0923-01.html
Software
Fujitsu software environment
Fujitsu MPI
- Supports FX1000 and FX700
Fujitsu SSL2
- BLAS (with some routines supporting FP16)
- LAPACK
CASTEP also requires high-performance BLAS/LAPACK numerical libraries. We used the Fujitsu SSL2 libraries to provide these functions on the A64FX, MKL on the Intel based systems, and the Arm Performance Libraries (Armpl) on the ThunderX2 system
The Fujitsu maths libraries (SSL2) have been shown to be easy replacements for the Intel MKL and Arm performance libraries for some of the applications we have considered in this paper, but not for all requirements we encountered (i.e. FFTW for CASTEP). Therefore, some further work on optimised libraries for the A64FX system would be beneficial.
Fujitsu Compiler
- Two compilers: a cross compiler that runs on the PRIMERGY, and the native compiler that runs on the PRIMEHPC FX1000.
- C/C++ compiler - C11, C++14, and partial C++17 support. Can run in clang mode, compatible with Clang/LLVM.
- Fortran Compiler - Supports Fortran 2018
- OpenMP - 4.5 and partial 5.0 support
FFTW
CASTEP requires a high-performance FFT library to function. This is usually provided by FFTW3 or Intel MKL. Fujitsu kindly provided their early development version of FFTW3 for the A64FX platform
Profile and debugger
- Provides profiler and debugger
Source
- https://arxiv.org/pdf/2009.11806.pdf - EPCC paper
Systems
HPE Apollo 80 System
- Chassis - 2U Configure-to-order Chassis with Rack Mount Rail Kit
- Server tray options
- 1U 2-node Blade A64FX 1.8GHz 48-core 32GB HBM M.2 Configure-to-order Server
- 1U 2-node Blade A64FX 2.0GHz 48-core 32GB HBM M.2 Configure-to-order Server
- Software
- HPE Cray Programming Environment
- HPE Message Passing Interface (MPI)
- GNU compiler suite
- Arm Allinea Studio
- Arm Forge Professional
- Rogue Wave Software® TotalView®
- Mellanox HPC-X