BLAS - shawfdong/hyades GitHub Wiki
BLAS (Basic Linear Algebra Subprograms) are a set of low-level subroutines that perform common linear algebra operations such as copying, vector scaling, vector dot products, linear combinations, and matrix multiplication. BLAS are used as a building block in higher-level math programming languages and libraries, including LAPACK, NumPy and R.
BLAS functionality is divided into three levels: 1, 2 and 3[1][2].
- Level 1
- vector-vector operations that are linear (O(n)) in data and linear (O(n)) in work.
- Level 2
- matrix-vector operations that are quadratic (O(n2)) in data and quadratic (O(n2)) in work.
- Level 3
- operations that are quadratic (O(n2)) in data and cubic (O(n3)) in work.
S | Real single precision |
D | Real double precision |
C | Complex single precision |
Z | Complex double precision |
There are many implementations of BLAS available. We've installed a few on Hyades.
The official reference implementation on Netlib provides a platform independent implementation of BLAS but without any attempt at optimizing performance. It is written in Fortran 77.
Netlib BLAS can be downloaded separately, or as part of LAPACK.
Download LAPACK 3.5.0:
$ cd /scratch $ wget http://www.netlib.org/lapack/lapack-3.5.0.tgz $ tar xvfz lapack-3.5.0.tgz $ cd lapack-3.5.0
Create the file make.in (based on the provided make.in.example):
SHELL = /bin/sh FORTRAN = gfortran OPTS = -O3 -frecursive -march=native -fPIC DRVOPTS = $(OPTS) NOOPT = -O0 -frecursive -fPIC LOADER = gfortran LOADOPTS = TIMER = INT_ETIME CC = gcc CFLAGS = -O3 -march=native -fPIC ARCH = ar ARCHFLAGS= cr RANLIB = ranlib XBLASLIB = BLASLIB = ../../libblas.a LAPACKLIB = liblapack.a TMGLIB = libtmg.a LAPACKELIB = liblapacke.a
Compile BLAS:
$ make blaslib
Compile LAPACK:
$ make
Netlib BLAS and LAPACK are installed at /pfs/sw/serial/gcc/lapack-3.5.0.
To link with the Netlib BLAS library, using gfortran:
$ gfortran -o blaspgm.x blaspgm.f -L/pfs/sw/serial/gcc/lapack-3.5.0/lib -lblas
To link with the Netlib BLAS library, using the Intel Fortran Compiler:
$ ifort -o blaspgm.x blaspgm.f -L/pfs/sw/serial/gcc/lapack-3.5.0/lib -lblas -lgfortran
There are C and C++ interfaces to BLAS. It is also possible and popular to call the Fortran BLAS from C and C++.
Fortran subroutines are the equivalent of C functions returning void. When compiling, most Fortran compilers append an underscore (_) to the subroutine name[3]. For example[4]:
$ nm /pfs/sw/serial/gcc/lapack-3.5.0/lib/libblas.a | grep sgemm sgemm.o: 0000000000000000 T sgemm_
To call, e.g., the Fortran subroutine sgemm (matrix matrix multiply) from C, first declare its prototype in the C code:
extern void sgemm_( char *, char *, int *, int *, int *, float *, float *, int *, float *, int *, float *, float *, int * );
To compile a C program and link with the Netlib Fortran BLAS library, use the following flags:
-L/pfs/sw/serial/gcc/lapack-3.5.0/lib -lblas -lgfortran
To call, e.g., the Fortran subroutine sgemm (matrix matrix multiply) from C, first declare its prototype in the C code:
extern "C" void sgemm_( char *, char *, int *, int *, int *, float *, float *, int *, float *, int *, float *, float *, int * );
To compile a C++ program and link with the Netlib Fortran BLAS library, use the following flags:
-L/pfs/sw/serial/gcc/lapack-3.5.0/lib -lblas -lgfortran
Netlib also provides a reference implementation of C interface to the BLAS.
Download Netlib CBLAS tar ball:
$ cd /scratch $ wget http://www.netlib.org/blas/blast-forum/cblas.tgz $ tar xvfz cblas.tgz $ cd CBLAS
Modify Makefile.in so that it reads as follows;
SHELL = /bin/sh BLLIB = /pfs/sw/serial/gcc/lapack-3.5.0/lib/libblas.a CBLIB = ../lib/libcblas.a CC = gcc FC = gfortran LOADER = $(FC) CFLAGS = -O3 -DADD_ -march=native -fPIC FFLAGS = -O3 -march=native -fPIC ARCH = ar ARCHFLAGS = cr RANLIB = ranlib
Compile CBLAS:
$ make
Install CBLAS:
$ cp -r include lib /pfs/sw/serial/gcc/lapack-3.5.0/Netlib CBLAS is installed at /pfs/sw/serial/gcc/lapack-3.5.0 too.
To facilitate the usage of the Netlib libraries, I've created a module lapack/s_gcc_netlib_3.5.0 to set up their environment. If you load the module, you can use more concise commands to link with the Netlib libraries. For example:
$ module load lapack/s_gcc_netlib_3.5.0 $ gcc -o cblaspgm.x cblaspgm.c -lcblas -lblas -lgfortran
Main article: ATLAS
ATLAS (Automatically Tuned Linear Algebra Software) is an open source efficient and full implementation of BLAS APIs for C and Fortran 77. It also implements a a few routines from LAPACK. While its performance often trails that of specialized libraries written for one specific hardware platform, e.g., Intel MKL, it is a large improvement over the reference Netlib BLAS.
The ATLAS installation include libraries for BLAS, CBLAS, LAPACK and ATLAS's clapack[5] (not to be confused with Netlib CLAPACK).
Main article: OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2. GotoBLAS, GotoBLAS2 and OpenBLAS are related implementations of the BLAS API with many hand-crafted optimizations for specific processor types. OpenBLAS adds optimized implementations of linear algebra kernels for several processor architectures, including Intel Sandy Bridge, which is the processor of choice for the Hyades cluster. It claims to achieve performance comparable to the Intel MKL.
The OpenBLAS library libopenblas.a contain object codes for all routines in BLAS, CBLAS, LAPACK, and LAPACKE.
Main article: Intel MKL
Intel MKL (Math Kernel Library) is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math. The routines in MKL are hand-optimized specifically for Intel processors.
Main article: GSL
GSL (GNU Scientific Library) is a numerical library for C and C++ programmers. It provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. GSL 1.16, compiled with GCC, is installed at /pfs/sw/serial/gcc/gsl-1.16.
GSL includes BLAS supports. To use the CBLAS library provided by GSL, include the appropriate GSL header in your C/C++ code:
#include <gsl/gsl_cblas.h>
To compile and link with GSL:
$ gcc -o cblaspgm.x cblaspgm.c -I/pfs/sw/serial/gcc/gsl-1.16/include \ -L/pfs/sw/serial/gcc/gsl-1.16/lib -lgsl -lgslcblas
or
$ module load gsl $ gcc -o cblaspgm.x cblaspgm.c -lgsl -lgslcblas
Boost includes uBLAS, a C++ template class library that provides BLAS level 1, 2, 3 functionality for dense, packed and sparse matrices. The design and implementation unify mathematical notation via operator overloading and efficient code generation via expression templates[6].
There are a few uBLAS examples at http://www.guwi17.de/ublas/examples/. To compile, e.g., the C++ program for Example 6 (Solve a System of Linear Equations using GMRES):
$ g++ -o gmres.x main_gmres.cpp -I/pfs/sw/serial/gcc/boost-1.57.0/includeNote uBLAS is a header-only library.
The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA CUDA runtime. It, however, is not a drop-in replacement of standard BLAS. One must use the cuBLAS API or the newer cuBLAS-XT API to access the cuBLAS Library. Consult cuBLAS User Guide for details.
The NVBLAS library is a drop-in replacement of standard BLAS. It can accelerate most BLAS Level-3 routines by dynamically routing BLAS calls to one or more NVIDIA GPUs present in the system, when the characteristics of the call make it to speedup on a GPU. NVBLAS is built on top of the cuBLAS Library using only the CUBLASXT API. NVBLAS also requires the presence of a CPU BLAS library on the system. Consult NVBLAS User Guide for details.