LAPACK - shawfdong/hyades GitHub Wiki
LAPACK (Linear Algebra PACKage) is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision[1].
LAPACK routines are written so that as much as possible of the computation is performed by calls to the BLAS. Each BLAS and LAPACK routine comes in several versions, one for each precision (data type). The first letter of the subprogram name indicates the precision used:
S | Real single precision |
D | Real double precision |
C | Complex single precision |
Z | Complex double precision |
We've installed a few LAPACK implementations on Hyades.
Netlib LAPACK 3.5.0 is installed at /pfs/sw/serial/gcc/lapack-3.5.0 on Hyades, following the procedures described in Netlib BLAS.
To facilitate the usage of the Netlib libraries, I've created a module lapack/s_gcc_netlib_3.5.0 to set up their environment. If you load the module, you can use more concise commands to link with the Netlib libraries.
$ module load lapack/s_gcc_netlib_3.5.0
To compile a Fortran program[2] and link with Netlib LAPACK, using gfortran:
$ gfortran -o lapackpgm.x lapackpgm.f -llapack -lblas
To compile a Fortran program and link with LAPACK, using the Intel Fortran Compiler:
$ ifort -o lapackpgm.x lapackpgm.f -llapack -lblas -lgfortran
LAPACKE is a C interface to LAPACK. The interface is two-level. The high-level interface handles all workspace memory allocation internally, while the middle-level interface requires the user to provide workspace arrays as in the original FORTRAN interface. Both interfaces provide support for both column-major and row-major matrices[3].
The naming scheme for the high-level interface is to take the Fortran LAPACK routine name, make it lower case, and add the prefix LAPACKE_. For example, the LAPACK subroutine DGETRF becomes LAPACKE_dgetrf.
The naming scheme for the middle-level interface is to take the Fortran LAPACK routine name, make it lower case, then add the prefix LAPACKE_ and the suffix _work. For example, the LAPACK subroutine DGETRF becomes LAPACKE_dgetrf_work.
LAPACKE is included in the Netlib LAPACK 3.50 package. To compile and install it:
$ cd /scratch/lapack-3.5.0 $ make lapackelib $ cp liblapacke.a /pfs/sw/serial/gcc/lapack-3.5.0/lib $ cp lapacke/include/* /pfs/sw/serial/gcc/lapack-3.5.0/include/
To compile the example LAPACKE program example_DGESV_rowmajor.c in the LAPACK 3.5.0 release:
$ module load lapack/s_gcc_netlib_3.5.0 $ gcc -o example_DGESV_rowmajor.x \ example_DGESV_rowmajor.c lapacke_example_aux.c \ -llapacke -llapack -lblas -lgfortranor
$ icc -o example_DGESV_rowmajor.x \ example_DGESV_rowmajor.c lapacke_example_aux.c \ -llapacke -llapack -lblas -lgfortran
The CLAPACK library was built using a Fortran to C conversion utility called f2c. The entire Fortran LAPACK library (including BLAS) is run through f2c to obtain C code, and then modified to improve readability[4].
Download the tar ball for CLAPACK 3.2.1:
$ cd /scratch $ wget http://www.netlib.org/clapack/clapack.tgz $ tar xfz clapack.tgz $ cd CLAPACK-3.2.1
Create the file make.inc (based on the provided make.in.example):
SHELL = /bin/sh CC = gcc -DNO_BLAS_WRAP CFLAGS = -O3 -march=native -fPIC -I$(TOPDIR)/INCLUDE LOADER = gcc LOADOPTS = NOOPT = -O0 -I$(TOPDIR)/INCLUDE DRVCFLAGS = $(CFLAGS) F2CCFLAGS = $(CFLAGS) TIMER = INT_CPU_TIME ARCH = ar ARCHFLAGS = cr RANLIB = ranlib BLASLIB = ../../libf2cblas.a XBLASLIB = LAPACKLIB = libclapack.a F2CLIB = ../../F2CLIBS/libf2c.a TMGLIB = libtmg.a
Compile CLAPACK:
$ make
Install CLAPACK:
$ cp libf2cblas.a libclapack.a /pfs/sw/serial/gcc/lapack-3.5.0/lib/ $ cd INCLUDE $ cp clapack.h f2c.h /pfs/sw/serial/gcc/lapack-3.5.0/include/ $ cd ../BLAS/WRAP/ $ cp fblaswr.h /pfs/sw/serial/gcc/lapack-3.5.0/include/f2cblas.hNote:
- The naming scheme for CLAPACK and f2c'ed BLAS is to take the Fortran routine name, make it lower case, and add the suffix _. For example, the LAPACK subroutine DGETRF becomes dgetrf_.
- Here I didn't use any wrapping of BLAS. So the BLAS function names are of the form sdot_, not f2c_sdot.
- The f2c'ed BLAS, although written in C, is a different beast than CBLAS. The CBLAS function names are of the form cblas_sdot.
$ module load lapack/s_gcc_netlib_3.5.0 $ gcc -o clapackpgm.x clapackpgm.c blaio.c -lclapack -lf2c -lf2cblas -lm
Note instead of using CLAPACK and f2c'ed BLAS, you can call the Fortran routines in BLAS and LAPACK directly from your C programs.
Main article: ATLAS
ATLAS (Automatically Tuned Linear Algebra Software) is an open source efficient and full implementation of BLAS APIs for C and Fortran 77. It also implements a a few routines from LAPACK; for the rest, it uses the Netlib implementation. While its performance often trails that of specialized libraries written for one specific hardware platform, e.g., Intel MKL, it is a large improvement over the reference Netlib BLAS.
The ATLAS installation include libraries for BLAS, CBLAS, LAPACK and ATLAS's clapack[5] (not to be confused with Netlib CLAPACK).
Main article: OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2. GotoBLAS, GotoBLAS2 and OpenBLAS are related implementations of the BLAS API with many hand-crafted optimizations for specific processor types. OpenBLAS adds optimized implementations of linear algebra kernels for several processor architectures, including Intel Sandy Bridge, which is the processor of choice for the Hyades cluster. It claims to achieve performance comparable to the Intel MKL.
The OpenBLAS library libopenblas.a contain object codes for all routines in BLAS, CBLAS, LAPACK, and LAPACKE.
Main article: Intel MKL
Intel MKL (Math Kernel Library) is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math. The routines in MKL are hand-optimized specifically for Intel processors.