Linear Algebra¶
Linear Algebra is at the core of many of the algorithms used in High Performance Computing. Operations like the addition of two vectors or the product of matrices have been highly optimized for the kind of computers used today for HPC.
The term array is used to generically denote vectors matrices and even higher-dimensional arrays of numbers.
There are basically two classes of Linear Algebra algorithms: Dense Linear Algebra deals with operations over arrays where most of the values are non-zero and Sparse Linear Algebra works with arrays where most of the values are zero. Algorithms on both sides are radically different even from the way the arrays are stored in memory.
In the case of dense linear algebra there are well established algorithms and the de-facto standard for many scientific applications is BLAS.
BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations. The routines are organized in three levels:
Level 1 BLAS perform scalar, vector and vector-vector operations.
Level 2 BLAS perform matrix-vector operations
Level 3 BLAS perform matrix-matrix operations.
Because the BLAS are efficient, portable, and widely available, they are commonly used as the foundation for more complex linear algebra functionality like LAPACK.
LAPACK is a set of routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers.
There are several implementations of BLAS and LAPACK. Just to mention three examples we have ATLAS, a self-optimized generation of BLAS routines. OpenBLAS a fork of GotoBLAS, a set of hand-crafted optimizations for specific processor types and MKL, with optimized math routines. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math. The routines in MKL are hand-optimized specifically for Intel processors.
The reference BLAS and LAPACK are available on most Linux Distributions and named libblas.{a,so} and liblapack.{a,so}.
However, reference versions of BLAS and LAPACK are typically the last packages that you want use for Dense Linear algebra. In general OpenBLAS and MKL outperform the reference BLAS and LAPACK, they provide multithreading and they are optimized for modern multicore processors with vector extensions like AVX.
We have on Spruce and Thorny at least two much better alternatives to the reference BLAS and LAPACK. The Intel MKL and OpenBLAS.
Intel MKL can be used even if you compile with gcc or gfortran. OpenBLAS was compiled on Thorny enabling all the AVX512 extensions available on the CPUs on Thorny.
OpenBLAS and MKL perform very similar. MKL being a bit better but also more numerically unstable.
Atlas is also an implementation of BLAS that is part of the system and several other packages depend of it. Atlas can be found here:
/usr/lib64/atlas/libsatlas.so.3
/usr/lib64/atlas/libsatlas.so.3.10
I do not recommend using that for serious Linear Algebra usage. It performs worst than the two options above.
Notice that the standard compiler on RedHat Enterprise Linux (RHEL) is GCC 4.x. Thorny runs RHEL 7.6 and the compiler is GCC 4.8. Spruce Knob uses RHEL 6.10 and the compiler is 4.4.
GCC 4.x has no programming for optimize code for AVX512, that is the main reason to use newer compilers like GCC 8.x when compiling scientific code.
Compiling Reference BLAS and LAPACK¶
Now we will show how to compile the reference BLAS and LAPACK. The instructions were created for LAPACK 3.8
Download the code¶
The Reference BLAS and LAPACK are hosted on Netlib http://www.netlib.org/lapack/ The sources for LAPACK also contain the reference BLAS so all that we need is to download is the package for lapack:
wget http://www.netlib.org/lapack/lapack-3.8.0.tar.gz
Uncompress and untar the file with:
tar -zxvf lapack-3.8.0.tar.gz
That will generate a folder called lapack-3.8.0
with all the sources inside.
Installing Numpy and Scipy using OpenBLAS¶
For the particular case of Python BLAS/LAPACK are very important for two packages Numpy and Scipy many scientific code written in Python relies on them.
The various environment modules for Python include Numpy and Scipy and they were specifically compiled to use OpenBLAS for linear algebra routines.
The instructions below show how you can do the same for clean virtual environments where you want to install numpy and scipy.
Assume that you want to use PyPy3 from module:
$ module purge
$ module load lang/python/pypy3.6-7.1.1-portable
The first thing to do is to also load GCC 8.2 and OpenBLAS 0.3.7:
$ module load load lang/gcc/8.2.0 libs/openblas/0.3.7_gcc82
We start creating the virtual environment with virtualenv
. Notice that PyPy
comes with its own version of virtualenv
called virtualenv-pypy
:
$ virtualenv-pypy pypy3.6
Activate the environment, create a build folder inside and change to that folder:
$ source pypy3.6/bin/activate
$ mkdir pypy3.6/build
$ cd pypy3.6/build
Now we want to install numpy and scipy, do not try to use pip install ...
,
it will bring generic versions using the reference BLAS and Lapack.
Download and extract the packages manually:
$ wget https://github.com/numpy/numpy/releases/download/v1.17.3/numpy-1.17.3.tar.gz
$ wget https://github.com/scipy/scipy/releases/download/v1.3.1/scipy-1.3.1.tar.gz
$ tar -zxvf numpy-1.17.3.tar.gz
$ tar -zxvf scipy-1.3.1.tar.gz
First configure numpy. Go to the folder and create a file site.cfg
with this
content:
[ALL]
library_dirs = /shared/software/libs/openblas/0.3.7_gcc82/lib
include_dirs = /shared/software/libs/openblas/0.3.7_gcc82/include
[openblas]
libraries = openblas
library_dirs = /shared/software/libs/openblas/0.3.7_gcc82/lib
include_dirs = /shared/software/libs/openblas/0.3.7_gcc82/include
runtime_library_dirs = /shared/software/libs/openblas/0.3.7_gcc82/lib
You can build and install numpy with pip, just execute:
$ pip3 install .
It takes a while and you get numpy installed and the installation tracked by pip. The configuration and installation Scipy is very similar:
$ cd ../scipy-1.3.1/
$ cp ../numpy-1.17.3/site.cfg .
$ pip3 install .
With numpy and scipy installed we can install any other packages that you plan for the virtual environment. As an example, we will install ipython and pytest:
$ pip3 install pytest ipython
(pypy3.6) 12:31:11-gufranco@trcis001:~/scratch/pypy3.6$ ipython
Python 3.6.1 (784b254d669919c872a505b807db8462b6140973, Apr 16 2019, 18:18:28)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import numpy as np
In [2]: np.test()
NumPy version 1.17.3
NumPy relaxed strides checking option: True
................................................. [ 0%]
.................................................
...ssssssss...................................... [100%]
10090 passed, 116 skipped, 173 deselected, 18 xfailed, 4 xpassed in 274.53s (0:04:34)
The configuration flags for numpy and scipy can be obtained from the command below:
(pypy3.6) 13:21:20-gufranco@trcis001:~/scratch/pypy3.6$ ipython
Python 3.6.1 (784b254d669919c872a505b807db8462b6140973, Apr 16 2019, 18:18:28)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import scipy
In [2]: scipy.__config__.show()
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/shared/software/libs/openblas/0.3.7_gcc82/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/shared/software/libs/openblas/0.3.7_gcc82/lib']
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/shared/software/libs/openblas/0.3.7_gcc82/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/shared/software/libs/openblas/0.3.7_gcc82/lib']
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/shared/software/libs/openblas/0.3.7_gcc82/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/shared/software/libs/openblas/0.3.7_gcc82/lib']
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/shared/software/libs/openblas/0.3.7_gcc82/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
runtime_library_dirs = ['/shared/software/libs/openblas/0.3.7_gcc82/lib']
Multithreading with OpenBLAS¶
Another reason to prefer OpenBLAS or MKL over the reference BLAS and LAPACK is that optimized versions provide multithreading, multithreading allows several cores that share the same memory space to work on the arrays concurrently.
OpenBLAS was compiled multithreaded and the number of threads can be controled with several environment variables:
export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4
The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.
See more about this here:
https://github.com/xianyi/OpenBLAS
On the case of MKL there are two versions -mkl_sequential and -lmkl_intel_thread Setting the selection of libraries for MKL is complex, but I always use the link advisor to help me up
https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
The only case where I can think someone would like to use the reference BLAS and LAPCACK is if you want to check a numerical issue with the optimized versions but in production, those versions should never being used.
ScaLapack 2.2.0 on Spruce with GCC 10.3.0¶
Modules:
$> module load lang/gcc/10.3.0 parallel/openmpi/3.1.6_gcc103 libs/openblas/0.3.19_gcc103 dev/cmake/3.23.0
Download the software:
$> wget https://github.com/Reference-ScaLAPACK/scalapack/archive/refs/tags/v2.2.0.tar.gz -O scalapack-2.2.0.tar.gz
$> tar -zxvf scalapack-2.2.0.tar.gz
$> cd scalapack-2.2.0
Use cmake:
$> cmake -DCMAKE_Fortran_COMPILER=mpif90 -DCMAKE_C_COMPILER=mpicc \
-DCMAKE_Fortran_FLAGS=-fallow-argument-mismatch \
-DCMAKE_INSTALL_PREFIX=/shared/software/libs/scalapack/2.2.0_gcc103_ompi316 \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON ..
Run make:
$> make
ELPA 2021.11 on Spruce with GCC 10.3¶
Modules:
$> module load lang/gcc/10.3.0 libs/openblas/0.3.19_gcc103 libs/scalapack/2.2.0_gcc103_ompi407 \
dev/cmake/3.23.0 parallel/openmpi/4.0.7_gcc103
Configure
$> ../configure --prefix=/shared/software/libs/elpa/2021.11_gcc103_ompi407 \
LDFLAGS="-L${MD_OPENBLAS}/lib -L${MD_SCALAPACK}/lib" \
--disable-sse --disable-avx --disable-avx2 --disable-avx512 CFLAGS=-fPIC
ELPA 2021.11 on Thorny with Intel 2022¶
Modules:
$> module load lang/gcc/9.3.0 parallel/cuda/11.7 compiler mpi mkl
Configure:
$> export SCALAPACK="-L/gpfs20/shared/software/intel/oneapi/mkl/2022.1.0/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl"
$> ../configure --enable-avx512 --enable-nvidia-gpu CC=mpiicc CXX=mpiicpc FC=mpiifort SCALAPACK_LDFLAGS="$SCALAPACK" --with-NVIDIA-GPU-compute-capability=sm_61 --prefix=/shared/software/libs/elpa/2021.11.002_intel22_impi22