Introduction - eth-cscs/DLA-interface GitHub Wiki

DLA-Interface project Wiki page.

Introduction

DLA-Interface (Dense Linear Algebra) project is part of the PRACE-6IP WP8 Performance portable linear algebra project. Project leader is Raffaele Solcà from ETH Zurich, CSCS.

The time to solution of many scientific applications depends highly on dense linear algebra operations. A typical example is to be found in Materials Science where the computational time is dominated by the execution of complex linear algebra tasks such as the Hermitian eigenvalue problem solution. Similarly, in applications requiring non-linear optimization, a pivotal role is played by the solution of dense linear systems. In both cases the size of the operands involved is so large that the solver of choice has to necessarily operate over multiple computing nodes.

Development of the Distributed Linear Algebra (DLA) interface started during the PRACE 5IP WP7 program, and continued with writing a detailed guide containing information and instructions on how to optimize the performance of the routines contained in the DLA interface. New C++ software applications can fully profit from the HPX-SLATE interface based on the C++ std::future object, which will reduce the amount of synchronization necessary between distinct routines: introducing fine-grained task dependencies with futures allows a second routine to start even if the first routine is not completely finished.

The goal of this project is to provide a modern and efficient distributed linear algebra package based on HPX, that can replace ScaLAPACK in scientific applications, and help the developers of scientific applications in the process of adopting modern, performance portable, and distributed linear algebra libraries. Since independent tasks can be executed easily in parallel, libraries built on task-based run-time can improve significantly the parallel efficiency of single function calls, and solve the fork-join mechanism issue of ScaLAPACK. In general task-based run-times simplifies the scheduling of small non-parallelizable independent tasks on different cores, that in general are executed sequentially.

As part of Prace-6IP project, DLA interface will improve the routines to compute the eigenvalue and eigenvectors by adding the NLAFET and the ChASE eigenvalue solver11 to the library to give access to eigenvalue solvers optimized for distributed multi-GPUs clusters. In addition, will improve some of the key kernels of the ChASE library (e.g. skinny QR decomposition), and extend its support to generalized Hermitian eigen-problems without the necessity of Cholesky factorization, and include partial SVD computations.