Most of the linear algebra software devlopments made by the TOPAL team are part of the Inria Bordeaux solverstack project that groups many projects together to provide large scale linear algebra libraries.
Chameleon is a framework written in C which provides routines to solve dense general systems of linear equations, symmetric positive definite systems of linear equations and linear least squares problems, using LU, Cholesky, QR and LQ factorizations. Real arithmetic and complex arithmetic are supported in both single precision and double precision. It supports Linux and Mac OS/X machines (mainly tested on Intel x86-64 and IBM Power architectures). Chameleon is based on the PLASMA source code but is not limited to shared-memory environment and can exploit multiple GPUs. Chameleon is interfaced in a generic way with StarPU, PaRSEC, QUARK, OpenMP runtime systems. This feature allows to analyze in a unified framework how sequential task-based algorithms behave regarding different runtime systems implementations. Using Chameleon with StarPU or PaRSEC runtime systems allows to exploit GPUs through kernels provided by cuBLAS and clusters of interconnected nodes with distributed memory (using MPI).
- Written in C, Fortran interface, CMake build system
- Algorithms: GEMM, POTRF, GETRF, GEQRF, GESVD, …
- Matrices forms: general, symmetric, triangular
- Precisions: simple, double, complex, double complex
PaStiX (Parallel Sparse matriX package) is a scientific library that provides a high performance parallel solver for very large sparse linear systems based on direct methods. Numerical algorithms are implemented in single or double precision (real or complex) using LLt, LDLt and LU with static pivoting (for non symmetric matrices having a symmetric pattern). This solver also provides some low-rank compression methods to reduce the memory footprint and/or the time-to-solution.
Rotor: Rematerializing Optimally with pyTORch
Rotor is a tool designed to
train very large deep neural networks networks on limited memory by
optimally selecting which activations should be kept and which should
be recomputed. This code is meant to replace the
pytorch, by providing more efficient rematerialization
strategies. The algorithm is easier to tune: the only required
parameter is the available memory, instead of the number of segments.
Rockmate is a successor to the Rotor tool, featuring applicability to a wider range of neural networks, without requiring that they have a sequential structure. Rockmate can be applied to any neural network without requiring any modification, and is especially suited to GPT-like networks. Compared to Rotor, Rockmate is able to provide improved rematerialization sequences, with lower computational overhead.