Master's thesis presentation. Alexander is advised by David Schneller, and Prof. Dr. Michael Bader.
Previous talks at the SCCS Colloquium
Alexander Puscas: Matrix Instruction Set Extensions to Improve Code Generation for Small Matrix Operations
SCCS Colloquium |
Matrix multiplications are omnipresent in high-performance computing (HPC) tasks. Advancements in hardware have significantly lowered the time-to-solution in scientific applications like tsunami and earthquake simulations. However, most applications are not optimized towards the underlying hardware. In this thesis we extend PSpaMM, an inline assembly code generator that creates specialized matrix multiplication kernels targeting various instruction set architectures, including Intel AVX and Arm Neon. This thesis aims to enable PSpaMM to allow generating kernels that target the Arm Scalable Matrix Extension (SME), utilizing the novel two-dimensional matrix register and outer product instruction introduced by the architecture extension. We benchmark the SME kernels generated by PSpaMM on the Apple M4 chip and compare the performances achieved by the kernels with the Arm Neon generator implemented in PSpaMM. Additionally, we measure the performance of kernels generated by LIBXSMM, a library for specialized matrix operations. We show that the generated SME kernels outperform the Neon kernels by a factor of up to 4.6 and 8.3 for double and single precision values, respectively. Moreover, the PSpaMM SME kernels achieve similar performances compared to kernels generated by the LIBXSMM library.