Previous talks at the SCCS Colloquium

Jonas Schreier: Optimization of small matrix multiplication kernels on Arm

SCCS Colloquium |


Matrix multiplication is essential for many numerical applications. Thus, efficient multiplication kernels are necessary to reach optimal performance. Most efforts in this direction are targeted towards x86 architectures, but recent developments have shown promise when traditional x86 computing contexts are ported to Arm architectures. This work analyses different approaches of generating highly optimized kernels for small dense GEMM on Arm. Benchmarks of generated code are performed on ThunderX2 (NEON vector extension) and Fujitsu A64FX (Scalable Vector Extension). We show that while multiple possible solutions to the DGEMM problem on Arm exist, a general solution that performs best for all use cases could not be found. The largest deciding factor was the vector extension implemented by the respective target CPU.

Bachelor's thesis submission talk (Informatics). Jonas is advised by Lukas Krenz.