Bachelor's Thesis presentation. Timur is advised by Fabio Gratl.
Previous talks at the SCCS Colloquium
Timur Eke: AutoPas on A64FX: Evaluation of Arm SVE Vectorization for Optimizing Molecular Dynamics Simulations
SCCS Colloquium |
Molecular dynamics simulations of high-density, compute-intensive scenarios are well suited for SIMD vectorization. The simulation kernel of AutoPas, a particle simulation library, is already implemented using manual AVX2 vectorization for x86 architectures, as common compilers are unable to auto-vectorize the code.
The Fujitsu A64FX is an Arm CPU developed for the Fugaku supercomputer of the RIKEN Center for Computational Science in Japan, which currently is the fastest in the world. To achieve peak performance, it supports Arm SVE, a recent SIMD instruction set extension featuring variable-length vectors and per-lane predication.
In this thesis, AutoPas, a particle simulation library, is optimized to run on the A64FX. Specifically, the computation of the Lennard-Jones potential is manually vectorized for the Arm SVE instruction set. A perfect speedup of 8x is measured using appropriate benchmarks, and the performance is compared to the existing x86 implementation. Additional optimizations to hide instruction latency and utilize instruction-level parallelism of the A64FX are evaluated, and a novel compaction approach is analyzed.