Master’s thesis submission talk (Informatics). Michael is advised by Dr. Tobias Neckel and Dr. Felix Dietrich.
Previous talks at the SCCS Colloquium
Michael Grad: Efficient Parallel Setup of Eigenvalue Problems in the Manifold Learning Framework Datafold
SCCS Colloquium |
Having implemented an efficient, parallel, sparse SLEPc eigensolver into the Datafold framework in a previous IDP, this thesis aims to improve runtime by optimizing code segments that have shown to scale poorly, especially in parallel contexts. Previous benchmarks have shown that the majority of the runtime is spent on distance calculations between the points in the dataset, the eigensolver or - for cases with a high number of datapoints - even the determination of a good cut-off value (depending on the input dataset). This thesis identifies these bottlenecks and their respective sources. It scouts for frameworks, offering efficient, parallel solutions for the given problems. Furthermore, these candidates are implemented and benchmarked against each other and the baseline, in order to assess improvements. For the purpose of optimizing the function, that determines a good cut-off value, the DASK framework is used, which provides a more performant selection algorithm. In order to create a sparse eigenproblem, the distance calculations used so far, take the closest neighbors into account, solely based on a cut-off distance. Apart from fine tuning this variant and testing various frameworks, an approximate kNN approach was tested and benchmarked. The runtime improvements gained are up to five times faster regarding the optimization with the DASK selection function over the numpy.partition(…) baseline, while the distance calculations have shown improvements up to ten times faster (in Hypercube benchmarks).