Master's thesis presentation. Xing is advised by Marc Marot-Lassauzaie.
Previous talks at the SCCS Colloquium
Xing Zhou: Evaluation of reduced-footprint memory layouts for the hyperbolic PDE solver ExaHyPE2
SCCS Colloquium |
Graphics Processing Units (GPUs) are indispensable in modern high-performance computing (HPC) systems due to their exceptional parallel processing capabilities. However, their performance is often constrained by the overhead of memory transfers between the CPU and GPU, particularly in memory-bound tasks. This study investigates two optimization techniques—bit-level data packing and buffer aggregation—to improve memory transfer efficiency in ExaHyPE2, an open-source framework for solving hyperbolic partial differential equations.
Experimental results demonstrate that buffer aggregation is the most effective strategy for optimizing GPU kernel execution in ExaHyPE2, significantly reducing synchronization overhead. Multi-threading was also explored to overlap memory transfers and kernel execution, but it yielded minimal benefits. While data packing alone did not show substantial performance improvements in ExaHyPE2 kernels, it proved highly effective for memory-bound tasks and exhibited enhanced performance when combined with buffer aggregation.
The proposed optimization strategies offer valuable insights into addressing memory transfer challenges and have broad applicability to HPC applications requiring efficient GPU offloading.