Previous talks at the SCCS Colloquium

<- Back to the future

In recent years, transformer architectures such as large language models have revolutionized the field of deep learning, achieving state-of-the-art performance across a variety of tasks. However, their computational demands result in significant operational costs at scale, for instance, in data centers but also pose major challenges for deployment in resource-constrained environments such as edge-devices. This work explores approximation techniques within inference hardware to enhance the computational efficiency of transformers by leveraging a configurable multiply-accumulate unit. In particular, we focus on efficient approximation of multiplication using logarithmic numbers for neural networks and present various optimizations over IEEE floating point summation in terms of efficiency such as approximate alignment, alternative rounding schemes, and custom quantization types. For each method, we qualitatively analyze the theoretical power savings and quantitatively assess the accuracy trade-offs compared to exhaustive high-precision computations. Experimental results reveal that certain approximations with significantly reduced computational complexity can be implemented with minimal accuracy loss, providing a practical pathway for designing power-efficient inference hardware tailored to transformers.

Master's thesis presentation. Jacob is advised by Dr. Thomas Pfeil, Lukas Wiest and Prof. Dr. Felix Dietrich.

◄ Back to: Previous talks at the SCCS Colloquium

Even older talks

To top

Lehrstuhl für Wissenschaftliches Rechnen

Univ.-Prof. Dr.

Hans-Joachim Bungartz

Boltzmannstraße 3

85748 Garching

Germany

Previous talks at the SCCS Colloquium

Jakob Taube: Analyzing Approximations of Computation in Inference Hardware for Transformers

Even older talks