Master's thesis presentation. Jacob is advised by Dr. Thomas Pfeil, Lukas Wiest and Prof. Dr. Felix Dietrich.
Previous talks at the SCCS Colloquium
Jakob Taube: Analyzing Approximations of Computation in Inference Hardware for Transformers
SCCS Colloquium |
In recent years, transformer architectures such as large language models have revolutionized the field of deep learning, achieving state-of-the-art performance across a variety of tasks. However, their computational demands result in significant operational costs at scale, for instance, in data centers but also pose major challenges for deployment in resource-constrained environments such as edge-devices. This work explores approximation techniques within inference hardware to enhance the computational efficiency of transformers by leveraging a configurable multiply-accumulate unit. In particular, we focus on efficient approximation of multiplication using logarithmic numbers for neural networks and present various optimizations over IEEE floating point summation in terms of efficiency such as approximate alignment, alternative rounding schemes, and custom quantization types. For each method, we qualitatively analyze the theoretical power savings and quantitatively assess the accuracy trade-offs compared to exhaustive high-precision computations. Experimental results reveal that certain approximations with significantly reduced computational complexity can be implemented with minimal accuracy loss, providing a practical pathway for designing power-efficient inference hardware tailored to transformers.