Analyzing Approximations of Computation in Inference Hardware for Transformers

Jakob Taube

Dokumenttyp:: Masterarbeit
Autor(en):: Jakob Taube
Titel:: Analyzing Approximations of Computation in Inference Hardware for Transformers
Übersetzter Titel:: Analyse von Berechnungsapproximationen in Inferenz-Hardware für Transformer
Abstract:: Transformer architectures, such as large language models, have revolutionized the field of deep learning, achieving state-of-the-art performance across a variety of tasks. Despite their success, their computational demands result in substantial operational costs at scale and pose major challenges for deployment in resource-constrained environments. This thesis proposes approximation techniques within inference hardware to enhance the computational efficiency of transformers by leveraging a configurable multiply-accumulate unit. In particular, we focus on efficient approximation of multiplication using the logarithmic number system and present optimizations over IEEE floating point summation, including approximate alignment, alternative rounding schemes, and custom quantization types. For each method, we analyze the theoretical power savings and accuracy trade-offs compared to high-precision computations. Experimental results reveal that certain approximations with greatly reduced computational complexity can be implemented with minimal accuracy loss, providing a practical pathway for designing power-efficient inference hardware tailored to transformers. «
Transformer architectures, such as large language models, have revolutionized the field of deep learning, achieving state-of-the-art performance across a variety of tasks. Despite their success, their computational demands result in substantial operational costs at scale and pose major challenges for deployment in resource-constrained environments. This thesis proposes approximation techniques within inference hardware to enhance the computational efficiency of transformers by leveraging a confi... »
Aufgabensteller:: Felix Dietrich
Betreuer:: Thomas Pfeil; Lukas Wiest
Jahr:: 2024
Quartal:: 4. Quartal
Jahr / Monat:: 2024-12
Monat:: Dec
Sprache:: en
Hochschule / Universität:: Technical University of Munich
Fakultät:: TUM School of Computation, Information and Technology
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)2024