Scalable Energy-Aware Optimization of Inference Across Multiple Chiplets

Wolters, Christopher

Benutzer: Gast

Elektrotechnik

Zurück
Zurück zum Anfang der Trefferliste
Dauerhafter Link zum angezeigten Objekt

Wenn Sie Schwierigkeiten haben, das Dokument zu öffnen, versuchen Sie auch bitte diesen Link

Dokumenttyp:: Masterarbeit
Autor(en):: Wolters, Christopher
Titel:: Scalable Energy-Aware Optimization of Inference Across Multiple Chiplets
Abstract:: The future of Artificial Intelligence and Machine Learning (AI/ML) inference lies in hardware systems composed of many interconnected chiplets. With the rapid advancement of 3D integration technologies, each chiplet is expected to contain increasingly large on-chip memory. To meet the growing demands of state-of-the-art models, achieving low-latency or high-through\-put inference, while minimizing energy consumption per input, requires novel architectural and algorithmic solutions. These must jointly optimize inter-chiplet communication, chiplet-local memory provisioning, heterogeneous interconnect topologies (both inter- and intra-chiplet), and dynamic power management. This, in turn, necessitates fine-grained spatiotemporal coordination, managing hundreds of wakeup and shutdown events at the block level throughout a single inference pass. In response, this thesis introduces an energy-aware optimization framework for multi-chiplet AI/ML systems, leveraging a flexible Mixed-Integer Quadratic Programming (MIQP) formulation. The approach addresses operator mapping and execution scheduling in tandem, and supports co-optimization across diverse hardware configurations and objective functions. With this, our method yields provably optimal mappings and system configurations for large-scale models with tens of thousands of computational operators, including modern architectures such as LLaMA3-70B. In benchmark scenarios, our optimized solutions achieve up to a 26.5× improvement in energy-delay product (EDP) compared to baseline approaches, and consistently remain within 14.0\% of the theoretical optimum. Notably, these results are obtained in minutes - significantly faster than traditional heuristics or solvers, which often require hours or fail to provide performance guarantees. As AI/ML models continue to scale, our solutions retain their efficiency: Even in scenarios requiring significantly larger compute and memory footprints, the proposed framework delivers energy efficiencies comparable to a hypothetical monolithic chip design with idealized integration. Experimental validation via cycle-accurate emulation and hardware prototype measurements confirms the accuracy of these analytical values, with observed results aligning within 2.7\% of MIQP estimates. «
The future of Artificial Intelligence and Machine Learning (AI/ML) inference lies in hardware systems composed of many interconnected chiplets. With the rapid advancement of 3D integration technologies, each chiplet is expected to contain increasingly large on-chip memory. To meet the growing demands of state-of-the-art models, achieving low-latency or high-through\-put inference, while minimizing energy consumption per input, requires novel architectural and algorithmic solutions. These mus... »
Fachgebiet:: ELT Elektrotechnik
DDC:: 600 Technik
Betreuer:: Schlichtmann, Ulf (Prof. Dr.)
Jahr:: 2025
Sprache:: en
Sprache der Übersetzung:: de
Hochschule / Universität:: Technische Universität München
Annahmedatum:: 25.07.2025
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Elektronische Prüfungsarbeiten Fachgebiet Elektrotechnik

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Prüfungsarbeiten Masterarbeiten

mediaTUM Gesamtbestand Elektronische Prüfungsarbeiten School TUM School of Computation, Information and Technology