TCU: A Multi-Objective Hardware Thread Mapping Unit for HPC Clusters

Ravi Kumar Pujari, Thomas Wild, Andreas Herkersdorf

2016

Back
Back to start of result list
Permanent link for displayed object

Title:: TCU: A Multi-Objective Hardware Thread Mapping Unit for HPC Clusters
Document type:: Konferenzbeitrag
Author(s):: Ravi Kumar Pujari, Thomas Wild, Andreas Herkersdorf
Abstract:: Meeting multiple, partially orthogonal optimization targets during thread scheduling on HPC and manycore platforms simultaneously, like maximizing CPU performance, meeting deadlines of time critical tasks, minimizing power and securing thermal resilience, is a major challenge because of associated scalability and thread management overhead. We tackle these challenges by introducing the Thread Control Unit (TCU), a configurable, low-latency, low-overhead hardware thread mapper in compute nodes of an HPC cluster. The TCU takes various sensor information into account and can map threads to 4-16 CPUs of a compute node within a small and bounded number of clock cycles in round-robin, single- or multi-objective manner. The TCU design can consider not just load balancing or performance criteria but also physical constraints like temperature limits, power budgets and reliability aspects. Evaluations of different mapping policies show that multi-objective thread mapping provides about 10 to 40% less mapping latency for periodic workloads compared to single-objective or round-robin policies. For bursty workloads under high load conditions, a 20% reduction is achieved. The TCU macro has a mere 9% hardware area overhead and achieves more than 150k thread mappings per second on an FPGA prototype of a RISC quad-core compute node operating at moderate 50 MHz. A 45 nm technology ASIC realization of TCU can operate well above 1 GHz and support up to 3.15 million thread mappings per second. «
Meeting multiple, partially orthogonal optimization targets during thread scheduling on HPC and manycore platforms simultaneously, like maximizing CPU performance, meeting deadlines of time critical tasks, minimizing power and securing thermal resilience, is a major challenge because of associated scalability and thread management overhead. We tackle these challenges by introducing the Thread Control Unit (TCU), a configurable, low-latency, low-overhead hardware thread mapper in compute nodes o... »
Keywords:: InvasIC B3
Dewey Decimal Classification:: 620 Ingenieurwissenschaften
Book / Congress title:: International Supercomputing Conference High Performance -- ISC 2016
Date of congress:: 19 - 22 June
Year:: 2016
Year / month:: 2016-06
Month:: Jun
TUM Institution:: Lehrstuhl für Integrierte Systeme
BibTeX

Occurrences:

mediaTUM Gesamtbestand Hochschulbibliographie 2016 Fakultäten Elektrotechnik und Informationstechnik Integrierte Systeme (Prof. Herkersdorf)

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Engineering Integrierte Systeme (Prof. Herkersdorf)2016