With a widening gap between processor speed and communication latencies, overlap of computation and communication becomes increasingly important. At the same time, variable CPU clock frequencies (e.g., with DVFS and Turbo Boost) and novel numerical techniques such as local time stepping make it
challenging to balance parallel execution times, even in the case of balanced computational load. This limits parallel efficiency. In order to tackle these challenges, emerging runtime systems may be used. In this paper, we present a thorough study of four selected parallelization frameworks -- Chameleon, HPX, Charm++ and UPC++ -- in a proxy application for solving the shallow water equations. In addition, we augment the traditional MPI baseline variant with support for these frameworks and evaluate them in detail with respect to strong scaling efficiency and load balancing for global and local time stepping.
«
With a widening gap between processor speed and communication latencies, overlap of computation and communication becomes increasingly important. At the same time, variable CPU clock frequencies (e.g., with DVFS and Turbo Boost) and novel numerical techniques such as local time stepping make it
challenging to balance parallel execution times, even in the case of balanced computational load. This limits parallel efficiency. In order to tackle these challenges, emerging runtime systems may be us...
»