As high-performance computing is starting to reach exascale and the amount of in- terconnected nodes of supercomputers is ever increasing, parallel applications must be extremely scalable, a property which is limited by the degree of parallelization the used algorithms provides, but which can also suffer greatly from load imbalances and processor downtimes due to the delay of data movement. For parallel runtime systems, it is highly desireable to promote the overlap of communication and computation, to evenly distribute load and to aid programmers in designing applications which exploit the given hardware as best as possible. This work presents the different notions and paradigms used by the parallel runtime systems UPC++, Charm++ and AMPI in order to achieve these goals. An overview of previous benchmarks is given, which already show significant performance increases in some applications by leveraging the features of the respective parallelization system. The Shallow Water Equations model will be introduced and the discretization, domain decomposition and parallelization strategy for implementing it will be elaborated on. The actual implementations using each of the aforementioned frameworks are described, which differ only in the way data is being communicated. Finally, the performance benchmarks of each implementation are illustrated, showing that all runtime systems are able to consistently perform on an equivalent level on par with the MPI reference implementation.
«
As high-performance computing is starting to reach exascale and the amount of in- terconnected nodes of supercomputers is ever increasing, parallel applications must be extremely scalable, a property which is limited by the degree of parallelization the used algorithms provides, but which can also suffer greatly from load imbalances and processor downtimes due to the delay of data movement. For parallel runtime systems, it is highly desireable to promote the overlap of communication and computat...
»