One of the main challenges in numerical computing on modern high performance clusters for the simulation of real world phenomena is the efficient handling and management of the simulation domain that is usually distributed among computational resources. This includes the subdivision of the domain, its distribution, and the continued balancing of workload per computational resource during runtime – all while minimising the communication between the participating units. Classical approaches may store the complete topology with each unit. This entails a significant memory requirement and expensive global communication overhead to ensure consistency when the domain configuration is altered for example through AMR or user interaction during runtime. Therefore, such an approach is mostly limited to simple uniform domain configurations with similar physical models. More sophisticated techniques use a subset of the participating processes which act as bookkeeping instances for the domain. While reducing the memory requirement as well as the costs of the global communication, the overhead when communicating with the central bookkeeping instance cannot be circumvented and grows with the size of the systems used. This work addresses these shortcomings by employing a decentral approach to domain organisation. The essential idea is to limit the domain view of each participating unit to their direct neighbours. Transfer of data and updates of topology are only realised between them, hence global updates are not necessary. Since there is an upper bound to the amount of neighbours each subdomain can have, regardless of total domain size, this approach promises to scale even when computing on the largest clusters.
«
One of the main challenges in numerical computing on modern high performance clusters for the simulation of real world phenomena is the efficient handling and management of the simulation domain that is usually distributed among computational resources. This includes the subdivision of the domain, its distribution, and the continued balancing of workload per computational resource during runtime – all while minimising the communication between the participating units. Classical approaches may st...
»