High-performance computing is an important field of scientific computing with many
problems offering the possibility of achieving speedups through high levels of parallelization.
One framework for programming such a parallelized program is the actor
model. This approach establishes the Single Program Multiple Data (SPMD) principle
through actors advancing the program and communicating with each other through
specified channels. Especially in exascale computing, undetected data corruptions
in an actor can have devastating effects on program executions. In order to detect
possible data corruptions, I propose to employ double redundancy through full replication
of actors. Redundantly computed results can be checked against each other to
find errors. Another important task in high-performance computing is balancing the
workload evenly between cores. While other approaches achieve promising results on
scenarios where imbalances are predictable, they cannot protect the program against
non-static and unpredictable imbalances. For these applications, the possibilty of load
balancing through redundancy is explored. Here, when an actor is slowed down due
to imbalances, its replica can take over and complete the computations, reducing the
waiting times of neighboring actors. Using replication, errors within the actor model
were observed to be detected with a particularly high accuracy under the sacrifice of
runtime. Additionally, the idle time of the actors in unbalanced scenarios was reduced
dramatically using load balancing through redundancy.
«
High-performance computing is an important field of scientific computing with many
problems offering the possibility of achieving speedups through high levels of parallelization.
One framework for programming such a parallelized program is the actor
model. This approach establishes the Single Program Multiple Data (SPMD) principle
through actors advancing the program and communicating with each other through
specified channels. Especially in exascale computing, undetected data corruptions...
»