Soft error rates are increasing as modern architec-
tures require increasingly small features at low voltages. Due
to the large number of components used in HPC architectures,
these are particularly vulnerable to soft errors. Hence, when
designing applications that run for long time periods on large
machines, algorithmic resilience must be taken into account. In
this paper we analyse the inherent resiliency of a-posteriori
limiting procedures in the context of the explicit ADER DG
hyperbolic PDE solver ExaHyPE. The a-posteriori limiter checks
element-local high-order DG solutions for physical admissibility,
and can thus be expected to also detect hardware-induced
errors. Algorithmically, it can be interpreted as element-local
checkpointing and restarting of the solver with a more robust
finite volume scheme on a fine subgrid. We show that the limiter
indeed increases the resilience of the DG algorithm, detecting
and correcting particularly those faults which would otherwise
lead to a fatal failure.
«
Soft error rates are increasing as modern architec-
tures require increasingly small features at low voltages. Due
to the large number of components used in HPC architectures,
these are particularly vulnerable to soft errors. Hence, when
designing applications that run for long time periods on large
machines, algorithmic resilience must be taken into account. In
this paper we analyse the inherent resiliency of a-posteriori
limiting procedures in the context of the explicit ADER DG
hyperb...
»