We make use of the DGX A100 platform for solving a 3D, fully-coupled Earth-Air wave propagation model using the open-source software SeisSol. Each A100 GPU is managed by a single MPI process with 2 dedicated threads: for control and progressing non-blocking MPI communication. In this work, we apply our new GPU code generation approach for batched GEMM computations. We also use CUDA graphs and capturing for reducing kernel launch overheads while working with our Local Time Stepping scheme. Additionally, we show that Singularity containerization leads to a negligible performance loss of ≈ 1.5% compared to a bare-metal installation of SeisSol for our scenario.
«
We make use of the DGX A100 platform for solving a 3D, fully-coupled Earth-Air wave propagation model using the open-source software SeisSol. Each A100 GPU is managed by a single MPI process with 2 dedicated threads: for control and progressing non-blocking MPI communication. In this work, we apply our new GPU code generation approach for batched GEMM computations. We also use CUDA graphs and capturing for reducing kernel launch overheads while working with our Local Time Stepping scheme. Additi...
»