Graphics Processing Units (GPUs) are indispensable in modern high-performance
computing (HPC) systems due to their exceptional parallel processing capabilities.
However, their performance is often constrained by the overhead of memory trans-
fers between the CPU and GPU, particularly in memory-bound tasks. This study
investigates two optimization techniques—bit-level data packing and buffer aggrega-
tion—to improve memory transfer efficiency in ExaHyPE2, an open-source framework
for solving hyperbolic partial differential equations.
Experimental results demonstrate that buffer aggregation is the most effective strategy
for optimizing GPU kernel execution in ExaHyPE2, significantly reducing synchroniza-
tion overhead. Multi-threading was also explored to overlap memory transfers and
kernel execution, but it yielded minimal benefits. While data packing alone did not
show substantial performance improvements in ExaHyPE2 kernels, it proved highly
effective for memory-bound tasks and exhibited enhanced performance when combined
with buffer aggregation.
The proposed optimization strategies offer valuable insights into addressing memory
transfer challenges and have broad applicability to HPC applications requiring efficient
GPU offloading.
«
Graphics Processing Units (GPUs) are indispensable in modern high-performance
computing (HPC) systems due to their exceptional parallel processing capabilities.
However, their performance is often constrained by the overhead of memory trans-
fers between the CPU and GPU, particularly in memory-bound tasks. This study
investigates two optimization techniques—bit-level data packing and buffer aggrega-
tion—to improve memory transfer efficiency in ExaHyPE2, an open-source framework
for solvin...
»