Modern GPU hardware offers substantial computational throughput for wave simulation, but realizing this potential and maximizing the productivity of development in high-level languages requires careful design choices. This thesis investigates the use of Julia with the ParallelStencil.jl framework to develop high-performance, hardware-agnostic finite difference solvers for scalar wave propagation and FWI, targeting both CPU and GPU backends from a single codebase without hand-written CUDA kernels. Three finite difference schemes are implemented and benchmarked: central difference, fourth-order spatial, and the Lax-Wendroff fourth-order. Performance is evaluated in terms of memory throughput, since all solvers operate in the memory-bandwidth-limited regime on modern GPU hardware. The Julia implementation approaches theoretical hardware limits and outperforms an equivalent Python/CuPy reference, particularly for 64-bit arithmetic and at moderate problem sizes where kernel launch latency is significant. Running on GPU, the solver achieves approximately 30× speedup over single-threaded CPU execution and 3× over multithreaded CPU. Higher-order schemes are shown to be significantly more computationally efficient than the second-order baseline for problems requiring low dispersion error, with the efficiency advantage growing substantially when extended to 3D. FWI is demonstrated using the L-BFGS optimizer, verifying the full performance across all three solvers.
«
Modern GPU hardware offers substantial computational throughput for wave simulation, but realizing this potential and maximizing the productivity of development in high-level languages requires careful design choices. This thesis investigates the use of Julia with the ParallelStencil.jl framework to develop high-performance, hardware-agnostic finite difference solvers for scalar wave propagation and FWI, targeting both CPU and GPU backends from a single codebase without hand-written CUDA kernels...
»