Molecular dynamics simulations of high-density, compute-intensive scenarios are well suited for SIMD vectorization. The simulation kernel of AutoPas, a particle simulation library, is already implemented with manual AVX2 vectorization for x86 architectures, as common compilers are unable to auto-vectorize the code.
The Fujitsu A64FX is an Arm CPU developed for the Fugaku supercomputer of the RIKEN Center for Computational Science in Japan, which leads several HPC performance rankings at the time of writing. To achieve peak performance, it supports Arm SVE, a novel SIMD instruction set extension featuring variable-length vectors and per-lane predication.
In this thesis, AutoPas is optimized to run on the A64FX. Specifically, the computation of the pairwise Lennard-Jones force is manually vectorized for the Arm SVE instruction set. Additional optimizations to hide instruction latency and utilize instruction level parallelism of the A64FX are evaluated, and the performance differences quantified and explained. A speedup factor of 9 compared to the unvectorized version is measured in appropriate simulation scenarios, and the performance is found to be comparable to the existing x86 implementation.
«
Molecular dynamics simulations of high-density, compute-intensive scenarios are well suited for SIMD vectorization. The simulation kernel of AutoPas, a particle simulation library, is already implemented with manual AVX2 vectorization for x86 architectures, as common compilers are unable to auto-vectorize the code.
The Fujitsu A64FX is an Arm CPU developed for the Fugaku supercomputer of the RIKEN Center for Computational Science in Japan, which leads several HPC performance rankings at the tim...
»