The Fujitsu A64FX and the inclusion of Arm’s SVE have shown promising results in the field
of HPC. Especially for numerical applications, the novel extension of the Aarch64 ISA has
been shown to yield significant performance boosts for applications that are correctly ported
to Arm architectures. This work extends the matrix multiplication kernel generator PSpaMM
to allow generating SVE instructions and analyzes the measured results. We benchmark
multiplication kernels generated by PSpaMM containing SVE and NEON instructions, as
well as matrix multiplication kernels generated by LIBXSMM. We show that SVE-based
kernels can provide a performance boost of a factor of 6.3 for small matrix multiplication
kernels when compared to PSpaMM’s NEON kernels. Benchmarks including dense-by-sparse
multiplication kernels show that the SVE kernels achieve increased performances by a factor
of 3.8 compared to their NEON counterparts. Finally, we observe that PSpaMM’s SVE
generator can compete performance-wise with the more optimized math library LIBXSMM.
«
The Fujitsu A64FX and the inclusion of Arm’s SVE have shown promising results in the field
of HPC. Especially for numerical applications, the novel extension of the Aarch64 ISA has
been shown to yield significant performance boosts for applications that are correctly ported
to Arm architectures. This work extends the matrix multiplication kernel generator PSpaMM
to allow generating SVE instructions and analyzes the measured results. We benchmark
multiplication kernels generated by PSpaMM co...
»