We present a distributed-memory algorithm for the hierarchical compression of SPD matrices. Our method is based on GOFMM, an algorithm that appeared in doi:10.1145/3126908.3126921.
For many SPD matrices, GOFMM enables compression and approximate matrix-vector multiplication in NlogN time---as opposed to quadratic work required for a dense matrix. But GOFMM supports only shared memory parallelism. In this paper, we use the message passing interface, extending the ideas of GOFMM to the distributed memory setting. We also introduce an asynchronous algorithm for faster multiplication. We present different usage scenarios of SPD matrices that are related to graphs, neural-networks, and covariance operators. We also compare with STRUMPACK, which, to our knowledge, is the only other parallel software that can compress arbitrary SPD matrices. In our largest run, we were able to compress a 67M-by-67M matrix within three minutes and perform a multiplication with 512 vectors within 5 seconds on 6,144 Intel Skylake cores.
«
We present a distributed-memory algorithm for the hierarchical compression of SPD matrices. Our method is based on GOFMM, an algorithm that appeared in doi:10.1145/3126908.3126921.
For many SPD matrices, GOFMM enables compression and approximate matrix-vector multiplication in NlogN time---as opposed to quadratic work required for a dense matrix. But GOFMM supports only shared memory parallelism. In this paper, we use the message passing interface, extending the ideas of GOFMM to the distribu...
»