Lattice Boltzmann methods (LBM) are used for massively parallel computational fluid dynamics simulations since they are easily parallelizable with a perfectly parallel and local in space collision step and a streaming step that only transfers data between neighboring grid points. Current CPU hardware architectures focus on increasing parallelism through additional CPU cores and wider vector instruction sets. To benefit from these developments parallel LBM schemes need to be designed with these concepts of parallelism in mind. This paper presents a new easily automatically vectorizable LBM streaming scheme for directly addressed grids which is based on the A-A pattern streaming algorithm. Combined with several implementation techniques the new algorithm provides a speedup of more than three compared to an unvectorized implementation. The algorithm also provides implementation benefits compared to the A-A pattern algorithm.
«