Beside traditional direct solvers iterative methods offer an efficient alternative for the solution of systems of linear equations which arise in the solution of partial differential equations (PDEs). Among them, multigrid algorithms belong to the most efficient methods based on the number of operations required to achieve a good approximation of the solution. The relevance of the number of arithmetic operations performed by an application as a metric for the complexity of an algorithm wanes since the performance of modern computing systems nowadays is limited by memory latency and bandwidth. Consequently, almost all computer manufacturers nowadays equip their computers with cache-based hierarchical memory systems. Thus, the efficiency of multigrid methods is rather determined by good data locality, i.e. good utilization of data caches, than by the number of arithmetic operations. In this thesis, the cache and memory access behavior of multigrid methods is systematically analyzed for the first time. The analysis is based on an exhaustive study of modern microprocessor memory hierarchies. Detailed runtime as well as theoretical studies of the performance of these methods demonstrate the interaction between multigrid algorithms and deep memory hierarchies. In particular, issues involved with the multilevel nature of the memory hierarchy are addressed. Furthermore, delays due to main memory accesses are clearly revealed as the performance bottlenecks of multigrid methods and their components. Besides the performance bottlenecks, upper limits for the achievable performance of multigrid methods on RISC based microprocessors are determined by means of theoretical models. Based on the knowledge gained from the analysis of multigrid algorithms and microprocessor architectures, new data locality optimization techniques for multigrid methods are proposed. The techniques extend existing code and data layout restructuring techniques and are able to significantly improve data locality and consequently speed up the execution of multigrid algorithms by a multiple. With the improved data locality multigrid methods are able to utilize 15 to 30 per cent of the peak performance on a multitude of modern computer systems. The impact of the techniques is demonstrated with runtime and memory hierarchy behavior measurements as well as theoretical data locality examinations. The applicability of the techniques is demonstrated by means of the DiMEPACK library. DiMEPACK is a multigrid solver for two-dimensional problems with constant coefficients on structured grids. In this thesis, however, aspects of multigrid methods for three-dimensional problems and variable coefficients are discussed as well.
«
Beside traditional direct solvers iterative methods offer an efficient alternative for the solution of systems of linear equations which arise in the solution of partial differential equations (PDEs). Among them, multigrid algorithms belong to the most efficient methods based on the number of operations required to achieve a good approximation of the solution. The relevance of the number of arithmetic operations performed by an application as a metric for the complexity of an algorithm wanes sin...
»