The widespread use of shared memory programming for High Performance Computing (HPC) is currently hindered by two main factors: the limited scalability of architectures with hardware support for shared memory and the abundance of existing programming models. In order to solve these issues, a comprehensive shared memory framework needs to be created which enables the use of shared memory on top of more scalable architectures and which provides a user--friendly solution to deal with the various different programming models. Driven by the first issue, a large number of so--called SoftWare Distributed Shared Memory (SW--DSM) systems have been developed. These systems rely solely on software components to create a transparent global virtual memory abstraction on highly scalable, loosely coupled architectures without any direct hardware support for shared memory. However, they are often affected by inherent performance problems and, in addition, do not solve the second issue of the existence of (too) many shared memory programming models. On the contrary, the large amount of work done in the DSM area has led to a significant number of independent systems, each with its own API, thereby further worsening the situation. The work presented within this thesis therefore takes the idea of SW--DSM systems a step further by proposing a general and open shared memory framework called HAMSTER (Hybrid-dsm based Adaptive and Modular Shared memory archiTEctuRe). Instead of being fixed to a single shared memory programming model or API, this framework provides a comprehensive set of shared memory services enabling the implementation of almost any shared memory programming model on top of a single core. These services are designed in a way that minimizes the complexity for target programming models making the implementation of a large number of different models feasible. This can include both existing and new application or application domain specific programming models easing both the porting of given and the parallelization of new applications. In addition, the HAMSTER framework avoids typical performance problems of SW--DSM systems by relying on so--called NUMA (Non--Uniform Memory Access) architectures which combine scalability and cost effectiveness with limited support for shared memory in the form of non--cache coherent hardware DSM. Their capabilities are directly exploited by a new type of hybrid hardware/software DSM system, the core of the HAMSTER framework. This Hybrid--DSM approach closes the semantic gap between the global physical memory provided by the underlying hardware and the global virtual memory required for shared memory programming enabling applications to directly benefit from the hardware support. On top of this Hybrid--DSM system, the HAMSTER framework defines and implements several independent and orthogonal management modules. This includes separate modules for memory, consistency, synchronization, and task management as well as for the control of the cluster and the global process abstraction. Each of these modules offers typical services required by implementations of shared memory programming models. Combined they form the HAMSTER interface which can then be used to implement shared memory programming models without much effort. This capability is proven through the implementation of a number of selected shared memory programming models on top of the HAMSTER framework. These models range from transparently distributed thread models all the way to explicit put/get libraries and also include various APIs from existing SW--DSM systems with different relaxed consistency models. It therefore covers the whole spectrum of shared memory programming models and underlines the broad applicability of this approach. The presented concepts are evaluated using a large number of different benchmarks and kernels exhibiting the performance details of the individual components. In addition, HAMSTER is used as the basis for the implementation or port of two real--world applications from the area of nuclear medical imaging, more precisely the reconstruction of PET images and their spectral analysis. These experiments cover both the porting of an already existing shared memory application using a given DSM API and the parallelization of an application from scratch using a new, customized API. In both cases, the system provides an efficient platform resulting in a very scalable execution. These experiment, therefore, prove both the wide applicability and the efficiency of the overall HAMSTER framework.
«
The widespread use of shared memory programming for High Performance Computing (HPC) is currently hindered by two main factors: the limited scalability of architectures with hardware support for shared memory and the abundance of existing programming models. In order to solve these issues, a comprehensive shared memory framework needs to be created which enables the use of shared memory on top of more scalable architectures and which provides a user--friendly solution to deal with the various di...
»