Shared Memory Programming on NUMA-based Clusters using a General and Open Hybrid Hardware/Software Approach

Schulz, Martin

Martin Schulz

Wenn Sie Schwierigkeiten haben, das Dokument zu öffnen, versuchen Sie auch bitte diesen Link

Originaltitel:: Shared Memory Programming on NUMA-based Clusters using a General and Open Hybrid Hardware/Software Approach
Übersetzter Titel:: Programmieren mit gemeinsamem Speicher auf NUMA-basierten Clustern mittels eines allgemeinen und hybriden Hardware/Software Ansatzes
Autor:: Schulz, Martin
Jahr:: 2001
Dokumenttyp:: Dissertation
Fakultät/School:: Fakultät für Informatik
Betreuer:: Bode, Arndt (Prof. Dr.)
Gutachter:: Hellwagner, Hermann (Prof. Dr.)
Format:: Text
Sprache:: en
Fachgebiet:: DAT Datenverarbeitung, Informatik
Stichworte:: Computer Science; Parallel programming; Shared memory; Commodity cluster; NUMA architectures
Übersetzte Stichworte:: Informatik; Parallele Programmierung; Gemeinsamer Speicher; Cluster aus Standardkomponenten; NUMA Architekturen
Schlagworte (SWD):: Parallelverarbeitung; Gemeinsamer Speicher
TU-Systematik:: DAT 416d
Kurzfassung:: The widespread use of shared memory programming for High Performance Computing (HPC) is currently hindered by two main factors: the limited scalability of architectures with hardware support for shared memory and the abundance of existing programming models. In order to solve these issues, a comprehensive shared memory framework needs to be created which enables the use of shared memory on top of more scalable architectures and which provides a user--friendly solution to deal with the various different programming models. Driven by the first issue, a large number of so--called SoftWare Distributed Shared Memory (SW--DSM) systems have been developed. These systems rely solely on software components to create a transparent global virtual memory abstraction on highly scalable, loosely coupled architectures without any direct hardware support for shared memory. However, they are often affected by inherent performance problems and, in addition, do not solve the second issue of the existence of (too) many shared memory programming models. On the contrary, the large amount of work done in the DSM area has led to a significant number of independent systems, each with its own API, thereby further worsening the situation. The work presented within this thesis therefore takes the idea of SW--DSM systems a step further by proposing a general and open shared memory framework called HAMSTER (Hybrid-dsm based Adaptive and Modular Shared memory archiTEctuRe). Instead of being fixed to a single shared memory programming model or API, this framework provides a comprehensive set of shared memory services enabling the implementation of almost any shared memory programming model on top of a single core. These services are designed in a way that minimizes the complexity for target programming models making the implementation of a large number of different models feasible. This can include both existing and new application or application domain specific programming models easing both the porting of given and the parallelization of new applications. In addition, the HAMSTER framework avoids typical performance problems of SW--DSM systems by relying on so--called NUMA (Non--Uniform Memory Access) architectures which combine scalability and cost effectiveness with limited support for shared memory in the form of non--cache coherent hardware DSM. Their capabilities are directly exploited by a new type of hybrid hardware/software DSM system, the core of the HAMSTER framework. This Hybrid--DSM approach closes the semantic gap between the global physical memory provided by the underlying hardware and the global virtual memory required for shared memory programming enabling applications to directly benefit from the hardware support. On top of this Hybrid--DSM system, the HAMSTER framework defines and implements several independent and orthogonal management modules. This includes separate modules for memory, consistency, synchronization, and task management as well as for the control of the cluster and the global process abstraction. Each of these modules offers typical services required by implementations of shared memory programming models. Combined they form the HAMSTER interface which can then be used to implement shared memory programming models without much effort. This capability is proven through the implementation of a number of selected shared memory programming models on top of the HAMSTER framework. These models range from transparently distributed thread models all the way to explicit put/get libraries and also include various APIs from existing SW--DSM systems with different relaxed consistency models. It therefore covers the whole spectrum of shared memory programming models and underlines the broad applicability of this approach. The presented concepts are evaluated using a large number of different benchmarks and kernels exhibiting the performance details of the individual components. In addition, HAMSTER is used as the basis for the implementation or port of two real--world applications from the area of nuclear medical imaging, more precisely the reconstruction of PET images and their spectral analysis. These experiments cover both the porting of an already existing shared memory application using a given DSM API and the parallelization of an application from scratch using a new, customized API. In both cases, the system provides an efficient platform resulting in a very scalable execution. These experiment, therefore, prove both the wide applicability and the efficiency of the overall HAMSTER framework. «
The widespread use of shared memory programming for High Performance Computing (HPC) is currently hindered by two main factors: the limited scalability of architectures with hardware support for shared memory and the abundance of existing programming models. In order to solve these issues, a comprehensive shared memory framework needs to be created which enables the use of shared memory on top of more scalable architectures and which provides a user--friendly solution to deal with the various di... »
Übersetzte Kurzfassung:: Der Einsatz von Programmiermodellen auf der Basis von gemeinsamem Speicher auf lose gekoppelten Architekturen beruht hauptsächlich auf sogenannten " Distributed Shared Memory" (DSM) Systemen. Sie erzeugen mit Hilfe von Softwaremechanismen eine globale Speicherabstraktion des in Wirklichkeit verteilten Speichers und ermöglichen somit Anwendungen einen transparenten Zugriff auf den gesamten Speicher. Diese Art von Systemen haben sich allerdings bisher kaum durchgesetzt, was im wesentlichen auf zwei Hauptgründe zurückzuführen ist: zum einen sind DSM Systeme, trotz intensiver Forschung in den letzten Jahren, weiterhin mit Performanzproblemen behaftet, die den Einsatz für eine ganze Reihe von Anwendung unattraktiv machen. Zum anderen existiert zur Zeit eine Vielzahl verschiedener Systeme mit unterschiedlichen APIs und Ausführungsmodellen, wodurch die Portierung von Anwendungen erheblich erschwert wird. Um diese zwei Probleme anzugehen, wurde im Rahmen dieser Arbeit ein System, genannt HAMSTER (" Hybrid-dsm based Adaptive and Modular Shared memory archiTEctuRe" ), entwickelt. Es ist speziell für NUMA basierte Clusterarchitekturen gedacht, da diese ein Minimum an Hardwareunterstützung für einen gemeinsamen Speicher mit den Vorteilen von lose gekoppelten Systemen, wie Skalierbarkeit und Kosteneffizienz, verbinden. Diese Hardwareunterstützung wird direkt zur Implementierung eines Hybrid-DSM Systemes verwendet, das damit die typischen DSM Performanzprobleme vermeiden kann. Auf diesem DSM Kern aufbauend bietet HAMSTER eine große Palette von Diensten, aufgeteilt in mehrere orthogonale Module, die zur Implementierung von beliebigen Programmiermodellen mit gemeinsamem Speicher eingesetzt werden können. Dabei ist das System so gestaltet, daß derartige Implementierungen mit sehr wenig Komplexität verbunden sind. Dies ermöglicht die Erstellung vieler verschiedener Programmiermodelle auf der Basis eines einzelnen Kernes, was die Portierung von Anwendungen deutlich vereinfacht. Das HAMSTER System bietet somit eine offene und flexible Umgebung, die sich sehr einfach auf neue Anwendungs- und Programmiermodellszenarien anpassen läßt. Dies wurde im Rahmen der Arbeit an Hand mehrerer ausgewählter Programmiermodelle mit unterschiedlichen Anforderungen, von vollen Threadmodellen bis hin zu Emulationen von APIs bekannter DSM Systeme, gezeigt.
Veröffentlichung:: Universitätsbibliothek der TU München
WWW:: https://mediatum.ub.tum.de/?id=601700
Eingereicht am:: 24.04.2001
Mündliche Prüfung:: 06.07.2001
Dateigröße:: 1503371 bytes
Seiten:: 226
Urn (Zitierfähige URL):: https://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:91-diss2001070616856
Letzte Änderung:: 03.07.2007
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Elektronische Prüfungsarbeiten School TUM School of Computation, Information and Technology

mediaTUM Gesamtbestand Elektronische Prüfungsarbeiten Fachgebiet Datenverarbeitung, Informatik

mediaTUM Gesamtbestand Elektronische Prüfungsarbeiten Fachgebiet

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Prüfungsarbeiten Dissertationen