The growing trend towards larger HPC systems and bigger jobs is introducing a new challenge to MPI, a widely accepted parallel programming approach. The massive process space raises problems in terms of scalability and resilience. In the current MPI, all processes are added to MPI_COMM_WORLD during initialization thereby increasing the initialization time and memory requirements at a massive scale. To overcome this issue, MPI forum is actively thinking about a new approach, Peer to Peer process model, where an application has the freedom to initial MPI resources based on its communication requirements. The fundamental goal of this new concept is to remove the scalability barriers by no longer requiring the MPI_COMM_WORLD.
The Peer to Peer process model is implemented using MPI Sessions that proposes a fundamental change in organizing and addressing MPI processes. In this thesis work, we demonstrate and evaluate a working prototype of MPI Sessions. The core of this design adheres to the design decisions and suggestions made by the "Sessions working group" in MPI forum. Our work is implemented as a shared library in C language, that can be linked to any MPI application written in OpenMPI. We demonstrate the working of MPI Sessions with static as well as dynamic process sets. In static process sets, processes remain in the same set throughout their life whereas, in dynamic process sets, processes may join or leave the set anytime.
Besides taking caring of the scalability issue, MPI Sessions ensure a tighter integration of the application with the runtime system like resource managers, job launchers etc. In our design, we chose a recently developed, open-source, and scalable Resource and Job Management Software (RJMS), FLUX, as the runtime system. Our library uses the distributed Key-Value store, provided by Flux, that serves as a fundamental building block for the system. Any MPI process can access and query this key-value store for runtime information based on which MPI internal resources are allocated for communication requirements.
«
The growing trend towards larger HPC systems and bigger jobs is introducing a new challenge to MPI, a widely accepted parallel programming approach. The massive process space raises problems in terms of scalability and resilience. In the current MPI, all processes are added to MPI_COMM_WORLD during initialization thereby increasing the initialization time and memory requirements at a massive scale. To overcome this issue, MPI forum is actively thinking about a new approach, Peer to Peer process...
»