Dynamic resource management is critical for improving efficiency in high-performance computing (HPC) environments, where workloads often face fluctuating computational demands. Traditional HPC applications often make use of the Message Passing Interface (MPI), which relies on static hardware resource allocation. This prevents applications from dynamically adjusting their resource allocation during execution. This limitation leads to inefficiencies, particularly in multi-workload environments where resource availability and demand can vary significantly.
This work compares two elastic MPI implementations, FLEX-MPI and DynRes, that aim to introduce dynamic scaling into MPI-based applications. FLEX-MPI allows for load balancing in a single MPI program within a fixed resource allocation by making use of real-time performance monitoring and hardware counters to redistribute workloads among existing processes. In contrast, DynRes enables full dynamic resource management, allowing applications to add or remove processes at runtime using MPI Sessions and PMIx.
Through a comparative analysis, this work evaluates ease of integration, system compatibility, and performance impact of both approaches. FLEX-MPI offers a lightweight and easily adoptable solution for heterogeneous environments, while DynRes provides greater optimization potential at the cost of more complexity due to deeper integration with resource managers and schedulers. The findings highlight the trade-offs between ease of integration and optimization potential, Ultimately, the best choice between the two depends on the target application’s need for adaptive load balancing versus full dynamic process scaling. Future work should explore hybrid approaches that combine the strengths of both methods to enhance MPI scalability in modern HPC environments.
«
Dynamic resource management is critical for improving efficiency in high-performance computing (HPC) environments, where workloads often face fluctuating computational demands. Traditional HPC applications often make use of the Message Passing Interface (MPI), which relies on static hardware resource allocation. This prevents applications from dynamically adjusting their resource allocation during execution. This limitation leads to inefficiencies, particularly in multi-workload environments whe...
»