Heterogeneous resources such as GPUs, FPGAs and an ever increasing node and core count complicate design and implementation of HPC applications. The traditional bulk synchronous parallelism model and the static approach that applications implementing it follow do not seem to fit these new circumstances. Although the implementation of efficient applications in the aforementioned manner is possible, developers now have to depend on an ever increasing number of frameworks and libraries, for example a combination of MPI for inter-node communication, OpenMP for intra-node communication and CUDA for GPU-specific components.
We propose Invasive Computing as an alternative. Invasive applications are implemented in invadeX10, an APGAS language using an actor library, actorX10. Resources are allocated at execution time based on application requirements and the overall system situation. Actors schedule themselves based on information available in their connected channels, which helps to avoid explicit global barriers. Invasive computing requires support from the entire compute stack. To demonstrate the benefits of our approach, we created a demonstration system encompassing the entire compute stack, including custom hardware, operating system, compiler and applications.
For the application layer, we built SWE-X10, an actor-based and locally coordinated tsunami simulation. It uses actorX10 for coordination to implement a time stepping scheme without global barriers. Furthermore, we introduced lazy activation of actors, where actors are gradually enabled once the wave reaches their part of the simulation domain. We also integrated SWE-X10 into the invasive stack. SWE-X10 consistently beats SWE, a prior, MPI+OpenMP-based version in weak scaling tests. Using lazy activation in a radial dam break test scenario, we were able to reduce the computation time (in core hours) by ~40%. Finally, using a custom-built instruction on the invasive platform prototype, we obtained an 3x speed-up for iterations for actors using the custom instruction.
One of our goals is to make the actor-based computational paradigm available to a wider audience. Therefore, I plan to implement an actor framework in UPCxx during my stay at Berkeley Lab. As a first use-case, the framework will be integrated with my tsunami application.
«
Heterogeneous resources such as GPUs, FPGAs and an ever increasing node and core count complicate design and implementation of HPC applications. The traditional bulk synchronous parallelism model and the static approach that applications implementing it follow do not seem to fit these new circumstances. Although the implementation of efficient applications in the aforementioned manner is possible, developers now have to depend on an ever increasing number of frameworks and libraries, for example...
»