We present the design of a system integrated load management facility for parallel applications that execute in systems of interconnected heterogeneous workstations. Cluster based parallel computing offers an attractive alternative to the usage of dedicated parallel machines due to advances in hardware and software technologies. One of the biggest issues in such systems is the development of effective techniques for the distribution of the processes of parallel applications to multiple processors. During the last years a large amount of load management techniques has been developed both for parallel and distributed systems. However, the adoption of load management techniques from one platform to another one is difficult due to different system implications. One of the main purposes of this design document is to uncover detailed constraints that have to be considered during the decision phase of the load management cycle. On the one hand side, heterogeneity of the system and its time-sharing usage model introduce additional implications that are generally outside the control of load management systems. Nevertheless, they have major impact on the behaviour and functionality of a load manager. We classify the term heterogeneity according its impact on the load management design. On the other hand side, management overhead might outweigh the expected performance benefits. Complex regulation operations like process migration or reconfiguration of the set of execution nodes introduce management costs that have to be considered during the decision phase. Therefore, we introduce a new concept of cost sensitive load management. The second part covers the design of our approach. A structured modeling technique is used that supports functional decomposition of the extended management control loop.
«