In this work, we develop concepts for implementing a logging and recovery component to deal with node crashes in a shared-disk system environment. This is done, using several previously published strategies and adapting those algorithms to fit our special system needs. Our environment is characterized by the following issues: the global lock manager is statically distributed among the system's nodes and employs a hierarchical synchronization protocol for efficient transaction processing. A local lock manager on each node administers local lock requests. Committed modifications to a data page are administrated by the respective page owner, which may - in order to allow maximum adaptability - dynamically migrate across the system. These distribution aspects evoke special problems for the logging component of the system, which are discussed in this paper in detail. We present two logging strategies and discuss their tradeoffs with respect to system performance, recovery costs and reliability. At last, we show that our recovery component is able to reconstruct any runtime system information as well as corrupted pages after a single or multiple node failure, simply by collecting intact information residing on surviving nodes and reading the local log data from permanent storage.
«
In this work, we develop concepts for implementing a logging and recovery component to deal with node crashes in a shared-disk system environment. This is done, using several previously published strategies and adapting those algorithms to fit our special system needs. Our environment is characterized by the following issues: the global lock manager is statically distributed among the system's nodes and employs a hierarchical synchronization protocol for efficient transaction processing. A local...
»