Arrays of arbitrary size and dimensionality appear in a large variety of database application fields, e.g., medical imaging, geographic information systems, scientific simulations, and also business-oriented applications like Online Analytical Processing (OLAP) and data mining. Recently, integration of an application domain-independent and dimensionality-independent type constructor for such Multi-dimensional Discrete Data (MDD) into Database Management Systems receives growing attention. Current scientific contributions in this area mainly focus on MDD algebra and specialized storage architectures. Since MDD values may occur in the scale of several megabytes and, compared to scalar values, operations on these values are very complex, their efficient evaluation becomes a critical factor for the overall query response time. Although the management of MDD values shifts the demands on query processing fundamentally, there has never been a systematic study on specific query optimization on both logical and physical level combined with efficient evaluation of MDD queries. In this thesis, we want to close this gap: We develop a generic Abstract Data Type (ADT) for MDD and integrate it into an adapted relational model by allowing the newly introduced MDD expressions in selection conditions and as parameters of the novel application operation which is an extension of relational projection. With this model serving as a formal base, a comprehensive list of algebraic transformation rules together with an optimization heuristics is provided. Specialized evaluation algorithms based on a tiled storage layout are presented which optimize array query processing both in terms of speed and memory usage. We proceed with an examination of the MDD specific cost structure for array query processing. The main responsible parameters are summarized in the Array Cost Model which, e.g., is used to make cost-based decisions for different alternative evaluation plans. The techniques presented are implemented in the operational Array DBMS RasDaMan. We provide an outline of the system architecture. The integration of the MDD ADT into the query language as well as the query processing module including optimizer and executor are described in more detail. Finally, a performance study based on synthetic data as well as on real-life data from the European Computerized Human Brain Database Project (ECHBD) proves practical benefits of the presented techniques.
«
Arrays of arbitrary size and dimensionality appear in a large variety of database application fields, e.g., medical imaging, geographic information systems, scientific simulations, and also business-oriented applications like Online Analytical Processing (OLAP) and data mining. Recently, integration of an application domain-independent and dimensionality-independent type constructor for such Multi-dimensional Discrete Data (MDD) into Database Management Systems receives growing attention. Curren...
»