Efficient star query processing is crucial for a performant data warehouse (DW) implementation and much work is available on physical optimization (e.g., indexing and schema design) and logical optimization (e.g., pre-aggregated materialized views with query rewriting). Organizing fact tables with clustering multidimensional access methods (like the UB-Tree) are a promising approach to speed up star queries. However, the implementation into commercial products has not been done so far, since in addition to the clustering index organization, many parts of a database management system must be extended. For example, the query optimizer with corresponding cost model modifications must support the new organization and various optimization topics. In this thesis, we present EHC, the Encoding for Hierarchical Clustering in combination with UB-Trees. EHC enables the use of clustering index structures also for hierarchical data. EHC is extended to MHC, the multidimensional hierarchical clustering by combining multiple dimensions. Based on the concept of MHC, we develop a number of query optimization algorithms, in order to support hierarchical clustering with query processing. For this purpose, we present a complete abstract processing plan that captures all necessary steps in evaluating star queries in these environments. One important step in the query processing phase is, however, still a bottleneck: the residual join of results from the fact table with the dimension tables in combination with grouping and aggregation. This phase typically consumes between 50% and 80% of the overall processing time. In typical data warehouse scenarios pre-grouping methods only have a limited effect as the grouping is usually specified on the hierarchy levels of the dimension tables and not on the fact table itself. Therefore, we suggest a combination of hierarchical clustering and pre-grouping. Exploiting hierarchy semantics for the pre-grouping of fact table result tuples is several times faster than conventional query processing. The reason for this is that hierarchical pre-grouping reduces the number of join operations significantly. With this method even queries covering a large part of the fact table can be executed within a time span acceptable for interactive query processing. All these concepts have been implemented during this thesis into the commercial database management system Transbase® Hypercube and already run productive at a couple of customers of Transaction Software GmbH. During the implementation further problems occurred, like complex aggregate expressions, multiple query boxes, non-clustering dimensions, complex schemata, multi-fact-table-joins etc. For these problems, solutions are described and have been implemented. We further address some theoretical aspects of multiple hierarchies and dynamic changes of surrogates and a complete hierarchy model. Finally, we present measurement results of a complex real-world sales transaction data warehouse of an electronic retailer and of the APB standard benchmark for OLAP. These measurements show the benefit of the implemented methods compared to conventional state of the art techniques and database management systems.
«
Efficient star query processing is crucial for a performant data warehouse (DW) implementation and much work is available on physical optimization (e.g., indexing and schema design) and logical optimization (e.g., pre-aggregated materialized views with query rewriting). Organizing fact tables with clustering multidimensional access methods (like the UB-Tree) are a promising approach to speed up star queries. However, the implementation into commercial products has not been done so far, since in...
»