Given the highly complementary nature of data mining and data warehousing it seems obvious that data mining should be performed as an integral part of the analysis process directly on the data already in the warehouse. In this paper we focus on frequent itemset processing and a tight integration approach. We introduce a novel concept to calculate candidate supports, called StreamJoin, as well as the corresponding pruning strategy to effectively reduce search complexity. We show how this approach can be efficiently embedded within a database engine, thus being able to exploit query optimization as well as parallel execution. Our approach avoids costly database scan operations, additional disk spoolings, intermediate blocking or preparatory phases. In contrast to other strategies, it yields a uniform processing within a single query execution plan and can be easily expressed and referenced via SQL-like interfaces.
«
Given the highly complementary nature of data mining and data warehousing it seems obvious that data mining should be performed as an integral part of the analysis process directly on the data already in the warehouse. In this paper we focus on frequent itemset processing and a tight integration approach. We introduce a novel concept to calculate candidate supports, called StreamJoin, as well as the corresponding pruning strategy to effectively reduce search complexity. We show how this approach...
»