The SG++ Datamining Pipeline is a component of the SG++ Toolbox whose main purpose is to provide an interface to generate Machine Learning Models based on Sparse Grid Methods. These are numerical techniques which have been previously proven to be successful in solving tasks handling large high dimensional data sets.
Until recently, the pipeline provided support only to train Sparse Grid based Density Estimation, Classification and Regression Models. In this thesis, we took the task of integrating the support for Clustering Models. We did it so by implementing the Sparse Grid based Clustering Algorithm created by Peherstorfer along with a special augmentation designed by Fischer to generate a Hierarchical Clustering.
Additionally, we implemented a series of metrics used to evaluate the quality of the clustering and a series of processes used to generate an output with the purpose of generating graphical representations of these models. We provide in this thesis the design of our implementation and the results of a series of tests conducted to show and to evaluate its functionality.
«
The SG++ Datamining Pipeline is a component of the SG++ Toolbox whose main purpose is to provide an interface to generate Machine Learning Models based on Sparse Grid Methods. These are numerical techniques which have been previously proven to be successful in solving tasks handling large high dimensional data sets.
Until recently, the pipeline provided support only to train Sparse Grid based Density Estimation, Classification and Regression Models. In this thesis, we took the task of integrati...
»