Integration and Visualization of Sparse-Grid based Clustering Methods in the SG++ Datamining Pipeline

Vincent Bennet Bautista Anguiano

If you experience problems opening the document, please try this link.

Document type:: Masterarbeit
Author(s):: Vincent Bennet Bautista Anguiano
Title:: Integration and Visualization of Sparse-Grid based Clustering Methods in the SG++ Datamining Pipeline
Translated title:: Integration und Visualisierung von dünngitter basierten Clustering Methoden in der SG++ Datamining Pipeline
Abstract:: The SG++ Datamining Pipeline is a component of the SG++ Toolbox whose main purpose is to provide an interface to generate Machine Learning Models based on Sparse Grid Methods. These are numerical techniques which have been previously proven to be successful in solving tasks handling large high dimensional data sets. Until recently, the pipeline provided support only to train Sparse Grid based Density Estimation, Classification and Regression Models. In this thesis, we took the task of integrating the support for Clustering Models. We did it so by implementing the Sparse Grid based Clustering Algorithm created by Peherstorfer along with a special augmentation designed by Fischer to generate a Hierarchical Clustering. Additionally, we implemented a series of metrics used to evaluate the quality of the clustering and a series of processes used to generate an output with the purpose of generating graphical representations of these models. We provide in this thesis the design of our implementation and the results of a series of tests conducted to show and to evaluate its functionality. «
The SG++ Datamining Pipeline is a component of the SG++ Toolbox whose main purpose is to provide an interface to generate Machine Learning Models based on Sparse Grid Methods. These are numerical techniques which have been previously proven to be successful in solving tasks handling large high dimensional data sets. Until recently, the pipeline provided support only to train Sparse Grid based Density Estimation, Classification and Regression Models. In this thesis, we took the task of integrati... »
Keywords:: Sparse Grids; Data Mining; Clustering; SG++
Supervisor:: Hans-Joachim Bungartz
Advisor:: Paul-Cristian Sarbu
Year:: 2020
Quarter:: 2. Quartal
Year / month:: 2020-04
Month:: Apr
Language:: en
University:: Technical University of Munich
Faculty:: Fakultät für Informatik
BibTeX

Occurrences:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)2020