Integrated approach of Random Projections and Sparse Grids for Density estimation

Oriolson Rodriguez Ramirez

If you experience problems opening the document, please try this link.

Document type:: Masterarbeit
Author(s):: Oriolson Rodriguez Ramirez
Title:: Integrated approach of Random Projections and Sparse Grids for Density estimation
Abstract:: Sparse Grid Density Estimation faces two challenges when it is applied to extremely high-dimensional and clustered data. The first is the Curse of dimensionalty which makes the required computing power increase exponentially with the number of dimensions. Despite that Sparse Grids moderate the effect of the dimensionality it can reach a point in which a computation is unfeasible. The second challenge is the location of some grid points; at the beginning of the computation the grid spreads all over the domain, includ- ing places where there is no data at all. When clustered data is used all samples are lo- cated around cluster centroids and not scattered all over the domain, this makes the phe- nomenon of placing grid points where they are not really necessary, to accentuate. This unwanted product adds computational cost but does not improve accuracy. Despite that Adaptive Refinement have shown very good results, the computation will carry all the way those inefficient grid points added at the beginning. To reduce the undesired aforementioned effects a pipeline is implemented. This pipeline is divided in two parts: The first one is a Locality Sensitive Hashing algorithm that uses Random Projections to divide the data in subsets that correspond to clusters. The second part applies dimensionality reduction to embed the whole data set and each cluster to a significantly lower dimensional space and then compute density estimations using Sparse Grids. For both parts two different sets of metrics are introduced to measure the efficacy of this implementation. Finally, the Conventional Approach using the whole data set is com- pared to two different methodologies to treat each individual cluster in parallel(Cluster Analysis and Clusters Extraction). Results for a real-world data set and Synthetic one show that the Conventional Ap- proach presents better results than Cluster Analysis in most of the scenarios. However, Cluster Extraction exhibit very good results compare to those two methodologies in 60% to 80% of the proposed numerical experiments. This promising results are consistent even in scenarios when the numerical experiments have very similar computational cost. «
Sparse Grid Density Estimation faces two challenges when it is applied to extremely high-dimensional and clustered data. The first is the Curse of dimensionalty which makes the required computing power increase exponentially with the number of dimensions. Despite that Sparse Grids moderate the effect of the dimensionality it can reach a point in which a computation is unfeasible. The second challenge is the location of some grid points; at the beginning of the computation the grid spreads... »
Supervisor:: Hans-Joachim Bungartz
Advisor:: Severin Reiz
Date of publication:: 15.11.2020
Year:: 2020
Quarter:: 4. Quartal
Year / month:: 2020-11
Month:: Nov
Language:: en
University:: TUM
Faculty:: Fakultät für Informatik
BibTeX

Occurrences:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)2020