Master's thesis submission talk (Informatics). Oriolson is advised by Severin Reiz.
Previous talks at the SCCS Colloquium
Oriolson Rodriguez Ramirez: Integrated approach of Random Projections and Sparse Grids for Density estimation
SCCS Colloquium |
Sparse Grid Density Estimation faces two challenges when it is applied to extremely high-dimensional and clustered data: the Curse of dimensionality and the location of some grid points. When clustered data is used all samples are located around cluster centroids and not scattered all over the domain, this makes the phenomenon of placing grid points where they are not really necessary, to accentuate. This unwanted product adds computational cost but does not improve accuracy. Despite that Adaptive Refinement have shown very good results, the computation will still carry all the way those inefficient grid points added at the beginning.
To reduce the undesired aforementioned effects a pipeline is implemented. It is divided in two parts: The first one is a Locality Sensitive Hashing algorithm that uses Random Projections to divide the data in subsets that correspond to clusters. The second part applies dimensionality reduction to embed the whole data set and each cluster to a significantly lower dimensional space and then compute density estimations using Sparse Grids.
One of our methodologies exhibits very good results in 60% to 80% of the proposed numerical experiments, even in scenarios with very similar computational cost.