UMAP - /s (strangemonad's notes)

Uniform Manifold Approximation and Projection (UMAP) is a method of dimensionality reduction that's inspired by topology. Topology roughly answers the question "What's the geometry of data". Idea: You can build up topological spaces starting by defining 0-simplices and building up higher-dimensional simplices where the simpler ones can be spanned - you can glue splices along face-maps # Applying Simplicial Complexes to data One way to do that is with Chech Complexes ![[_Media/Chech Complex.png]] Nerve Thm. Given a cover, the topology built up of Chech Complexes is equivalent to the actual topology. To be able to assume that the data is uniformly distributed on the manifold, you need an adaptive Riemannian [[metric]] (locally euclidian but the metric might vary over the manifold) UMAP Adjunction Thm: Given functors - $FinReal: sFuzz \to FinEPMet$ - $FinSing: FinEPMet \to sFuzz$ $FinReal \dashv FinEPMet$ ie FinReal is left-adjoint to FinEPMet You can glue together a family of [[Simplicial Set]] by taking the [[Co-limits]] ## Assessing goodness of fit If we have a low-dimensional dataset, how good of a approximation of a higher-dimensional dataset is it -> Build up the same fuzzy simplex topology -> Minimize the cross-entropy of both topologies ![[_Media/UMAP cross-entropy.png]] You can also inverse transform # References - https://www.youtube.com/watch?v=FD8vKED4Mgc