t-Distributed Stochastic Neighbor Embeddings

## Gotchas - The hyper-parameters really matter and affect the output. - Cluster sizes don't have any meaning - Distances between clusters ***might*** not mean anything - Randomness isn't uniform - randomness has clumps ## Questions - Why can't I back in to a perplexity value that's mediated by the number of points? ## Raw notes - non-linear, non-uniform, stochastic transformation - t-SNE distance is an approximation of regional density variations in the dataset. - densities will tend to be equalized by design - perplexity roughly controls how many neighbors we should consider being close to - it can be roughly interpreted as balancing local and global geometry of the space - how perplexity acts depends on the number of points - There isn't a single perplexity value that works across all clusters in the underlying data - You can still see the local clusters but global geometry will get lost outside of the perplexity sweet spot. - Randomness isn't uniform - randomness has clumps - at different levels of perplexity you might see those clumps as what appears to be structured clusters. - at higher perplexity, the visualization is quite good in some respects. high-dimensional normal distributions a very close to a uniform distribution on a sphere. ## Resources - [How to Use t-SNE Effectively (distill.pub)](https://distill.pub/2016/misread-tsne/) --- - Links: [[Student-T distribution]] [[Principal component analysis]] [[Dimensionality Reduction]] [[Data Visualization]] - Created at: 2023-06-21