## Gotchas
- The hyper-parameters really matter and affect the output.
- Cluster sizes don't have any meaning
- Distances between clusters ***might*** not mean anything
- Randomness isn't uniform - randomness has clumps
## Questions
- Why can't I back in to a perplexity value that's mediated by the number of points?
## Raw notes
- non-linear, non-uniform, stochastic transformation
- t-SNE distance is an approximation of regional density variations in the dataset.
- densities will tend to be equalized by design
- perplexity roughly controls how many neighbors we should consider being close to
- it can be roughly interpreted as balancing local and global geometry of the space
- how perplexity acts depends on the number of points
- There isn't a single perplexity value that works across all clusters in the underlying data
- You can still see the local clusters but global geometry will get lost outside of the perplexity sweet spot.
- Randomness isn't uniform - randomness has clumps
- at different levels of perplexity you might see those clumps as what appears to be structured clusters.
- at higher perplexity, the visualization is quite good in some respects. high-dimensional normal distributions a very close to a uniform distribution on a sphere.
## Resources
- [How to Use t-SNE Effectively (distill.pub)](https://distill.pub/2016/misread-tsne/)
---
- Links: [[Student-T distribution]] [[Principal component analysis]] [[Dimensionality Reduction]] [[Data Visualization]]
- Created at: 2023-06-21