What Is t-SNE?

phoenix@Capricornus

已于 2024-06-10 13:28:30 修改

阅读量250

点赞数 4

分类专栏：模式识别与机器学习文章标签：机器学习

于 2024-06-08 10:46:21 首次发布

本文链接：https://blog.csdn.net/u013600306/article/details/139543712

版权

模式识别与机器学习专栏收录该内容

26 篇文章 0 订阅

订阅专栏

t-SNE (tsne) is an algorithm for dimensionality reduction that is well-suited to visualizing high-dimensional data. The name stands for t-distributed Stochastic Neighbor Embedding. The idea is to embed high-dimensional points in low dimensions in a way that respects similarities between points. Nearby points in the high-dimensional space correspond to nearby embedded low-dimensional points, and distant points in high-dimensional space correspond to distant embedded low-dimensional points. (Generally, it is impossible to match distances exactly between high-dimensional and low-dimensional spaces.)
Barnes-Hut Variation of t-SNE
To speed the t-SNE algorithm and to cut down on its memory usage, tsne offers an approximate optimization scheme. The Barnes-Hut algorithm groups nearby points together to lower the complexity and memory usage of the t-SNE optimization step. The Barnes-Hut algorithm is an approximate optimizer, not an exact optimizer. There is a nonnegative tuning parameter Theta that effects a tradeoff between speed and accuracy. Larger values of ‘Theta’ give faster but less accurate optimization results. The algorithm is relatively insensitive to ‘Theta’ values in the range (0.2,0.8).

The Barnes-Hut algorithm groups nearby points in the low-dimensional space, and performs an approximate gradient descent based on these groups. The idea, originally used in astrophysics, is that the gradient is similar for nearby points, so the computations can be simplified.

Cannot Use Embedding to Classify New Data
Because t-SNE often separates data clusters well, it can seem that t-SNE can classify new data points. However, t-SNE cannot classify new points. The t-SNE embedding is a nonlinear map that is data-dependent. To embed a new point in the low-dimensional space, you cannot use the previous embedding as a map. Instead, run the entire algorithm again.

Performance Depends on Data Sizes and Algorithm
t-SNE can take a good deal of time to process data. If you have N data points in D dimensions that you want to map to Y dimensions, then

Exact t-SNE takes of order D*N2 operations.

Barnes-Hut t-SNE takes of order D*Nlog(N)*exp(dimension(Y)) operations.

So for large data sets, where N is greater than 1000 or so, and where the embedding dimension Y is 2 or 3, the Barnes-Hut algorithm can be faster than the exact algorithm.

在这里插入图片描述