高维数据可视化工具：t-SNE

福斯基

已于 2022-05-30 21:48:06 修改

阅读量632

点赞数

分类专栏：天文学数据文章标签：机器学习深度学习人工智能

于 2022-05-08 00:42:50 首次发布

本文链接：https://blog.csdn.net/cool_bot/article/details/124615089

版权

t-SNE（学生t分布随机近邻嵌入）是一种用于高维数据可视化的工具，尤其适用于数据量小于100的情况。它能够将高维数据映射到2或3维平面，保持局部结构。t-SNE通过解决“拥挤问题”避免了高维数据在低维空间中聚类的困难，使用学生t分布来优化相似性。该方法在分类任务中表现出色，如手写数字识别和鸢尾花分类。尽管t-SNE存在对高维数据表现不佳和对初始参数敏感等弱点，但其在保留局部结构方面优于其他降维方法。

摘要由CSDN通过智能技术生成

Introduce

Base on Visualizing Data using t-SNE.
How to Use t-SNE Effectively
example code

Student t-distribution Stochastic Neighbor Embedding (t-SNE) is a kinds of vsiualize tools, also a clssifier for high-dimmetion(high-dim) data (well proform in $N < 100$ ), which can embed and clssify the high-dim data into 2 or 3-dim plane clusters. It’s a kinds of unsupervise Mechine Learing (ML) technique.

64 pixels hand writes digits classify

64 pixels hand writes digits classify & flower species classify (4-input characteristic)

Classifying of data, essentially is a kinds of dimensionality reduction, ML in here is used to, mininmize something (cost function) with larger parameters and non-linear.

Stochastic Neighbor Embedding (SNE), Basis idea of SNE and it’s math

SNE is expeted to find a faithful representation in low-dim for high-dim date, which preserve the small daistant (local) structure and reflect the large daistant of high-dim data.

conditional probability

Assume the one sample of high-dim data can be write down as a vector $y_i$ and it’s mapping point in low-dim is $x_i$ . Then define to conditional probability $p_{j|i}, q_{j|i}$ :

$x_i \ \ \ \ \text{high-dim vector, a fixed piont as real data in high-dim} \\ y_i \ \ \ \ \text{mapping low-dim vector, movable and mapping point in low-dim}$

$p_{j \mid i}=\frac{\exp \left(-\left\|x_{i}-x_{j}\right\|^{2} / 2 \sigma_{i}^{2}\right)}{\sum_{k \neq i} \exp \left(-\left\|x_{i}-x_{k}\right\|^{2} / 2 \sigma_{i}^{2}\right)}, \ \ p_{i \mid i}= 0$

$q_{j \mid i}=\frac{\exp \left(-\left\|y_{i}-y_{j}\right\|^{2}\right)}{\sum_{k \neq i} \exp \left(-\left\|y_{i}-y_{k}\right\|^{2}\right)}, \ \ \sigma = \frac{1}{\sqrt{2}} , \ \ q_{i \mid i}= 0$

where $\left\|x_{i}-x_{j}\right\|$ is the Euclidean distances. The $s i m i l a r i t y$ of datapoint $x_j$ to datapoint $x_i$ is the conditional probability, $p_{j|i}$ , that $x_i$ would pick $x_j$ as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at $x_i$ . For example, for larger distance, $p_{j|i} \sim 0$ . The $\sigma_i$ here, perform a rescaling effect.

If $y_i$ faithful to $x_i$ , that should be $q_{j \mid i} \approx p_{j \mid i}$

The Cost function

It’s Kullback- Leibler divergence (which is in this case equal to the cross-entropy up to an additive constant, also named relative entropy and well uesd in such as $A d s - C F T$ Holographic theory, is equivalent to the difference between the Shannon entropy of two probability distributions).
$C=\sum_{i} K L\left(P_{i} \| Q_{i}\right)=\sum_{i} \sum_{j} p_{j \mid i} \log \frac{p_{j \mid i}}{q_{j \mid i}}$
Shannon entropy is $-\sum_{j} p_{j } \log {p_{j}}$
$\sum_{j} p_{j \mid i} \log \frac{p_{j \mid i}}{q_{j \mid i}} = \sum_{j} p_{j \mid i} \log {p_{j \mid i}} - \sum_{j} p_{j \mid i} \log {q_{j \mid i}}$