高维数据可视化工具:t-SNE

t-SNE(学生t分布随机近邻嵌入)是一种用于高维数据可视化的工具,尤其适用于数据量小于100的情况。它能够将高维数据映射到2或3维平面,保持局部结构。t-SNE通过解决“拥挤问题”避免了高维数据在低维空间中聚类的困难,使用学生t分布来优化相似性。该方法在分类任务中表现出色,如手写数字识别和鸢尾花分类。尽管t-SNE存在对高维数据表现不佳和对初始参数敏感等弱点,但其在保留局部结构方面优于其他降维方法。
摘要由CSDN通过智能技术生成

Introduce

Base on Visualizing Data using t-SNE.
How to Use t-SNE Effectively
example code

Student t-distribution Stochastic Neighbor Embedding (t-SNE) is a kinds of vsiualize tools, also a clssifier for high-dimmetion(high-dim) data (well proform in N < 100 N<100 N<100), which can embed and clssify the high-dim data into 2 or 3-dim plane clusters. It’s a kinds of unsupervise Mechine Learing (ML) technique.

 64 pixels hand writes digits classify

64 pixels hand writes digits classify & flower species classify (4-input characteristic)

Classifying of data, essentially is a kinds of dimensionality reduction, ML in here is used to, mininmize something (cost function) with larger parameters and non-linear.

Stochastic Neighbor Embedding (SNE), Basis idea of SNE and it’s math

SNE is expeted to find a faithful representation in low-dim for high-dim date, which preserve the small daistant (local) structure and reflect the large daistant of high-dim data.

conditional probability

Assume the one sample of high-dim data can be write down as a vector y i y_i yi and it’s mapping point in low-dim is x i x_i xi. Then define to conditional probability p j ∣ i , q j ∣ i p_{j|i}, q_{j|i} pji,qji :

x i     high-dim vector, a fixed piont as real data in high-dim y i     mapping low-dim vector, movable and mapping point in low-dim x_i \ \ \ \ \text{high-dim vector, a fixed piont as real data in high-dim} \\ y_i \ \ \ \ \text{mapping low-dim vector, movable and mapping point in low-dim} xi    high-dim vector, a fixed piont as real data in high-dimyi    mapping low-dim vector, movable and mapping point in low-dim

p j ∣ i = exp ⁡ ( − ∥ x i − x j ∥ 2 / 2 σ i 2 ) ∑ k ≠ i exp ⁡ ( − ∥ x i − x k ∥ 2 / 2 σ i 2 ) ,    p i ∣ i = 0 p_{j \mid i}=\frac{\exp \left(-\left\|x_{i}-x_{j}\right\|^{2} / 2 \sigma_{i}^{2}\right)}{\sum_{k \neq i} \exp \left(-\left\|x_{i}-x_{k}\right\|^{2} / 2 \sigma_{i}^{2}\right)}, \ \ p_{i \mid i}= 0 pji=k=iexp(xixk2/2σi2)exp(xixj2/2σi2),  pii=0

q j ∣ i = exp ⁡ ( − ∥ y i − y j ∥ 2 ) ∑ k ≠ i exp ⁡ ( − ∥ y i − y k ∥ 2 ) ,    σ = 1 2 ,    q i ∣ i = 0 q_{j \mid i}=\frac{\exp \left(-\left\|y_{i}-y_{j}\right\|^{2}\right)}{\sum_{k \neq i} \exp \left(-\left\|y_{i}-y_{k}\right\|^{2}\right)}, \ \ \sigma = \frac{1}{\sqrt{2}} , \ \ q_{i \mid i}= 0 qji=k=iexp(yiyk2)exp(yiyj2),  σ=2 1,  qii=0

where ∥ x i − x j ∥ \left\|x_{i}-x_{j}\right\| xixj is the Euclidean distances. The s i m i l a r i t y similarity similarity of datapoint x j x_j xj to datapoint x i x_i xi is the conditional probability, p j ∣ i p_{j|i} pji, that x i x_i xi would pick x j x_j xj as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at x i x_i xi. For example, for larger distance, p j ∣ i ∼ 0 p_{j|i} \sim 0 pji0. The σ i \sigma_i σi here, perform a rescaling effect.

If y i y_i yi faithful to x i x_i xi, that should be q j ∣ i ≈ p j ∣ i q_{j \mid i} \approx p_{j \mid i} qjipji

The Cost function

It’s Kullback- Leibler divergence (which is in this case equal to the cross-entropy up to an additive constant, also named relative entropy and well uesd in such as A d s − C F T Ads-CFT AdsCFT Holographic theory, is equivalent to the difference between the Shannon entropy of two probability distributions).
C = ∑ i K L ( P i ∥ Q i ) = ∑ i ∑ j p j ∣ i log ⁡ p j ∣ i q j ∣ i C=\sum_{i} K L\left(P_{i} \| Q_{i}\right)=\sum_{i} \sum_{j} p_{j \mid i} \log \frac{p_{j \mid i}}{q_{j \mid i}} C=iKL(PiQi)=ijpjilogqjipji
Shannon entropy is S = − ∑ j p j log ⁡ p j S = -\sum_{j} p_{j } \log {p_{j}} S=jpjlogpj
∑ j p j ∣ i log ⁡ p j ∣ i q j ∣ i = ∑ j p j ∣ i log ⁡ p j ∣ i − ∑ j p j ∣ i log ⁡ q j ∣ i \sum_{j} p_{j \mid i} \log \frac{p_{j \mid i}}{q_{j \mid i}} = \sum_{j} p_{j \mid i} \log {p_{j \mid i}} - \sum_{j} p_{j \mid i} \log {q_{j \mid i}} jpjilogqjipji=jpjilogpjijpji

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值