# PCA

(1)最小化重构误差
(2)最大化投影后的方差。

z i = W T x i   编 码 x ^ i = W z i   解 码 x ^ i = W W T x i = x i \bm{z_{i}}=\mathbf{W}^{T}\bm{x}_{i}\ 编码\\ \bm{\hat{x}_{i}}=\mathbf{W}\bm{z}_{i}\ 解码\\ \bm{\hat{x}}_{i}=\mathbf{W}\mathbf{W}^{T}\bm{x}_{i}=\bm{x}_{i}

x i ^ = W z i = ∑ j = 1 d ′ z i j w j \hat{\bm{x}_{i}}=\mathbf{W}\bm{z}_{i}=\sum_{j=1}^{d'}z_{ij}\bm{w}_{j}

∑ i = 1 m ∣ ∣ W z i − x i ∣ ∣ 2 2 = ∑ i = 1 m [ ( W z i ) T ( W z i ) − 2 z i T W T x i ] + c o n s t = ∑ i = 1 m z i T z i − 2 ∑ i = 1 m z i T W T x i + c o n s t = − ∑ i = 1 m z i T z i + c o n s t \sum_{i=1}^{m}||\mathbf{W}\bm{z}_{i}-\bm{x}_{i}||^{2}_{2}=\sum_{i=1}^{m}[(\mathbf{W}z_{i})^{T}(\mathbf{W}z_{i})-2\bm{z}_{i}^{T}\mathbf{W}^{T}\bm{x}_{i}]+const\\ =\sum_{i=1}^{m}\bm{z}_{i}^{T}\bm{z}_{i}-2\sum_{i=1}^{m}\bm{z}_{i}^{T}\mathbf{W}^{T}\bm{x}_{i}+const\\ =-\sum_{i=1}^{m}\bm{z}_{i}^{T}\bm{z}_{i}+const
t r ( ⋅ ) tr(\cdot) 表示矩阵的迹，即对角线值的和。由于
z i T z i = t r ( z i z i T ) \bm{z}_{i}^{T}\bm{z}_{i}=tr(\bm{z}_{i}\bm{z}_{i}^{T})

∝ − t r ( W T ( ∑ i = 1 m x i x i T ) W ) = − t r ( W T X X T W ) \propto -tr(\mathbf{W}^{T}(\sum_{i=1}^{m}\bm{x}_{i}\bm{x}_{i}^{T})\mathbf{W})=-tr(\mathbf{W}^{T}\mathbf{XX}^{T}\mathbf{W})

min ⁡ W − t r ( W T X X T W ) s . t .   W T W = I \min_{\mathbf{W}}-tr(\mathbf{W}^{T}\mathbf{XX}^{T}\mathbf{W}) \\ s.t. \ \mathbf{W^{T}W}=\mathbf{I}

max ⁡ ∑ i = 1 m z i T z i = max ⁡ W t r ( W T X X T W ) s . t .   W T W = I \max\sum_{i=1}^{m}\bm{z}_{i}^{T}\bm{z}_{i}=\max_{\mathbf{W}} tr(\mathbf{W}^{T}\mathbf{XX}^{T}\mathbf{W})\\ s.t. \ \mathbf{W^{T}W}=\mathbf{I}
PCA目标函数求解

max ⁡ ∑ i = 1 m z i T z i = max ⁡ W t r ( W T X X T W ) s . t .   W T W = I \max\sum_{i=1}^{m}\bm{z}_{i}^{T}\bm{z}_{i}=\max_{\mathbf{W}} tr(\mathbf{W}^{T}\mathbf{XX}^{T}\mathbf{W})\\ s.t. \ \mathbf{W^{T}W}=\mathbf{I}

L ( W ) = t r ( W T X X T W ) − λ ( W T W − I ) λ = d i a g ( λ 1 , λ 2 , . . . , λ d ′ ) λ ( W T W − I ) = ∑ i = 1 d ′ λ i ( w i T w i − 1 ) L(\mathbf{W})=tr(\mathbf{W}^{T}\mathbf{XX}^{T}\mathbf{W})-\bm{\lambda}(\mathbf{W^{T}W}-I) \\ \bm{\lambda}=diag(\lambda_{1},\lambda_{2},...,\lambda_{d'})\\ \bm{\lambda}(\mathbf{W^{T}W}-I)=\sum_{i=1}^{d'}\lambda_{i}(\bm{w}_{i}^{T}\bm{w}_{i}-1)

∂ t r ( W A W T ) ∂ W = 2 W A , A = A T \frac{\partial{tr(\mathbf{WA}\mathbf{W}}^{T})}{\partial \mathbf{W}}=\mathbf{2WA},\mathbf{A}=\mathbf{A}^{T}
∂ L ∂ W = 0 ⇒ X X T W = λ W X X T w i = λ i w i , i = 1 , 2 , . . . , d ′ \frac{\partial{L}}{\partial{\mathbf{W}}}=0\Rightarrow \mathbf{XX^{T}W}=\bm{\lambda}\mathbf{W}\\ \mathbf{XX}^{T}\bm{w}_{i}=\lambda_{i}\bm{w}_{i},i=1,2,...,d'
w i , i = 1 , 2 , . . . , d ′ \bm{w}_{i},i=1,2,...,d' 所组成标准正交基，所以可得到 λ i , w i \lambda_{i},\bm{w}_{i} X X T \mathbf{XX}^{T} 的特征值和特征向量。
X X T W = λ W ⇒ W T X X T W = λ \mathbf{XX^{T}W}=\bm{\lambda}\mathbf{W}\Rightarrow \mathbf{W}^{T}\mathbf{XX^{T}W}=\bm{\lambda}

PCA算法过程：

================================================

1.样本中心化： x i ← x i − 1 m ∑ j = 1 m x j \bm{x}_{i}\leftarrow \bm{x}_{i}-\frac{1}{m}\sum_{j=1}^{m}\bm{x}_{j}
2.求解样本的协方差矩阵 X X T \mathbf{XX^{T}} 并进行特征值分解。
3.取前 d ′ d' 大的特征值对应的特征向量 w 1 , w 2 , . . . , w d ′ \bm{w}_{1},\bm{w}_{2},...,\bm{w}_{d'}

=============================================

PCA超参数确定
PCA的好处是超参数很少，只需确定低维空间的维数 d ′ d' 即可。有如下三种方法：
（1）用户事先指定
（2）使用不同的值降维形成的样本对k近邻分类器（或者其他开销较小的分类器）进行交叉验证。
（3）设置重构阈值：
∑ i = 1 d ′ λ i ∑ i = 1 d λ i ≥ t \frac{\sum_{i=1}^{d'}\lambda_{i}}{\sum_{i=1}^{d}\lambda_{i}}\geq t

# T-SNE

PCA是一种线性降维方法，t-SNE（t-distributed Stochastic Neighbor Embedding）则是一种非线性降维方法，它将数据点的相似度转换为联合概率并优化低维数据和高维数据之间的KL误差。

from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

X_tsne = TSNE(learning_rate=1000.0).fit_transform(iris.data)
X_pca = PCA().fit_transform(iris.data)
plt.figure(figsize=(10, 5))
plt.subplot(121)
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=iris.target)
plt.subplot(122)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=iris.target)
plt.show()


### t-SNE降维CIFAR-100的embeddings

（1）任意选取10类，提取embeddings和标签target并保存为npy文件：

import torch
import torch.backends.cudnn as cudnn

import os
import numpy as np

import models
import torchvision
import torchvision.transforms as transforms

# dataset
num_classes = 100
device = 'cuda' if torch.cuda.is_available() else 'cpu'
os.environ["CUDA_VISIBLE_DEVICES"] = '0'

testset = torchvision.datasets.CIFAR100(root=get_data_folder(),
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.5071, 0.4867, 0.4408],
[0.2675, 0.2565, 0.2761]),
]))

pin_memory=(torch.cuda.is_available()))

# --------------------------------------

model = getattr(models, 'resnet32')
net = model(num_classes=num_classes).cuda()
net = torch.nn.DataParallel(net)
cudnn.benchmark = True

def test():
net.eval()
classes = np.random.choice(np.arange(100), 10, replace=False)
embeddings_list = []
target_list = []
for batch_idx, (inputs, target) in enumerate(testloader):
inputs, target = inputs.to(device), target.to(device)
if np.sum(classes == target.item()) == 1:
_, embeddings = net(inputs)
embeddings_list.append(embeddings.squeeze(0).cpu().numpy())
target_list.append(target.item())

embeddings_list = np.asarray(embeddings_list)
target_list = np.asarray(target_list)
print(embeddings_list.shape)
print(target_list.shape)
np.save('embeddings.npy', embeddings_list)
np.save('target.npy', target_list)

if __name__ == '__main__':
checkpoint = torch.load('./checkpoint/' + model.__name__ + '_best.pth.tar',
map_location=torch.device('cpu'))
test()



（2）从文件中提取embeddings和对应的target信息，进行t-SNE可视化：

from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import numpy as np

X_tsne = TSNE(learning_rate=200.0, perplexity=50).fit_transform(embeddings)
X_pca = PCA().fit_transform(embeddings)

plt.figure(figsize=(10, 5))
plt.subplot(121)
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=target)
plt.title('t-SNE')
plt.xticks([])
plt.yticks([])

plt.subplot(122)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=target)
plt.title('PCA')
plt.xticks([])
plt.yticks([])
plt.show()


embeddings = np.load('baseline_embeddings.npy')

target_value = list(set(target))
color_dict = {}
colors = ['black', 'red', 'yellow', 'green', 'orange', 'blue', 'magenta', 'slategray', 'cyan', 'aquamarine']
for i, t in enumerate(target_value):
color_dict[t] = colors[i]
print(color_dict)

X_tsne = TSNE(learning_rate=200.0, perplexity=30).fit_transform(embeddings)
plt.figure(figsize=(10, 5))

for i in range(len(target_value)):
tmp_X = X_tsne[target==target_value[i]]
plt.scatter(tmp_X[:, 0], tmp_X[:, 1], c=color_dict[target_value[i]])

plt.title('t-SNE')
plt.xticks([])
plt.yticks([])
plt.show()


01-19

06-12 5822
07-29 5402
12-17 858
11-17 3202
08-30 2857
12-13 2656
10-26 6218
10-09 1万+
01-30 6592
03-26 1669
12-12 326
05-12 4697
10-08 957
08-05 2万+