无监督学习,实现降维功能。
1、matlab代码如下
clear all
clc
load hald % ingredients N*M 数据
coeff = pca(ingredients); % 特征向量矩阵 M*M
covx = cov(ingredients); % 协方差矩阵 M*M
COEFF = pcacov(covx); % 特征向量矩阵 等价于 coeff
data = ingredients; % N rows (samples), M cols (n_features)
N = size(data, 1); % 样本数
M = size(data, 2); % 特征维度
A = ones(N, N); % 单位矩阵
data2 = data - (1/N)*A*data; % 去中心化
P = (1/N)*(data2'*data2); % 协方差矩阵 M*M
[V, D] = eig(P); % 特征值分解,特征值从小到大排序 VDV'=P
index = M:-1:1; % 逆序
D = diag(D); % 主对角线上元素值
D = D(index); % 重排序
D = diag(D, 0); % 特征值矩阵,从大到小排序
V = V(:,index); % 对应特征向量矩阵
d = diag(D)'; % 对角阵的主对角线上元素值
d = d / sum(d); % 主成分比例
r = 2; % 降维后的维度
V2 = V(:, 1:r); % 取前2维的主成分的特征向量
D2 = D(1:r, 1:r); % 取前2维的主成分的特征值
R = data2*V2; % 降维后数据 N*r
原始数据data
7 26 6 60
1 29 15 52
11 56 8 20
11 31 8 47
7 52 6 33
11 55 9 22
3 71 17 6
1 31 22 44
2 54 18 22
21 47 4 26
1 40 23 34
11 66 9 12
10 68 8 12
协方差矩阵P
31.9408 19.3136 -28.6627 -22.3077
19.3136 223.5148 -12.8107 -233.9231
-28.6627 -12.8107 37.8698 2.9231
-22.3077 -233.9231 2.9231 258.6154
特征向量矩阵V
-0.0678 0.6461 0.5674 0.5062
-0.6785 0.0200 -0.5440 0.4933
0.0290 -0.7553 0.4036 0.5156
0.7309 0.1085 -0.4684 0.4844
特征值D
477.9663 0 0 0
0 62.3044 0 0
0 0 11.4512 0
0 0 0 0.2189
降维后数据R
36.8218 6.8709
29.6073 -4.6109
-12.9818 4.2049
23.7147 6.6341
-0.5532 4.4617
-10.8125 3.6466
-32.5882 -8.9798
22.6064 -10.7259
-9.2626 -8.9854
-3.2840 14.1573
9.2200 -12.3861
-25.5849 2.7817
-26.9032 2.9310
2、利用sklearn实现PCA降维代码
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA
def pca1206(n_comp):
# X为样本特征,Y为样本簇类别, 共1000个样本,每个样本3个特征,共4个簇
# x, y = make_blobs(n_samples=1000, n_features=3,
# centers=[[3, 3, 3], [0, 0, 0], [1, 1, 1], [2, 2, 2]],
# cluster_std=[0.2, 0.1, 0.2, 0.2], random_state=9)
x = [[7, 26, 6, 60],
[1, 29, 15, 52],
[11, 56, 8, 20],
[11, 31, 8, 47],
[7, 52, 6, 33],
[11, 55, 9, 22],
[3, 71, 17, 6],
[1, 31, 22, 44],
[2, 54, 18, 22],
[21, 47, 4, 26],
[1, 40, 23, 34],
[11, 66, 9, 12],
[10, 68, 8, 12]]
x = np.array(x)
# fig = plt.figure()
# ax = Axes3D(fig, rect=[0, 0, 1, 1], elev=30, azim=20)
# plt.scatter(x[:, 0], x[:, 1], x[:, 2], marker='o')
# plt.show()
pca = PCA(n_components=n_comp) # decrease to 2 dim
pca.fit(x)
print('ratio: ', pca.explained_variance_ratio_)
print('variance: ', pca.explained_variance_)
print('n_components: ', pca.n_components_)
x_new = pca.transform(x)
print(x_new)
# plt.scatter(x_new[:, 0], x_new[:, 1], marker='o')
# plt.show()
return 0
if __name__ == '__main__':
print('Hello world!')
pca1206(2)
运行结果
Hello world!
ratio: [0.8659739 0.11288239]
variance: [517.79687807 67.49643605]
n_components: 2
[[ 36.821826 6.87087815]
[ 29.60727342 -4.61088196]
[-12.98177572 4.20491318]
[ 23.71472572 6.63405255]
[ -0.55319168 4.46173212]
[-10.81249083 3.64657117]
[-32.58816661 -8.97984628]
[ 22.6063955 -10.72590646]
[ -9.26258724 -8.98537335]
[ -3.28396933 14.15727734]
[ 9.22003112 -12.38608079]
[-25.58490852 2.78169315]
[-26.90316183 2.93097117]]