matlab的Feature Transformation提供了一个有关主成分分析的介绍和例子。
PCA的介绍
主成分分析的实例
计算成分
>> load cities
>> whos
Name Size Bytes Class Attributes
categories 9x14 252 char
names 329x43 28294 char
ratings 329x9 23688 double
categories--一个包含各指标名字的字符串矩阵
names--一个包含329个城市名字的字符串矩阵
ratings--329行9列的数据矩阵
boxplot(ratings,'orientation','horizontal','labels',categories)
有时对原始数据计算主成分是可以的,这在各变量是相同的单位时是很合适的。但是当变量单位不同或不同列数据的方差相差很大时,先对数据做标准化是更好的。
stdr = std(ratings);
sr = ratings./repmat(stdr,329,1);
[coefs,scores,variances,t2] = princomp(sr);
Component Coeffcients
>> c3 = coefs(:,1:3)
c3 =
0.2064 -0.2178 0.6900
0.3565 -0.2506 0.2082
0.4602 0.2995 0.0073
0.2813 -0.3553 -0.1851
0.3512 0.1796 -0.1464
0.2753 0.4834 -0.2297
0.4631 0.1948 0.0265
0.3279 -0.3845 0.0509
0.1354 -0.4713 -0.6073
I = c3'*c3
I =
1.0000 -0.0000 -0.0000
-0.0000 1.0000 -0.0000
-0.0000 -0.0000 1.0000
Component Scores
plot(scores(:,1),scores(:,2),'+')
xlabel('1st Principal Component')
ylabel('2nd Principal Component')
gname(names)
1.关闭上面的figure
2.重绘plot
plot(scores(:,1),scores(:,2),'+')
xlabel('1st Principal Component');
ylabel('2nd Principal Component');
3.运行gname函数,如输入参数
4.标记离散点,标记自动为这些数据的行数
metro = [43 65 179 213 234 270 314];
names(metro,:)
ans =
Boston, MA
Chicago, IL
Los Angeles, Long Beach, CA
New York, NY
Philadelphia, PA-NJ
San Francisco, CA
Washington, DC-MD-VA
rsubset = ratings;
nsubset = names;
nsubset(metro,:) = [];
rsubset(metro,:) = [];
size(rsubset)
ans =
322 9
Component Variances
variances
variances =
3.4083
1.2140
1.1415
0.9209
0.7533
0.6306
0.4930
0.3180
0.1204
percent_explained = 100*variances/sum(variances)
percent_explained =
37.8699
13.4886
12.6831
10.2324
8.3698
7.0062
5.4783
3.5338
1.3378
pareto(percent_explained)
xlabel('Principal Component')
ylabel('Variance Explained (%)')
Hotelling's T2
[st2, index] = sort(t2,'descend'); % Sort in descending order.
extreme = index(1)
extreme =
213
names(extreme,:)
ans =
New York, NY
结果可视化
biplot(coefs(:,1:2), 'scores',scores(:,1:2),...
'varlabels',categories);
axis([-.26 1 -.51 .51]);
biplot(coefs(:,1:3), 'scores',scores(:,1:3),...
'obslabels',names);
axis([-.26 1 -.51 .51 -.61 .81]);
view([30 40]);