主成分分析是经常用到的。
今天写了一下,发现自己之前对于一些问题的认识不够透彻。
比如例子给的数据是二维的45个数据。对于PCA,首先我们要算所有样本的均值。然后所有的样本减去均值。这样得到的X 才可以用来求cov,然后对于cov 的结果我们求svd 分解,但是我对u的认识不够的,因为 cov 之后 矩阵是个方针,比如2x2 的。然后我就想当然的以为每一行是个向量,其实不是,而是每一列对应SVD 分解的特征值,所以在后面是用的U 的转置。
一个问题是为什么我用的U*X 但是不没有出现错误那?
原因是:cov 求出来的u 等于u 的转置。所以显示的结果是对的。但是这样用是不对的。
数据的whitening必须满足两个条件:
一是不同特征间相关性最小,接近0;(PCA 之后基本我们可以看到维度之间是正交的关系,满足第一个条件)
二是所有特征的方差相等(不一定为1)。常见的白化操作有PCA whitening和ZCA whitening。
PCA whitening是指将数据x经过PCA降维为z后,可以看出z中每一维是独立的,满足whitening白化的第一个条件,这是只需要将z中的每一维都除以标准差就得到了每一维的方差为1,也就是说方差相等。公式为:
ZCA whitening是指数据x先经过PCA变换为z,但是并不降维,因为这里是把所有的成分都选进去了。这是也同样满足whtienning的第一个条件,特征间相互独立。然后同样进行方差为1的操作,最后将得到的矩阵左乘一个特征向量矩阵U即可。
ZCA whitening公式为:
close all
%%================================================================
%% Step 0: Load data
% We have provided the code to load data from pcaData.txt into x.
% x is a 2 * 45 matrix, where the kth column x(:,k) corresponds to
% the kth data point.Here we provide the code to load natural image data into x.
% You do not need to change the code below.
x = load('pcaData.txt','-ascii');
figure(1);
scatter(x(1, :), x(2, :));
title('Raw data');
%%================================================================
%% Step 1a: Implement PCA to obtain U
% Implement PCA to obtain the rotation matrix U, which is the eigenbasis
% sigma.
% -------------------- YOUR CODE HERE --------------------
u = zeros(size(x, 1)); % You need to compute this
[row,column]=size(x);
x=x-repmat(mean(x,2),1,column)
t=(1/column)*x*x';
[u,s,v]=svd(t);
% --------------------------------------------------------
hold on
plot([0 u(1,1)], [0 u(2,1)]);
plot([0 u(1,2)], [0 u(2,2)]);
scatter(x(1, :), x(2, :));
hold off
%%================================================================
%% Step 1b: Compute xRot, the projection on to the eigenbasis
% Now, compute xRot by projecting the data on to the basis defined
% by U. Visualize the points by performing a scatter plot.
% -------------------- YOUR CODE HERE --------------------
xRot = zeros(size(x)); % You need to compute this
xRot = u'*x; % why not u*x ? reason 因为 u 的每一列对应一个特征值 。
% --------------------------------------------------------
% Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure(2);
scatter(xRot(1, :), xRot(2, :));
title('xRot');
%%================================================================
%% Step 2: Reduce the number of dimensions from 2 to 1.
% Compute xRot again (this time projecting to 1 dimension).
% Then, compute xHat by projecting the xRot back onto the original axes
% to see the effect of dimension reduction
% -------------------- YOUR CODE HERE --------------------
k = 1; % Use k = 1 and project the data onto the first eigenbasis
xHat = zeros(size(x)); % You need to compute this
u1=u;
u1(:,2)=0;
xHat=u*u1'*x; % 为什么要乘以u 原因是什么
% --------------------------------------------------------
figure(3);
scatter(xHat(1, :), xHat(2, :));
title('xHat');
%%================================================================
%% Step 3: PCA Whitening
% Complute xPCAWhite and plot the results.
epsilon = 1e-5;
% -------------------- YOUR CODE HERE --------------------
xPCAWhite = zeros(size(x)); % You need to compute this
for i=1:size(x,1)
xPCAWhite(i,:)= xRot(i,:)/sqrt(s(i,i)+epsilon);
end
% xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x;
% --------------------------------------------------------
figure(4);
scatter(xPCAWhite(1, :), xPCAWhite(2, :));
title('xPCAWhite');
%%================================================================
%% Step 3: ZCA Whitening
% Complute xZCAWhite and plot the results.
% -------------------- YOUR CODE HERE --------------------
xZCAWhite = zeros(size(x)); % You need to compute this
xZCAWhite =u*xPCAWhite;
% --------------------------------------------------------
figure(5);
scatter(xZCAWhite(1, :), xZCAWhite(2, :));
title('xZCAWhite');
%% Congratulations! When you have reached this point, you are done!
% You can now move onto the next PCA exercise. :)