ML 14 part2 principal component analysis

这里写图片描述

Given m samples in Rn

这里写图片描述

这里写图片描述
The goal is to find the direction u along which maximizing the sum of projection of each point u
这里写图片描述
这里写图片描述
So to speak, u ought to be the eigenvectors of the matrix Σ
这里写图片描述
Here we reduce n-dimension space down to k-dimension subspace which is composed of u1 , u2 , … , uk basis
这里写图片描述
Given any x in n-dimension space, we can express it in k-dimension space as
(uT1x,uT2x,,uTkx) . In other words, we project xRn into subspace {u1,u2,,uk}Rk

In order to perceive PCA algorithm more intuitive, we use matlab to draw some figures.


X0=read_obj('football1.obj'); % seahorse_extended.obj');
vertex=X0.xyz';  %坐标
faces=X0.tri'; %三角形顺序索引
center0 = mean(vertex,1);

% display the mesh
clf;
plot_mesh(vertex, faces);
shading interp;

[pc,scores,pcvars] = princomp(vertex);
u1 = pc(:,1);
u2 = pc(:,2);
u3 = pc(:,3);
% 
% 
hold on 
center = [];
center = [center ;center0];
center = [center ; center0 + u1'*20];
% center = [center ;center0];
% center = [center ; center0 + u1'*-20];
center = [center ;center0];
center = [center ; center0 + u2'*20];
% center = [center ;center0];
% center = [center ; center0 + u2'*-20];
center = [center ;center0];
center = [center ; center0 + u3'*20];
% center = [center ;center0];
% center = [center ; center0 + u3'*-20];
plot3 (center(:,1), center(:,2), center(:,3), '-*');

The core PCA algorithm is implemented in princomp function. the document of princomp says:

[COEFF,SCORE,latent,tsquare] = princomp(X)
COEFF = princomp(X) performs principal components analysis (PCA) on the n-by-p data matrix X, and returns the principal component coefficients, also known as loadings. Rows of X correspond to observations, columns to variables. COEFF is a p-by-p matrix, each column containing coefficients for one principal component. The columns are in order of decreasing component variance.
The scores are the data formed by transforming the original data into the space of the principal components. The values of the vector latent are the variance of the columns of SCORE. Hotelling’s T^2 is a measure of the multivariate distance of each observation from the center of the data set.

Expressing in our own language, the columns of COEFF is the eigenvectors of matrix Σ which are the basis of the reduced space. In the document, principal components are their aliases.

In our case, because the dimension of original space is 3, the dimension of reduced space must be less than or equal to 3.
Then the data in original space can be reduced into the reduced space by means of dot product with each principal component to constitute a new point in reduced space.

the result picture seems like:

这里写图片描述

Due to the fact that I am fond of sharing, I would like to release the whole matlab source code:

complete MATLAB source code

Application in face recognition
这里写图片描述

Assuming faces are encoded in 100*100 images, each image consists in 10000 pixels, which means each image xR10000 . Every pixel in the image stands for a dimension of the space.

Each image can be expressed a dot in R10000 , then we can solve the k top eigenvectors of Σ in which each eigenvector can denote a people’s face.

say if we get a new face image, then we can project it on each eigenvector of subspace Rk . The large the projection is, more like these two faces are.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值