ML 14 part2 principal component analysis

最新推荐文章于 2021-12-18 11:14:17 发布

seamanj

最新推荐文章于 2021-12-18 11:14:17 发布

阅读量767

点赞数

分类专栏： Machine Learning from Stanford MATLAB

本文链接：https://blog.csdn.net/seamanj/article/details/51864314

版权

MATLAB 同时被 2 个专栏收录

108 篇文章 18 订阅

订阅专栏

Machine Learning from Stanford

8 篇文章 0 订阅

订阅专栏

这里写图片描述

Given m samples in $R^n$

这里写图片描述

这里写图片描述
The goal is to find the direction $u$ along which maximizing the sum of projection of each point $u$

So to speak, $u$ ought to be the eigenvectors of the matrix $\Sigma$

Here we reduce n-dimension space down to k-dimension subspace which is composed of $u_1$ , $u_2$ , … , $u_k$ basis
这里写图片描述
Given any x in n-dimension space, we can express it in k-dimension space as
$(u_1^Tx, u_2^Tx, \cdots, u_k^Tx)$ . In other words, we project $x \in R^n$ into subspace $\{u_1, u_2, \cdots, u_k\} \in R^k$

In order to perceive PCA algorithm more intuitive, we use matlab to draw some figures.


X0=read_obj('football1.obj'); % seahorse_extended.obj');
vertex=X0.xyz';  %坐标
faces=X0.tri'; %三角形顺序索引
center0 = mean(vertex,1);

% display the mesh
clf;
plot_mesh(vertex, faces);
shading interp;

[pc,scores,pcvars] = princomp(vertex);
u1 = pc(:,1);
u2 = pc(:,2);
u3 = pc(:,3);
% 
% 
hold on 
center = [];
center = [center ;center0];
center = [center ; center0 + u1'*20];
% center = [center ;center0];
% center = [center ; center0 + u1'*-20];
center = [center ;center0];
center = [center ; center0 + u2'*20];
% center = [center ;center0];
% center = [center ; center0 + u2'*-20];
center = [center ;center0];
center = [center ; center0 + u3'*20];
% center = [center ;center0];
% center = [center ; center0 + u3'*-20];
plot3 (center(:,1), center(:,2), center(:,3), '-*');

The core PCA algorithm is implemented in princomp function. the document of princomp says:

[COEFF,SCORE,latent,tsquare] = princomp(X)
COEFF = princomp(X) performs principal components analysis (PCA) on the n-by-p data matrix X, and returns the principal component coefficients, also known as loadings. Rows of X correspond to observations, columns to variables. COEFF is a p-by-p matrix, each column containing coefficients for one principal component. The columns are in order of decreasing component variance.
The scores are the data formed by transforming the original data into the space of the principal components. The values of the vector latent are the variance of the columns of SCORE. Hotelling’s T^2 is a measure of the multivariate distance of each observation from the center of the data set.

Expressing in our own language, the columns of COEFF is the eigenvectors of matrix $\Sigma$ which are the basis of the reduced space. In the document, principal components are their aliases.

In our case, because the dimension of original space is 3, the dimension of reduced space must be less than or equal to 3.
Then the data in original space can be reduced into the reduced space by means of dot product with each principal component to constitute a new point in reduced space.

the result picture seems like:

这里写图片描述

Due to the fact that I am fond of sharing, I would like to release the whole matlab source code:

complete MATLAB source code

Application in face recognition
这里写图片描述

Assuming faces are encoded in 100*100 images, each image consists in 10000 pixels, which means each image $x \in R^{10000}$ . Every pixel in the image stands for a dimension of the space.

Each image can be expressed a dot in $R^{10000}$ , then we can solve the k top eigenvectors of $\Sigma$ in which each eigenvector can denote a people’s face.

say if we get a new face image, then we can project it on each eigenvector of subspace $R^k$ . The large the projection is, more like these two faces are.

seamanj

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ML 14 part2 principal component analysis

Given m samples in RnR^n The goal is to find the direction uu along which maximizing the sum of projection of each point uu So to speak, uu ought to be the eigenvectors of the matrix Σ\Sigma H
复制链接

扫一扫