原
PCA的应用示例
在 PCA 详细算法介绍 (http://blog.csdn.net/watkinsong/article/details/38536463) 中, 因为篇幅问题 没有给出详细的代码示例, 这里给出代码示例。
通过对人脸图像进行降维深入了解PCA算得使用。
首先看一下数据集, 我们有12张人脸图像, 用10张人脸训练PCA降维矩阵, 剩下的两张可以用作测试。
需要特别注意: 只能使用训练集样本进行所有的PCA训练过程。
这里所有的代码都采用 octave实现, 跟matlab应该是一致的。
1. 加载图像
-
%% Initialization
-
clear ; close all;
clc
-
-
fprintf(
'this code will load 12 images and do PCA for each face.\n');
-
fprintf(
'10 images are used to train PCA and the other 2 images are used to test PCA.\n');
-
-
trainset = zeros(
10,
32 *
32); % image size
is :
32 *
32
-
m =
10; % number of samples
-
-
for i =
1 : m
-
img = imread(strcat(int2str(i),
'.bmp'));
-
img =
double(img);
-
trainset(i, :) = img(:);
-
end
2. 特征向量做 normalization
-
%% before training PCA,
do feature normalization
-
mu = mean(trainset);
-
trainset_norm = bsxfun(@minus, trainset, mu);
-
-
sigma = std(trainset_norm);
-
trainset_norm = bsxfun(@rdivide, trainset_norm, sigma);
3. 在做特征向量归一化的过程中, 我们为了以后使用归一化参数, 需要保存这些归一化参数。
比如这里可能需要保存mu 和 sigma, 这里我们已 mean face 的方式保存 mu, 因为本示例比较小, 所以没有保存 sigma, 这里保存mu 的目的也仅仅是 为了让大家看一下平均脸的样子。 如果在做项目的过程中, 可能训练PCA是分开进行的, 以后需要进行降维, 那么就需要保存这两个归一化参数。
-
%%
we could save the mean face mu to take a look the mean face
-
imwrite(
uint8(reshape(mu, 32, 32)), 'mf.bmp');
看一下由10张人脸生成的平均脸:
是不是比较丑? 因为人脸太少了, 再来看看由5000个人脸图像生成的平均脸:
4. 计算降维矩阵
-
%% compute reduce matrix
-
X = trainset_norm; % just
for convience
-
[
m, n] = size(X);
-
-
U = zeros(n);
-
S = zeros(n);
-
-
Cov =
1 / m * X
' * X;
-
[U, S, V] = svd(Cov);
-
fprintf('compute cov done.\n
');
5. 查看特征脸
降维矩阵U中的特征向量, 在关于人脸的降维中,又被称为特征脸, U 中的每个特征向量相当于找到的降维空间的一个方向。 利用U可以将特征映射到这个空间中。
这里我们把的U中的前几个特征向量保存下来, 看一下特征脸。 U 中的特征向量是按照特征值进行由大到小排序的, 这个排序的顺序也决定了对于降维的影响最大的向量放在最前面。
这里的 eigen face 和人脸的相似性比较高, 因为我们的样本数量比较少, 就10个样本。。。所以会出现这种相似度比较高的情况。
补充: 给出几张用5000张人脸图像训练得到的eigen face, 如下所示:
6. 降维
-
%% dimension reduction
-
k =
100; % reduce to
100 dimension
-
test = zeros(
2,
32 *
32);
-
for i =
1:
2
-
img = imread(strcat(int2str(i +
10),
'.bmp'));
-
img =
double(img);
-
test(i, :) = img(:);
-
end
-
-
% test
set need to
do normalization
-
test_norm = bsxfun(@minus, test, mu);
-
test_norm = bsxfun(@rdivide, test_norm, sigma);
-
-
% reduction
-
Uk = U(:,
1:k);
-
Z = test_norm * Uk;
-
fprintf(
'reduce done.\n');
7. 还原特征(Reconstruction)
-
%% reconstruction
-
%%
for the test
set images, we only minus the mean face,
-
% so
in the reconstruct process, we need
add the mean face back
-
Xp = Z * Uk
';
-
% show reconstructed face
-
for i = 1:5
-
face = Xp(i, :) + mu;
-
face = reshape((face), 32, 32);
-
imwrite(uint8(face), strcat('./reconstruct/
', int2str(4000 + i), '.bmp
'));
-
end
-
-
%% for the train set reconstruction, we minus the mean face and divide by standard deviation during the train
-
% so in the reconstruction process, we need to multiby standard deviation first,
-
% and then add the mean face back
-
trainset_re = trainset_norm * Uk; % reduction
-
trainset_re = trainset_re * Uk'; % reconstruction
-
for i =
1:
5
-
train = trainset_re(i, :);
-
train = train .* sigma;
-
train = train + mu;
-
train = reshape(train,
32,
32);
-
imwrite(uint8(train), strcat(
'./reconstruct/', int2str(i),
'train.bmp'));
-
end
注: 这里我使用了训练样本为4000张, 因为样本数量太少还原的效果很差。
看一下特征还原的效果: 左边为原始图像, 右边为还原的图像
对于测试样本还原, 测试样本在降维之前减去了mean face, 所以在还原之后还要加上mean face才是真正的还原的图像。
对于训练样本还原, 因为训练样本即减去了mean face 还除以了 standard deviation, 所以在计算得到还原的样本特征后, 还首先要将特征 按元素乘上 standard deviation, 也就是 .* , 然后再加上mean face才是最后得到的真实的还原的数据。
看以下几个关于训练样本的还原: 同样左边为原始图像, 右边为还原之后的图像
整个工程的全部代码:
-
%% Initialization
-
clear ; close all;
clc
-
-
fprintf(
'this code will load 12 images and do PCA for each face.\n');
-
fprintf(
'10 images are used to train PCA and the other 2 images are used to test PCA.\n');
-
-
m =
4000; % number of samples
-
trainset = zeros(m,
32 *
32); % image size
is :
32 *
32
-
-
for i =
1 : m
-
img = imread(strcat(
'./img/', int2str(i),
'.bmp'));
-
img =
double(img);
-
trainset(i, :) = img(:);
-
end
-
-
-
%% before training PCA,
do feature normalization
-
mu = mean(trainset);
-
trainset_norm = bsxfun(@minus, trainset, mu);
-
-
sigma = std(trainset_norm);
-
trainset_norm = bsxfun(@rdivide, trainset_norm, sigma);
-
-
%%
we could save the mean face mu to take a look the mean face
-
imwrite(
uint8(reshape(mu, 32, 32)), 'meanface.bmp');
-
fprintf(
'mean face saved. paused\n');
-
pause;
-
-
%% compute reduce matrix
-
X = trainset_norm; % just
for convience
-
[
m, n] = size(X);
-
-
U = zeros(n);
-
S = zeros(n);
-
-
Cov =
1 / m * X
' * X;
-
[U, S, V] = svd(Cov);
-
fprintf('compute cov done.\n
');
-
-
%% save eigen face
-
for i = 1:10
-
ef = U(:, i)';
-
img = ef;
-
minVal = min(img);
-
img = img - minVal;
-
max_val = max(abs(img));
-
img = img / max_val;
-
img = reshape(img,
32,
32);
-
imwrite(img, strcat(
'eigenface', int2str(i),
'.bmp'));
-
end
-
-
fprintf(
'eigen face saved, paused.\n');
-
pause;
-
-
%% dimension reduction
-
k =
100; % reduce to
100 dimension
-
test = zeros(
10,
32 *
32);
-
for i =
4001:
4010
-
img = imread(strcat(
'./img/', int2str(i),
'.bmp'));
-
img =
double(img);
-
test(i -
4000, :) = img(:);
-
end
-
-
% test
set need to
do normalization
-
test = bsxfun(@minus, test, mu);
-
-
% reduction
-
Uk = U(:,
1:k);
-
Z = test * Uk;
-
fprintf(
'reduce done.\n');
-
-
%% reconstruction
-
%%
for the test
set images, we only minus the mean face,
-
% so
in the reconstruct process, we need
add the mean face back
-
Xp = Z * Uk
';
-
% show reconstructed face
-
for i = 1:5
-
face = Xp(i, :) + mu;
-
face = reshape((face), 32, 32);
-
imwrite(uint8(face), strcat('./reconstruct/
', int2str(4000 + i), '.bmp
'));
-
end
-
-
%% for the train set reconstruction, we minus the mean face and divide by standard deviation during the train
-
% so in the reconstruction process, we need to multiby standard deviation first,
-
% and then add the mean face back
-
trainset_re = trainset_norm * Uk; % reduction
-
trainset_re = trainset_re * Uk'; % reconstruction
-
for i =
1:
5
-
train = trainset_re(i, :);
-
train = train .* sigma;
-
train = train + mu;
-
train = reshape(train,
32,
32);
-
imwrite(uint8(train), strcat(
'./reconstruct/', int2str(i),
'train.bmp'));
-
end
-
-
fprintf(
'job done.\n');
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/weixingstudio/article/details/38539289