目录
LDA算法概念
线性判别分析(Linear discriminant Analysis,LDA)是一种监督学习的降维技术,与无监督的PCA不同的是,PCA是寻找数据集中方差最大的方向作为主成分分量的轴,而LDA是最优化分类的特征子空间。LDA的思想可以用一句话概括,就是“投影后类内方差最小,类间方差最大”。也就是说,要将数据在低维度上进行投影,投影后希望每一种类别数据的投影点尽可能的接近,而不同类别的数据的类别中心之间的距离尽可能的大。
可能还是有点抽象,我们先看看最简单的情况。假设我们有两类数据 分别为红色和蓝色,如下图所示,这些数据特征是二维的,我们希望将这些数据投影到一维的一条直线,让每一种类别数据的投影点尽可能的接近,而红色和蓝色数据中心之间的距离尽可能的大。
上图中提供了两种投影方式,哪一种能更好的满足我们的标准呢?从直观上可以看出,右图要比左图的投影效果好,因为右图的黑色数据和蓝色数据同类数据较为集中,且类别之间的距离明显,即满足类内方差最小,类间方差最大。左图则在边界处数据混杂。以上就是LDA的主要思想了,当然在实际应用中,我们的数据是多个类别的,我们的原始数据一般也是超过二维的,投影后的也一般不是直线,而是一个低维的超平面。
相关数学知识
1. 瑞利商(Rayleigh quotient)
我们首先来看看瑞利商的定义。瑞利商是指这样的函数
R
(
A
,
x
)
R(A,x)
R(A,x):
R
(
A
,
x
)
=
x
H
A
x
x
H
x
R(A, x) = \frac{x^HAx}{x^Hx}
R(A,x)=xHxxHAx
其中x为非零向量,而
A
A
A为
n
×
n
n×n
n×n的
H
e
r
m
i
t
a
n
Hermitan
Hermitan矩阵。所谓的
H
e
r
m
i
t
a
n
Hermitan
Hermitan矩阵就是满足共轭转置矩阵和自己相等的矩阵,即
A
H
=
A
AH=A
AH=A。如果我们的矩阵A是实矩阵,则满足
A
T
=
A
A^T=A
AT=A的矩阵即为
H
e
r
m
i
t
a
n
Hermitan
Hermitan矩阵。
具体的证明这里就不给出了。当向量
x
x
x是标准正交基时,即满足
x
H
x
x^Hx
xHx=1时,瑞利商退化为:
R
(
A
,
x
)
=
x
H
A
x
R(A,x)=x^HAx
R(A,x)=xHAx,这个形式在谱聚类和PCA中都有出现。
2. 广义瑞利商(genralized Rayleigh quotient)
广义瑞利商是指这样的函数
R
(
A
,
B
,
x
)
R(A,B,x)
R(A,B,x):
R
(
A
,
x
)
=
x
H
A
x
x
H
B
x
R(A,x)=\frac{x^HAx}{x^HBx}
R(A,x)=xHBxxHAx
其中
x
x
x为非零向量,而
A
,
B
A,B
A,B为
n
×
n
n×n
n×n的
H
e
r
m
i
t
a
n
Hermitan
Hermitan矩阵。
B
B
B为正定矩阵。它的最大值和最小值是什么呢?其实我们只要通过将其通过标准化就可以转化为瑞利商的格式。我们令
x
=
B
−
1
/
2
x
′
x=B^{−1/2}x^′
x=B−1/2x′,则分母转化为:
x
H
B
x
=
x
′
H
(
B
−
1
/
2
)
H
B
B
−
1
/
2
x
′
=
x
′
H
x
′
x^HBx=x^{′^H}(B^{−1/2})^HBB^{−1/2}x^′=x^{′^H}x^′
xHBx=x′H(B−1/2)HBB−1/2x′=x′Hx′
而分子转化为:
x
H
A
x
=
x
′
H
B
−
1
/
2
A
B
−
1
/
2
x
′
x^HAx=x^{′^H}B^{−1/2}AB^{−1/2}x^′
xHAx=x′HB−1/2AB−1/2x′
此时我们的
R
(
A
,
B
,
x
)
R(A,B,x)
R(A,B,x)转化为
R
(
A
,
B
,
x
′
)
R(A,B,x^′)
R(A,B,x′):
R
(
A
,
B
,
x
′
)
=
x
′
H
B
−
1
/
2
A
B
−
1
/
2
x
′
x
′
H
x
′
R(A,B,x^′)=\frac{x^{′^H}B^{−1/2}AB^{−1/2}x^′}{x^{′^H}x^′}
R(A,B,x′)=x′Hx′x′HB−1/2AB−1/2x′
利用前面的瑞利商的性质,我们可以很快知道,
R
(
A
,
B
,
x
′
)
R(A,B,x^′)
R(A,B,x′)的最大值为矩阵
B
−
1
/
2
A
B
−
1
/
2
B^{−1/2}AB^{−1/2}
B−1/2AB−1/2的最大特征值,或者说矩阵
B
−
1
A
B^{−1}A
B−1A的最大特征值,而最小值为矩阵
B
−
1
A
B^{−1}A
B−1A的最小特征值。
LDA原理推导
假设现在有数据集
D
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
m
,
y
m
)
}
D = \left \{(x_1, y_1), (x_2, y_2), ... , (x_m, y_m)\right \}
D={(x1,y1),(x2,y2),...,(xm,ym)}, 其中任意样本
x
i
x_i
xi为
n
n
n维向量,
y
i
∈
{
C
1
,
C
2
,
.
.
.
,
C
k
}
y_i \in \left \{C_1, C_2, ..., C_k\right \}
yi∈{C1,C2,...,Ck}。我们定义
N
j
N_j
Nj为第
j
j
j类样本的个数,其中
j
∈
{
1
,
2
,
.
.
,
k
}
j \in \left\{ 1, 2, .., k \right \}
j∈{1,2,..,k},
X
j
X_j
Xj为第
j
j
j类样本的集合,而
μ
j
\mu_j
μj为第
j
j
j类样本的均值向量,定义
Σ
j
(
j
∈
{
1
,
2
,
.
.
.
,
k
}
)
\Sigma_j(j \in \left\{1, 2, ..., k\right\})
Σj(j∈{1,2,...,k})为第
j
j
j类样本的协方差矩阵。则:
μ
j
=
1
N
j
Σ
x
∈
X
j
x
\mu_j=\frac{1}{N_j} \Sigma_{x \in X_j}x
μj=Nj1Σx∈Xjx
Σ
j
=
Σ
x
∈
X
j
(
x
−
μ
j
)
(
x
−
μ
j
)
T
\Sigma_j=\Sigma_{x \in X_j}(x-\mu_j)(x-\mu_j)^T
Σj=Σx∈Xj(x−μj)(x−μj)T
假设我们的投影方向是向量
w
w
w,则对任意一个样本
x
i
x_i
xi,它在方向w的投影是
w
T
x
i
w^T x_i
wTxi,我们希望同一种类别的投影点尽可能接近,也就是类内协方差
∑
i
=
1
c
∑
j
=
1
n
i
∣
∣
w
T
(
x
j
i
−
μ
i
)
∣
∣
2
2
\sum_{i=1}^{c}\sum_{j=1}^{n_i}||w^T(x_j^i-\mu_i)||_2^2
∑i=1c∑j=1ni∣∣wT(xji−μi)∣∣22 尽可能小。也就是说要最小化下面式子:
∑
i
=
1
c
∑
j
=
1
n
i
∣
∣
w
T
(
x
j
i
−
μ
i
)
∣
∣
2
2
\sum_{i=1}^{c}\sum_{j=1}^{n_i}||w^T(x_j^i-\mu_i)||_2^2
i=1∑cj=1∑ni∣∣wT(xji−μi)∣∣22
另一方面,我们希望让不同类别的数据尽可能地远离,以便于我们分类。考虑类的中心,也就是均值
μ
1
,
μ
2
,
…
,
μ
k
μ_1,μ_2,…,μ_k
μ1,μ2,…,μk只要最大化
∑
i
=
1
c
n
i
∣
∣
w
T
(
μ
i
−
μ
)
∣
∣
2
2
\sum_{i=1}^{c}n_i||w^T(\mu_i-\mu)||_2^2
∑i=1cni∣∣wT(μi−μ)∣∣22。
综上所述,我们得到算法优化目标:
arg
max
J
(
w
)
=
∑
i
=
1
c
n
i
∣
∣
w
T
(
μ
i
−
μ
)
∣
∣
2
2
∑
i
=
1
c
∑
j
=
1
n
i
∣
∣
w
T
(
x
j
i
−
μ
i
)
∣
∣
2
2
\arg\max J(w)=\frac{\sum_{i=1}^{c}n_i||w^T(\mu_i-\mu)||_2^2}{\sum_{i=1}^{c}\sum_{j=1}^{n_i}||w^T(x_j^i-\mu_i)||_2^2}
argmaxJ(w)=∑i=1c∑j=1ni∣∣wT(xji−μi)∣∣22∑i=1cni∣∣wT(μi−μ)∣∣22
为了表示方便,我们做如下定义:
S
w
=
∑
i
=
1
c
∑
j
=
1
n
i
(
x
j
i
−
μ
i
)
(
x
j
i
−
μ
i
)
T
S_w=\sum_{i=1}^{c}\sum_{j=1}^{n_i}(x_j^i-\mu_i)(x_j^i - \mu_i)^T
Sw=i=1∑cj=1∑ni(xji−μi)(xji−μi)T
S
b
=
∑
i
=
1
c
n
i
(
μ
i
−
μ
)
(
μ
i
−
μ
)
T
S_b=\sum_{i=1}^{c}n_i(\mu_i-\mu)(\mu_i-\mu)^T
Sb=i=1∑cni(μi−μ)(μi−μ)T
优化目标重写为:
arg
max
J
(
w
)
=
t
r
(
w
T
S
b
w
)
t
r
(
w
T
S
w
w
)
\arg\max J(w)=\frac{tr(w^TS_bw)}{tr(w^TS_ww)}
argmaxJ(w)=tr(wTSww)tr(wTSbw)
利用凸优化的知识,我们约束
w
T
S
w
w
=
1
w^T S_w w=1
wTSww=1,这样使用拉格朗日乘子法就可以得到:
J
(
w
)
=
w
T
S
b
w
−
λ
(
w
T
S
w
w
−
1
)
J(w)=w^TS_bw-\lambda (w^TS_ww-1)
J(w)=wTSbw−λ(wTSww−1)
d
J
d
w
=
2
S
b
w
−
2
λ
S
w
w
=
0
\frac{dJ}{dw}=2S_bw-2\lambda S_ww=0
dwdJ=2Sbw−2λSww=0
从而:
S
b
w
=
λ
S
w
w
S_bw=\lambda S_ww
Sbw=λSww
如果
S
w
S_w
Sw可逆,就可以使用求解特征值分解的方法:
S
w
−
1
S
b
w
=
λ
w
S_w^{-1}S_bw=\lambda w
Sw−1Sbw=λw
而在实际当中,S_w常常是不可逆的,我们可以通过以下两种方法解决:
- 令 S w = S w + γ I S_w=S_w + \gamma I Sw=Sw+γI,其中 γ \gamma γ是一个特别小的数,这样 S w S_w Sw一定可逆。
- 先使用PCA对数据进行降维,使得在降维 后的数据上 S w S_w Sw可逆,再使用LDA求解。
matlab代码实现(以人脸识别为例)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%读取数据%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
all_face = load("ORL4646.mat").ORL4646; % 所有人脸
face_train = []; % 训练人脸
train_classes = []; % 训练人脸标记
face_test = []; % 测试人脸
test_classes = []; % 测试人脸标记
% 一共40人,每个人取4张图片作为数据集
human_num = 40; % 数据集的人数
every_num = 10; % 每个人有几张照片
row = 46; % 照片行大小
col = 46; % 照片列大小
train_num = 5; % 用于训练的人脸数
test_num = 5; % 用于测试的人脸数
for i = 0 : human_num - 1
for j = 1 : train_num
tmp = all_face(:, :, i*every_num+j);
new_face = reshape(tmp, row*col, 1); % 将像素矩阵按列拼接
face_train = [face_train new_face]; % 加入到训练集中
end
for k = test_num : every_num
tmp = all_face(:, :, i*every_num+k);
new_face = reshape(tmp, row*col, 1); % 将像素矩阵按列拼接
face_test = [face_test new_face]; % 加入到测试集中
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%PCA降维%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
PCAtrain_mean = mean(face_train,2); % PCA整体均值
PCAdimension = row*col;
PCA_mean = face_train(:, 1:train_num * human_num) - PCAtrain_mean;
[PCAV, PCAD] = eig(PCA_mean * PCA_mean');
[PCAd, PCAind] = sort(diag(PCAD), "descend");
Ds = PCAD(PCAind, PCAind);
Vs = PCAV(:, PCAind);
A = Vs(:, 1:PCAdimension);
LDAface_train = face_train;
LDAface_test = face_test;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%计算Sb和Sw%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
LDAtrain_mean = mean(LDAface_train,2); % LDA整体均值
Sb = 0; % 类间散度矩阵
Sw = 0; % 类内散度矩阵
N_j = train_num; % 每个类的样本数
avcolumn = zeros(PCAdimension,PCAdimension); % 计算每个类的均值
for i = 1:human_num
avcolumn(:,i) = mean(LDAface_train(:,(i-1)*N_j+1:i*N_j),2);
end
for i = 0 : (human_num - 1)
for k = 1 : N_j
temp2 = LDAface_train(:, i*N_j+k) - avcolumn(:,i+1); % 用于计算类内散度
Sw = Sw + temp2 * temp2';
end
end
St = zeros(PCAdimension,PCAdimension); % 计算St
for i = 1:N_j*human_num
St = St + (LDAface_train(:,i)-LDAtrain_mean)*(LDAface_train(:,i)-LDAtrain_mean)';
end
Sb = St - Sw;
%[U, Sigma, DD] = svd(Sw);
%Sigma_diag = diag(Sigma);
%Sigma = Sigma + (1e-11)*eye(size(Sigma, 1), size(Sigma, 2));
%Sw = U * Sigma * DD;
Sw = Sw + (1e-7)*eye(size(Sw, 1), size(Sw, 2));
%%%%%%%%%%%%%%%%方法一%%%%%%%%%%%%%%%%%%%%
[V, D] = eig(Sw\Sb);% 将特征向量按照特征值降序
[LDAd, LDAind] = sort(diag(D), "descend");
%%%%%%%%%%%%%%%%方法二%%%%%%%%%%%%%%%%%%%%
%Sb = Sb + (1e-7)*eye(size(Sb, 1), size(Sb, 2));
%[V, D] = eig(Sb\Sw);% 将特征向量按照特征值降序
%[LDAd, LDAind] = sort(diag(D));
%%%%%%%%%%%%%%%%方法三%%%%%%%%%%%%%%%%%%%%
%[V, D] = eig(Sb - Sw);% 将特征向量按照特征值降序
%[LDAd, LDAind] = sort(diag(D),"descend");
Ds = D(LDAind, LDAind);
LDAeigenVector = V(:, LDAind);
%dimension = 39;
%W = eigenVector(:, 1 :dimension); % 获取投影矩阵的转置
%FisherFaces = W' * face_train;
dimensions = [];
rates = [];
for d = 3:3
dimension = d * 1; % 降维后的维数
dimensions = [dimensions dimension];
W = LDAeigenVector(:, 1 :dimension);
FisherFaces = W' * LDAface_train;
TestFaces = W' * LDAface_test;
for i = 0 : 2
tmp = FisherFaces(:, (i*10+1):(i+1)*10);
%tmp = FisherFaces(:, (i*10+1):(i*10+1));
color = rand(1, 3);
plot3(tmp(1,:),tmp(2,:),tmp(3,:),'marker','.','linestyle','none','color',color, 'MarkerSize',10);
hold on;
end
hold on;
axes("Position",[0.58 0.45 0.05 0.05]);imshow(reshape(face_train(:,1),46,46),[]);
hold on;
axes("Position",[0.3 0.68 0.05 0.05]);imshow(reshape(face_train(:,11),46,46),[]);
hold on;
axes("Position",[0.3 0.48 0.05 0.05]);imshow(reshape(face_train(:,21),46,46),[]);
%subplot(7, 7, d)
%imshow(reshape(W*FisherFaces(:, 1), row, col), []);
correct_sum = 0;
for test_index = 1 : size(TestFaces, 2)
test_face = TestFaces(:, test_index);
Euc_dist = [];
for i = 1 : size(FisherFaces, 2)
tmp = ( norm(test_face - FisherFaces(:, i), 2))^2; % 计算L2范数
Euc_dist = [Euc_dist tmp]; % 用于存放距离
end
[Euc_dist_min , Recognized_index] = min(Euc_dist); % 获得最小距离
disp([Recognized_index test_index]);
if ceil(test_index/test_num) == ceil(Recognized_index/train_num) % 进行分类
correct_sum = correct_sum + 1;
end
end
rate = correct_sum / size(TestFaces, 2); % 计算准确率
rates = [rates rate];
end
% 画折线图
%disp(rates);
%p = plot(dimensions, rates,'k-o','linewidth', 2, 'markersize', 4);
%p.Color = "blue";
%grid on;
%xlabel('维度','fontname','标楷体', 'fontweight', 'bold', 'fontsize', 12 );
%ylabel('识别率', 'fontname', '标楷体', 'fontweight', 'bold', 'fontsize',12);
%xlim([1, human_num-1]);
%ylim([0, 1]);
Python代码实现(以人脸识别为例)
import scipy.io as scio
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from mpl_toolkits.mplot3d import Axes3D
def getdata(eachNum):
dataFile = 'D:\大学作业\大二下机器学习\算法常用数据\ORL4646.mat'
data = scio.loadmat(dataFile)
datamatrix = data['ORL4646'] # 删除作者版本等信息,只采用人脸数据
datamatrix = np.reshape(datamatrix, (46 * 46, 400)) # 将人脸数据转化为二维数组
datamatrix = np.array(datamatrix)
datamatrix = np.transpose(datamatrix) #
train_face = [];test_face = []; train_label = [];test_label = []
for i in range(40):
for j in range(10):
if(j<eachNum):
train_face.append(datamatrix[i*10+j])
train_label.append(i+1)
else:
test_face.append(datamatrix[i*10+j])
test_label.append(i+1)
train_face = np.asarray(train_face)
train_label = np.asarray(train_label)
test_face = np.asarray(test_face)
test_label = np.asarray(test_label)
'''
dataFile = 'D:\大学作业\大二下机器学习\算法常用数据\YaleB_32x32.mat'
data = scio.loadmat(dataFile)
datamatrix = data['fea']
train_face = [];test_face = [];train_label = [];test_label = []
for i in range(38):
for j in range(10):
if (j < eachNum):
train_face.append(datamatrix[i * 64 + j])
train_label.append(i + 1)
else:
test_face.append(datamatrix[i * 64 + j])
test_label.append(i + 1)
train_face = np.asarray(train_face)
train_label = np.asarray(train_label)
test_face = np.asarray(test_face)
test_label = np.asarray(test_label)
'''
'''
dataFile = 'D:\大学作业\大二下机器学习\算法常用数据\Yale5040165.mat'
data = scio.loadmat(dataFile)
datamatrix = data['Yale5040165']
datamatrix = np.reshape(datamatrix, (50 * 40, 165)) # 将人脸数据转化为二维数组
datamatrix = np.array(datamatrix)
datamatrix = np.transpose(datamatrix)
train_face = [];test_face = [];train_label = [];test_label = []
for i in range(15):
for j in range(11):
if (j < eachNum):
train_face.append(datamatrix[i * 11 + j])
train_label.append(i + 1)
else:
test_face.append(datamatrix[i * 11 + j])
test_label.append(i + 1)
train_face = np.asarray(train_face)
train_label = np.asarray(train_label)
test_face = np.asarray(test_face)
test_label = np.asarray(test_label)
return train_face,train_label,test_face,test_label,datamatrix
'''
return train_face,train_label,test_face,test_label,datamatrix
def getdataByRecog(eachNum):
'''
dataFile = 'D:\大学作业\大二下机器学习\算法常用数据\ORL4646.mat'
data = scio.loadmat(dataFile)
datamatrix = data['ORL4646'] # 删除作者版本等信息,只采用人脸数据
datamatrix = np.reshape(datamatrix, (46 * 46, 400)) #将人脸数据转化为二维数组
datamatrix = np.array(datamatrix)
datamatrix = np.transpose(datamatrix)
trainMatrix = []
testMatrix = []
for i in range(40): #ORL共40个人
for j in range(10): #每个人10张照片
if j < eachNum: #每个人用n张照片进行训练
trainMatrix.append(datamatrix[i*10 + j])
else: #剩下的进行测试
testMatrix.append(datamatrix[i*10 + j])
trainMatrix = np.array(trainMatrix)
testMatrix = np.array(testMatrix)
#print(trainMatrix.shape)
#print(testMatrix.shape)
return trainMatrix,testMatrix
:param eachNum:
:return:
'''
'''
dataFile = 'D:\大学作业\大二下机器学习\算法常用数据\YaleB_32x32.mat'
data = scio.loadmat(dataFile)
datamatrix = data['fea']
trainMatrix = []
testMatrix = []
for i in range(38): # YaleB共个人
for j in range(10): # 每个人64张照片
if j < eachNum: # 每个人用n张照片进行训练
trainMatrix.append(datamatrix[i * 64 + j])
else: # 剩下的进行测试
testMatrix.append(datamatrix[i * 64 + j])
trainMatrix = np.array(trainMatrix)
testMatrix = np.array(testMatrix)
return trainMatrix, testMatrix
'''
dataFile = 'D:\大学作业\大二下机器学习\算法常用数据\Yale5040165.mat'
data = scio.loadmat(dataFile)
datamatrix = data['Yale5040165']
datamatrix = np.reshape(datamatrix, (50 * 40, 165)) # 将人脸数据转化为二维数组
datamatrix = np.array(datamatrix)
datamatrix = np.transpose(datamatrix)
trainMatrix = []
testMatrix = []
for i in range(15): # Yale共15个人
for j in range(11): # 每个人张照片
if j < eachNum: # 每个人用n张照片进行训练
trainMatrix.append(datamatrix[i * 11 + j])
else: # 剩下的进行测试
testMatrix.append(datamatrix[i * 11 + j])
trainMatrix = np.array(trainMatrix)
testMatrix = np.array(testMatrix)
return trainMatrix, testMatrix
def LDA(face,label,k):
x,y = face.shape
classes = np.unique(label)
meanAll = face.mean(axis=0)
Sb = np.zeros((y, y),dtype = np.float32)
Sw = np.zeros((y, y), dtype=np.float32)
for i in classes:
face_i = face[np.where(label == i)[0],:]
mean_i = face_i.mean(axis = 0)
n = face_i.shape[0]
Sw = Sw + np.dot((face_i - mean_i).T,(face_i-mean_i))
Sb = Sb + n*np.dot((mean_i - meanAll).T,(mean_i - meanAll))
#print("Sw");print(Sw);print("end");print("Sb");print(Sb);print("end")
tmp = np.ones(Sw.shape)/500
Sw = tmp + Sw
matrix = np.linalg.inv(Sw) @ Sb
eigenvalue,eigenvector = np.linalg.eigh(matrix)
#print("vec");print(eigenvector);print("end")
index = np.argsort(eigenvalue) # 排序
index = index[:-(k + 1):-1]
select = eigenvector[:, index]
return select
def pca(matrix,k):
matrix = np.float32(np.mat(matrix))
meanMatrix = np.mean(matrix,axis=0)
matrix = matrix - meanMatrix #去平均值
covMatrix = np.cov(matrix.T,rowvar=1) #协方差矩阵
eigenvalue,eigenvector = np.linalg.eigh(np.mat(covMatrix)) #特征值和特征向量
index = np.argsort(eigenvalue) #排序
index = index[:-(k+1):-1]
select = eigenvector[:,index] #最大的特征值对应的特征向量
return select.T
def pca_recog(matrix,k):
matrix = np.float32(np.mat(matrix))
eachMean = []
#eachMean = np.array(eachMean)
for i in range(matrix.shape[0]):
eachMean.append(matrix[i].mean(axis=0))
eachMean = np.array(eachMean)
eachMean = eachMean.reshape(eachMean.shape[0],eachMean.shape[2])
matrix = matrix.T
meanMatrix = matrix.mean(axis=1) # 按行计算平均
matrix = matrix - meanMatrix # 去平均值
u,sigma,v = np.linalg.svd(matrix)
w = u[:,:k]
w = np.array(w)
eachMean = np.array(eachMean)
return w,eachMean
def recognize_pca():
eachNum = 5 #每个人用来训练的照片数
trainMatrix,testMatrix = getdataByRecog(eachNum)
#print(testMatrix.shape)
trainNum = trainMatrix.shape[0]
testNum = testMatrix.shape[0]
acc = []
dimen = []
w,eachMean = pca_recog(trainMatrix,180)
project = []
for each in eachMean:
tmp = np.dot(w.T,each.T)
project.append(tmp.T)
project = np.array(project)
correct = 0
for i in range(testNum):
tmp = np.reshape(testMatrix[i],(50*40,1))
test = np.dot(w.T , tmp)
disMatrix = []
for j in range(project.shape[0]):
distance = np.linalg.norm(test.T - project[j])
disMatrix.append(distance)
classes = int(disMatrix.index(min(disMatrix))/eachNum) #eachNum张为一类
if classes == int(i/(10-eachNum)): #10-eachNum张为一类
correct += 1
accuracy = float(correct) / testNum
print(accuracy)
def KNN_classify(face,train_face,train_label,k):
dis = np.sum(np.power((train_face - face),2),axis=1)
index = np.argsort(dis)[:k]
weight = []
for i in range(k):
weight.append(dis[index[k-1]] - dis[index[i]] / (dis[index[k-1]] - dis[index[0]]))
count = [0 for i in range (train_face.shape[0])]
tmp = 0
for i in index:
count[train_label[i]] += 1 + weight[tmp]
tmp += 1
label = np.argmax(count);
return label
def KNN(train_faces, train_labels,test_faces, test_labels,k):
sum = test_faces.shape[0]
err = 0
for i in range(sum):
count = KNN_classify(test_faces[i],train_faces,train_labels,k)
if count != test_labels[i]:
err += 1
acc = (sum - err) / sum
return acc
def view():
eachNum = 7 # 每个人拿出5张照片进行训练
train_face, train_label, test_face, test_label, face = getdata(eachNum)
W = LDA(train_face, train_label, 3, eachNum)
# print(W)
project = np.dot(W, face.T).T
print(project.shape)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
colors = ['b', 'g', 'r']
for i in range(3):
tmpx = []
tmpy = []
tmpz = []
for j in range(10):
tmpx.append(project[10 * i + j][0])
tmpy.append(project[10 * i + j][1])
tmpz.append(project[10 * i + j][2])
ax.scatter(tmpx, tmpy, tmpz, c=colors[i])
# ax.title("LDA")
plt.show()
matrixW = pca(train_face, 2)
matrixW = np.array(matrixW)
# print(matrixW)
project_pca = np.dot(matrixW, face.T).T
# print(project_pca.shape)
fig = plt.figure()
ax1 = fig.add_subplot(111, projection='3d')
colors = ['b', 'g', 'r']
for i in range(3):
tmpx = []
tmpy = []
tmpz = []
for j in range(10):
tmpx.append(project_pca[10 * i + j][0])
tmpy.append(project_pca[10 * i + j][1])
tmpz.append(project_pca[10 * i + j][1])
ax1.scatter(tmpx, tmpy, tmpz, c=colors[i])
# ax1.title("PCA")
plt.show()
def rebuild():
eachNum = 7 # 每个人拿出5张照片进行训练
train_face, train_label, test_face, test_label, face = getdata(eachNum)
print(face[1].shape)
W = LDA(train_face, train_label, 39, eachNum)
lowmatrix = np.mat(face[10]) * W.T # 1,2116 * 2116*d
fisherface = lowmatrix*W #1,d * d*2116
fisherface = np.reshape(fisherface,(46,46))
plt.figure()
plt.axis('off')
plt.imshow(fisherface, cmap=plt.cm.gray)
plt.show()
W_pca = pca(train_face, 180)
lowmatrix_pca = np.mat(face[10]) * W_pca.T
eigenface = lowmatrix_pca * W_pca
eigenface = np.reshape(eigenface,(46,46))
plt.figure()
plt.axis('off')
plt.imshow(eigenface, cmap=plt.cm.gray)
plt.show()
plt.figure()
plt.axis('off')
plt.imshow(face[10].reshape(46,46), cmap=plt.cm.gray)
plt.show()
def recognize_lda():
eachNum = 3 # 每个人拿出eachNum张照片进行训练
train_face, train_label, test_face, test_label, face = getdata(eachNum)
W = LDA(train_face, train_label, 39)
train = np.dot(train_face,W)
test = np.dot(test_face,W)
acc = KNN(train,train_label,test,test_label,3)
print("acc=",acc);
if __name__ == "__main__":
#view()
#rebuild()
#recognize_lda()
recognize_pca()
相关证明: S t = S w + S b S_t=S_w+S_b St=Sw+Sb
LDA主要优点
- 在降维过程中可以使用类别的先验知识经验,而像PCA这样的无监督学习则无法使用类别先验知识。
- LDA在样本分类信息依赖均值而不是方差的时候,比PCA之类的算法较优。
LDA主要缺点
- LDA不适合对非高斯分布样本进行降维,PCA也有这个问题。
- LDA降维最多降到类别数k-1的维数,如果我们降维的维度大于k-1,则不能使用LDA。当然目前有一些LDA的进化版算法可以绕过这个问题。
- LDA在样本分类信息依赖方差而不是均值的时候,降维效果不好。
- LDA可能过度拟合数据。
写在最后
本文主要为个人学习的成果总结,部分内容参考网上相关资料,如有侵权联系删除。另外,也欢迎各位对本文进行转载,转载请附上原文链接。读者在阅读本文过程中如有疑问,欢迎随时交流。