线性判别式算法(LDA)
LDA算法和PCA算法都是一种数据压缩的算法,由于前者属于无监督学习而后者属于监督学习,根据任务的不同,因而它们的侧重点不同,PCA算法关心的是原数据与新数据之间的最小重构误差,而LDA算法关注的是数据压缩后类别间的区分度。
从上图中可以看出,LDA算法希望找到一个投影的方向,使得类别间中心点尽可能分散,而每一类的样本尽可能聚集,如果说PCA算法的优化准则是最小重构误差,则LDA的准则就是最小化类内方差、最大化类间均值。
那我们该如何去选择这个投影方向呢?我们不妨先从数学原理出发。
假设样本数据共分为 K K K 类,每一类的样本数目分别为 N 1 , N 2 , ⋯ , N K N_1,N_2,\cdots,N_K N1,N2,⋯,NK ,设 x k 1 , x k 2 , ⋯ , x k N k \boldsymbol{x_k^1},\boldsymbol{x_k^2},\cdots,\boldsymbol{x_k^{N_k}} xk1,xk2,⋯,xkNk 分别为第 k k k 类的样本。对于任何一个样本 x \boldsymbol{x} x ,设 x ~ \boldsymbol{\widetilde{x}} x 为 x \boldsymbol{x} x 投影后的样本点。
则有 x ~ = < x , u > u = ( x T u ) u \boldsymbol{\widetilde{x}} = <\boldsymbol{x},\boldsymbol{u}>\boldsymbol{u} = (\boldsymbol{x}^T\boldsymbol{u})\boldsymbol{u} x =<x,u>u=(xTu)u 。
如何描述类内方差?
我们接下来首先去描述投影后的第
k
k
k 类样本的方差。
S
k
=
1
N
k
∑
x
~
∈
D
k
(
x
~
−
m
~
k
)
T
(
x
~
−
m
~
k
)
=
1
N
k
∑
x
∈
D
k
[
(
x
T
u
)
u
−
(
m
k
T
u
)
u
]
T
[
(
x
T
u
)
u
−
(
m
k
T
u
)
u
]
=
1
N
k
∑
[
(
x
T
u
)
2
u
T
u
−
2
(
m
k
T
u
)
(
x
T
u
)
u
T
u
+
(
m
k
T
u
)
2
u
T
u
]
=
1
N
k
u
T
u
∑
[
(
x
T
u
)
2
−
2
(
m
k
T
u
)
(
x
T
u
)
+
(
m
k
T
u
)
2
]
=
a
N
k
∑
[
(
x
T
u
)
2
−
2
(
m
k
T
u
)
(
x
T
u
)
+
(
m
k
T
u
)
2
]
=
a
[
∑
(
x
T
u
)
2
N
k
−
2
∑
(
m
k
T
u
)
(
x
T
u
)
N
k
+
∑
(
m
k
T
u
)
2
N
k
]
=
a
[
∑
(
x
T
u
)
(
x
T
u
)
N
k
−
2
∑
(
m
k
T
u
)
(
x
T
u
)
N
k
+
(
m
k
T
u
)
2
]
=
a
[
∑
u
T
x
x
T
u
N
k
−
2
∑
x
T
N
k
u
m
k
T
u
+
(
m
k
T
u
)
2
]
=
a
[
u
T
∑
x
x
T
N
k
u
−
u
T
m
k
m
k
T
u
]
=
a
u
T
(
∑
x
x
T
N
k
−
m
k
m
k
T
)
u
\begin{aligned} S_k &= \frac{1}{N_k}\displaystyle \sum_{\boldsymbol{\widetilde{x}} \in D_k}(\boldsymbol{\widetilde{x}}-\boldsymbol{\widetilde{m}}_k)^T (\boldsymbol{\widetilde{x}}-\boldsymbol{\widetilde{m}}_k) \\ &= \frac{1}{N_k} \displaystyle \sum_{\boldsymbol{x} \in D_k} [(\boldsymbol{x}^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_k^T \boldsymbol{u})\boldsymbol{u}]^T [(\boldsymbol{x}^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_k^T \boldsymbol{u})\boldsymbol{u}] \\ &= \frac{1}{N_k} \sum \Big[(\boldsymbol{x}^T \boldsymbol{u})^2 \boldsymbol{u}^T \boldsymbol{u} - 2(\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u})\boldsymbol{u}^T \boldsymbol{u} + (\boldsymbol{m}_k^T \boldsymbol{u})^2 \boldsymbol{u}^T \boldsymbol{u}\Big] \\ &= \frac{1}{N_k} \boldsymbol{u}^T \boldsymbol{u} \sum [(\boldsymbol{x}^T \boldsymbol{u})^2 - 2(\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u}) + (\boldsymbol{m}_k^T \boldsymbol{u})^2] \\ &= \frac{a}{N_k} \sum [(\boldsymbol{x}^T \boldsymbol{u})^2 - 2(\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u}) + (\boldsymbol{m}_k^T \boldsymbol{u})^2] \\ &= a\Big[\frac{\sum (\boldsymbol{x}^T \boldsymbol{u})^2}{N_k} - 2\frac{ \sum (\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u})}{N_k} + \frac{ \sum (\boldsymbol{m}_k^T \boldsymbol{u})^2}{N_k}\Big] \\ &= a\Big[\frac{\sum (\boldsymbol{x}^T \boldsymbol{u}) (\boldsymbol{x}^T \boldsymbol{u})}{N_k} - 2\frac{\sum (\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u})}{N_k} + (\boldsymbol{m}_k^T \boldsymbol{u})^2 \Big] \\ &= a\Big[\frac{\sum \boldsymbol{u}^T \boldsymbol{x} \boldsymbol{x}^T \boldsymbol{u}}{N_k} - 2\frac{\sum \boldsymbol{x}^T}{N_k} \boldsymbol{u} \boldsymbol{m}_k^T \boldsymbol{u} + (\boldsymbol{m}_k^T \boldsymbol{u})^2 \Big] \\ &= a\Big[\boldsymbol{u}^T \frac{\sum \boldsymbol{x} \boldsymbol{x}^T}{N_k} \boldsymbol{u} - \boldsymbol{u}^T \boldsymbol{m}_k \boldsymbol{m}_k^T \boldsymbol{u}\Big] \\ &= a\boldsymbol{u}^T ( \frac{\sum \boldsymbol{x} \boldsymbol{x}^T}{N_k} - \boldsymbol{m}_k \boldsymbol{m}_k^T) \boldsymbol{u} \end{aligned}
Sk=Nk1x
∈Dk∑(x
−m
k)T(x
−m
k)=Nk1x∈Dk∑[(xTu)u−(mkTu)u]T[(xTu)u−(mkTu)u]=Nk1∑[(xTu)2uTu−2(mkTu)(xTu)uTu+(mkTu)2uTu]=Nk1uTu∑[(xTu)2−2(mkTu)(xTu)+(mkTu)2]=Nka∑[(xTu)2−2(mkTu)(xTu)+(mkTu)2]=a[Nk∑(xTu)2−2Nk∑(mkTu)(xTu)+Nk∑(mkTu)2]=a[Nk∑(xTu)(xTu)−2Nk∑(mkTu)(xTu)+(mkTu)2]=a[Nk∑uTxxTu−2Nk∑xTumkTu+(mkTu)2]=a[uTNk∑xxTu−uTmkmkTu]=auT(Nk∑xxT−mkmkT)u
其中,
D
k
D_k
Dk 表示第
k
k
k 类的样本集合,投影后的样本中心为
m
~
k
\widetilde{\boldsymbol{m}}_k
m
k ,原样本的中心为
m
k
=
∑
x
x
T
N
k
\boldsymbol{m}_k = \frac{\sum \boldsymbol{x}\boldsymbol{x}^T}{N_k}
mk=Nk∑xxT ,由于
u
\boldsymbol{u}
u 重在它的方向性,因此不妨设它的大小为
u
T
u
=
a
\boldsymbol{u}^T \boldsymbol{u} = a
uTu=a 。
而对于整个算法来说,投影后所有类别的类内方差为
∑
k
=
1
K
S
k
=
a
∑
k
=
1
K
u
T
(
∑
x
x
T
N
k
−
m
k
m
k
T
)
u
=
a
u
T
∑
k
=
1
K
(
∑
x
x
T
N
k
−
m
k
m
k
T
)
u
\begin{aligned} \sum_{k=1}^K S_k &= a \sum_{k=1}^K \boldsymbol{u}^T ( \frac{\sum \boldsymbol{x} \boldsymbol{x}^T}{N_k} - \boldsymbol{m}_k \boldsymbol{m}_k^T) \boldsymbol{u} \\ &= a \boldsymbol{u}^T \sum_{k=1}^K ( \frac{\sum \boldsymbol{x} \boldsymbol{x}^T}{N_k} - \boldsymbol{m}_k \boldsymbol{m}_k^T) \boldsymbol{u} \\ \end{aligned}
k=1∑KSk=ak=1∑KuT(Nk∑xxT−mkmkT)u=auTk=1∑K(Nk∑xxT−mkmkT)u
令
∑
k
=
1
K
(
∑
x
x
T
N
k
−
m
k
m
k
T
)
=
S
w
\displaystyle \sum_{k=1}^ {K} ( \frac{\sum \boldsymbol{x} \boldsymbol{x} ^ T}{N_k} - \boldsymbol{m}_k \boldsymbol{m}_k^T) = \mathbf{S_w}
k=1∑K(Nk∑xxT−mkmkT)=Sw ,则有
∑
k
=
1
K
S
k
=
a
u
T
S
w
u
\sum_{k=1}^K S_k = a \boldsymbol{u}^T \mathbf{S_w} \boldsymbol{u}
k=1∑KSk=auTSwu
如何描述类间距离?
而对于投影后任意两个类别间的距离,有
S
i
,
j
=
(
m
~
i
−
m
~
j
)
T
(
m
~
i
−
m
~
j
)
=
[
(
m
i
T
u
)
u
−
(
m
j
T
u
)
u
]
T
[
(
m
i
T
u
)
u
−
(
m
j
T
u
)
u
]
=
[
(
m
i
T
u
)
u
T
−
(
m
j
T
u
)
u
T
]
[
(
m
i
T
u
)
u
−
(
m
j
T
u
)
u
]
=
(
m
i
−
m
j
)
T
u
u
T
u
(
m
i
−
m
j
)
T
u
=
a
(
m
i
−
m
j
)
T
u
(
m
i
−
m
j
)
T
u
=
a
u
T
(
m
i
−
m
j
)
(
m
i
−
m
j
)
T
u
\begin{aligned} S_{i,j} &= (\widetilde{\boldsymbol{m}}_i - \widetilde{\boldsymbol{m}}_j)^T (\widetilde{\boldsymbol{m}}_i - \widetilde{\boldsymbol{m}}_j)\\ &= \big[(\boldsymbol{m}_i^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_j^T \boldsymbol{u})\boldsymbol{u}\big]^T \big[(\boldsymbol{m}_i^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_j^T \boldsymbol{u})\boldsymbol{u}\big]\\ &=\big[ (\boldsymbol{m}_i^T \boldsymbol{u})\boldsymbol{u}^T - (\boldsymbol{m}_j^T \boldsymbol{u})\boldsymbol{u}^T \big] \big[(\boldsymbol{m}_i^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_j^T \boldsymbol{u})\boldsymbol{u}\big]\\ &= (\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \boldsymbol{u}^T \boldsymbol{u}(\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \\ &= a (\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u}(\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \\ &= a \boldsymbol{u}^T(\boldsymbol{m}_i - \boldsymbol{m}_j)(\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \end{aligned}
Si,j=(m
i−m
j)T(m
i−m
j)=[(miTu)u−(mjTu)u]T[(miTu)u−(mjTu)u]=[(miTu)uT−(mjTu)uT][(miTu)u−(mjTu)u]=(mi−mj)TuuTu(mi−mj)Tu=a(mi−mj)Tu(mi−mj)Tu=auT(mi−mj)(mi−mj)Tu
投影后所有类别间的距离为
∑
i
,
j
且
i
≠
j
S
i
,
j
=
∑
i
,
j
且
i
≠
j
a
u
T
(
m
i
−
m
j
)
(
m
i
−
m
j
)
T
u
=
a
u
T
[
∑
i
,
j
且
i
≠
j
(
m
i
−
m
j
)
(
m
i
−
m
j
)
T
]
u
\begin{aligned} \sum_{i,j 且 i\neq j} S_{i,j} &=\sum_{i,j且i\neq j} a \boldsymbol{u}^T (\boldsymbol{m}_i - \boldsymbol{m}_j) (\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \\ &= a \boldsymbol{u}^T \Big[\sum_{i,j且i\neq j} (\boldsymbol{m}_i - \boldsymbol{m}_j) (\boldsymbol{m}_i - \boldsymbol{m}_j)^T \Big]\boldsymbol{u} \end{aligned}
i,j且i=j∑Si,j=i,j且i=j∑auT(mi−mj)(mi−mj)Tu=auT[i,j且i=j∑(mi−mj)(mi−mj)T]u
令
∑
i
,
j
i
≠
j
(
m
i
−
m
j
)
(
m
i
−
m
j
)
T
=
S
b
\displaystyle \sum_{i,j\\i\neq j} (\boldsymbol{m}_i - \boldsymbol{m}_j) (\boldsymbol{m}_i - \boldsymbol{m}_j)^T = \mathbf{S_b}
i,ji=j∑(mi−mj)(mi−mj)T=Sb ,有
∑
i
,
j
且
i
≠
j
S
i
,
j
=
a
u
T
S
b
u
\sum_{i,j且i\neq j} S_{i,j} = a \boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u}
i,j且i=j∑Si,j=auTSbu
LDA优化目标
因此,根据LDA的优化准则,我们设计出
min
u
J
(
u
)
=
u
T
S
w
u
u
T
S
b
u
\min_{\boldsymbol{u}}J(\boldsymbol{u}) = \frac{\boldsymbol{u}^T \mathbf{S_w} \boldsymbol{u}}{\boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u}}
uminJ(u)=uTSbuuTSwu
为求最小化,假设
u
T
S
b
u
=
1
\boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u} = 1
uTSbu=1 ,从而有以下优化问题
{
min
u
T
S
w
u
u
T
S
b
u
=
1
\left\{ \begin{aligned} \min \boldsymbol{u}^T \mathbf{S_w} \boldsymbol{u} \\ \boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u} = 1 \end{aligned} \right.
{minuTSwuuTSbu=1
通过拉格朗日乘子法有
L
(
u
,
λ
)
=
u
T
S
w
u
+
λ
(
1
−
u
T
S
b
u
)
L(\boldsymbol{u},\lambda) = \boldsymbol{u}^T \mathbf{S_w} \boldsymbol{u} + \lambda (1- \boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u} )
L(u,λ)=uTSwu+λ(1−uTSbu)
从而有
∂
L
∂
u
=
0
→
S
w
u
=
λ
S
b
u
\frac{\partial L}{\partial \boldsymbol{u}} = 0 \rightarrow \mathbf{S_w} \boldsymbol{u} = \lambda \mathbf{S_b} \boldsymbol{u}
∂u∂L=0→Swu=λSbu
即
S
b
−
1
S
w
u
=
λ
u
\mathbf{S_b}^{-1} \mathbf{S_w} \boldsymbol{u} = \lambda \boldsymbol{u}
Sb−1Swu=λu ,投影方向即为
S
b
−
1
S
w
\mathbf{S_b}^{-1} \mathbf{S_w}
Sb−1Sw 的特征向量。