3. LDA and GDA
3.1 Linear Discriminant Analysis
寻找一个方向向量满足:
- 投影后的各类均值距离最大
- 投影后每一类的样本与均值的距离最小
即增大类均值距离,增大每一类的样本聚集程度。目的是降低样本投影之间的重叠部分,增大可分性
L L L:样本类别数目; N i N_i Ni:第 i i i类样本的数目; N N N全部样本数目; x j ( i ) \boldsymbol x^{(i)}_j xj(i):第 j j j类中的第 i i i个样本
将所有的样本投影到方向向量 v \boldsymbol v v上, v T x 1 ( 1 ) , ⋯ , v T x N 1 ( 1 ) ; v T x 2 ( 2 ) , ⋯ , v T x N 2 ( 2 ) ; ⋯ ; v T x 1 ( L ) , ⋯ , v T x N L ( L ) \boldsymbol v^T\boldsymbol x^{(1)}_1,\cdots,\boldsymbol v^T\boldsymbol x^{(1)}_{N_1};\boldsymbol v^T\boldsymbol x^{(2)}_2,\cdots,\boldsymbol v^T\boldsymbol x^{(2)}_{N_2};\cdots;\boldsymbol v^T\boldsymbol x^{(L)}_1,\cdots,\boldsymbol v^T\boldsymbol x^{(L)}_{N_L} vTx1(1),⋯,vTxN1(1);vTx2(2),⋯,vTxN2(2);⋯;vTx1(L),⋯,vTxNL(L)。
各类的均值为
m
‾
i
=
1
N
i
∑
j
=
1
N
i
v
T
x
j
(
i
)
=
v
T
(
1
N
i
∑
j
=
1
N
i
x
j
(
i
)
)
=
v
T
m
i
\overline{\boldsymbol m}_i=\frac{1}{N_i}\sum_{j=1}^{N_i}\boldsymbol v^T\boldsymbol x^{(i)}_j=\boldsymbol v^T\left(\frac{1}{N_i}\sum_{j=1}^{N_i}\boldsymbol x^{(i)}_j\right)=\boldsymbol v^T\boldsymbol m_i
mi=Ni1j=1∑NivTxj(i)=vT(Ni1j=1∑Nixj(i))=vTmi
其中
m
i
m_i
mi为原空间内第
i
i
i类的均值。
然后计算每一类均值之间距离的权重平方和为
∑
i
=
1
L
−
1
∑
j
=
i
+
1
L
N
i
N
N
j
N
(
m
‾
i
−
m
‾
j
)
2
=
∑
i
=
1
L
−
1
∑
j
=
i
+
1
L
N
i
N
N
j
N
(
m
‾
i
−
m
‾
j
)
(
m
‾
i
−
m
‾
j
)
T
=
∑
i
=
1
L
−
1
∑
j
=
i
+
1
L
N
i
N
N
j
N
(
v
T
m
i
−
v
T
m
j
)
(
v
T
m
i
−
v
T
m
j
)
T
=
∑
i
=
1
L
−
1
∑
j
=
i
+
1
L
N
i
N
N
j
N
v
T
(
m
i
−
m
j
)
(
m
i
−
m
j
)
T
v
=
v
T
(
∑
i
=
1
L
−
1
∑
j
=
i
+
1
L
N
i
N
N
j
N
(
m
i
−
m
j
)
(
m
i
−
m
j
)
T
)
v
=
v
T
S
b
L
D
A
v
\begin{aligned} \sum^{L-1}_{i=1}{\sum^{L}_{j=i+1}{\frac{N_i}{N}\frac{N_j}{N}(\overline{\boldsymbol m}_i-\overline{\boldsymbol m}_j)^2}} &= \sum^{L-1}_{i=1}{\sum^{L}_{j=i+1}{\frac{N_i}{N}\frac{N_j}{N}(\overline{\boldsymbol m}_i-\overline{\boldsymbol m}_j)(\overline{\boldsymbol m}_i-\overline{\boldsymbol m}_j)^T}} \\ &= \sum^{L-1}_{i=1}{\sum^{L}_{j=i+1}{\frac{N_i}{N}\frac{N_j}{N}(\boldsymbol v^T\boldsymbol m_i-\boldsymbol v^T\boldsymbol m_j)(\boldsymbol v^T\boldsymbol m_i-\boldsymbol v^T\boldsymbol m_j)^T}} \\ &= \sum^{L-1}_{i=1}{\sum^{L}_{j=i+1}{\frac{N_i}{N}\frac{N_j}{N}\boldsymbol v^T(\boldsymbol m_i-\boldsymbol m_j)(\boldsymbol m_i-\boldsymbol m_j)^T\boldsymbol v}} \\ &= \boldsymbol v^T\left(\sum^{L-1}_{i=1}{\sum^{L}_{j=i+1}{\frac{N_i}{N}\frac{N_j}{N}}(\boldsymbol m_i-\boldsymbol m_j)(\boldsymbol m_i-\boldsymbol m_j)^T}\right)\boldsymbol v \\ &= \boldsymbol v^TS^{LDA}_{b}\boldsymbol v \end{aligned}
i=1∑L−1j=i+1∑LNNiNNj(mi−mj)2=i=1∑L−1j=i+1∑LNNiNNj(mi−mj)(mi−mj)T=i=1∑L−1j=i+1∑LNNiNNj(vTmi−vTmj)(vTmi−vTmj)T=i=1∑L−1j=i+1∑LNNiNNjvT(mi−mj)(mi−mj)Tv=vT(i=1∑L−1j=i+1∑LNNiNNj(mi−mj)(mi−mj)T)v=vTSbLDAv
S b L D A = ∑ i = 1 L − 1 ∑ j = i + 1 L N i N N j N ( m i − m j ) ( m i − m j ) T = 1 2 ∑ i = 1 L ∑ j = 1 L N i N N j N ( m i − m j ) ( m i − m j ) T = 1 2 ∑ i = 1 L ∑ j = 1 L N i N N j N ( m i m i T − m i m j T − m j m i T + m j m j T ) = 1 2 ( ∑ i = 1 L ∑ j = 1 L N i N N j N m i m i T − ∑ i = 1 L ∑ j = 1 L N i N N j N m i m j T − ∑ i = 1 L ∑ j = 1 L N i N N j N m j m i T + ∑ i = 1 L ∑ j = 1 L N i N N j N m j m j T ) = 1 2 ( ∑ i = 1 L N i N m i m i T ∑ j = 1 L N j N − ∑ i = 1 L N i N m i ∑ j = 1 L N j N m j T − ∑ j = 1 L N j N m j ∑ i = 1 L N i N m i T + ∑ i = 1 L N i N ∑ j = 1 L N j N m j T m j T ) = 1 2 ( ∑ i = 1 L N i N m i m i T − m 0 m 0 T − m 0 m 0 T + ∑ L j = 1 N j N m j m j T ) = ∑ L i = 1 N i N m i m i T − m 0 m 0 T = ∑ i = 1 L N i N ( m i − m 0 ) ( m i − m 0 ) T ( 与 E [ ( x − x ˉ ) 2 ] = E [ x 2 ] − x ˉ 2 相 似 ) \begin{aligned} S^{LDA}_b &= \sum^{L-1}_{i=1}{\sum^{L}_{j=i+1}{\frac{N_i}{N}\frac{N_j}{N}}(\boldsymbol m_i-\boldsymbol m_j)(\boldsymbol m_i-\boldsymbol m_j)^T}\\ &= \frac{1}{2}\sum^{L}_{i=1}{\sum^{L}_{j=1}{\frac{N_i}{N}\frac{N_j}{N}}(\boldsymbol m_i-\boldsymbol m_j)(\boldsymbol m_i-\boldsymbol m_j)^T}\\ &= \frac{1}{2}\sum^{L}_{i=1}{\sum^{L}_{j=1}{\frac{N_i}{N}\frac{N_j}{N}}(\boldsymbol m_i\boldsymbol m_i^T-\boldsymbol m_i\boldsymbol m_j^T-\boldsymbol m_j\boldsymbol m_i^T+\boldsymbol m_j\boldsymbol m_j^T)}\\ &= \frac{1}{2}\left({\sum_{i=1}^{L}{\sum_{j=1}^{L}{\frac{N_i}{N}\frac{N_j}{N}\boldsymbol m_i\boldsymbol m_i^T}} -\sum_{i=1}^{L}{\sum_{j=1}^{L}{\frac{N_i}{N}\frac{N_j}{N}\boldsymbol m_i\boldsymbol m_j^T}} -\sum_{i=1}^{L}{\sum_{j=1}^{L}{\frac{N_i}{N}\frac{N_j}{N}\boldsymbol m_j\boldsymbol m_i^T}} +\sum_{i=1}^{L}{\sum_{j=1}^{L}{\frac{N_i}{N}\frac{N_j}{N}\boldsymbol m_j\boldsymbol m_j^T}}}\right)\\ &= \frac{1}{2}\left( \sum_{i=1}^{L}\frac{N_i}{N}\boldsymbol m_i\boldsymbol m_i^T\sum_{j=1}^{L}\frac{N_j}{N}- \sum_{i=1}^{L}\frac{N_i}{N}\boldsymbol m_i\sum_{j=1}^{L}\frac{N_j}{N}\boldsymbol m_j^T- \sum_{j=1}^{L}\frac{N_j}{N}\boldsymbol m_j\sum_{i=1}^{L}\frac{N_i}{N}\boldsymbol m_i^T+ \sum_{i=1}^{L}\frac{N_i}{N}\sum_{j=1}^{L}\frac{N_j}{N}\boldsymbol m_j^T\boldsymbol m_j^T \right)\\ &= \frac{1}{2}\left( \sum_{i=1}^{L}\frac{N_i}{N}\boldsymbol m_i\boldsymbol m_i^T-\boldsymbol m_0\boldsymbol m_0^T-\boldsymbol m_0\boldsymbol m_0^T+\sum_{L}^{j=1}\frac{N_j}{N}\boldsymbol m_j\boldsymbol m_j^T \right)\\ &= \sum_{L}^{i=1}\frac{N_i}{N}\boldsymbol m_i\boldsymbol m_i^T-\boldsymbol m_0\boldsymbol m_0^T\\ &= \sum_{i=1}^{L}\frac{N_i}{N}(\boldsymbol m_i-\boldsymbol m_0)(\boldsymbol m_i-\boldsymbol m_0)^T\ (与E[(x-\bar x)^2]=E[x^2]-\bar x^2相似) \end{aligned} SbLDA=i=1∑L−1j=i+1∑LNNiNNj(mi−mj)(mi−mj)T=21i=1∑Lj=1∑LNNiNNj(mi−mj)(mi−mj)T=21i=1∑Lj=1∑LNNiNNj(mimiT−mimjT−mjmiT+mjmjT)=21(i=1∑Lj=1∑LNNiNNjmimiT−i=1∑Lj=1∑LNNiNNjmimjT−i=1∑Lj=1∑LNNiNNjmjmiT+i=1∑Lj=1∑LNNiNNjmjmjT)=21(i=1∑LNNimimiTj=1∑LNNj−i=1∑LNNimij=1∑LNNjmjT−j=1∑LNNjmji=1∑LNNimiT+i=1∑LNNij=1∑LNNjmjTmjT)=21(i=1∑LNNimimiT−m0m0T−m0m0T+L∑j=1NNjmjmjT)=L∑i=1NNimimiT−m0m0T=i=1∑LNNi(mi−m0)(mi−m0)T (与E[(x−xˉ)2]=E[x2]−xˉ2相似)
其中,
m
0
=
∑
L
i
=
1
N
i
N
m
i
=
∑
L
i
=
1
N
i
N
∑
k
=
1
N
i
1
N
i
x
k
(
i
)
=
∑
L
i
=
1
∑
N
i
k
=
1
1
N
x
k
(
i
)
\boldsymbol m_0=\sum_{L}^{i=1}\frac{N_i}{N}\boldsymbol m_i=\sum_{L}^{i=1}\frac{N_i}{N}\sum_{k=1}^{N_i}\frac{1}{N_i}\boldsymbol x_k^{(i)}=\sum_{L}^{i=1}\sum_{N_i}^{k=1}\frac{1}{N}\boldsymbol x_k^{(i)}
m0=L∑i=1NNimi=L∑i=1NNik=1∑NiNi1xk(i)=L∑i=1Ni∑k=1N1xk(i)
综上,组间分散矩阵为:
S
b
L
D
A
=
∑
i
=
1
L
−
1
∑
j
=
i
+
1
L
N
i
N
N
j
N
(
m
i
−
m
j
)
(
m
i
−
m
j
)
T
=
∑
i
=
1
L
N
i
N
(
m
i
−
m
0
)
(
m
i
−
m
0
)
T
S^{LDA}_b=\sum_{i=1}^{L-1}\sum^{L}_{j=i+1}\frac{N_i}{N}\frac{N_j}{N}(\boldsymbol m_i-\boldsymbol m_j)(\boldsymbol m_i-\boldsymbol m_j)^T=\sum_{i=1}^{L}\frac{N_i}{N}(\boldsymbol m_i-\boldsymbol m_0)(\boldsymbol m_i-\boldsymbol m_0)^T
SbLDA=i=1∑L−1j=i+1∑LNNiNNj(mi−mj)(mi−mj)T=i=1∑LNNi(mi−m0)(mi−m0)T
相当与每个集群的形心到整个集群的形心之间的距离乘上质量权重。
类方差和为
∑
i
=
1
L
∑
j
=
1
N
i
1
N
(
v
T
x
J
(
i
)
−
m
‾
i
)
2
=
∑
i
=
1
L
∑
j
=
1
N
i
1
N
(
v
T
x
j
(
i
)
−
v
T
m
i
)
(
v
T
x
j
(
i
)
−
v
T
m
i
)
T
=
v
T
(
∑
i
=
1
L
∑
j
=
1
N
i
1
N
(
x
j
(
i
)
−
m
i
)
(
x
j
(
i
)
−
m
i
)
T
)
v
=
v
T
S
w
L
D
A
v
\begin{aligned} \sum_{i=1}^{L}\sum_{j=1}^{N_i}\frac{1}{N}(\boldsymbol v^Tx_J^{(i)}-\overline{\boldsymbol m}_i)^2 &= \sum_{i=1}^{L}\sum_{j=1}^{N_i}\frac{1}{N}(\boldsymbol v^T\boldsymbol x_j^{(i)}-\boldsymbol v^T\boldsymbol m_i)(\boldsymbol v^T\boldsymbol x_j^{(i)}-\boldsymbol v^T\boldsymbol m_i)^T\\ &= \boldsymbol v^T\left(\sum_{i=1}^{L}\sum_{j=1}^{N_i}\frac{1}{N}(\boldsymbol x_j^{(i)}-\boldsymbol m_i)(\boldsymbol x_j^{(i)}-\boldsymbol m_i)^T\right)\boldsymbol v\\ &= \boldsymbol v^TS^{LDA}_w\boldsymbol v \end{aligned}
i=1∑Lj=1∑NiN1(vTxJ(i)−mi)2=i=1∑Lj=1∑NiN1(vTxj(i)−vTmi)(vTxj(i)−vTmi)T=vT(i=1∑Lj=1∑NiN1(xj(i)−mi)(xj(i)−mi)T)v=vTSwLDAv
所以,组内分散矩阵为:
S
w
L
D
A
=
∑
i
=
1
L
∑
j
=
1
N
i
1
N
(
x
j
(
i
)
−
m
i
)
(
x
j
(
i
)
−
m
i
)
T
S^{LDA}_w=\sum_{i=1}^{L}\sum_{j=1}^{N_i}\frac{1}{N}(\boldsymbol x_j^{(i)}-\boldsymbol m_i)(\boldsymbol x_j^{(i)}-\boldsymbol m_i)^T
SwLDA=i=1∑Lj=1∑NiN1(xj(i)−mi)(xj(i)−mi)T
第一主元向量可以由以下计算:
v
=
arg
max
v
∈
R
d
v
T
S
b
L
D
A
v
v
T
S
w
L
D
A
v
=
arg
max
v
T
S
b
L
D
A
v
=
1
v
T
S
b
L
D
A
v
.
{\color{red} \boldsymbol v=\mathop{\arg\max}_{\boldsymbol v\in\mathbb{R}^d}\frac{\boldsymbol v^TS^{LDA}_b\boldsymbol v}{\boldsymbol v^TS^{LDA}_w\boldsymbol v}=\mathop{\arg\max}_{\boldsymbol v^TS_b^{LDA}\boldsymbol v=1}\boldsymbol v^TS_b^{LDA}\boldsymbol v}.
v=argmaxv∈RdvTSwLDAvvTSbLDAv=argmaxvTSbLDAv=1vTSbLDAv.
由Lagrangian方法可得,
f
(
v
,
λ
)
=
v
T
S
b
L
D
A
v
−
λ
(
v
T
S
w
L
D
A
v
−
1
)
f(\boldsymbol v,\lambda)=\boldsymbol v^TS_b^{LDA}\boldsymbol v-\lambda(\boldsymbol v^TS_w^{LDA}\boldsymbol v-1)
f(v,λ)=vTSbLDAv−λ(vTSwLDAv−1)
∂ f ∂ v = 2 S b L D A v − 2 λ S w L D A v ⇔ ( S w L D A ) − 1 S b L D A v = λ v ∂ f ∂ λ = v T S w L D A v − 1 = 0 ⇔ v T S w L D A v = 1 \begin{aligned} \frac{\partial f}{\partial \boldsymbol v}&=2S_b^{LDA}\boldsymbol v-2\lambda S_w^{LDA}\boldsymbol v \Leftrightarrow {\color{red}(S_w^{LDA})^{-1}S_b^{LDA}\boldsymbol v=\lambda \boldsymbol v}\\ \frac{\partial f}{\partial \lambda}&=\boldsymbol v^TS_w^{LDA}\boldsymbol v-1=0 \Leftrightarrow \boldsymbol v^TS_w^{LDA}\boldsymbol v=1 \end{aligned} ∂v∂f∂λ∂f=2SbLDAv−2λSwLDAv⇔(SwLDA)−1SbLDAv=λv=vTSwLDAv−1=0⇔vTSwLDAv=1
当满足以上条件时, v T S b L D A v = λ v T S w L D A v = λ \boldsymbol v^TS^{LDA}_b\boldsymbol v=\lambda \boldsymbol v^TS^{LDA}_w\boldsymbol v=\lambda vTSbLDAv=λvTSwLDAv=λ。
综上,求解第一主元等价于求解下列最大广义特征值,
S
b
L
D
A
u
=
λ
S
w
L
D
A
u
,
v
=
1
u
T
S
w
L
D
A
u
u
S^{LDA}_b\boldsymbol u=\lambda S^{LDA}_w\boldsymbol u, \boldsymbol v=\frac{1}{\sqrt{\boldsymbol u^TS^{LDA}_w\boldsymbol u}}\boldsymbol u
SbLDAu=λSwLDAu,v=uTSwLDAu1u
其中后一项保证
v
T
S
b
L
D
A
v
=
1
\boldsymbol v^TS_b^{LDA}\boldsymbol v=1
vTSbLDAv=1。
3.2 Generalized Discriminant Analysis
L L L:样本类别数目;
N i N_i Ni:第 i i i类样本的数目;
N N N全部样本数目;
ϕ ( x j ( i ) ) \phi(\boldsymbol x^{(i)}_j) ϕ(xj(i)):第 j j j类中的第 i i i个样本;
X i T = [ ϕ ( x 1 ( i ) ) , ⋯ , ϕ ( x N i ( i ) ) ] X^T_i=[\phi(\boldsymbol x^{(i)}_1),\cdots,\phi(\boldsymbol x^{(i)}_{N_i})] XiT=[ϕ(x1(i)),⋯,ϕ(xNi(i))];
X T = [ X 1 T , ⋯ , X L T ] X^T=[X^T_1,\cdots,X^T_L] XT=[X1T,⋯,XLT]。
假设在空间 H H H内样本均值为零: m 0 = 0 \boldsymbol m_0=0 m0=0
则组间分散矩阵为:
S
b
G
D
A
=
∑
i
=
1
L
N
i
N
(
m
i
−
m
0
)
(
m
i
−
m
0
)
T
=
∑
i
=
1
L
N
i
N
m
i
m
i
T
S^{GDA}_b=\sum_{i=1}^L\frac{N_i}{N}(\boldsymbol m_i-\boldsymbol m_0)(\boldsymbol m_i-\boldsymbol m_0)^T=\sum_{i=1}^L\frac{N_i}{N}\boldsymbol m_i\boldsymbol m_i^T
SbGDA=i=1∑LNNi(mi−m0)(mi−m0)T=i=1∑LNNimimiT
组内分散矩阵为:
S
w
G
D
A
=
∑
i
=
1
L
∑
j
=
1
N
i
1
N
ϕ
(
x
j
(
i
)
)
ϕ
(
x
j
(
i
)
)
T
S^{GDA}_w=\sum_{i=1}^L\sum_{j=1}^{N_i}\frac{1}{N}\phi(\boldsymbol x^{(i)}_j)\phi(\boldsymbol x^{(i)}_j)^T
SwGDA=i=1∑Lj=1∑NiN1ϕ(xj(i))ϕ(xj(i))T
m i = 1 N i ∑ j = 1 N i ϕ ( x j ( i ) ) = 1 N i [ ϕ ( x 1 ( i ) ) , ⋯ , ϕ ( x N i ( i ) ) ] [ 1 ⋮ 1 ] = 1 N i X i T 1 N i × 1 \boldsymbol m_i=\frac{1}{N_i}\sum_{j=1}^{N_i}\phi(\boldsymbol x^{(i)}_j)=\frac{1}{N_i}[\phi(\boldsymbol x^{(i)}_1),\cdots,\phi(\boldsymbol x^{(i)}_{N_i})]\begin{bmatrix}1\\\vdots\\1\end{bmatrix}=\frac{1}{N_i}X^T_i1_{N_i\times1} mi=Ni1j=1∑Niϕ(xj(i))=Ni1[ϕ(x1(i)),⋯,ϕ(xNi(i))]⎣⎢⎡1⋮1⎦⎥⎤=Ni1XiT1Ni×1
m i m i T = 1 N i 2 X i T 1 N i × 1 1 1 × N i X i = 1 N i X i T B i X i \boldsymbol m_i\boldsymbol m_i^T=\frac{1}{N_i^2}X^T_i1_{N_i\times1}1_{1\times N_i}X_i=\frac{1}{N_i}X^T_iB_iX_i mimiT=Ni21XiT1Ni×111×NiXi=Ni1XiTBiXi
其中,
B
i
=
1
N
i
1
N
i
×
N
i
B_i=\frac{1}{N_i}1_{N_i\times N_i}
Bi=Ni11Ni×Ni。组间分散矩阵为:
S
b
G
D
A
=
∑
i
=
1
L
N
i
N
m
i
m
i
T
=
1
N
∑
i
=
1
L
X
i
T
B
i
X
i
=
1
N
[
X
1
T
⋯
X
L
T
]
[
B
1
0
⋱
0
B
L
]
[
X
i
⋮
X
L
]
=
1
N
X
T
B
X
{\color{red}S^{GDA}_b}=\sum_{i=1}^L\frac{N_i}{N}\boldsymbol m_i\boldsymbol m_i^T=\frac{1}{N}\sum_{i=1}^LX^T_iB_iX_i=\frac{1}{N} \begin{bmatrix} X^T_1 & \cdots & X^T_L \end{bmatrix} \begin{bmatrix} B_1 & & 0\\ & \ddots &\\ 0 & & B_L \end{bmatrix} \begin{bmatrix} X_i \\ \vdots \\ X_L \end{bmatrix} =\frac{1}{N}X^TBX
SbGDA=i=1∑LNNimimiT=N1i=1∑LXiTBiXi=N1[X1T⋯XLT]⎣⎡B10⋱0BL⎦⎤⎣⎢⎡Xi⋮XL⎦⎥⎤=N1XTBX
组内分散矩阵为:
S
w
G
D
A
=
∑
i
=
1
L
∑
j
=
1
N
i
1
N
ϕ
(
x
j
(
i
)
)
ϕ
(
x
j
(
i
)
)
T
=
1
N
∑
i
=
1
L
[
ϕ
(
x
1
(
i
)
)
⋯
ϕ
(
x
N
i
(
i
)
)
]
[
ϕ
(
x
1
(
i
)
)
T
⋮
ϕ
(
x
N
i
(
i
)
)
T
]
=
1
N
∑
i
=
1
L
X
i
T
X
i
=
1
N
[
X
1
T
⋯
X
L
T
]
[
X
1
⋮
X
L
]
=
1
N
X
T
X
\begin{aligned} {\color{red}S^{GDA}_w}&=\sum_{i=1}^L\sum_{j=1}^{N_i}\frac{1}{N}\phi(\boldsymbol x^{(i)}_j)\phi(\boldsymbol x^{(i)}_j)^T\\ &=\frac{1}{N} \sum_{i=1}^L \begin{bmatrix} \phi(\boldsymbol x^{(i)}_1) & \cdots & \phi(\boldsymbol x^{(i)}_{N_i}) \end{bmatrix} \begin{bmatrix} \phi(\boldsymbol x^{(i)}_1)^T \\ \vdots \\ \phi(\boldsymbol x^{(i)}_{N_i})^T \end{bmatrix}\\ &=\frac{1}{N}\sum_{i=1}^LX^T_iX_i\\ &=\frac{1}{N} \begin{bmatrix} X_1^T & \cdots & X^T_L \end{bmatrix} \begin{bmatrix} X_1 \\ \vdots \\X_L \end{bmatrix}\\ &=\frac{1}{N}X^TX \end{aligned}
SwGDA=i=1∑Lj=1∑NiN1ϕ(xj(i))ϕ(xj(i))T=N1i=1∑L[ϕ(x1(i))⋯ϕ(xNi(i))]⎣⎢⎢⎡ϕ(x1(i))T⋮ϕ(xNi(i))T⎦⎥⎥⎤=N1i=1∑LXiTXi=N1[X1T⋯XLT]⎣⎢⎡X1⋮XL⎦⎥⎤=N1XTX
同理,
S
b
G
D
A
v
=
λ
S
w
G
D
A
v
i
.
e
.
(
1
N
X
T
B
X
)
v
=
λ
(
1
N
X
T
X
)
v
(
X
未
知
)
S^{GDA}_b\boldsymbol v=\lambda S^{GDA}_w \boldsymbol v \\ i.e.\ (\frac{1}{N}X^TBX)\boldsymbol v=\lambda (\frac{1}{N}X^TX)\boldsymbol v\ (X未知)
SbGDAv=λSwGDAvi.e. (N1XTBX)v=λ(N1XTX)v (X未知)
假设
v
v
v可以由样本的线性组合表示,即
v
=
∑
i
=
1
L
∑
j
=
1
N
i
α
j
(
i
)
ϕ
(
x
j
(
i
)
)
=
X
T
α
.
\boldsymbol v=\sum_{i=1}^L\sum_{j=1}^{N_i}\alpha_j^{(i)}\phi(\boldsymbol x^{(i)}_j)=X^T\boldsymbol \alpha.
v=i=1∑Lj=1∑Niαj(i)ϕ(xj(i))=XTα.
将假设代入上式,
⇒
X
T
B
X
X
T
α
=
λ
X
T
X
X
T
α
⇒
X
X
T
B
X
X
T
α
=
λ
X
X
T
X
X
T
α
⇒
(
K
B
K
)
α
=
λ
(
K
K
)
α
\begin{aligned} &\Rightarrow X^TBXX^T\boldsymbol \alpha=\lambda X^TXX^T\boldsymbol \alpha\\ &\Rightarrow XX^TBXX^T\boldsymbol \alpha=\lambda XX^TXX^T\boldsymbol \alpha\\ &\Rightarrow (KBK)\boldsymbol \alpha=\lambda(KK)\boldsymbol \alpha \end{aligned}
⇒XTBXXTα=λXTXXTα⇒XXTBXXTα=λXXTXXTα⇒(KBK)α=λ(KK)α
计算上式可获得
α
\boldsymbol \alpha
α,将测试样本投影到
v
=
X
T
α
\boldsymbol v=X^T\boldsymbol \alpha
v=XTα上,
v
T
ϕ
(
x
)
=
(
X
T
α
)
T
ϕ
(
x
)
=
α
T
[
ϕ
(
x
1
)
T
⋮
ϕ
(
x
N
)
T
]
ϕ
(
x
)
=
α
T
[
κ
(
x
1
,
x
)
⋮
κ
(
x
N
,
x
)
]
\boldsymbol v^T\phi(\boldsymbol x)=(X^T\boldsymbol \alpha)^T\phi(\boldsymbol x)=\boldsymbol \alpha^T \begin{bmatrix} \phi(\boldsymbol x_1)^T \\ \vdots \\ \phi(\boldsymbol x_N)^T \end{bmatrix}\phi(\boldsymbol x) =\boldsymbol \alpha^T \begin{bmatrix} \kappa(\boldsymbol x_1,\boldsymbol x) \\ \vdots \\ \kappa(\boldsymbol x_N,\boldsymbol x) \end{bmatrix}
vTϕ(x)=(XTα)Tϕ(x)=αT⎣⎢⎡ϕ(x1)T⋮ϕ(xN)T⎦⎥⎤ϕ(x)=αT⎣⎢⎡κ(x1,x)⋮κ(xN,x)⎦⎥⎤
Ex: 在GDA中,组内分散矩阵为:
S
w
G
D
A
=
∑
i
=
1
L
∑
j
=
1
N
i
1
N
ϕ
(
x
j
(
i
)
)
ϕ
(
x
j
(
i
)
)
T
S^{GDA}_w=\sum_{i=1}^L\sum_{j=1}^{N_i}\frac{1}{N}\phi(\boldsymbol x^{(i)}_j)\phi(\boldsymbol x^{(i)}_j)^T
SwGDA=i=1∑Lj=1∑NiN1ϕ(xj(i))ϕ(xj(i))T
而在LDA中,
S
w
L
D
A
=
∑
i
=
1
L
∑
j
=
1
N
i
1
N
(
ϕ
(
x
j
(
i
)
)
−
m
i
)
(
ϕ
(
x
j
(
i
)
)
−
m
i
)
T
m
i
=
1
N
i
∑
j
=
1
N
i
ϕ
(
x
j
(
i
)
)
S^{LDA}_w=\sum_{i=1}^{L}\sum_{j=1}^{N_i}\frac{1}{N}(\phi(\boldsymbol x_j^{(i)})-\boldsymbol m_i)(\phi(\boldsymbol x_j^{(i)})-\boldsymbol m_i)^T\\ \boldsymbol m_i=\frac{1}{N_i}\sum_{j=1}^{N_i}\phi(\boldsymbol x^{(i)}_j)
SwLDA=i=1∑Lj=1∑NiN1(ϕ(xj(i))−mi)(ϕ(xj(i))−mi)Tmi=Ni1j=1∑Niϕ(xj(i))
能否由LDA推导GDA组内分散矩阵,并且找到
W
W
W,使得
S
w
L
D
A
=
X
T
W
X
S^{LDA}_w=X^TWX
SwLDA=XTWX.