个人笔记,非教程
W
:
D
×
d
W: D\times d
W:D×d
Z
:
d
×
N
Z: d\times N
Z:d×N
X
:
D
×
N
X: D\times N
X:D×N
类内散度矩阵
S W = ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( x j ( i ) − μ i ) 2 S_W=\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(x_j^{(i)}-\mu_i)^2 SW=i=1∑Cj=1∑Mip(i,j)(xj(i)−μi)2
投影后的类内散度矩阵
S W ′ ′ ′ = ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( W T x j ( i ) − W T μ i ) 2 = ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( W T ( x j ( i ) − μ i ) ) 2 = ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( W T ( x j ( i ) − μ i ) ( x j ( i ) − μ i ) T W ) = W T ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( x j ( i ) − μ i ) ( x j ( i ) − μ i ) T W = W T S W W \begin{aligned} S_W'''&=\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(W^Tx_j^{(i)}-W^T\mu_i)^2 \\ &=\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(W^T(x_j^{(i)}-\mu_i))^2 \\ &=\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(W^T(x_j^{(i)}-\mu_i)(x_j^{(i)}-\mu_i)^TW) \\ &=W^T\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(x_j^{(i)}-\mu_i)(x_j^{(i)}-\mu_i)^TW \\ &=W^TS_WW \end{aligned} SW′′′=i=1∑Cj=1∑Mip(i,j)(WTxj(i)−WTμi)2=i=1∑Cj=1∑Mip(i,j)(WT(xj(i)−μi))2=i=1∑Cj=1∑Mip(i,j)(WT(xj(i)−μi)(xj(i)−μi)TW)=WTi=1∑Cj=1∑Mip(i,j)(xj(i)−μi)(xj(i)−μi)TW=WTSWW
类间散度矩阵
S B = ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( μ i − μ j ) 2 S_B=\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(\mu_i-\mu_j)^2 SB=i=1∑Cj=1∑Mip(i,j)(μi−μj)2
投影后的类间散度矩阵
S B ′ ′ ′ = ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( μ i − μ j ) 2 = ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( W T ( μ i − μ j ) ) 2 = ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( W T ( μ i − μ j ) ( μ i − μ j ) T W ) = W T ∑ i = 1 C ∑ j = 1 M i p ( i , j ) ( μ i − μ j ) ( μ i − μ j ) T W = W T S B W \begin{aligned} S_B'''&=\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(\mu_i-\mu_j)^2 \\ &=\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(W^T(\mu_i-\mu_j))^2 \\ &=\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(W^T(\mu_i-\mu_j)(\mu_i-\mu_j)^TW) \\ &=W^T\sum_{i=1}^{C}\sum_{j=1}^{M_i}p(i,j)(\mu_i-\mu_j)(\mu_i-\mu_j)^TW \\ &=W^TS_BW \end{aligned} SB′′′=i=1∑Cj=1∑Mip(i,j)(μi−μj)2=i=1∑Cj=1∑Mip(i,j)(WT(μi−μj))2=i=1∑Cj=1∑Mip(i,j)(WT(μi−μj)(μi−μj)TW)=WTi=1∑Cj=1∑Mip(i,j)(μi−μj)(μi−μj)TW=WTSBW
最大化投影后的类间散度和最小化投影后的类内散度
max
∣
W
T
S
B
W
∣
\max |W^TS_BW|
max∣WTSBW∣
min
∣
W
T
S
W
W
∣
\min |W^TS_WW|
min∣WTSWW∣
等价于
max ∣ W T S B W ∣ ∣ W T S W W ∣ \max \frac{|W^TS_BW|}{|W^TS_WW|} max∣WTSWW∣∣WTSBW∣
不失一般性我们可以令 ∣ W T S W W ∣ = 1 |W^TS_WW|=1 ∣WTSWW∣=1,则
max
∣
W
T
S
B
W
∣
\max |W^TS_BW|
max∣WTSBW∣
s
.
t
.
∣
W
T
S
W
W
∣
=
1
s.t.|W^TS_WW|=1
s.t.∣WTSWW∣=1
拉格朗日
L
(
W
,
λ
)
=
W
T
S
B
W
+
λ
(
W
T
S
W
W
−
1
)
L(W,\lambda)=W^TS_BW+\lambda (W^TS_WW-1)
L(W,λ)=WTSBW+λ(WTSWW−1)
∂
L
(
W
,
λ
)
∂
W
=
2
S
B
W
+
2
λ
S
W
W
\frac{\partial L(W,\lambda)}{\partial W}=2S_BW+2\lambda S_WW
∂W∂L(W,λ)=2SBW+2λSWW
令 ∂ L ( W , λ ) ∂ W = 0 \frac{\partial L(W,\lambda)}{\partial W}=0 ∂W∂L(W,λ)=0
S
B
W
=
λ
S
W
W
S_BW=\lambda S_WW
SBW=λSWW
S
W
−
1
S
B
W
=
λ
W
S_W^{-1}S_BW=\lambda W
SW−1SBW=λW
W W W即 S W − 1 S B S_W^{-1}S_B SW−1SB的特征向量
Z = W T X Z=W^TX Z=WTX