线性判别分析定义
假设存在样本 X N × p X_{N\times p} XN×p满足如下形式:
X = ( x 1 x 2 . . . x N ) T = ( x 1 T x 2 T ⋮ x N T ) N × p = ( x 11 x 12 . . . x 1 p x 21 x 22 . . . x 2 p ⋮ ⋮ ⋮ x N 1 x N 2 . . . x N p ) N × p X=\left ( x_{1} \ x_{2} \ ...\ x_{N}\right )^{T} =\left( \begin{matrix} x^T_1 \\ x^T_2 \\ \vdots \\ x^T_N \\ \end{matrix} \right)_{N \times p} = \left( \begin{matrix} x_{11} & x_{12} & ... & x_{1p} \\ x_{21} & x_{22} & ... & x_{2p} \\ \vdots & \vdots & & \vdots \\ x_{N1} & x_{N2} & ... & x_{Np} \\ \end{matrix} \right )_{N\times p} X=(x1 x2 ... xN)T=⎝⎜⎜⎜⎛x1Tx2T⋮xNT⎠⎟⎟⎟⎞N×p=⎝⎜⎜⎜⎛x11x21⋮xN1x12x22⋮xN2.........x1px2p⋮xNp⎠⎟⎟⎟⎞N×p
存在样本 Y N × 1 Y_{N\times 1} YN×1满足如下形式:
Y = ( y 1 y 2 ⋮ y N ) N × 1 Y =\left( \begin{matrix} y_{1} \\ y_{2} \\ \vdots \\ y_{N} \\ \end{matrix} \right )_{N \times 1} Y=⎝⎜⎜⎜⎛y1y2⋮yN⎠⎟⎟⎟⎞N×1
X
X
X和
Y
Y
Y组成
{
(
x
i
,
y
i
)
}
i
=
1
N
\left\{ \left( x_i,y_i\right) \right\}_{i=1}^{N}
{(xi,yi)}i=1N样式样本点,并且
X
∈
ℜ
p
,
y
i
∈
{
+
1
,
−
1
}
X\in \real^p,y_i \in \left\{ +1,-1\right\}
X∈ℜp,yi∈{+1,−1}。
且有
N
1
N_1
N1 个
x
c
1
=
{
x
i
∣
y
i
=
+
1
}
,
x_{c1}=\left\{ x_i|y_i=+1 \right\},
xc1={xi∣yi=+1},
N
2
N_2
N2 个
x
c
2
=
{
x
i
∣
y
i
=
−
1
}
,
N
1
+
N
2
=
N
x_{c2}=\left\{ x_i|y_i=-1 \right\},N_1+N_2=N
xc2={xi∣yi=−1},N1+N2=N
线性判别分析具有类内小,类间大的特点,我们定义:
z = ω T x i z ˉ = 1 N ∑ i = 1 N z i = 1 N ∑ i = 1 N ω T x i S z = ∑ i = 1 N ( z i − z ˉ ) ( z i − z ˉ ) T = 1 N ∑ i = 1 N ( ω T x i − ∑ i = 1 N ω T x i ) ( ω T x i − ∑ i = 1 N ω T x i ) T z=\omega ^Tx_i \\ \bar{z}=\frac{1}{N}\sum_{i=1}^{N}z_i= \frac{1}{N}\sum_{i=1}^{N}\omega ^Tx_i \\ S_z=\sum_{i=1}^{N}(z_i-\bar{z})(z_i-\bar{z})^T = \frac{1}{N}\sum_{i=1}^{N}(\omega ^Tx_i-\sum_{i=1}^{N}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N}\omega ^Tx_i)^T z=ωTxizˉ=N1i=1∑Nzi=N1i=1∑NωTxiSz=i=1∑N(zi−zˉ)(zi−zˉ)T=N1i=1∑N(ωTxi−i=1∑NωTxi)(ωTxi−i=1∑NωTxi)T
那么对于
c
1
,
c
2
c_1,c_2
c1,c2类,有:
c
1
:
c1:
c1:
z
1
ˉ
=
1
N
1
∑
i
=
1
N
1
ω
T
x
i
S
z
1
=
1
N
1
∑
i
=
1
N
1
(
ω
T
x
i
−
∑
i
=
1
N
1
ω
T
x
i
)
(
ω
T
x
i
−
∑
i
=
1
N
1
ω
T
x
i
)
T
\bar{z_1}=\frac{1}{N_1}\sum_{i=1}^{N_1}\omega ^Tx_i \\ S_{z1}= \frac{1}{N_1}\sum_{i=1}^{N_1}(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)^T
z1ˉ=N11i=1∑N1ωTxiSz1=N11i=1∑N1(ωTxi−i=1∑N1ωTxi)(ωTxi−i=1∑N1ωTxi)T
c
2
:
c2:
c2:
z
2
ˉ
=
1
N
2
∑
i
=
1
N
2
ω
T
x
i
S
z
2
=
1
N
2
∑
i
=
1
N
2
(
ω
T
x
i
−
∑
i
=
1
N
2
ω
T
x
i
)
(
ω
T
x
i
−
∑
i
=
1
N
2
ω
T
x
i
)
T
\bar{z_2}=\frac{1}{N_2}\sum_{i=1}^{N_2}\omega ^Tx_i \\ S_{z2}= \frac{1}{N_2}\sum_{i=1}^{N_2}(\omega ^Tx_i-\sum_{i=1}^{N_2}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_2}\omega ^Tx_i)^T
z2ˉ=N21i=1∑N2ωTxiSz2=N21i=1∑N2(ωTxi−i=1∑N2ωTxi)(ωTxi−i=1∑N2ωTxi)T
线性判别分析具有类间小,类内大的特点,此处定义:
- 类间: ( z 1 ˉ − z 2 ˉ ) 2 (\bar{z_1}-\bar{z_2})^2 (z1ˉ−z2ˉ)2
- 类内: S z 1 + S z 2 S_{z1}+S_{z2} Sz1+Sz2
根据这一性质我们可构造损失函数:
J
(
ω
)
=
(
z
1
ˉ
−
z
2
ˉ
)
2
S
1
+
S
2
J(\omega)=\frac{(\bar{z_1}-\bar{z_2})^2}{S_1+S_2}
J(ω)=S1+S2(z1ˉ−z2ˉ)2
其中分子:
(
z
1
ˉ
−
z
2
ˉ
)
2
=
(
1
N
1
∑
i
=
1
N
1
ω
T
x
i
−
1
N
2
∑
i
=
1
N
2
ω
T
x
i
)
2
=
{
ω
T
(
1
N
1
∑
i
=
1
N
1
x
i
−
1
N
2
∑
i
=
1
N
2
x
i
)
}
2
=
ω
T
(
x
ˉ
c
1
−
x
ˉ
c
2
)
(
x
ˉ
c
1
−
x
ˉ
c
2
)
T
ω
(\bar{z_1}-\bar{z_2})^2 =(\frac{1}{N_1}\sum_{i=1}^{N_1}\omega ^Tx_i-\frac{1}{N_2}\sum_{i=1}^{N_2}\omega ^Tx_i)^2\\ =\left\{\omega^T(\frac{1}{N_1}\sum_{i=1}^{N_1}x_i-\frac{1}{N_2}\sum_{i=1}^{N_2}x_i)\right\}^2\\ =\omega^T(\bar{x}_{c1}-\bar{x}_{c2})(\bar{x}_{c1}-\bar{x}_{c2})^T\omega
(z1ˉ−z2ˉ)2=(N11i=1∑N1ωTxi−N21i=1∑N2ωTxi)2={ωT(N11i=1∑N1xi−N21i=1∑N2xi)}2=ωT(xˉc1−xˉc2)(xˉc1−xˉc2)Tω
分母:
S
z
1
+
S
z
2
=
1
N
1
∑
i
=
1
N
1
(
ω
T
x
i
−
∑
i
=
1
N
1
ω
T
x
i
)
(
ω
T
x
i
−
∑
i
=
1
N
1
ω
T
x
i
)
T
+
1
N
2
∑
i
=
1
N
2
(
ω
T
x
i
−
∑
i
=
1
N
2
ω
T
x
i
)
(
ω
T
x
i
−
∑
i
=
1
N
2
ω
T
x
i
)
T
S_{z1}+S_{z2} =\frac{1}{N_1}\sum_{i=1}^{N_1}(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)^T\\+\frac{1}{N_2}\sum_{i=1}^{N_2}(\omega ^Tx_i-\sum_{i=1}^{N_2}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_2}\omega ^Tx_i)^T
Sz1+Sz2=N11i=1∑N1(ωTxi−i=1∑N1ωTxi)(ωTxi−i=1∑N1ωTxi)T+N21i=1∑N2(ωTxi−i=1∑N2ωTxi)(ωTxi−i=1∑N2ωTxi)T
其中
S
z
1
=
1
N
1
∑
i
=
1
N
1
(
ω
T
x
i
−
∑
i
=
1
N
1
ω
T
x
i
)
(
ω
T
x
i
−
∑
i
=
1
N
1
ω
T
x
i
)
T
=
1
N
1
∑
i
=
1
N
1
ω
T
(
x
i
−
∑
i
=
1
N
1
x
i
)
(
x
i
−
∑
i
=
1
N
1
x
i
)
T
ω
=
ω
T
{
1
N
1
∑
i
=
1
N
1
(
x
i
−
x
ˉ
c
1
)
(
x
i
−
x
ˉ
c
1
)
T
}
ω
=
ω
T
S
c
1
ω
S_{z1} = \frac{1}{N_1}\sum_{i=1}^{N_1}(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)^T\\ = \frac{1}{N_1}\sum_{i=1}^{N_1}\omega ^T(x_i-\sum_{i=1}^{N_1}x_i)(x_i-\sum_{i=1}^{N_1}x_i)^T\omega\\ = \omega ^T\left\{\frac{1}{N_1}\sum_{i=1}^{N_1}(x_i-\bar{x}_{c1})(x_i-\bar{x}_{c1})^T\right\}\omega\\ =\omega^TS_{c1}\omega
Sz1=N11i=1∑N1(ωTxi−i=1∑N1ωTxi)(ωTxi−i=1∑N1ωTxi)T=N11i=1∑N1ωT(xi−i=1∑N1xi)(xi−i=1∑N1xi)Tω=ωT{N11i=1∑N1(xi−xˉc1)(xi−xˉc1)T}ω=ωTSc1ω
同理:
S
z
2
=
ω
T
S
c
2
ω
S_{z2} =\omega^TS_{c2}\omega
Sz2=ωTSc2ω
那么有:
S
z
1
+
S
z
2
=
ω
T
(
S
c
1
+
S
c
2
)
ω
S_{z1}+S_{z2} = \omega^T(S_{c1}+S_{c2})\omega
Sz1+Sz2=ωT(Sc1+Sc2)ω
综上所述:
J
(
ω
)
=
(
z
1
ˉ
−
z
2
ˉ
)
2
S
z
1
+
S
z
2
=
ω
T
(
x
ˉ
c
1
−
x
ˉ
c
2
)
(
x
ˉ
c
1
−
x
ˉ
c
2
)
T
ω
ω
T
(
S
c
1
+
S
c
2
)
ω
J(\omega) =\frac{(\bar{z_1}-\bar{z_2})^2}{S_{z1}+S_{z2}} =\frac{\omega^T(\bar{x}_{c1}-\bar{x}_{c2})(\bar{x}_{c1}-\bar{x}_{c2})^T\omega}{\omega^T(S_{c1}+S_{c2})\omega}
J(ω)=Sz1+Sz2(z1ˉ−z2ˉ)2=ωT(Sc1+Sc2)ωωT(xˉc1−xˉc2)(xˉc1−xˉc2)Tω
线性判别分析模型求解
现求解
J
(
ω
)
J(\omega)
J(ω),令:
S
b
=
(
x
ˉ
c
1
−
x
ˉ
c
2
)
(
x
ˉ
c
1
−
x
ˉ
c
2
)
T
b
e
t
w
e
e
n
c
l
a
s
s
类
间
S
w
=
(
S
c
1
+
S
c
2
)
w
i
t
h
c
a
l
s
s
类
内
S_b = (\bar{x}_{c1}-\bar{x}_{c2})(\bar{x}_{c1}-\bar{x}_{c2})^T \quad between \ class \ 类间\\ S_w = (S_{c1}+S_{c2}) \quad with \ calss\ 类内
Sb=(xˉc1−xˉc2)(xˉc1−xˉc2)Tbetween class 类间Sw=(Sc1+Sc2)with calss 类内
那么:
J
(
ω
)
=
ω
T
S
b
ω
ω
T
S
w
ω
=
ω
T
S
b
ω
(
ω
T
S
w
ω
)
−
1
J(\omega) =\frac{\omega^TS_b\omega}{\omega^TS_w\omega} =\omega^TS_b\omega(\omega^TS_w\omega)^{-1}
J(ω)=ωTSwωωTSbω=ωTSbω(ωTSwω)−1
对以
ω
\omega
ω对
J
(
ω
)
J(\omega)
J(ω)求导,得:
∂
J
(
ω
)
∂
ω
=
S
b
ω
(
ω
T
S
w
ω
)
−
1
−
ω
T
S
b
ω
(
ω
T
S
w
ω
)
−
2
S
w
ω
=
0
\frac{\partial{J(\omega)}}{\partial{\omega}} =S_b\omega(\omega^TS_w\omega)^{-1}-\omega^TS_b\omega(\omega^TS_w\omega)^{-2}S_w\omega=0
∂ω∂J(ω)=Sbω(ωTSwω)−1−ωTSbω(ωTSwω)−2Swω=0
两边同乘以
(
ω
T
S
w
ω
)
−
2
(\omega^TS_w\omega)^{-2}
(ωTSwω)−2,得
S
b
ω
ω
T
S
w
ω
=
ω
T
S
b
ω
S
w
ω
S_b\omega \omega^TS_w\omega=\omega^TS_b\omega S_w\omega
SbωωTSwω=ωTSbωSwω
其中
ω
T
S
w
ω
\omega^TS_w\omega
ωTSwω和
ω
T
S
b
ω
S
w
\omega^TS_b\omega S_w
ωTSbωSw均为实数,可以变换至任意位置,所以有:
S
w
ω
=
S
b
ω
ω
T
S
w
ω
ω
T
S
b
ω
ω
=
ω
T
S
w
ω
ω
T
S
b
ω
S
w
−
1
S
b
ω
∝
S
w
−
1
S
b
ω
S_w\omega=S_b\omega\frac{\omega^TS_w\omega}{\omega^TS_b\omega}\\ \omega = \frac{\omega^TS_w\omega}{\omega^TS_b\omega}S_w^{-1}S_b\omega \propto S_w^{-1}S_b\omega
Swω=SbωωTSbωωTSwωω=ωTSbωωTSwωSw−1Sbω∝Sw−1Sbω
继续代入:
ω
∝
S
w
−
1
(
x
ˉ
c
1
−
x
ˉ
c
2
)
(
x
ˉ
c
1
−
x
ˉ
c
2
)
T
ω
\omega \propto S_w^{-1} (\bar{x}_{c1}-\bar{x}_{c2})(\bar{x}_{c1}-\bar{x}_{c2})^T\omega
ω∝Sw−1(xˉc1−xˉc2)(xˉc1−xˉc2)Tω
其中
(
x
ˉ
c
1
−
x
ˉ
c
2
)
T
ω
(\bar{x}_{c1}-\bar{x}_{c2})^T\omega
(xˉc1−xˉc2)Tω为一维实数,
ω
\omega
ω方向与此项无关,所以有:
ω
∝
S
w
−
1
(
x
ˉ
c
1
−
x
ˉ
c
2
)
\omega \propto S_w^{-1} (\bar{x}_{c1}-\bar{x}_{c2})
ω∝Sw−1(xˉc1−xˉc2)
如果假设
S
w
−
1
S_w^{-1}
Sw−1为对角矩阵,并且满足各向同性,那么
S
w
−
1
∝
I
S_w^{-1}\propto I
Sw−1∝I(单位矩阵),此时:
ω
∝
(
x
ˉ
c
1
−
x
ˉ
c
2
)
\omega \propto (\bar{x}_{c1}-\bar{x}_{c2})
ω∝(xˉc1−xˉc2)
至此,关于模型参数
ω
\omega
ω的方向求解完毕(假定
∣
∣
ω
∣
∣
2
=
1
||\omega||^2=1
∣∣ω∣∣2=1)。
后记
LDA线性判别分析具有很大的局限性(日后补充局限性),但它是一个非常具有代表性的分类方法,可以作为其他分类方法的性能基准。
参考资料
1、机器学习白板推导