机器学习——线性分类之线性判别分析

线性判别分析定义

假设存在样本 X N × p X_{N\times p} XN×p满足如下形式:

X = ( x 1   x 2   . . .   x N ) T = ( x 1 T x 2 T ⋮ x N T ) N × p = ( x 11 x 12 . . . x 1 p x 21 x 22 . . . x 2 p ⋮ ⋮ ⋮ x N 1 x N 2 . . . x N p ) N × p X=\left ( x_{1} \ x_{2} \ ...\ x_{N}\right )^{T} =\left( \begin{matrix} x^T_1 \\ x^T_2 \\ \vdots \\ x^T_N \\ \end{matrix} \right)_{N \times p} = \left( \begin{matrix} x_{11} & x_{12} & ... & x_{1p} \\ x_{21} & x_{22} & ... & x_{2p} \\ \vdots & \vdots & & \vdots \\ x_{N1} & x_{N2} & ... & x_{Np} \\ \end{matrix} \right )_{N\times p} X=(x1 x2 ... xN)T=x1Tx2TxNTN×p=x11x21xN1x12x22xN2.........x1px2pxNpN×p

存在样本 Y N × 1 Y_{N\times 1} YN×1满足如下形式:

Y = ( y 1 y 2 ⋮ y N ) N × 1 Y =\left( \begin{matrix} y_{1} \\ y_{2} \\ \vdots \\ y_{N} \\ \end{matrix} \right )_{N \times 1} Y=y1y2yNN×1

X X X Y Y Y组成 { ( x i , y i ) } i = 1 N \left\{ \left( x_i,y_i\right) \right\}_{i=1}^{N} {(xi,yi)}i=1N样式样本点,并且 X ∈ ℜ p , y i ∈ { + 1 , − 1 } X\in \real^p,y_i \in \left\{ +1,-1\right\} Xp,yi{+1,1}
且有 N 1 N_1 N1 x c 1 = { x i ∣ y i = + 1 } , x_{c1}=\left\{ x_i|y_i=+1 \right\}, xc1={xiyi=+1}, N 2 N_2 N2 x c 2 = { x i ∣ y i = − 1 } , N 1 + N 2 = N x_{c2}=\left\{ x_i|y_i=-1 \right\},N_1+N_2=N xc2={xiyi=1},N1+N2=N
线性判别分析具有类内小,类间大的特点,我们定义:

z = ω T x i z ˉ = 1 N ∑ i = 1 N z i = 1 N ∑ i = 1 N ω T x i S z = ∑ i = 1 N ( z i − z ˉ ) ( z i − z ˉ ) T = 1 N ∑ i = 1 N ( ω T x i − ∑ i = 1 N ω T x i ) ( ω T x i − ∑ i = 1 N ω T x i ) T z=\omega ^Tx_i \\ \bar{z}=\frac{1}{N}\sum_{i=1}^{N}z_i= \frac{1}{N}\sum_{i=1}^{N}\omega ^Tx_i \\ S_z=\sum_{i=1}^{N}(z_i-\bar{z})(z_i-\bar{z})^T = \frac{1}{N}\sum_{i=1}^{N}(\omega ^Tx_i-\sum_{i=1}^{N}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N}\omega ^Tx_i)^T z=ωTxizˉ=N1i=1Nzi=N1i=1NωTxiSz=i=1N(zizˉ)(zizˉ)T=N1i=1N(ωTxii=1NωTxi)(ωTxii=1NωTxi)T

那么对于 c 1 , c 2 c_1,c_2 c1,c2类,有:
c 1 : c1: c1:
z 1 ˉ = 1 N 1 ∑ i = 1 N 1 ω T x i S z 1 = 1 N 1 ∑ i = 1 N 1 ( ω T x i − ∑ i = 1 N 1 ω T x i ) ( ω T x i − ∑ i = 1 N 1 ω T x i ) T \bar{z_1}=\frac{1}{N_1}\sum_{i=1}^{N_1}\omega ^Tx_i \\ S_{z1}= \frac{1}{N_1}\sum_{i=1}^{N_1}(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)^T z1ˉ=N11i=1N1ωTxiSz1=N11i=1N1(ωTxii=1N1ωTxi)(ωTxii=1N1ωTxi)T

c 2 : c2: c2:
z 2 ˉ = 1 N 2 ∑ i = 1 N 2 ω T x i S z 2 = 1 N 2 ∑ i = 1 N 2 ( ω T x i − ∑ i = 1 N 2 ω T x i ) ( ω T x i − ∑ i = 1 N 2 ω T x i ) T \bar{z_2}=\frac{1}{N_2}\sum_{i=1}^{N_2}\omega ^Tx_i \\ S_{z2}= \frac{1}{N_2}\sum_{i=1}^{N_2}(\omega ^Tx_i-\sum_{i=1}^{N_2}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_2}\omega ^Tx_i)^T z2ˉ=N21i=1N2ωTxiSz2=N21i=1N2(ωTxii=1N2ωTxi)(ωTxii=1N2ωTxi)T

线性判别分析具有类间小,类内大的特点,此处定义:

  • 类间: ( z 1 ˉ − z 2 ˉ ) 2 (\bar{z_1}-\bar{z_2})^2 (z1ˉz2ˉ)2
  • 类内: S z 1 + S z 2 S_{z1}+S_{z2} Sz1+Sz2

根据这一性质我们可构造损失函数:
J ( ω ) = ( z 1 ˉ − z 2 ˉ ) 2 S 1 + S 2 J(\omega)=\frac{(\bar{z_1}-\bar{z_2})^2}{S_1+S_2} J(ω)=S1+S2(z1ˉz2ˉ)2
其中分子:
( z 1 ˉ − z 2 ˉ ) 2 = ( 1 N 1 ∑ i = 1 N 1 ω T x i − 1 N 2 ∑ i = 1 N 2 ω T x i ) 2 = { ω T ( 1 N 1 ∑ i = 1 N 1 x i − 1 N 2 ∑ i = 1 N 2 x i ) } 2 = ω T ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) T ω (\bar{z_1}-\bar{z_2})^2 =(\frac{1}{N_1}\sum_{i=1}^{N_1}\omega ^Tx_i-\frac{1}{N_2}\sum_{i=1}^{N_2}\omega ^Tx_i)^2\\ =\left\{\omega^T(\frac{1}{N_1}\sum_{i=1}^{N_1}x_i-\frac{1}{N_2}\sum_{i=1}^{N_2}x_i)\right\}^2\\ =\omega^T(\bar{x}_{c1}-\bar{x}_{c2})(\bar{x}_{c1}-\bar{x}_{c2})^T\omega (z1ˉz2ˉ)2=(N11i=1N1ωTxiN21i=1N2ωTxi)2={ωT(N11i=1N1xiN21i=1N2xi)}2=ωT(xˉc1xˉc2)(xˉc1xˉc2)Tω
分母:
S z 1 + S z 2 = 1 N 1 ∑ i = 1 N 1 ( ω T x i − ∑ i = 1 N 1 ω T x i ) ( ω T x i − ∑ i = 1 N 1 ω T x i ) T + 1 N 2 ∑ i = 1 N 2 ( ω T x i − ∑ i = 1 N 2 ω T x i ) ( ω T x i − ∑ i = 1 N 2 ω T x i ) T S_{z1}+S_{z2} =\frac{1}{N_1}\sum_{i=1}^{N_1}(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)^T\\+\frac{1}{N_2}\sum_{i=1}^{N_2}(\omega ^Tx_i-\sum_{i=1}^{N_2}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_2}\omega ^Tx_i)^T Sz1+Sz2=N11i=1N1(ωTxii=1N1ωTxi)(ωTxii=1N1ωTxi)T+N21i=1N2(ωTxii=1N2ωTxi)(ωTxii=1N2ωTxi)T
其中
S z 1 = 1 N 1 ∑ i = 1 N 1 ( ω T x i − ∑ i = 1 N 1 ω T x i ) ( ω T x i − ∑ i = 1 N 1 ω T x i ) T = 1 N 1 ∑ i = 1 N 1 ω T ( x i − ∑ i = 1 N 1 x i ) ( x i − ∑ i = 1 N 1 x i ) T ω = ω T { 1 N 1 ∑ i = 1 N 1 ( x i − x ˉ c 1 ) ( x i − x ˉ c 1 ) T } ω = ω T S c 1 ω S_{z1} = \frac{1}{N_1}\sum_{i=1}^{N_1}(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)(\omega ^Tx_i-\sum_{i=1}^{N_1}\omega ^Tx_i)^T\\ = \frac{1}{N_1}\sum_{i=1}^{N_1}\omega ^T(x_i-\sum_{i=1}^{N_1}x_i)(x_i-\sum_{i=1}^{N_1}x_i)^T\omega\\ = \omega ^T\left\{\frac{1}{N_1}\sum_{i=1}^{N_1}(x_i-\bar{x}_{c1})(x_i-\bar{x}_{c1})^T\right\}\omega\\ =\omega^TS_{c1}\omega Sz1=N11i=1N1(ωTxii=1N1ωTxi)(ωTxii=1N1ωTxi)T=N11i=1N1ωT(xii=1N1xi)(xii=1N1xi)Tω=ωT{N11i=1N1(xixˉc1)(xixˉc1)T}ω=ωTSc1ω
同理:
S z 2 = ω T S c 2 ω S_{z2} =\omega^TS_{c2}\omega Sz2=ωTSc2ω
那么有:
S z 1 + S z 2 = ω T ( S c 1 + S c 2 ) ω S_{z1}+S_{z2} = \omega^T(S_{c1}+S_{c2})\omega Sz1+Sz2=ωT(Sc1+Sc2)ω
综上所述:
J ( ω ) = ( z 1 ˉ − z 2 ˉ ) 2 S z 1 + S z 2 = ω T ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) T ω ω T ( S c 1 + S c 2 ) ω J(\omega) =\frac{(\bar{z_1}-\bar{z_2})^2}{S_{z1}+S_{z2}} =\frac{\omega^T(\bar{x}_{c1}-\bar{x}_{c2})(\bar{x}_{c1}-\bar{x}_{c2})^T\omega}{\omega^T(S_{c1}+S_{c2})\omega} J(ω)=Sz1+Sz2(z1ˉz2ˉ)2=ωT(Sc1+Sc2)ωωT(xˉc1xˉc2)(xˉc1xˉc2)Tω

线性判别分析模型求解

现求解 J ( ω ) J(\omega) J(ω),令:
S b = ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) T b e t w e e n   c l a s s   类 间 S w = ( S c 1 + S c 2 ) w i t h   c a l s s   类 内 S_b = (\bar{x}_{c1}-\bar{x}_{c2})(\bar{x}_{c1}-\bar{x}_{c2})^T \quad between \ class \ 类间\\ S_w = (S_{c1}+S_{c2}) \quad with \ calss\ 类内 Sb=(xˉc1xˉc2)(xˉc1xˉc2)Tbetween class Sw=(Sc1+Sc2)with calss 
那么:
J ( ω ) = ω T S b ω ω T S w ω = ω T S b ω ( ω T S w ω ) − 1 J(\omega) =\frac{\omega^TS_b\omega}{\omega^TS_w\omega} =\omega^TS_b\omega(\omega^TS_w\omega)^{-1} J(ω)=ωTSwωωTSbω=ωTSbω(ωTSwω)1
对以 ω \omega ω J ( ω ) J(\omega) J(ω)求导,得:
∂ J ( ω ) ∂ ω = S b ω ( ω T S w ω ) − 1 − ω T S b ω ( ω T S w ω ) − 2 S w ω = 0 \frac{\partial{J(\omega)}}{\partial{\omega}} =S_b\omega(\omega^TS_w\omega)^{-1}-\omega^TS_b\omega(\omega^TS_w\omega)^{-2}S_w\omega=0 ωJ(ω)=Sbω(ωTSwω)1ωTSbω(ωTSwω)2Swω=0
两边同乘以 ( ω T S w ω ) − 2 (\omega^TS_w\omega)^{-2} (ωTSwω)2,得
S b ω ω T S w ω = ω T S b ω S w ω S_b\omega \omega^TS_w\omega=\omega^TS_b\omega S_w\omega SbωωTSwω=ωTSbωSwω
其中 ω T S w ω \omega^TS_w\omega ωTSwω ω T S b ω S w \omega^TS_b\omega S_w ωTSbωSw均为实数,可以变换至任意位置,所以有:
S w ω = S b ω ω T S w ω ω T S b ω ω = ω T S w ω ω T S b ω S w − 1 S b ω ∝ S w − 1 S b ω S_w\omega=S_b\omega\frac{\omega^TS_w\omega}{\omega^TS_b\omega}\\ \omega = \frac{\omega^TS_w\omega}{\omega^TS_b\omega}S_w^{-1}S_b\omega \propto S_w^{-1}S_b\omega Swω=SbωωTSbωωTSwωω=ωTSbωωTSwωSw1SbωSw1Sbω
继续代入:
ω ∝ S w − 1 ( x ˉ c 1 − x ˉ c 2 ) ( x ˉ c 1 − x ˉ c 2 ) T ω \omega \propto S_w^{-1} (\bar{x}_{c1}-\bar{x}_{c2})(\bar{x}_{c1}-\bar{x}_{c2})^T\omega ωSw1(xˉc1xˉc2)(xˉc1xˉc2)Tω
其中 ( x ˉ c 1 − x ˉ c 2 ) T ω (\bar{x}_{c1}-\bar{x}_{c2})^T\omega (xˉc1xˉc2)Tω为一维实数, ω \omega ω方向与此项无关,所以有:
ω ∝ S w − 1 ( x ˉ c 1 − x ˉ c 2 ) \omega \propto S_w^{-1} (\bar{x}_{c1}-\bar{x}_{c2}) ωSw1(xˉc1xˉc2)
如果假设 S w − 1 S_w^{-1} Sw1为对角矩阵,并且满足各向同性,那么 S w − 1 ∝ I S_w^{-1}\propto I Sw1I(单位矩阵),此时:
ω ∝ ( x ˉ c 1 − x ˉ c 2 ) \omega \propto (\bar{x}_{c1}-\bar{x}_{c2}) ω(xˉc1xˉc2)
至此,关于模型参数 ω \omega ω的方向求解完毕(假定 ∣ ∣ ω ∣ ∣ 2 = 1 ||\omega||^2=1 ω2=1)。

后记

LDA线性判别分析具有很大的局限性(日后补充局限性),但它是一个非常具有代表性的分类方法,可以作为其他分类方法的性能基准。

参考资料

1、机器学习白板推导

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值