[监督学习]线性判别式分析(LDA)

线性判别式算法(LDA)

LDA算法和PCA算法都是一种数据压缩的算法,由于前者属于无监督学习而后者属于监督学习,根据任务的不同,因而它们的侧重点不同,PCA算法关心的是原数据与新数据之间的最小重构误差,而LDA算法关注的是数据压缩后类别间的区分度。

在这里插入图片描述

从上图中可以看出,LDA算法希望找到一个投影的方向,使得类别间中心点尽可能分散,而每一类的样本尽可能聚集,如果说PCA算法的优化准则是最小重构误差,则LDA的准则就是最小化类内方差、最大化类间均值。

那我们该如何去选择这个投影方向呢?我们不妨先从数学原理出发。

假设样本数据共分为 K K K 类,每一类的样本数目分别为 N 1 , N 2 , ⋯   , N K N_1,N_2,\cdots,N_K N1,N2,,NK ,设 x k 1 , x k 2 , ⋯   , x k N k \boldsymbol{x_k^1},\boldsymbol{x_k^2},\cdots,\boldsymbol{x_k^{N_k}} xk1,xk2,,xkNk 分别为第 k k k 类的样本。对于任何一个样本 x \boldsymbol{x} x ,设 x ~ \boldsymbol{\widetilde{x}} x x \boldsymbol{x} x 投影后的样本点。

在这里插入图片描述

则有 x ~ = < x , u > u = ( x T u ) u \boldsymbol{\widetilde{x}} = <\boldsymbol{x},\boldsymbol{u}>\boldsymbol{u} = (\boldsymbol{x}^T\boldsymbol{u})\boldsymbol{u} x =<x,u>u=(xTu)u

如何描述类内方差?

我们接下来首先去描述投影后的第 k k k 类样本的方差。
S k = 1 N k ∑ x ~ ∈ D k ( x ~ − m ~ k ) T ( x ~ − m ~ k ) = 1 N k ∑ x ∈ D k [ ( x T u ) u − ( m k T u ) u ] T [ ( x T u ) u − ( m k T u ) u ] = 1 N k ∑ [ ( x T u ) 2 u T u − 2 ( m k T u ) ( x T u ) u T u + ( m k T u ) 2 u T u ] = 1 N k u T u ∑ [ ( x T u ) 2 − 2 ( m k T u ) ( x T u ) + ( m k T u ) 2 ] = a N k ∑ [ ( x T u ) 2 − 2 ( m k T u ) ( x T u ) + ( m k T u ) 2 ] = a [ ∑ ( x T u ) 2 N k − 2 ∑ ( m k T u ) ( x T u ) N k + ∑ ( m k T u ) 2 N k ] = a [ ∑ ( x T u ) ( x T u ) N k − 2 ∑ ( m k T u ) ( x T u ) N k + ( m k T u ) 2 ] = a [ ∑ u T x x T u N k − 2 ∑ x T N k u m k T u + ( m k T u ) 2 ] = a [ u T ∑ x x T N k u − u T m k m k T u ] = a u T ( ∑ x x T N k − m k m k T ) u \begin{aligned} S_k &= \frac{1}{N_k}\displaystyle \sum_{\boldsymbol{\widetilde{x}} \in D_k}(\boldsymbol{\widetilde{x}}-\boldsymbol{\widetilde{m}}_k)^T (\boldsymbol{\widetilde{x}}-\boldsymbol{\widetilde{m}}_k) \\ &= \frac{1}{N_k} \displaystyle \sum_{\boldsymbol{x} \in D_k} [(\boldsymbol{x}^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_k^T \boldsymbol{u})\boldsymbol{u}]^T [(\boldsymbol{x}^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_k^T \boldsymbol{u})\boldsymbol{u}] \\ &= \frac{1}{N_k} \sum \Big[(\boldsymbol{x}^T \boldsymbol{u})^2 \boldsymbol{u}^T \boldsymbol{u} - 2(\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u})\boldsymbol{u}^T \boldsymbol{u} + (\boldsymbol{m}_k^T \boldsymbol{u})^2 \boldsymbol{u}^T \boldsymbol{u}\Big] \\ &= \frac{1}{N_k} \boldsymbol{u}^T \boldsymbol{u} \sum [(\boldsymbol{x}^T \boldsymbol{u})^2 - 2(\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u}) + (\boldsymbol{m}_k^T \boldsymbol{u})^2] \\ &= \frac{a}{N_k} \sum [(\boldsymbol{x}^T \boldsymbol{u})^2 - 2(\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u}) + (\boldsymbol{m}_k^T \boldsymbol{u})^2] \\ &= a\Big[\frac{\sum (\boldsymbol{x}^T \boldsymbol{u})^2}{N_k} - 2\frac{ \sum (\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u})}{N_k} + \frac{ \sum (\boldsymbol{m}_k^T \boldsymbol{u})^2}{N_k}\Big] \\ &= a\Big[\frac{\sum (\boldsymbol{x}^T \boldsymbol{u}) (\boldsymbol{x}^T \boldsymbol{u})}{N_k} - 2\frac{\sum (\boldsymbol{m}_k^T \boldsymbol{u})(\boldsymbol{x}^T \boldsymbol{u})}{N_k} + (\boldsymbol{m}_k^T \boldsymbol{u})^2 \Big] \\ &= a\Big[\frac{\sum \boldsymbol{u}^T \boldsymbol{x} \boldsymbol{x}^T \boldsymbol{u}}{N_k} - 2\frac{\sum \boldsymbol{x}^T}{N_k} \boldsymbol{u} \boldsymbol{m}_k^T \boldsymbol{u} + (\boldsymbol{m}_k^T \boldsymbol{u})^2 \Big] \\ &= a\Big[\boldsymbol{u}^T \frac{\sum \boldsymbol{x} \boldsymbol{x}^T}{N_k} \boldsymbol{u} - \boldsymbol{u}^T \boldsymbol{m}_k \boldsymbol{m}_k^T \boldsymbol{u}\Big] \\ &= a\boldsymbol{u}^T ( \frac{\sum \boldsymbol{x} \boldsymbol{x}^T}{N_k} - \boldsymbol{m}_k \boldsymbol{m}_k^T) \boldsymbol{u} \end{aligned} Sk=Nk1x Dk(x m k)T(x m k)=Nk1xDk[(xTu)u(mkTu)u]T[(xTu)u(mkTu)u]=Nk1[(xTu)2uTu2(mkTu)(xTu)uTu+(mkTu)2uTu]=Nk1uTu[(xTu)22(mkTu)(xTu)+(mkTu)2]=Nka[(xTu)22(mkTu)(xTu)+(mkTu)2]=a[Nk(xTu)22Nk(mkTu)(xTu)+Nk(mkTu)2]=a[Nk(xTu)(xTu)2Nk(mkTu)(xTu)+(mkTu)2]=a[NkuTxxTu2NkxTumkTu+(mkTu)2]=a[uTNkxxTuuTmkmkTu]=auT(NkxxTmkmkT)u
其中, D k D_k Dk 表示第 k k k 类的样本集合,投影后的样本中心为 m ~ k \widetilde{\boldsymbol{m}}_k m k ,原样本的中心为 m k = ∑ x x T N k \boldsymbol{m}_k = \frac{\sum \boldsymbol{x}\boldsymbol{x}^T}{N_k} mk=NkxxT ,由于 u \boldsymbol{u} u 重在它的方向性,因此不妨设它的大小为 u T u = a \boldsymbol{u}^T \boldsymbol{u} = a uTu=a

而对于整个算法来说,投影后所有类别的类内方差为
∑ k = 1 K S k = a ∑ k = 1 K u T ( ∑ x x T N k − m k m k T ) u = a u T ∑ k = 1 K ( ∑ x x T N k − m k m k T ) u \begin{aligned} \sum_{k=1}^K S_k &= a \sum_{k=1}^K \boldsymbol{u}^T ( \frac{\sum \boldsymbol{x} \boldsymbol{x}^T}{N_k} - \boldsymbol{m}_k \boldsymbol{m}_k^T) \boldsymbol{u} \\ &= a \boldsymbol{u}^T \sum_{k=1}^K ( \frac{\sum \boldsymbol{x} \boldsymbol{x}^T}{N_k} - \boldsymbol{m}_k \boldsymbol{m}_k^T) \boldsymbol{u} \\ \end{aligned} k=1KSk=ak=1KuT(NkxxTmkmkT)u=auTk=1K(NkxxTmkmkT)u
∑ k = 1 K ( ∑ x x T N k − m k m k T ) = S w \displaystyle \sum_{k=1}^ {K} ( \frac{\sum \boldsymbol{x} \boldsymbol{x} ^ T}{N_k} - \boldsymbol{m}_k \boldsymbol{m}_k^T) = \mathbf{S_w} k=1K(NkxxTmkmkT)=Sw ,则有
∑ k = 1 K S k = a u T S w u \sum_{k=1}^K S_k = a \boldsymbol{u}^T \mathbf{S_w} \boldsymbol{u} k=1KSk=auTSwu

如何描述类间距离?

而对于投影后任意两个类别间的距离,有
S i , j = ( m ~ i − m ~ j ) T ( m ~ i − m ~ j ) = [ ( m i T u ) u − ( m j T u ) u ] T [ ( m i T u ) u − ( m j T u ) u ] = [ ( m i T u ) u T − ( m j T u ) u T ] [ ( m i T u ) u − ( m j T u ) u ] = ( m i − m j ) T u u T u ( m i − m j ) T u = a ( m i − m j ) T u ( m i − m j ) T u = a u T ( m i − m j ) ( m i − m j ) T u \begin{aligned} S_{i,j} &= (\widetilde{\boldsymbol{m}}_i - \widetilde{\boldsymbol{m}}_j)^T (\widetilde{\boldsymbol{m}}_i - \widetilde{\boldsymbol{m}}_j)\\ &= \big[(\boldsymbol{m}_i^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_j^T \boldsymbol{u})\boldsymbol{u}\big]^T \big[(\boldsymbol{m}_i^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_j^T \boldsymbol{u})\boldsymbol{u}\big]\\ &=\big[ (\boldsymbol{m}_i^T \boldsymbol{u})\boldsymbol{u}^T - (\boldsymbol{m}_j^T \boldsymbol{u})\boldsymbol{u}^T \big] \big[(\boldsymbol{m}_i^T \boldsymbol{u})\boldsymbol{u} - (\boldsymbol{m}_j^T \boldsymbol{u})\boldsymbol{u}\big]\\ &= (\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \boldsymbol{u}^T \boldsymbol{u}(\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \\ &= a (\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u}(\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \\ &= a \boldsymbol{u}^T(\boldsymbol{m}_i - \boldsymbol{m}_j)(\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \end{aligned} Si,j=(m im j)T(m im j)=[(miTu)u(mjTu)u]T[(miTu)u(mjTu)u]=[(miTu)uT(mjTu)uT][(miTu)u(mjTu)u]=(mimj)TuuTu(mimj)Tu=a(mimj)Tu(mimj)Tu=auT(mimj)(mimj)Tu
投影后所有类别间的距离为
∑ i , j 且 i ≠ j S i , j = ∑ i , j 且 i ≠ j a u T ( m i − m j ) ( m i − m j ) T u = a u T [ ∑ i , j 且 i ≠ j ( m i − m j ) ( m i − m j ) T ] u \begin{aligned} \sum_{i,j 且 i\neq j} S_{i,j} &=\sum_{i,j且i\neq j} a \boldsymbol{u}^T (\boldsymbol{m}_i - \boldsymbol{m}_j) (\boldsymbol{m}_i - \boldsymbol{m}_j)^T \boldsymbol{u} \\ &= a \boldsymbol{u}^T \Big[\sum_{i,j且i\neq j} (\boldsymbol{m}_i - \boldsymbol{m}_j) (\boldsymbol{m}_i - \boldsymbol{m}_j)^T \Big]\boldsymbol{u} \end{aligned} i,ji=jSi,j=i,ji=jauT(mimj)(mimj)Tu=auT[i,ji=j(mimj)(mimj)T]u
∑ i , j i ≠ j ( m i − m j ) ( m i − m j ) T = S b \displaystyle \sum_{i,j\\i\neq j} (\boldsymbol{m}_i - \boldsymbol{m}_j) (\boldsymbol{m}_i - \boldsymbol{m}_j)^T = \mathbf{S_b} i,ji=j(mimj)(mimj)T=Sb ,有
∑ i , j 且 i ≠ j S i , j = a u T S b u \sum_{i,j且i\neq j} S_{i,j} = a \boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u} i,ji=jSi,j=auTSbu

LDA优化目标

因此,根据LDA的优化准则,我们设计出
min ⁡ u J ( u ) = u T S w u u T S b u \min_{\boldsymbol{u}}J(\boldsymbol{u}) = \frac{\boldsymbol{u}^T \mathbf{S_w} \boldsymbol{u}}{\boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u}} uminJ(u)=uTSbuuTSwu
为求最小化,假设 u T S b u = 1 \boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u} = 1 uTSbu=1 ,从而有以下优化问题
{ min ⁡ u T S w u u T S b u = 1 \left\{ \begin{aligned} \min \boldsymbol{u}^T \mathbf{S_w} \boldsymbol{u} \\ \boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u} = 1 \end{aligned} \right. {minuTSwuuTSbu=1
通过拉格朗日乘子法有
L ( u , λ ) = u T S w u + λ ( 1 − u T S b u ) L(\boldsymbol{u},\lambda) = \boldsymbol{u}^T \mathbf{S_w} \boldsymbol{u} + \lambda (1- \boldsymbol{u}^T \mathbf{S_b} \boldsymbol{u} ) L(u,λ)=uTSwu+λ(1uTSbu)
从而有
∂ L ∂ u = 0 → S w u = λ S b u \frac{\partial L}{\partial \boldsymbol{u}} = 0 \rightarrow \mathbf{S_w} \boldsymbol{u} = \lambda \mathbf{S_b} \boldsymbol{u} uL=0Swu=λSbu
S b − 1 S w u = λ u \mathbf{S_b}^{-1} \mathbf{S_w} \boldsymbol{u} = \lambda \boldsymbol{u} Sb1Swu=λu ,投影方向即为 S b − 1 S w \mathbf{S_b}^{-1} \mathbf{S_w} Sb1Sw 的特征向量。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值