机器学习——线性分类之高斯判别分析

高斯判别分析模型定义

高斯判别分析:Gaussian discriminant analysis

假设存在样本 X N × p X_{N\times p} XN×p满足如下形式:

X = ( x 1   x 2   . . .   x N ) T = ( x 1 T x 2 T ⋮ x N T ) N × p = ( x 11 x 12 . . . x 1 p x 21 x 22 . . . x 2 p ⋮ ⋮ ⋮ x N 1 x N 2 . . . x N p ) N × p X=\left ( x_{1} \ x_{2} \ ...\ x_{N}\right )^{T} =\left( \begin{matrix} x^T_1 \\ x^T_2 \\ \vdots \\ x^T_N \\ \end{matrix} \right)_{N \times p} = \left( \begin{matrix} x_{11} & x_{12} & ... & x_{1p} \\ x_{21} & x_{22} & ... & x_{2p} \\ \vdots & \vdots & & \vdots \\ x_{N1} & x_{N2} & ... & x_{Np} \\ \end{matrix} \right )_{N\times p} X=(x1 x2 ... xN)T=x1Tx2TxNTN×p=x11x21xN1x12x22xN2.........x1px2pxNpN×p

存在样本 Y N × 1 Y_{N\times 1} YN×1满足如下形式:

Y = ( y 1 y 2 ⋮ y N ) N × 1 Y =\left( \begin{matrix} y_{1} \\ y_{2} \\ \vdots \\ y_{N} \\ \end{matrix} \right )_{N \times 1} Y=y1y2yNN×1
上述 X X X Y Y Y组成 { ( x i , y i ) } i = 1 N \left\{ \left( x_i,y_i\right) \right\}_{i=1}^{N} {(xi,yi)}i=1N样式样本点。
首先,高斯判别分析的作用也是用于分类。对于两类样本,其服从伯努利分布,假设 Y Y Y满足伯努利分布,则有:

y i y_i yi10
P P P ϕ \phi ϕ 1 − ϕ 1-\phi 1ϕ

⇒ { ϕ y , y i = 1 ( 1 − ϕ ) 1 − y i , y i = 0 ⇒ ϕ y ( 1 − ϕ ) 1 − y i \Rightarrow\left\{\begin{matrix} \phi^y, &y_i=1 \\ (1-\phi)^{1-y_i},&y_i=0 & \end{matrix}\right. \Rightarrow \phi^y(1-\phi)^{1-y_i} {ϕy,(1ϕ)1yi,yi=1yi=0ϕy(1ϕ)1yi
对于每个类中的样本,假定都服从高斯分布,并有相同的协方差 Σ \Sigma Σ,则有:
x i ∣ y i = 1 ∼ N ( μ 1 , Σ ) x i ∣ y i = 0 ∼ N ( μ 2 , Σ ) } ⇒ N ( μ 1 , Σ ) y i ⋅ N ( μ 2 , Σ ) 1 − y i \left.\begin{matrix} x_i|y_i=1\sim N(\mu_1,\Sigma)\\ x_i|y_i=0 \sim N(\mu_2,\Sigma) \end{matrix}\right\} \Rightarrow N(\mu_1,\Sigma)^{y_i}\cdot N(\mu_2,\Sigma)^{1-y_i} xiyi=1N(μ1,Σ)xiyi=0N(μ2,Σ)}N(μ1,Σ)yiN(μ2,Σ)1yi
并假设有 N 1 N_1 N1 y i = 1 y_i=1 yi=1 N 2 N_2 N2 y i = 0 y_i=0 yi=0,并且有 N 1 + N 2 = N N_1+N_2=N N1+N2=N
这样,根据训练样本,估计出先验概率以及高斯分布的均值和协方差矩阵,即可通过如下贝叶斯公式求出一个新样本分别属于两类的概率,进而可实现对该样本的分类。
P ( y ∣ x ) = P ( x ∣ y ) P ( y ) P ( x ) ∝ P ( x ∣ y ) P ( y ) P(y|x)=\frac{P(x|y)P(y)}{P(x)} \propto P(x|y)P(y) P(yx)=P(x)P(xy)P(y)P(xy)P(y)
对于新来的样本 y y y,我们通过计算 P ( y = 1 ∣ x ) P(y=1|x) P(y=1x) P ( y = 0 ∣ x ) P(y=0|x) P(y=0x)并比较两者大小,将 y y y分类至求出概率大的一类,为此有:
y ^ = arg ⁡ max ⁡ y ∈ { 0 , 1 } P ( y ∣ x ) = arg ⁡ max ⁡ y ∈ { 0 , 1 } P ( x ∣ y ) P ( y ) = arg ⁡ max ⁡ y ∈ { 0 , 1 } P ( x , y ) \hat{y} = \underset{y\in \left \{ 0,1\right\}}{\arg\max}P(y|x) = \underset{y\in \left \{ 0,1\right\}}{\arg\max} P(x|y)P(y)=\underset{y\in \left \{ 0,1\right\}}{\arg\max} P(x,y) y^=y{0,1}argmaxP(yx)=y{0,1}argmaxP(xy)P(y)=y{0,1}argmaxP(x,y)

高斯判别分析的核心工作就是估计上述未知量 μ 1 , μ 2 , Σ , ϕ \mu_1,\mu_2,\Sigma,\phi μ1,μ2,Σ,ϕ。现通过对数似然函数 L ( θ ) L(\theta) L(θ)估计上述未知量,其中 θ = ( μ 1 , μ 2 , Σ , ϕ ) \theta=(\mu_1,\mu_2,\Sigma,\phi) θ=(μ1,μ2,Σ,ϕ)
L ( θ ) = log ⁡ ∏ i = 1 N P ( x , y ) = log ⁡ ∏ i = 1 N P ( x ∣ y ) P ( y ) = ∑ i = 1 N log ⁡ P ( x ∣ y ) + ∑ i = 1 N log ⁡ P ( y ) L(\theta) = \log \prod_{i=1}^NP(x,y) =\log \prod_{i=1}^N P(x|y)P(y) =\sum_{i=1}^N \log P(x|y)+\sum_{i=1}^N \log P(y) L(θ)=logi=1NP(x,y)=logi=1NP(xy)P(y)=i=1NlogP(xy)+i=1NlogP(y)
代入概率,得:
L ( θ ) = ∑ i = 1 N [ log ⁡ N ( μ 1 , Σ ) y i + log ⁡ N ( μ 2 , Σ ) 1 − y i + log ⁡ ϕ y ( 1 − ϕ ) 1 − y i ] L(\theta) = \sum_{i=1}^N \left [ \log N(\mu_1,\Sigma)^{y_i} +\log N(\mu_2,\Sigma)^{1-y_i} +\log \phi^y(1-\phi)^{1-y_i} \right ] L(θ)=i=1N[logN(μ1,Σ)yi+logN(μ2,Σ)1yi+logϕy(1ϕ)1yi]

高斯判别分析模型求 ϕ \phi ϕ

ϕ \phi ϕ,因为 ϕ \phi ϕ只与 L ( θ ) L(\theta) L(θ)第三项有关,我们令:
Δ = ∑ i = 1 N log ⁡ ϕ y ( 1 − ϕ ) 1 − y i = ∑ i = 1 N y log ⁡ ϕ + ∑ i = 1 N ( 1 − y i ) log ⁡ ( 1 − ϕ ) \Delta =\sum_{i=1}^N \log \phi^y(1-\phi)^{1-y_i} =\sum_{i=1}^N y\log \phi+\sum_{i=1}^N (1-y_i)\log (1-\phi) Δ=i=1Nlogϕy(1ϕ)1yi=i=1Nylogϕ+i=1N(1yi)log(1ϕ)
Δ \Delta Δ求导有:
∂ Δ ∂ ϕ = ∑ i = 1 N y ϕ − ∑ i = 1 N 1 − y 1 − ϕ ∑ i = 1 N [ y ( 1 − ϕ ) − ϕ ( 1 − y ) = 0 ] ∑ i = 1 N [ y − ϕ ] = 0 ∑ i = 1 N y = N ϕ \frac{\partial{\Delta}}{\partial{\phi}} =\sum_{i=1}^N \frac{y}{\phi}-\sum_{i=1}^N \frac{1-y}{1-\phi}\\ \sum_{i=1}^N \left [ y(1-\phi)-\phi(1-y) = 0 \right ] \\ \sum_{i=1}^N \left [y-\phi \right ] = 0\\ \sum_{i=1}^N y = N\phi ϕΔ=i=1Nϕyi=1N1ϕ1yi=1N[y(1ϕ)ϕ(1y)=0]i=1N[yϕ]=0i=1Ny=Nϕ
所以有:
ϕ = ∑ i = 1 N y N = N 1 N \phi = \frac{\sum_{i=1}^N y}{N}=\frac{N_1}{N} ϕ=Ni=1Ny=NN1

高斯判别分析模型求 μ 1 , μ 2 \mu_1,\mu_2 μ1,μ2

前面求出:
L ( θ ) = ∑ i = 1 N [ log ⁡ N ( μ 1 , Σ ) y i + log ⁡ N ( μ 2 , Σ ) 1 − y i + log ⁡ ϕ y ( 1 − ϕ ) 1 − y i ] L(\theta) = \sum_{i=1}^N \left [ \log N(\mu_1,\Sigma)^{y_i} +\log N(\mu_2,\Sigma)^{1-y_i} +\log \phi^y(1-\phi)^{1-y_i} \right ] L(θ)=i=1N[logN(μ1,Σ)yi+logN(μ2,Σ)1yi+logϕy(1ϕ)1yi]
观察 μ 1 , μ 2 \mu_1,\mu_2 μ1,μ2只与第一项和第二项有关,并且 μ 1 \mu_1 μ1只与第一项有关,我们令
Δ = ∑ i = 1 N log ⁡ N ( μ 1 , Σ ) y i = ∑ i = 1 N y i log ⁡ { 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ⁡ [ − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ] } = ∑ i = 1 N y i log ⁡ { 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 } − 1 2 ∑ i = 1 N y i ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) \Delta =\sum_{i=1}^N \log N(\mu_1,\Sigma)^{y_i}\\ =\sum_{i=1}^N y_i\log \left \{ \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} \exp\left [ -\frac{1}{2} (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1) \right ]\right\}\\ =\sum_{i=1}^N y_i\log \left \{ \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\right\} -\frac{1}{2} \sum_{i=1}^Ny_i (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1) Δ=i=1NlogN(μ1,Σ)yi=i=1Nyilog{(2π)2pΣ211exp[21(xiμ1)TΣ1(xiμ1)]}=i=1Nyilog{(2π)2pΣ211}21i=1Nyi(xiμ1)TΣ1(xiμ1)
Δ \Delta Δ μ 1 \mu_1 μ1求导有:
∂ Δ ∂ μ 1 = ∂ [ − 1 2 ∑ i = 1 N y i ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ] ∂ μ 1 = − 1 2 ∑ i = 1 N y i ( x i − μ 1 ) Σ − 1 = 0 \frac{\partial{\Delta}}{\partial{\mu_1}}=\frac{\partial{[-\frac{1}{2} \sum_{i=1}^Ny_i (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1)]}}{\partial{\mu_1}}\\ =-\frac{1}{2} \sum_{i=1}^Ny_i (x_i-\mu_1) \Sigma^{-1}=0 μ1Δ=μ1[21i=1Nyi(xiμ1)TΣ1(xiμ1)]=21i=1Nyi(xiμ1)Σ1=0
那么:
∑ i = 1 N y i ( x i − μ 1 ) = 0 ∑ i = 1 N y i x i = ∑ i = 1 N y i μ 1 = N 1 μ 1 \sum_{i=1}^Ny_i (x_i-\mu_1)=0\\ \sum_{i=1}^Ny_i x_i=\sum_{i=1}^Ny_i \mu_1=N_1\mu_1 i=1Nyi(xiμ1)=0i=1Nyixi=i=1Nyiμ1=N1μ1
所以有:
μ 1 ^ = 1 N 1 ∑ i = 1 N y i x i \hat{\mu_1}=\frac{1}{N_1}\sum_{i=1}^Ny_i x_i μ1^=N11i=1Nyixi
同理,对于 μ 2 ^ \hat{\mu_2} μ2^
μ 2 ^ = 1 N 2 ∑ i = 1 N ( 1 − y i ) x i \hat{\mu_2}=\frac{1}{N_2}\sum_{i=1}^N(1-y_i) x_i μ2^=N21i=1N(1yi)xi

高斯判别分析模型求 Σ \Sigma Σ

前面求出:
L ( θ ) = ∑ i = 1 N [ log ⁡ N ( μ 1 , Σ ) y i + log ⁡ N ( μ 2 , Σ ) 1 − y i + log ⁡ ϕ y ( 1 − ϕ ) 1 − y i ] L(\theta) = \sum_{i=1}^N \left [ \log N(\mu_1,\Sigma)^{y_i} +\log N(\mu_2,\Sigma)^{1-y_i} +\log \phi^y(1-\phi)^{1-y_i} \right ] L(θ)=i=1N[logN(μ1,Σ)yi+logN(μ2,Σ)1yi+logϕy(1ϕ)1yi]
此时令:
Δ = ∑ i = 1 N [ y i log ⁡ N ( μ 1 , Σ ) + ( 1 − y i ) log ⁡ N ( μ 2 , Σ ) ] \Delta = \sum_{i=1}^N \left [ y_i\log N(\mu_1,\Sigma) +(1-y_i)\log N(\mu_2,\Sigma) \right] Δ=i=1N[yilogN(μ1,Σ)+(1yi)logN(μ2,Σ)]
为方便求导计算,现对 Δ \Delta Δ做如下转换:
Δ = Δ 1 + Δ 2 = ∑ x i ∈ c 1 log ⁡ N ( μ 1 , Σ ) + ∑ x i ∈ c 2 log ⁡ N ( μ 2 , Σ ) \Delta = \Delta_1+\Delta_2= \sum_{x_i\in c_1}\log N(\mu_1,\Sigma)+ \sum_{x_i\in c_2}\log N(\mu_2,\Sigma) Δ=Δ1+Δ2=xic1logN(μ1,Σ)+xic2logN(μ2,Σ)
化简 Δ 1 \Delta_1 Δ1
Δ 1 = ∑ i = 1 N 1 log ⁡ { 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ⁡ [ − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ] } = ∑ i = 1 N 1 log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 − 1 2 ∑ i = 1 N 1 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) \Delta_1= \sum_{i=1}^{N_1} \log \left \{ \frac{1}{(2\pi)^{\frac{p}{2}}| \Sigma|^{\frac{1}{2}}} \exp\left [ -\frac{1}{2} (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1) \right ]\right\}\\ =\sum_{i=1}^{N_1} \log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} -\frac{1}{2}\sum_{i=1}^{N_1} (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1) Δ1=i=1N1log{(2π)2pΣ211exp[21(xiμ1)TΣ1(xiμ1)]}=i=1N1log(2π)2pΣ21121i=1N1(xiμ1)TΣ1(xiμ1)
为方便计算,现引入如下定义:
∂ t r ( A B ) ∂ A = B T ∂ ∣ A ∣ ∂ A = ∣ A ∣ A − 1 t r ( A B ) = t r ( B A ) t r ( A B C ) = t r ( C A B ) = t r ( B C A ) \frac{\partial{tr(AB)}}{\partial{A}} = B^T\\ \frac{\partial{|A|}}{\partial{A}} = |A|A^{-1}\\ tr(AB) = tr(BA)\\ tr(ABC) = tr(CAB)=tr(BCA) Atr(AB)=BTAA=AA1tr(AB)=tr(BA)tr(ABC)=tr(CAB)=tr(BCA)
因为 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1) (xiμ1)TΣ1(xiμ1)为一维实数,将其转换为 t r [ ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ] tr[(x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1)] tr[(xiμ1)TΣ1(xiμ1)]对其值无任何影响,根据迹的性质,所以 Δ 1 \Delta_1 Δ1有:
Δ 1 = ∑ i = 1 N 1 log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 − 1 2 ∑ i = 1 N 1 t r [ ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ] = ∑ i = 1 N 1 log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 − 1 2 ∑ i = 1 N 1 t r [ ( x i − μ 1 ) ( x i − μ 1 ) T Σ − 1 ] \Delta_1=\sum_{i=1}^{N_1} \log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} -\frac{1}{2}\sum_{i=1}^{N_1} tr[(x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1)]\\ =\sum_{i=1}^{N_1} \log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} -\frac{1}{2}\sum_{i=1}^{N_1} tr[(x_i-\mu_1)(x_i-\mu_1)^T\Sigma^{-1} ] Δ1=i=1N1log(2π)2pΣ21121i=1N1tr[(xiμ1)TΣ1(xiμ1)]=i=1N1log(2π)2pΣ21121i=1N1tr[(xiμ1)(xiμ1)TΣ1]
因为样本方差 S 1 = 1 N 1 ∑ i = 1 N 1 ( x i − μ 1 ) ( x i − μ 1 ) T S_1=\frac{1}{N_1}\sum_{i=1}^{N_1} (x_i-\mu_1)(x_i-\mu_1)^T S1=N11i=1N1(xiμ1)(xiμ1)T,所以上式有:
Δ 1 = ∑ i = 1 N 1 log ⁡ ( 2 π ) − p 2 − 1 2 ∑ i = 1 N 1 log ⁡ ∣ Σ ∣ − 1 2 N 1 t r [ 1 N 1 ∑ i = 1 N 1 ( x i − μ 1 ) ( x i − μ 1 ) T Σ − 1 ] = C − 1 2 N 1 log ⁡ ∣ Σ ∣ − 1 2 N 1 t r ( S 1 Σ − 1 ) \Delta_1=\sum_{i=1}^{N_1}\log (2\pi)^{-\frac{p}{2}}-\frac{1}{2}\sum_{i=1}^{N_1}\log|\Sigma|-\frac{1}{2}N_1tr\left [\frac{1}{N_1}\sum_{i=1}^{N_1} (x_i-\mu_1)(x_i-\mu_1)^T\Sigma^{-1}\right]\\ =C-\frac{1}{2}N_1\log|\Sigma|-\frac{1}{2}N_1tr(S_1\Sigma^{-1}) Δ1=i=1N1log(2π)2p21i=1N1logΣ21N1tr[N11i=1N1(xiμ1)(xiμ1)TΣ1]=C21N1logΣ21N1tr(S1Σ1)
所以:
Δ = Δ 1 + Δ 2 = − 1 2 N 1 log ⁡ ∣ Σ ∣ − 1 2 N 1 t r ( S 1 Σ − 1 ) − 1 2 N 2 log ⁡ ∣ Σ ∣ − 1 2 N 2 t r ( S 2 Σ − 1 ) = − 1 2 [ N log ⁡ ∣ Σ ∣ + N 1 t r ( S 1 Σ − 1 ) + N 2 t r ( S 2 Σ − 1 ) ] \Delta = \Delta_1+\Delta_2\\ =-\frac{1}{2}N_1\log|\Sigma|-\frac{1}{2}N_1tr(S_1\Sigma^{-1})-\frac{1}{2}N_2\log|\Sigma|-\frac{1}{2}N_2tr(S_2\Sigma^{-1})\\ =-\frac{1}{2} \left [ N\log|\Sigma|+ N_1tr(S_1\Sigma^{-1})+ N_2tr(S_2\Sigma^{-1}) \right ] Δ=Δ1+Δ2=21N1logΣ21N1tr(S1Σ1)21N2logΣ21N2tr(S2Σ1)=21[NlogΣ+N1tr(S1Σ1)+N2tr(S2Σ1)]
现将 Δ \Delta Δ Σ \Sigma Σ求导,有(利用上文给出的行列式求导的公式):
∂ Δ ∂ Σ = − 1 2 ( N 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 − N 1 S 1 Σ − 2 − N 2 S 2 Σ − 2 ) = − 1 2 [ N Σ − ( N 1 S 1 + N 2 S 2 ) ] = 0 \frac{\partial{\Delta}}{\partial{\Sigma}} =-\frac{1}{2}\left ( N\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}-N_1S_1\Sigma^{-2}-N_2S_2\Sigma^{-2}\right )\\ =-\frac{1}{2}\left [ N\Sigma -(N_1S_1+N_2S_2) \right]=0 ΣΔ=21(NΣ1ΣΣ1N1S1Σ2N2S2Σ2)=21[NΣ(N1S1+N2S2)]=0
所以有:
Σ ^ = 1 N ( N 1 S 1 + N 2 S 2 ) \hat{\Sigma}=\frac{1}{N}(N_1S_1+N_2S_2) Σ^=N1(N1S1+N2S2)
求解完毕。

后记

至此, θ = ( μ 1 , μ 2 , Σ , ϕ ) \theta=(\mu_1,\mu_2,\Sigma,\phi) θ=(μ1,μ2,Σ,ϕ)均求解完毕!
日后补充

参考资料

1、机器学习白板推导
2、斯坦福机器学习实现与分析之五

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值