高斯判别分析(GDA)

高斯判别分析

建立高斯模型

【假设】: y ∼ B e r n o u l i ( Φ ) y\sim Bernouli(\Phi) yBernouli(Φ) x ∣ y = 0 ∼ N ( μ 0 , Σ ) x|y=0\sim N(\mu_0,\Sigma) xy=0N(μ0,Σ) x ∣ y = 1 ∼ N ( μ 1 , Σ ) x|y=1\sim N(\mu_1,\Sigma) xy=1N(μ1,Σ)
由贝叶斯公式可得: P ( y ∣ x ) = p ( x ∣ y ) p ( y ) p ( x ) P(y|x)=\frac{p(x|y)p(y)}{p(x)} P(yx)=p(x)p(xy)p(y)
y ^ = a r g \widehat{y}=arg y =arg m a x max max p ( y ∣ x ) = p(y|x)= p(yx)= a r g arg arg m a x max max p ( x ∣ y ) p ( y ) p ( x ) = \frac{p(x|y)p(y)}{p(x)}= p(x)p(xy)p(y)= a r g arg arg m a x max max p ( x ∣ y ) p ( y ) p(x|y)p(y) p(xy)p(y)

【参数估计】:

构造对数似然函数:
L ( Φ , μ 0 , μ 1 , Σ ) = l o g ∏ i = 1 m P ( x ( i ) , y ( i ) ) = l o g ∏ i = 1 m P ( x ( i ) ∣ y ( i ) ) P ( y ( i ) ) = ∑ i = 1 m ( l o g P ( x ( i ) ∣ y ( i ) ) + l o g P ( y ( i ) ) ) L(\Phi,\mu_0,\mu_1,\Sigma)=log\prod\limits_{i=1}^{m}P(x^{(i)},y^{(i)})=log\prod\limits_{i=1}^{m}P(x^{(i)}|y^{(i)})P(y^{(i)})\\\quad\quad\quad\quad=\sum\limits_{i=1}^{m}(logP(x^{(i)}|y^{(i)})+logP(y^{(i)})) L(Φ,μ0,μ1,Σ)=logi=1mP(x(i),y(i))=logi=1mP(x(i)y(i))P(y(i))=i=1m(logP(x(i)y(i))+logP(y(i)))
= ∑ i = 1 m [ l o g ( P ( x ( i ) ∣ y ( i ) = 0 ) 1 − y ( i ) ∗ P ( x ( i ) ∣ y ( i ) = 1 ) y ( i ) ) + l o g P ( y ( i ) ) ] \quad\quad\quad\quad=\sum\limits_{i=1}^{m}[log(P(x^{(i)}|y^{(i)}=0)^{1-y^{(i)}}*P(x^{(i)}|y^{(i)}=1)^{y^{(i)}})+logP(y^{(i)})] =i=1m[log(P(x(i)y(i)=0)1y(i)P(x(i)y(i)=1)y(i))+logP(y(i))]
= ∑ i = 1 m [ ( 1 − y ( i ) ) l o g P ( x ( i ) ∣ y ( i ) = 0 ) + y ( i ) l o g P ( x ( i ) ∣ y ( i ) = 1 ) + l o g P ( y ( i ) ) ] \quad\quad\quad\quad=\sum\limits_{i=1}^{m}[(1-y^{(i)})logP(x^{(i)}|y^{(i)}=0)+y^{(i)}logP(x^{(i)}|y^{(i)}=1)+logP(y^{(i)})] =i=1m[(1y(i))logP(x(i)y(i)=0)+y(i)logP(x(i)y(i)=1)+logP(y(i))]

其中,第一项只和 μ 0 , Σ \mu_0,\Sigma μ0,Σ 有关,第二项只和 μ 1 , Σ \mu_1,\Sigma μ1,Σ 有关,第三项只和 Φ \Phi Φ 有关

【求 Φ \Phi Φ】:

∂ L ∂ Φ = ∂ ∑ i = 1 m l o g P ( y ( i ) ) ∂ Φ = ∂ ∑ i = 1 m ( l o g Φ y ( i ) ∗ ( 1 − Φ ) 1 − y ( i ) ) ∂ Φ = ∂ ∑ i = 1 m ( y ( i ) l o g Φ + ( 1 − y ( i ) ) l o g ( 1 − Φ ) ) ) ∂ Φ = ∑ i = 1 m ( y ( i ) 1 Φ + ( 1 − y ( i ) ) 1 1 − Φ ) = ∑ i = 1 m ( I ( y ( i ) = 1 ) 1 Φ + I ( y ( i ) = 0 ) 1 1 − Φ ) = 0 \frac{\partial L}{\partial\Phi}=\frac{\partial \sum\limits_{i=1}^{m}logP(y^{(i)})}{\partial \Phi}=\frac{\partial \sum\limits_{i=1}^{m}(log\Phi^{y^{(i)}}*(1-\Phi)^{1-y^{(i)}})}{\partial \Phi}=\frac{\partial \sum\limits_{i=1}^{m}(y^{(i)}log\Phi+(1-y^{(i)})log(1-\Phi)))}{\partial \Phi}\\\quad=\sum\limits_{i=1}^{m}(y^{(i)}\frac{1}{\Phi}+(1-y^{(i)})\frac{1}{1-\Phi})\\\quad=\sum\limits_{i=1}^{m}(I(y^{(i)}=1)\frac{1}{\Phi}+I(y^{(i)}=0)\frac{1}{1-\Phi})=0 ΦL=Φi=1mlogP(y(i))=Φi=1m(logΦy(i)(1Φ)1y(i))=Φi=1m(y(i)logΦ+(1y(i))log(1Φ)))=i=1m(y(i)Φ1+(1y(i))1Φ1)=i=1m(I(y(i)=1)Φ1+I(y(i)=0)1Φ1)=0

可求得: Φ ^ = ∑ i = 1 m I ( y ( i ) = 1 ) ∑ i = 1 m I ( y ( i ) = 0 ) + ∑ i = 1 m I ( y ( i ) = 1 ) = ∑ i = 1 m I ( y ( i ) = 1 ) m \widehat{\Phi}=\frac{\sum\limits_{i=1}^{m}I(y^{(i)}=1)}{\sum\limits_{i=1}^{m}I(y^{(i)}=0)+\sum\limits_{i=1}^{m}I(y^{(i)}=1)}=\frac{\sum\limits_{i=1}^{m}I(y^{(i)}=1)}{m} Φ =i=1mI(y(i)=0)+i=1mI(y(i)=1)i=1mI(y(i)=1)=mi=1mI(y(i)=1)

【求 μ 0 , μ 1 \mu_0,\mu_1 μ0,μ1】:

∂ L ∂ μ 0 = ∂ ∑ i = 1 m ( 1 − y ( i ) ) l o g P ( x ( i ) ∣ y ( i ) = 0 ) ∂ μ 0 = ∂ ∑ i = 1 m ( 1 − y ( i ) ) [ l o g 1 ( 2 π ) p ∣ Σ ∣ − 1 2 ( x ( i ) − μ 0 ) T Σ − 1 ( x ( i ) − μ 0 ) ] ∂ μ 0 = ∑ i = 1 m ( 1 − y ( i ) ) Σ − 1 ( x ( i ) − μ 0 ) = ∑ i = 1 m I ( y ( i ) = 0 ) ( x ( i ) − μ 0 ) = 0 \frac{\partial L}{\partial \mu_0}=\frac{\partial \sum\limits_{i=1}^{m}(1-y^{(i)})logP(x^{(i)}|y^{(i)}=0)}{\partial \mu_0}=\frac{\partial \sum\limits_{i=1}^{m}(1-y^{(i)})[log\frac{1}{\sqrt{(2\pi)^p|\Sigma|}}-\frac{1}{2}(x^{(i)}-\mu_0)^T\Sigma^{-1}(x^{(i)}-\mu_0)]}{\partial \mu_0}\\\quad\quad=\sum\limits_{i=1}^{m}(1-y^{(i)})\Sigma^{-1}(x^{(i)}-\mu_0)=\sum\limits_{i=1}^{m}I(y^{(i)}=0)(x^{(i)}-\mu_0)=0 μ0L=μ0i=1m(1y(i))logP(x(i)y(i)=0)=μ0i=1m(1y(i))[log(2π)pΣ 121(x(i)μ0)TΣ1(x(i)μ0)]=i=1m(1y(i))Σ1(x(i)μ0)=i=1mI(y(i)=0)(x(i)μ0)=0

可求得: μ 0 ^ = ∑ i = 1 m I ( Y ( i ) = 0 ) x ( i ) ∑ i = 1 m I ( y ( i ) = 0 ) \widehat{\mu_0}=\frac{\sum\limits_{i=1}^{m}I(Y^{(i)}=0)x^{(i)}}{\sum\limits_{i=1}^{m}I(y^{(i)}=0)} μ0 =i=1mI(y(i)=0)i=1mI(Y(i)=0)x(i)

同理得: μ 1 ^ = ∑ i = 1 m I ( Y ( i ) = 1 ) x ( i ) ∑ i = 1 m I ( y ( i ) = 1 ) \widehat{\mu_1}=\frac{\sum\limits_{i=1}^{m}I(Y^{(i)}=1)x^{(i)}}{\sum\limits_{i=1}^{m}I(y^{(i)}=1)} μ1 =i=1mI(y(i)=1)i=1mI(Y(i)=1)x(i)

【求 Σ \Sigma Σ】:

a = l o g 1 ( 2 π ) p ∣ Σ ∣ = − p 2 l o g ( 2 π ) − 1 2 l o g ∣ Σ ∣ a=log\frac{1}{\sqrt{(2\pi)^p|\Sigma|}}=-\frac{p}{2}log(2\pi)-\frac{1}{2}log|\Sigma| a=log(2π)pΣ 1=2plog(2π)21logΣ

Σ \Sigma Σ 之和前两项有关,因此将前两项写作:

∑ i = 1 m ( 1 − y ( i ) ) a + ∑ i = 1 m y ( i ) a − 1 2 ∑ i = 1 m ( x ( i ) − μ 0 ) T Σ − 1 ( x ( i ) − μ 0 ) − 1 2 ∑ i = 1 m ( x ( i ) − μ 1 ) T Σ − 1 ( x ( i ) − μ 1 ) = ∑ i = 1 m a − 1 2 ∑ i = 1 m ( x ( i ) − μ y ( i ) ) T Σ − 1 ( x ( i ) − μ y ( i ) ) \sum\limits_{i=1}^{m}(1-y^{(i)})a+\sum\limits_{i=1}^{m}y^{(i)}a-\frac{1}{2}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_0)^T\Sigma^{-1}(x^{(i)}-\mu_0)-\frac{1}{2}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_1)^T\Sigma^{-1}(x^{(i)}-\mu_1)\\=\sum\limits_{i=1}^{m}a-\frac{1}{2}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_{y^{(i)}})^T\Sigma^{-1}(x^{(i)}-\mu_{y^{(i)}}) i=1m(1y(i))a+i=1my(i)a21i=1m(x(i)μ0)TΣ1(x(i)μ0)21i=1m(x(i)μ1)TΣ1(x(i)μ1)=i=1ma21i=1m(x(i)μy(i))TΣ1(x(i)μy(i))

∂ L ∂ Σ = m ( − 1 2 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 ) − 1 2 ∑ i = 1 m ( x ( i ) − μ y ( i ) ) T ( − 1 ) Σ − 2 ( x ( i ) − μ y ( i ) ) = 0 \frac{\partial L}{\partial \Sigma}=m(-\frac{1}{2}\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1})-\frac{1}{2}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_{y^{(i)}})^T(-1)\Sigma^{-2}(x^{(i)}-\mu_{y^{(i)}})=0 ΣL=m(21Σ1ΣΣ1)21i=1m(x(i)μy(i))T(1)Σ2(x(i)μy(i))=0

求得: Σ ^ = 1 m ∑ i = 1 m ( x ( i ) − μ y ( i ) ) T ( x ( i ) − μ y ( i ) ) \widehat{\Sigma}=\frac{1}{m}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_{y^{(i)}})^T(x^{(i)}-\mu_{y^{(i)}}) Σ =m1i=1m(x(i)μy(i))T(x(i)μy(i))

【分类】:

求得上述参数之后就可以代入样本 x x x 求后验概率 p ( y = 1 ∣ x ) p(y=1|x) p(y=1x) p ( y = 0 ∣ x ) p(y=0|x) p(y=0x),比较二者大小,将样本 x x x 归于后验概率大的一类。因此可以得到GDA的分离超平面 p ( y = 1 ∣ x ) = p ( y = 0 ∣ x ) p(y=1|x)=p(y=0|x) p(y=1x)=p(y=0x) p ( x ∣ y = 0 ) p ( y = 0 ) = p ( x ∣ y = 1 ) p ( y = 1 ) p(x|y=0)p(y=0)=p(x|y=1)p(y=1) p(xy=0)p(y=0)=p(xy=1)p(y=1) ( 1 − Φ ) e x p { ( x − μ 0 ) T Σ − 1 ( x − μ 0 ) } = Φ e x p { ( x − μ 1 ) T Σ − 1 ( x − μ 1 ) } (1-\Phi) exp\{(x-\mu_0)^T\Sigma^{-1}(x-\mu_0)\}=\Phi exp\{(x-\mu_1)^T\Sigma^{-1}(x-\mu_1)\} (1Φ)exp{(xμ0)TΣ1(xμ0)}=Φexp{(xμ1)TΣ1(xμ1)}

对上式两边取对数化简可得: x T Σ − 1 ( μ 1 − μ 0 ) + ( μ 1 − μ 0 ) T Σ − 1 x = μ 1 T Σ − 1 μ 1 − μ 0 T Σ − 1 μ 0 + l o g Φ − l o g ( 1 − Φ ) x^T\Sigma^{-1}(\mu_1-\mu_0)+(\mu_1-\mu_0)^T\Sigma^{-1}x=\mu_1^{T}\Sigma^{-1}\mu_1-\mu_0^T\Sigma^{-1}\mu_0+log\Phi-log(1-\Phi) xTΣ1(μ1μ0)+(μ1μ0)TΣ1x=μ1TΣ1μ1μ0TΣ1μ0+logΦlog(1Φ)因为左边两项都是数,因此进一步化简: 2 x T Σ − 1 ( μ 1 − μ 0 ) = μ 1 T Σ − 1 μ 1 − μ 0 T Σ − 1 μ 0 + l o g Φ − l o g ( 1 − Φ ) 2x^T\Sigma^{-1}(\mu_1-\mu_0)=\mu_1^{T}\Sigma^{-1}\mu_1-\mu_0^T\Sigma^{-1}\mu_0+log\Phi-log(1-\Phi) 2xTΣ1(μ1μ0)=μ1TΣ1μ1μ0TΣ1μ0+logΦlog(1Φ)
A = 2 Σ − 1 ( μ 1 − μ 0 ) = ( a 1 a 2 … a p ) T , b = μ 1 T Σ − 1 μ 1 − μ 0 T Σ − 1 μ 0 + l o g Φ − l o g ( 1 − Φ ) A=2\Sigma^{-1}(\mu_1-\mu_0)=\begin{pmatrix}a_1&a_2&\dots&a_p\end{pmatrix}^T,b=\mu_1^{T}\Sigma^{-1}\mu_1-\mu_0^T\Sigma^{-1}\mu_0+log\Phi-log(1-\Phi) A=2Σ1(μ1μ0)=(a1a2ap)T,b=μ1TΣ1μ1μ0TΣ1μ0+logΦlog(1Φ)

则超平面可简化为 x T A = b ⇒ a 1 x 1 + a 2 x 2 + ⋯ + a p x p = b x^TA=b\Rightarrow a_1x_1+a_2x_2+\dots+a_px_p=b xTA=ba1x1+a2x2++apxp=b

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值