机器学习-白板推导 P4_5 (高斯判别分析)

高斯判别分析 Gaussian Discriminant Analysis

定义

y ^ = arg ⁡ max ⁡ y ∈ { 0 , 1 } p ( y ∣ x ) \hat{y} = \arg \max_{y \in \lbrace 0,1 \rbrace} p(y|x) y^=argmaxy{0,1}p(yx)

借助贝叶斯定理: p ( y ∣ x ) = p ( x ∣ y ) p ( y ) p ( x ) p(y|x)=\frac{p(x|y)p(y)}{p(x)} p(yx)=p(x)p(xy)p(y)

生成式模型并不是求两个值的大小,只要能够比较出两个值的大小就行

因为 p ( x ) p(x) p(x)是一个定值

所以 p ( y ∣ x ) ∝ p ( x ∣ y ) p ( y ) p(y|x) \propto {p(x|y)p(y)} p(yx)p(xy)p(y)

所以: y ^ ∝ arg ⁡ max ⁡ y ∈ { 0 , 1 } p ( x ∣ y ) p ( y ) \hat{y} \propto \arg \max_{y \in \lbrace 0,1 \rbrace} p(x|y) p(y) y^argmaxy{0,1}p(xy)p(y)

假设 y y y服从伯努力分布

y ∝ B e r n o u l l i ( ϕ ) y \propto Bernoulli(\phi) yBernoulli(ϕ)
在这里插入图片描述
高斯判别分析,假设条件概率服从高斯分布,均值不同,方差相同

x ∣ y = 1 ∝ N ( μ 1 , Σ ) x|y =1 \propto N(\mu_1,\Sigma) xy=1N(μ1,Σ)

x ∣ y = 0 ∝ N ( μ 2 , Σ ) x|y =0 \propto N(\mu_2,\Sigma) xy=0N(μ2,Σ)

公式

l o g − l i k e l i h o o d log-likelihood loglikelihood:
L ( θ ) = ∑ i = 1 N log ⁡ ( p ( x i ∣ y i ) p ( y i ) ) = ∑ i = 1 N ( log ⁡ ( p ( x i ∣ y i ) + log ⁡ ( p ( y i ) ) = ∑ i = 1 N ( log ⁡ N ( μ 1 , Σ ) y i N ( μ 2 , Σ ) 1 − y i + log ⁡ ϕ y i ( 1 − ϕ ) 1 − y i ) = ∑ i = 1 N ( log ⁡ N ( μ 1 , Σ ) y i + log ⁡ N ( μ 2 , Σ ) 1 − y i + log ⁡ ϕ y i ( 1 − ϕ ) 1 − y i ) \begin{aligned} L(\theta) &= \sum_{i=1}^N \log(p(x_i|y_i)p(y_i)) \\ &=\sum_{i=1}^N(\log(p(x_i|y_i) + \log(p(y_i)) \\ &=\sum_{i=1}^N(\log N(\mu_1,\Sigma)^{y_i}N(\mu_2,\Sigma)^{1-y_i} + \log \phi^{y_i}(1-\phi)^{1-y_i}) \\ &=\sum_{i=1}^N (\log N(\mu_1,\Sigma)^{y_i}+\log N(\mu_2,\Sigma)^{1-y_i} + \log \phi^{y_i}(1-\phi)^{1-y_i}) \end{aligned} L(θ)=i=1Nlog(p(xiyi)p(yi))=i=1N(log(p(xiyi)+log(p(yi))=i=1N(logN(μ1,Σ)yiN(μ2,Σ)1yi+logϕyi(1ϕ)1yi)=i=1N(logN(μ1,Σ)yi+logN(μ2,Σ)1yi+logϕyi(1ϕ)1yi)

θ = ( μ 1 , μ 2 , Σ , ϕ ) \theta=(\mu_1,\mu_2,\Sigma,\phi) θ=(μ1,μ2,Σ,ϕ)

w ^ = a r g max ⁡ θ L ( θ ) \hat{w}=arg \max_{\theta}L(\theta) w^=argmaxθL(θ)

求值

把样本分为2类:
y = 1 : N 1 y=1:N_1 y=1:N1
y = 0 : N 2 y=0:N_2 y=0:N2
N = N 1 + N 2 N=N_1+N_2 N=N1+N2

ϕ \phi ϕ:

定义:

L ( θ ) 3 = log ⁡ ϕ y i ( 1 − ϕ ) 1 − y i L(\theta)_3= \log \phi^{y_i}(1-\phi)^{1-y_i} L(θ)3=logϕyi(1ϕ)1yi

∂ L ( θ ) 3 ∂ ϕ = ∑ i = 1 N ( y i 1 ϕ + ( 1 − y i ) 1 1 − ϕ ( − 1 ) ) = 0 \frac {\partial L(\theta)_3 }{\partial \phi}=\sum_{i=1}^N( y_i \frac{1}{\phi} +(1-y_i)\frac{1}{1- \phi}(-1))=0 ϕL(θ)3=i=1N(yiϕ1+(1yi)1ϕ1(1))=0

∑ i = 1 N ( y i ( 1 − ϕ ) − ( 1 − y i ) ϕ ) = 0 \sum_{i=1}^N( y_i (1- \phi) - (1-y_i){\phi})=0 i=1N(yi(1ϕ)(1yi)ϕ)=0

∑ i = 1 N ( y i − ϕ ) = 0 \sum_{i=1}^N( y_i -{\phi})=0 i=1N(yiϕ)=0

∑ i = 1 N y i − N ϕ = 0 \sum_{i=1}^N y_i -N{\phi}=0 i=1NyiNϕ=0

所以:

ϕ ^ = 1 N ∑ i = 1 N y i = N 1 N \hat{\phi}= \frac{1}{N} \sum_{i=1}^N y_i=\frac{N_1}{N} ϕ^=N1i=1Nyi=NN1

μ 1 \mu_1 μ1:
定义:

L ( θ ) 1 = ∑ i = 1 N ( log ⁡ N ( μ 1 , Σ ) y i ) = ∑ i = 1 N y i log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e x p ( − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ) L(\theta)_1= \sum_{i=1}^N (\log N(\mu_1,\Sigma)^{y_i})=\sum_{i=1}^Ny_i\log\frac{1}{(2 \pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp(-\frac{1}{2}(x_i-\mu_1)^T\Sigma^{-1}(x_i-\mu_1)) L(θ)1=i=1N(logN(μ1,Σ)yi)=i=1Nyilog(2π)2pΣ211exp(21(xiμ1)TΣ1(xiμ1))

μ 1 = a r g max ⁡ μ 1 L ( θ ) 1 = a r g max ⁡ μ 1 ∑ i = 1 N y i ( − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ) \mu_1=arg \max_{\mu_1}L(\theta)_1=arg \max_{\mu_1}\sum_{i=1}^Ny_i(-\frac{1}{2}(x_i-\mu_1)^T\Sigma^{-1}(x_i-\mu_1)) μ1=argmaxμ1L(θ)1=argmaxμ1i=1Nyi(21(xiμ1)TΣ1(xiμ1))

Δ = ∑ i = 1 N y i ( − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ) = − 1 2 ∑ i = 1 N y i ( ( x i T Σ − 1 − μ 1 T Σ − 1 ) ( x i − μ 1 ) ) = − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 x i − μ 1 T Σ − 1 μ 1 − x i T Σ − 1 x i + μ 1 T Σ − 1 μ 1 ) = − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 x i − 2 μ 1 T Σ − 1 x 1 + μ 1 T Σ − 1 μ 1 ) \begin{aligned} \Delta &= \sum_{i=1}^Ny_i(-\frac{1}{2}(x_i-\mu_1)^T\Sigma^{-1}(x_i-\mu_1)) \\ &=-\frac{1}{2} \sum_{i=1}^Ny_i((x_i^T \Sigma^{-1} -\mu_1^T \Sigma^{-1})(x_i-\mu_1)) \\ &=-\frac{1}{2} \sum_{i=1}^Ny_i(x_i^T \Sigma^{-1} x_i-\mu_1^T \Sigma^{-1}\mu_1-x_i^T \Sigma^{-1}x_i +\mu_1^T \Sigma^{-1}\mu_1 )\\ &=-\frac{1}{2} \sum_{i=1}^Ny_i(x_i^T \Sigma^{-1} x_i-2\mu_1^T \Sigma^{-1}x_1 +\mu_1^T \Sigma^{-1}\mu_1 )\\ \end{aligned} Δ=i=1Nyi(21(xiμ1)TΣ1(xiμ1))=21i=1Nyi((xiTΣ1μ1TΣ1)(xiμ1))=21i=1Nyi(xiTΣ1xiμ1TΣ1μ1xiTΣ1xi+μ1TΣ1μ1)=21i=1Nyi(xiTΣ1xi2μ1TΣ1x1+μ1TΣ1μ1)

∂ Δ ∂ μ 1 = − 1 2 ∑ i = 1 N y i ( − 2 Σ − 1 x i + 2 Σ − 1 μ 1 ) = 0 \frac{\partial \Delta}{\partial \mu_1} = -\frac{1}{2} \sum_{i=1}^Ny_i(-2\Sigma^{-1}x_i+2\Sigma^{-1}\mu_1)=0 μ1Δ=21i=1Nyi(2Σ1xi+2Σ1μ1)=0

∑ i = 1 N y i ( Σ − 1 μ 1 − Σ − 1 x i ) = 0 \sum_{i=1}^Ny_i(\Sigma^{-1}\mu_1-\Sigma^{-1}x_i)=0 i=1Nyi(Σ1μ1Σ1xi)=0

∑ i = 1 N y i ( μ 1 − x i ) = 0 \sum_{i=1}^Ny_i(\mu_1-x_i)=0 i=1Nyi(μ1xi)=0

∑ i = 1 N y i μ 1 = ∑ i = 1 N y i x i \sum_{i=1}^Ny_i\mu_1=\sum_{i=1}^Ny_i x_i i=1Nyiμ1=i=1Nyixi

μ 1 ^ = ∑ i = 1 N y i x i ∑ i = 1 N y i = ∑ i = 1 N y i x i N 1 \hat{\mu_1}=\frac{\sum_{i=1}^Ny_ix_i}{\sum_{i=1}^N y_i}=\frac{\sum_{i=1}^Ny_ix_i}{N_1} μ1^=i=1Nyii=1Nyixi=N1i=1Nyixi

Σ \Sigma Σ:

补充公式:
∂ t r ( A B ) ∂ A = B T \frac{\partial tr(AB)}{\partial A}=B^T Atr(AB)=BT
∂ ∣ A ∣ ∂ A = ∣ A ∣ . A − 1 \frac{\partial|A|}{\partial A}=|A|.A^{-1} AA=A.A1
t r ( A B ) = t r ( B A ) tr(AB)=tr(BA) tr(AB)=tr(BA)
t r ( A B C ) = t r ( C A B ) = t r ( B C A ) tr(ABC)=tr(CAB)=tr(BCA) tr(ABC)=tr(CAB)=tr(BCA)

L ( θ ) 2 = ∑ i = 1 N ( log ⁡ N ( μ 1 , Σ ) y i + log ⁡ N ( μ 2 , Σ ) 1 − y i ) L(\theta)_2 = \sum_{i=1}^N (\log N(\mu_1,\Sigma)^{y_i}+\log N(\mu_2,\Sigma)^{1-y_i} ) L(θ)2=i=1N(logN(μ1,Σ)yi+logN(μ2,Σ)1yi)

log ⁡ N ( μ , Σ ) = log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e x p ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) = log ⁡ 1 ( 2 π ) p 2 + log ⁡ ∣ Σ ∣ − 1 2 − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) = C − 1 2 log ⁡ ∣ Σ ∣ − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) \begin{aligned} \log N(\mu,\Sigma) &=\log \frac{1}{(2 \pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) \\ &=\log\frac{1}{(2 \pi)^{\frac{p}{2}}}+\log |\Sigma|^{-\frac{1}{2}}-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) \\ &=C- \frac{1}{2}\log|\Sigma|-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))\\ \end{aligned} logN(μ,Σ)=log(2π)2pΣ211exp(21(xμ)TΣ1(xμ))=log(2π)2p1+logΣ2121(xμ)TΣ1(xμ))=C21logΣ21(xμ)TΣ1(xμ))

∑ i = 1 N log ⁡ N ( μ , Σ ) = ∑ i = 1 N ( C − 1 2 log ⁡ ∣ Σ ∣ − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) ) = C − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 ∑ i = 1 N log ⁡ N ( x − μ ) T Σ − 1 ( x − μ ) ) \begin{aligned} \sum_{i=1}^N \log N(\mu,\Sigma) &= \sum_{i=1}^N \left( C- \frac{1}{2}\log|\Sigma|-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) \right) \\ &=C-\frac{1}{2}N\log|\Sigma|-\frac{1}{2}\sum_{i=1}^N \log N(x-\mu)^T\Sigma^{-1}(x-\mu)) \end{aligned} i=1NlogN(μ,Σ)=i=1N(C21logΣ21(xμ)TΣ1(xμ)))=C21NlogΣ21i=1NlogN(xμ)TΣ1(xμ))
( x − μ ) T Σ − 1 ( x − μ ) (x-\mu)^T\Sigma^{-1}(x-\mu) (xμ)TΣ1(xμ)维度为1

( x − μ ) T Σ − 1 ( x − μ ) = t r ( ( x − μ ) T Σ − 1 ( x − μ ) ) (x-\mu)^T\Sigma^{-1}(x-\mu)=tr((x-\mu)^T\Sigma^{-1}(x-\mu)) (xμ)TΣ1(xμ)=tr((xμ)TΣ1(xμ))

样本方差 S = 1 N ∑ i = 1 N ( x − μ ) ( x − μ ) T S=\frac{1}{N}\sum_{i=1}^N(x-\mu)(x-\mu)^T S=N1i=1N(xμ)(xμ)T
∑ i = 1 N t r ( ( x − μ ) T Σ − 1 ( x − μ ) ) = ∑ i = 1 N t r ( ( x − μ ) ( x − μ ) T Σ − 1 ) = t r ( ∑ i = 1 N ( x − μ ) ( x − μ ) T Σ − 1 ) = N t r ( S Σ − 1 ) \begin{aligned} \sum_{i=1}^Ntr \left((x-\mu)^T\Sigma^{-1}(x-\mu) \right) &= \sum_{i=1}^Ntr \left((x-\mu)(x-\mu)^T\Sigma^{-1} \right) \\ &=tr\left( \sum_{i=1}^N(x-\mu)(x-\mu)^T\Sigma^{-1} \right) \\ &=Ntr(S\Sigma^{-1}) \end{aligned} i=1Ntr((xμ)TΣ1(xμ))=i=1Ntr((xμ)(xμ)TΣ1)=tr(i=1N(xμ)(xμ)TΣ1)=Ntr(SΣ1)
所以:
∑ i = 1 N log ⁡ N ( μ , Σ ) = C − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 N t r ( S Σ − 1 ) \begin{aligned} \sum_{i=1}^N \log N(\mu,\Sigma) &=C-\frac{1}{2}N\log|\Sigma|-\frac{1}{2}Ntr(S\Sigma^{-1}) \end{aligned} i=1NlogN(μ,Σ)=C21NlogΣ21Ntr(SΣ1)

L ( θ ) 2 = ∑ i = 1 N ( log ⁡ N ( μ 1 , Σ ) y i + log ⁡ N ( μ 2 , Σ ) 1 − y i ) = − 1 2 N 1 log ⁡ ∣ Σ ∣ − 1 2 N 1 t r ( S 1 Σ − 1 ) − 1 2 N 2 log ⁡ ∣ Σ ∣ − 1 2 N 2 t r ( S 2 Σ − 1 ) + C = − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 N 1 t r ( S 1 Σ − 1 ) − 1 2 N 2 t r ( S 2 Σ − 1 ) + C = − 1 2 ( N log ⁡ ∣ Σ ∣ + N 1 t r ( S 1 Σ − 1 ) + N 2 t r ( S 2 Σ − 1 ) ) + C \begin{aligned} L(\theta)_2 & = \sum_{i=1}^N (\log N(\mu_1,\Sigma)^{y_i}+\log N(\mu_2,\Sigma)^{1-y_i} ) \\ &=-\frac{1}{2}N_1\log|\Sigma|-\frac{1}{2}N_1tr(S_1\Sigma^{-1})-\frac{1}{2}N_2\log|\Sigma|-\frac{1}{2}N_2tr(S_2\Sigma^{-1})+C \\ &=-\frac{1}{2}N\log|\Sigma|-\frac{1}{2}N_1tr(S_1\Sigma^{-1})-\frac{1}{2}N_2tr(S_2\Sigma^{-1}) +C\\ &=-\frac{1}{2} \left( N\log|\Sigma|+N_1tr(S_1\Sigma^{-1})+N_2tr(S_2\Sigma^{-1}) \right) +C \\ \end{aligned} L(θ)2=i=1N(logN(μ1,Σ)yi+logN(μ2,Σ)1yi)=21N1logΣ21N1tr(S1Σ1)21N2logΣ21N2tr(S2Σ1)+C=21NlogΣ21N1tr(S1Σ1)21N2tr(S2Σ1)+C=21(NlogΣ+N1tr(S1Σ1)+N2tr(S2Σ1))+C

∂ L ( θ ) 2 ∂ Σ = − 1 2 ( N 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 − N 1 S 1 Σ − 2 − N 2 S 2 Σ − 2 ) = − 1 2 ( N Σ − 1 − N 1 S 1 Σ − 2 − N 2 S 2 Σ − 2 ) = 0 \begin{aligned} \frac{\partial L(\theta)_2 }{\partial \Sigma} &= -\frac{1}{2} \left( N\frac{1}{|\Sigma|} |\Sigma| \Sigma^{-1} -N_1S_1\Sigma^{-2} -N_2S_2\Sigma^{-2} \right) \\ & = -\frac{1}{2} \left( N \Sigma^{-1} -N_1S_1\Sigma^{-2} -N_2S_2\Sigma^{-2} \right) \\ &=0 \end{aligned} ΣL(θ)2=21(NΣ1ΣΣ1N1S1Σ2N2S2Σ2)=21(NΣ1N1S1Σ2N2S2Σ2)=0

N Σ − 1 = N 1 S 1 Σ − 2 + N 2 S 2 Σ − 2 N \Sigma^{-1} = N_1S_1\Sigma^{-2} + N_2S_2\Sigma^{-2} NΣ1=N1S1Σ2+N2S2Σ2

N Σ = N 1 S 1 + N 2 S 2 N \Sigma = N_1S_1 + N_2S_2 NΣ=N1S1+N2S2

Σ ^ = 1 N ( N 1 S 1 + N 2 S 2 ) \hat{\Sigma}=\frac{1}{N}(N_1S_1 + N_2S_2) Σ^=N1(N1S1+N2S2)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值