【ML】_06_EM(隐变量)


 


 

【一】 Latent Variable Model(隐变量模型)

 

举个例子:比如说一个人的观测值为【公益活动,运动,执行力强】,但其对应的未观测值为【善良,坚持,博学】,也就是说这是一种因果关系,【善良,坚持,博学】=》【公益活动,运动,执行力强】,“因” 是隐变量,“果” 是观测值

 

  • Complete Case X X X Z Z Z 已知; θ θ θ 未知)-- MLE
     
    ℓ ( θ ; D ) = l o g P ( X , Z   ∣   θ ) = l o g P ( Z   ∣   θ z ) + l o g P ( X ∣ Z , θ x ) \bm {\ell( \theta ; D )} = log P ( X , Z \,|\, \theta ) = logP ( Z \,|\, \theta _ { z } ) + logP ( X | Z , \theta _ { x } ) (θ;D)=logP(X,Zθ)=logP(Zθz)+logP(XZ,θx)

 

  • Incomplete Case X X X 已知; Z Z Z θ θ θ 未知)-- EM
     
    ℓ ( θ ; D ) = l o g ∑ Z P ( X , Z   ∣   θ ) = l o g ∑ Z P ( Z ∣ θ Z ) P ( X ∣ Z , θ x ) \bm {\ell( \theta ; D )} = log\sum _ { Z } P ( X , Z \,|\, \theta ) = log \sum _ { Z } P ( Z | \theta _ { Z } ) P ( X | Z , \theta _ { x } ) (θ;D)=logZP(X,Zθ)=logZP(ZθZ)P(XZ,θx)

 


 

【二】 Expectation Maximization(EM算法)

 

【无监督】一种 迭代算法,用于含 隐变量 的概率模型参数的极大似然估计(E 求期望 + M 求极大)

 

  • 观测数据 & 未观测数据
     
    Y = ( Y 1 , Y 2 … Y n ) T                Z = ( Z 1 , Z 2 … Z n ) T \bm Y = ( Y _ { 1 } , Y _ { 2 } \ldots Y _ { n } ) ^ { T } \;\;\;\;\;\;\; \bm Z = ( Z _ { 1 } , Z _ { 2 } \ldots Z _ { n } ) ^ { T } Y=(Y1,Y2Yn)TZ=(Z1,Z2Zn)T

 

  • 求解 观测数据 的似然函数(MLE)
     
    L ( θ ) = P ( Y ∣ θ ) = ∑ Z P ( Z ∣ θ ) P ( Y ∣ Z , θ ) \bm {L ( \theta )} = \bm {P ( Y | \theta ) }= \sum _ { Z } P ( Z | \theta ) P ( Y | Z , \theta ) L(θ)=P(Yθ)=ZP(Zθ)P(YZ,θ)

 

  • 求解模型参数 θ 的对数极大似然估计
     
    θ ^ = arg ⁡ m a x θ ⁡   l o g P ( Y ∣ θ ) \bm {\hat { \theta }} = \operatorname { arg } \operatorname { max _ {\theta}} \, logP(Y | \theta) θ^=argmaxθlogP(Yθ)

 


 

【三】 手撕 EM 算法(必须掌握)

 

  • 输入:观测变量数据 Y \bm Y Y,隐变量数据 Z \bm Z Z,联合分布 P ( Y , Z ∣ θ ) \bm {P(Y,Z∣θ)} P(Y,Zθ) ,条件分布 P ( Z ∣ Y , θ ) \bm {P(Z∣Y,θ)} P(ZY,θ)
     
    输出:模型参数 θ \bm θ θ

 

  • 推导过程

θ n + 1 = arg ⁡ m a x θ ⁡   L ( θ ) − L ( θ n ) \bm \red {\theta ^ { n + 1 }} = \operatorname { arg } \operatorname { max _ {\theta} } \, \bm {L ( \theta )} - \bm {L ( \theta ^ { n } )} θn+1=argmaxθL(θ)L(θn)
= arg ⁡ m a x θ ⁡   l o g P ( Y   ∣   θ ) − l o g P ( Y   ∣   θ n ) = \operatorname { arg } \operatorname { max _ {\theta} } \, logP(Y \,|\, \theta) - logP(Y \,|\, {\theta} ^ n ) =argmaxθlogP(Yθ)logP(Yθn)
= arg ⁡ m a x θ ⁡   l o g ∑ Z P ( Y   ∣   Z , θ ) P ( Z   ∣   θ ) − l o g P ( Y   ∣   θ n ) = \operatorname { arg } \operatorname { max _ {\theta} } \, log \sum _ { Z } P(Y\,|\, Z,\theta) P(Z \,|\, \theta)- log P(Y \,|\, {\theta} ^ n ) =argmaxθlogZP(YZ,θ)P(Zθ)logP(Yθn)
= arg ⁡ m a x θ ⁡   l o g ∑ Z P ( Y   ∣   Z , θ ) P ( Z   ∣   θ ) ⋅ P ( Z   ∣   Y , θ n ) P ( Z   ∣   Y , θ n ) − l o g P ( Y   ∣   θ n ) = \operatorname { arg } \operatorname { max _ {\theta} } \, log \sum _ { Z } P(Y \,|\, Z, \theta) P(Z \,|\, \theta) \cdot \bm {\frac { P ( Z \,|\, Y , \theta ^ { n } ) } { P ( Z \,|\, Y , \theta ^ { n } ) }} - log P(Y \,|\, {\theta} ^ n ) =argmaxθlogZP(YZ,θ)P(Zθ)P(ZY,θn)P(ZY,θn)logP(Yθn)
= arg ⁡ m a x θ ⁡   l o g ∑ Z P ( Z   ∣   Y , θ n ) ⋅ P ( Y   ∣   Z , θ ) P ( Z   ∣   θ ) P ( Z   ∣   Y , θ n ) − l o g P ( Y   ∣   θ n ) = \operatorname { arg } \operatorname { max _ {\theta} } \, log \sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot {\frac { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } { \bm {P ( Z \,|\, Y , \theta ^ { n } ) }}} - log P(Y \,|\, {\theta} ^ n ) =argmaxθlogZP(ZY,θn)P(ZY,θn)P(YZ,θ)P(Zθ)logP(Yθn)
≥ arg ⁡ m a x θ ⁡   ∑ Z P ( Z   ∣   Y , θ n ) ⋅ l o g   P ( Y   ∣   Z , θ ) P ( Z   ∣   θ ) P ( Z   ∣   Y , θ n ) − l o g P ( Y   ∣   θ n ) \bm \red {\geq }\operatorname { arg } \operatorname { max _ {\theta} } \, \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, {\frac { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } { \bm {P ( Z \,|\, Y , \theta ^ { n } ) }}} - log P(Y \,|\, {\theta} ^ n )} argmaxθZP(ZY,θn)logP(ZY,θn)P(YZ,θ)P(Zθ)logP(Yθn)
= arg ⁡ m a x θ ⁡ △ ( θ   ∣   θ n ) (1) = \operatorname { arg } \operatorname { max _ {\theta} } \bm \red { \triangle ( \theta \,|\, \theta ^ { n } ) } \tag{1} =argmaxθ(θθn)(1)
∴      L ( θ ) − L ( θ n ) ≥ △ ( θ   ∣   θ n )        ⇒      【最大化下限】 ⁡ L ( θ ) ≥ L ( θ n ) + △ ( θ   ∣   θ n ) (2) \bm \red {\therefore} \;\; \bm {L ( \theta )} - \bm {L ( \theta ^ { n } ) } \geq \bm \red { \triangle ( \theta \,|\, \theta ^ { n } ) } \;\;\;\Rightarrow\;\; \operatorname {【最大化下限】} \bm {L ( \theta )} \geq \bm {L ( \theta ^ { n } ) } +\bm \red { \triangle ( \theta \,|\, \theta ^ { n } ) } \tag{2} L(θ)L(θn)(θθn)L(θ)L(θn)+(θθn)(2)
∴      θ n + 1 = arg ⁡ m a x θ ⁡    [ L ( θ n ) + △ ( θ   ∣   θ n )   ] \bm \red {\therefore} \;\; \bm \red {\theta ^ { n + 1 }} = \operatorname { arg } \operatorname { max _ {\theta} } \; [ \bm {L ( \theta ^ { n } ) } +\bm \red { \triangle ( \theta \,|\, \theta ^ { n } ) } \, ] θn+1=argmaxθ[L(θn)+(θθn)]
= arg ⁡ m a x θ ⁡    [   L ( θ n ) + ∑ Z P ( Z   ∣   Y , θ n ) ⋅ l o g   P ( Y   ∣   Z , θ ) P ( Z   ∣   θ ) P ( Z   ∣   Y , θ n ) − l o g P ( Y   ∣   θ n )   ] = \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm {L ( \theta ^ { n } ) } + \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, {\frac { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } { \bm {P ( Z \,|\, Y , \theta ^ { n } ) }}} - log P(Y \,|\, {\theta} ^ n )} \, ] =argmaxθ[L(θn)+ZP(ZY,θn)logP(ZY,θn)P(YZ,θ)P(Zθ)logP(Yθn)]
= arg ⁡ m a x θ ⁡    [   L ( θ n ) + ∑ Z P ( Z   ∣   Y , θ n ) ⋅ l o g   P ( Y   ∣   Z , θ ) P ( Z   ∣   θ ) P ( Z   ∣   Y , θ n ) P ( Y   ∣   θ n )   ] = \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm {L ( \theta ^ { n } ) } + \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, {\frac { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } { \bm {P ( Z \,|\, Y , \theta ^ { n } )} P(Y \,|\, {\theta} ^ n ) }}} \, ] =argmaxθ[L(θn)+ZP(ZY,θn)logP(ZY,θn)P(Yθn)P(YZ,θ)P(Zθ)]
= arg ⁡ m a x θ ⁡    [   ∑ Z P ( Z   ∣   Y , θ n ) ⋅ l o g   P ( Y   ∣   Z , θ ) P ( Z   ∣   θ )   ] = \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, { { P(Y \,|\, Z, \theta) P(Z \,|\, \theta) } }} \, ] =argmaxθ[ZP(ZY,θn)logP(YZ,θ)P(Zθ)]
= arg ⁡ m a x θ ⁡    [   ∑ Z P ( Z   ∣   Y , θ n ) ⋅ l o g   P ( Y , Z   ∣   θ )   ] = \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \red {\sum _ { Z } \bm {P ( Z \,|\, Y , \theta ^ { n } )} \cdot \bm {log} \, { { P(Y, Z \,|\, \theta) }}} \, ] =argmaxθ[ZP(ZY,θn)logP(Y,Zθ)]
= arg ⁡ m a x θ ⁡    [   E Z ∣ Y ,   θ n [   l o g   P ( Y , Z   ∣   θ )   ]   ] (3) = \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm \red {E _ { Z | Y , \, \theta ^ { n } } \bm [\,{ log } \, { { P(Y, Z \,|\, \theta) }} \,]} \, ] \tag{3} =argmaxθ[EZY,θn[logP(Y,Zθ)]](3)

 
 

  • E − S t e p \bm \red {E - Step} EStep:(先求 Z \bm Z Z
     
    arg ⁡ m a x θ ⁡    [   E Z ∣ Y ,   θ n [   l o g   P ( Y , Z   ∣   θ )   ]   ] (E) \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm \red {E _ { Z | Y , \, \theta ^ { n } }} \bm {[\,{ log } \, { { P(Y, Z \,|\, \theta) }} \,] \, ]} \tag{E} argmaxθ[EZY,θn[logP(Y,Zθ)]](E)

 

  • M − S t e p \bm \red {M - Step} MStep:(再求 θ \bm \theta θ
     
    arg ⁡ m a x θ ⁡    [   E Z ∣ Y ,   θ n [   l o g   P ( Y , Z   ∣   θ )   ]   ] (M) \operatorname { arg } \operatorname { max _ {\theta} } \; [ \, \bm {E _ { Z | Y , \, \theta ^ { n } }} \bm \red { [\,{ log } \, { { P(Y, Z \,|\, \theta) }} \,] \, ]} \tag{M} argmaxθ[EZY,θn[logP(Y,Zθ)]](M)

 
 

  • 重复 E − S t e p E-Step EStep M − S t e p M-Step MStep,直至收敛( L ( θ 1 ) < L ( θ 2 ) < L ( θ 3 ) < . . . < L ( θ n ) L ( \theta ^ { 1 } ) < L ( \theta ^ { 2 } ) < L ( \theta ^ { 3 } ) <...< L ( \theta ^ { n } ) L(θ1)<L(θ2)<L(θ3)<...<L(θn)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值