EM算法精解

用大众能看懂的文字来详细剖析EM(Expectation-Maximum) 算法。

一、问题描述

有A,B,C三枚硬币,正面朝上的概率分别是a, b, c,是未知参数,也是EM算法要估的参数。现在对三枚硬币进行试验,先抛A硬币,若正面朝上,则抛B硬币,否则抛C硬币。如果抛B硬币或者C硬币时,正面朝上记为1,反面朝上记为0。此为一次试验。现重复进行10次如上的试验,记录观测值为:1, 0, 1, 1, 0, 1, 0, 1, 1, 1。问:a, b, c分别是多少?令 θ θ θ=(a, b, c)。

二、变量定义

定义随机变量Xi为第i次试验的观测值,其中 X i ∈ { 0 , 1 } , i = 1 , 2 , … , 10 X_i \in \{0,1 \}, i=1,2,\ldots,10 Xi{0,1},i=1,2,,10。定义隐含随机变量 Z i Z_i Zi为第i次试验中A硬币正面朝上与否,其中 Z i ∈ { 0 , 1 } , i = 1 , 2 , … , 10 Z_i\in \{0,1\}, i=1,2,\ldots,10 Zi{0,1},i=1,2,,10,当A硬币正面朝上, Z i Z_i Zi为1,否则为0。其中 X i X_i Xi是已知, Z i Z_i Zi是未知。10次试验的观测值为 x i ∈ { 0 ,   1 } , i = 1 , 2 , … , 10 x_i\in\{0,\ 1\},i=1,2,\ldots,10 xi{0, 1},i=1,2,,10

三、似然函数

现有n=10个观测值,它们之间相互独立,故似然函数为:
L ( θ ) = ∏ i = 1 10 p ( X i = x i ;   θ ) = ∏ i = 1 10 [ a b x i ( 1 − b ) 1 − x i + ( 1 − a ) c x i ( 1 − c ) 1 − x i ] L\left(\theta\right)=\prod_{i=1}^{10}{p\left(X_i=x_i;\ \theta\right)=}\prod_{i=1}^{10} [ab^{x_i}(1-b)^{1-x_i} + (1-a)c^{x_i} (1-c)^{1-x_i}] L(θ)=i=110p(Xi=xi; θ)=i=110[abxi(1b)1xi+(1a)cxi(1c)1xi]
对数似然函数为:
l ( θ ) = l o g L ( θ ) = ∑ i = 1 10 l o g [ a b x i ( 1 − b ) 1 − x i + ( 1 − a ) c x i ( 1 − c ) 1 − x i ] l(\theta)= logL(\theta) = \sum_{i=1}^{10} log[ab^{x_i}(1-b)^{1-x_i} + (1-a)c^{x_i} (1-c)^{1-x_i}] l(θ)=logL(θ)=i=110log[abxi(1b)1xi+(1a)cxi(1c)1xi]
我们的目标是求 θ \theta θ,也即a, b, c三个介于0-1的数字使得对数似然函数最大,即:
θ ∗ = arg ⁡ max ⁡ θ l ( θ ) \theta^*= \arg \max \limits_\theta l(\theta) θ=argθmaxl(θ)
这种多个对数里面求和的累加是很难通过求偏导来算解析极值的。于是用Jensen不等式来进行转换,把对数里面的求和提到外面。

l ( θ ) = l o g L ( θ ) = l o g ( ∏ i = 1 10 p ( X i = x i ; θ ) ) = l o g ( ∏ i = 1 10 ∑ z = 0 1 p ( X i = x i , Z i = z ; θ ) ) 全 概 率 公 式 = ∑ i = 1 10 l o g ( ∑ z = 0 1 p ( X i = x i , Z i = z ; θ ) ) = ∑ i = 1 10 l o g ( ∑ z = 0 1 B ( Z i = z ; θ ) p ( X i = x i , Z i = z ; θ ) B ( Z i = z ; θ ) ) 分 子 分 母 同 乘 B ( Z i = z ; θ ) , 其 中 ∑ z = 0 1 B ( Z i = z ; θ ) = 1 = ∑ i = 1 10 l o g [ E Z ( p ( X i = x i , Z i ; θ ) B ( Z i ; θ ) ) ] ≥ ∑ i = 1 10 E Z ( l o g ( p ( X i = x i , Z i ; θ ) B ( Z i ; θ ) ) ) 根 据 J e n s e n 不 等 式 = ∑ i = 1 10 ∑ z = 0 1 B ( Z i = z ; θ ) log ⁡ [ p ( X i = x i , Z i = z ; θ ) B ( Z i = z ; θ ) ] l(\theta) = logL(\theta) = log(\prod_{i=1}^{10}p(X_i = x_i; \theta)) \\ = log(\prod_{i=1}^{10}\sum_{z=0}^1p(X_i=x_i, Z_i=z; \theta)) \quad 全概率公式 \\ = \sum_{i=1}^{10} log(\sum_{z=0}^1 p(X_i=x_i, Z_i = z; \theta)) \\ = \sum_{i=1}^{10} log(\sum_{z=0}^1 B(Z_i=z; \theta) \frac{p(X_i=x_i, Z_i = z; \theta)}{B(Z_i=z; \theta)}) \\ 分子分母同乘B(Z_i=z; \theta),其中 \sum_{z=0}^1 B(Z_i=z; \theta) =1\\ = \sum_{i=1}^{10} log[E_Z (\frac{p(X_i=x_i, Z_i ; \theta)}{B(Z_i; \theta)})] \\ \geq \sum_{i=1}^{10} E_Z(log(\frac{p(X_i=x_i, Z_i ; \theta)}{B(Z_i; \theta)})) \quad 根据Jensen不等式\\ = \sum_{i=1}^{10} \sum_{z=0}^1 B(Z_i=z; \theta) \log[\frac{p(X_i=x_i, Z_i=z; \theta)}{B(Z_i=z; \theta)}] l(θ)=logL(θ)=log(i=110p(Xi=xi;θ))=log(i=110z=01p(Xi=xi,Zi=z;θ))=i=110log(z=01p(Xi=xi,Zi=z;θ))=i=110log(z=01B(Zi=z;θ)B(Zi=z;θ)p(Xi=xi,Zi=z;θ))B(Zi=z;θ)z=01B(Zi=z;θ)=1=i=110log[EZ(B(Zi;θ)p(Xi=xi,Zi;θ))]i=110EZ(log(B(Zi;θ)p(Xi=xi,Zi;θ)))Jensen=i=110z=01B(Zi=z;θ)log[B(Zi=z;θ)p(Xi=xi,Zi=z;θ)]

若令 B ( Z i = z ; θ ) = p ( Z i = z ∣ X i = x i ; θ ) , Z i = 0 , 1 , i = 1 , 2 , … , 10 B(Z_i=z; \theta) = p(Z_i=z|X_i=x_i; \theta), Z_i=0,1,i=1,2,\ldots,10 B(Zi=z;θ)=p(Zi=zXi=xi;θ),Zi=0,1,i=1,2,,10则上面等号成立(证明见附录一),即:
l ( θ ) = ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z ∣ X i = x i ; θ ) log ⁡ [ P ( X i = x i , Z i = z ; θ ) p ( Z i = z ∣ X i = x i ; θ ) ] l(\theta) = \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i =z| X_i=x_i;\theta) \log [\frac{P(X_i=x_i, Z_i=z;\theta) }{p(Z_i=z|X_i=x_i;\theta)}] l(θ)=i=110z=01p(Zi=zXi=xi;θ)log[p(Zi=zXi=xi;θ)P(Xi=xi,Zi=z;θ)]
也就是说,我们的目标变成了求 θ \theta θ,使得上述 l ( θ ) l(\theta) l(θ) 最大化,即:
θ ∗ = arg ⁡ max ⁡ θ ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z ∣ X i = x i ; θ ) l o g [ p ( X i = x i , Z i = z ; θ ) p ( Z i = z ∣ X i = x i ; θ ) ] \theta^* = \arg \max \limits_\theta \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z|X_i =x_i; \theta) log[\frac{p(X_i=x_i, Z_i=z; \theta)}{p(Z_i=z|X_i=x_i; \theta)}] θ=argθmaxi=110z=01p(Zi=zXi=xi;θ)log[p(Zi=zXi=xi;θ)p(Xi=xi,Zi=z;θ)]

四、迭代求参数

接下来用迭代的方法来求参数 θ \theta θ使得 l ( θ ) l(\theta) l(θ)最大化。

  1. 首先初始化 θ j = θ 0 \theta_j = \theta_0 θj=θ0,例如 a = 0.4 , b = 0.5 , c = 0.6 a=0.4,b=0.5,c=0.6 a=0.4,b=0.5,c=0.6
  2. 则第 j + 1 j+1 j+1次迭代的 θ \theta θ如下:
    θ j + 1 = arg ⁡ max ⁡ θ ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z ∣ X i = x i ; θ j ) l o g [ p ( X i = x i , Z i = z ; θ ) p ( Z i = z ∣ X i = x i ; θ ) ] = arg ⁡ max ⁡ θ ∑ i = 1 10 ∑ z = 0 1 [ p ( Z i = z i ∣ X i = x i ; θ j ) log ⁡ p ( X i = x i , Z i = z ; θ ) − p ( Z i = z i ∣ X i = x i ; θ j ) log ⁡ p ( Z I = z ∣ X i = x i ; θ j ) ] \theta_{j+1} = \arg \max \limits_\theta \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z|X_i=x_i; \theta_j) log[\frac{p(X_i=x_i, Z_i =z; \theta)}{p(Z_i =z|X_i=x_i;\theta)}] \\ = \arg \max \limits_\theta \sum_{i=1}^{10} \sum_{z=0}^1 [p(Z_i=z_i|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta) - p(Z_i=z_i|X_i=x_i; \theta_j) \log p(Z_I=z|X_i=x_i; \theta_j)] θj+1=argθmaxi=110z=01p(Zi=zXi=xi;θj)log[p(Zi=zXi=xi;θ)p(Xi=xi,Zi=z;θ)]=argθmaxi=110z=01[p(Zi=ziXi=xi;θj)logp(Xi=xi,Zi=z;θ)p(Zi=ziXi=xi;θj)logp(ZI=zXi=xi;θj)]
    去掉和 θ \theta θ无关的项,得:
    arg ⁡ max ⁡ θ ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z i ∣ X i = x i ; θ j ) log ⁡ p ( X i = x i , Z i = z ; θ ) \arg \max \limits_\theta \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z_i|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta) argθmaxi=110z=01p(Zi=ziXi=xi;θj)logp(Xi=xi,Zi=z;θ)
    ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z i ∣ X i = x i ; θ j ) log ⁡ p ( X i = x i , Z i = z ; θ ) \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z_i|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta) i=110z=01p(Zi=ziXi=xi;θj)logp(Xi=xi,Zi=z;θ) Q Q Q函数,记作 Q ( θ , θ j ) Q(\theta, \theta_j) Q(θ,θj)。因为 Q ( θ , θ j ) = ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z i ∣ X i = x i ; θ j ) log ⁡ p ( X i = x i , Z i = z ; θ ) = ∑ i = 1 10 E Z [ log ⁡ p ( X i = x i , Z i = z ; θ ) ] Q(\theta, \theta_j) = \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z_i|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta) =\sum_{i=1}^{10}E_Z[\log p(X_i=x_i, Z_i=z; \theta)] Q(θ,θj)=i=110z=01p(Zi=ziXi=xi;θj)logp(Xi=xi,Zi=z;θ)=i=110EZ[logp(Xi=xi,Zi=z;θ)],这就是EM算法中E的由来。
  3. 继而,求使得 Q ( θ , θ j ) Q(\theta, \theta_j) Q(θ,θj)最大化的 θ j + 1 \theta_{j+1} θj+1,即 θ j + 1 = arg ⁡ max ⁡ θ Q ( θ , θ j ) \theta_{j+1}= \arg \max \limits_\theta Q(\theta, \theta_j) θj+1=argθmaxQ(θ,θj),这就是M的由来。
    Q ( θ , θ j ) = ∑ i = 1 10 ∑ z = 0 1 [ p ( Z i = z ∣ X i = x i ; θ j ) log ⁡ p ( X i = x i , Z i = z ; θ ) ] = ∑ i = 1 10 [ p ( Z i = 0 ∣ X i = x i ; θ j ) log ⁡ p ( X i = x i , Z i = 0 ; θ ) + p ( Z i = 1 ∣ X i = x i ; θ j ) log ⁡ p ( X i = x i , Z i = 1 ; θ ) ] = ∑ i = 1 10 [ p ( Z i = 0 , X i = x i ; θ j ) p ( X i = x i ; θ j ) log ⁡ p ( X i = x i , Z i = 0 ; θ ) + p ( Z i = 1 , X i = x i ; θ j ) p ( X i = x i ; θ j ) log ⁡ p ( X i = x i , Z i = 1 ; θ ) ] 条 件 概 率 = ∑ i = 1 10 [ p ( Z i = 0 , X i = x i ; θ j ) ∑ z = 0 1 p ( X i = x i , Z i = z ; θ j ) log ⁡ p ( X i = x i , Z i = 0 ; θ ) + p ( Z i = 1 , X i = x i ; θ j ) ∑ z = 0 1 p ( X i = x i , Z i = z ; θ j ) log ⁡ p ( X i = x i , Z i = 1 ; θ ) ] = ∑ i = 1 10 ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i + a j b j x i ( 1 − b j ) 1 − x i log ⁡ [ ( 1 − a j + 1 ) c j + 1 x i ( 1 − c j + 1 ) 1 − x i ] + a j b j x i ( 1 − b j ) 1 − x i ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i + a j b j x i ( 1 − b j ) 1 − x i log ⁡ [ a j + 1 b j + 1 x i ( 1 − b j + 1 ) 1 − x i ] Q(\theta, \theta_j) = \sum_{i=1}^{10} \sum_{z=0}^1 [p(Z_i=z|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta)] \\ = \sum_{i=1}^{10} [p(Z_i=0|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=0; \theta) + p(Z_i=1|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=1; \theta)] \\ =\sum_{i=1}^{10} [\frac{p(Z_i =0, X_i=x_i; \theta_j)}{p(X_i=x_i; \theta_j)} \log p(X_i=x_i, Z_i=0; \theta) + \frac{p(Z_i =1, X_i=x_i; \theta_j)}{p(X_i=x_i; \theta_j)} \log p(X_i=x_i, Z_i=1; \theta)] \quad 条件概率 \\ = \sum_{i=1}^{10} [\frac{p(Z_i =0, X_i=x_i; \theta_j)}{ \sum_{z=0}^1 p(X_i=x_i, Z_i = z; \theta_j) } \log p(X_i=x_i, Z_i=0; \theta) + \frac{p(Z_i =1, X_i=x_i; \theta_j)}{ \sum_{z=0}^1 p(X_i=x_i, Z_i = z; \theta_j) } \log p(X_i=x_i, Z_i=1; \theta)] \\ =\sum_{i=1}^{10} { \frac{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} }{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} + a_jb_j^{x_i}(1-b_j)^{1-x_i} } \log[(1-a_{j+1})c_{j+1}^{x_i}(1-c_{j+1})^{1-x_i}] + \\ \frac{ a_jb_j^{x_i}(1-b_j)^{1-x_i} }{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} + a_jb_j^{x_i}(1-b_j)^{1-x_i} } \log[a_{j+1}b_{j+1}^{x_i}(1-b_{j+1})^{1-x_i}] } Q(θ,θj)=i=110z=01[p(Zi=zXi=xi;θj)logp(Xi=xi,Zi=z;θ)]=i=110[p(Zi=0Xi=xi;θj)logp(Xi=xi,Zi=0;θ)+p(Zi=1Xi=xi;θj)logp(Xi=xi,Zi=1;θ)]=i=110[p(Xi=xi;θj)p(Zi=0,Xi=xi;θj)logp(Xi=xi,Zi=0;θ)+p(Xi=xi;θj)p(Zi=1,Xi=xi;θj)logp(Xi=xi,Zi=1;θ)]=i=110[z=01p(Xi=xi,Zi=z;θj)p(Zi=0,Xi=xi;θj)logp(Xi=xi,Zi=0;θ)+z=01p(Xi=xi,Zi=z;θj)p(Zi=1,Xi=xi;θj)logp(Xi=xi,Zi=1;θ)]=i=110(1aj)cjxi(1cj)1xi+ajbjxi(1bj)1xi(1aj)cjxi(1cj)1xilog[(1aj+1)cj+1xi(1cj+1)1xi]+(1aj)cjxi(1cj)1xi+ajbjxi(1bj)1xiajbjxi(1bj)1xilog[aj+1bj+1xi(1bj+1)1xi]
    m i = ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i + a j b j x i ( 1 − b j ) 1 − x i , i = 1 , 2 , … , 10 m_i= \frac{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} }{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} + a_jb_j^{x_i}(1-b_j)^{1-x_i} }, i=1,2,\ldots,10 mi=(1aj)cjxi(1cj)1xi+ajbjxi(1bj)1xi(1aj)cjxi(1cj)1xi,i=1,2,,10,令 n i = a j b j x i ( 1 − b j ) 1 − x i ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i + a j b j x i ( 1 − b j ) 1 − x i , i = 1 , 2 , … , 10 n_i= \frac{ a_jb_j^{x_i}(1-b_j)^{1-x_i} }{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} + a_jb_j^{x_i}(1-b_j)^{1-x_i} }, i=1,2,\ldots,10 ni=(1aj)cjxi(1cj)1xi+ajbjxi(1bj)1xiajbjxi(1bj)1xi,i=1,2,,10,有 m i + n i = 1 , i = 1 , 2 , … , 10 m_i+n_i=1, i=1,2,\ldots,10 mi+ni=1,i=1,2,,10

Q Q Q函数中 a j , b j , c j a_j,b_j,c_j aj,bj,cj是上一轮迭代的结果,是已知的。其中 a j + 1 , b j + 1 , c j + 1 a_{j+1},b_{j+1},c_{j+1} aj+1,bj+1,cj+1是未知参数,现在分别对它们求偏导并等于 0 0 0来求使 Q Q Q函数最大的 a j + 1 , b j + 1 , c j + 1 a_{j+1},b_{j+1},c_{j+1} aj+1,bj+1,cj+1
∂ Q ( θ , θ j ) ∂ a j + 1 = ∑ i = 1 10 ( m i a j + 1 − 1 + n i a j + 1 ) = ∑ i = 1 10 a j + 1 m i + a j + 1 n i − n i ( a j + 1 − 1 ) a j + 1 \frac {\partial Q(\theta, \theta_j)} { \partial a_{j+1}} = \sum_{i=1}^{10} (\frac{m_i}{a_{j+1}-1} + \frac{n_i}{a_{j+1}}) = \sum_{i=1}^{10} \frac{a_{j+1}m_i + a_{j+1}n_i -n_i} {(a_{j+1} -1 )a_{j+1}} aj+1Q(θ,θj)=i=110(aj+11mi+aj+1ni)=i=110(aj+11)aj+1aj+1mi+aj+1nini
∂ Q ( θ , θ j ) ∂ a j + 1 = 0 \frac {\partial Q(\theta, \theta_j)} { \partial a_{j+1}} =0 aj+1Q(θ,θj)=0得: a j + 1 = 1 10 ∑ i = 1 10 n i a_{j+1}=\frac{1}{10} \sum_{i=1}^{10} n_i aj+1=101i=110ni
又:
∂ Q ( θ , θ j ) ∂ b j + 1 = ∑ i = 1 10 n i a j + 1 a j + 1 b j + 1 x i ( 1 − b j + 1 ) 1 − x i ( b j + 1 x i ( 1 − b j + 1 ) 1 − x i ) ′ = ∑ i = 1 10 n i ( x i b j + 1 − 1 − x i 1 − b j + 1 ) 根 据 求 导 乘 法 法 则 并 化 简 \frac {\partial Q(\theta, \theta_j)} { \partial b_{j+1}} = \sum_{i=1}^{10}n_i \frac{a_{j+1}}{a_{j+1}b_{j+1}^{x_i} (1-b_{j+1})^{1-x_i} } (b_{j+1}^{x_i} (1-b_{j+1})^{1-x_i} ) \prime \\ =\sum_{i=1}^{10} n_i (\frac{x_i}{b_{j+1}} -\frac{1-x_i}{1-b_{j+1}}) \quad 根据求导乘法法则并化简 bj+1Q(θ,θj)=i=110niaj+1bj+1xi(1bj+1)1xiaj+1(bj+1xi(1bj+1)1xi)=i=110ni(bj+1xi1bj+11xi)
∂ Q ( θ , θ j ) ∂ b j + 1 = 0 \frac {\partial Q(\theta, \theta_j)} { \partial b_{j+1}} =0 bj+1Q(θ,θj)=0得: b j + 1 = ∑ i = 1 10 n i x i ∑ i = 1 10 n i b_{j+1} = \frac{\sum_{i=1}^{10} n_ix_i} { \sum_{i=1}^{10} n_i} bj+1=i=110nii=110nixi
同理可得: c j + 1 = ∑ i = 1 10 m i x i ∑ i = 1 10 m i c_{j+1} = \frac{\sum_{i=1}^{10} m_ix_i} { \sum_{i=1}^{10} m_i} cj+1=i=110mii=110mixi
4. 重复2,3两步, 直至收敛,或者达到预设标准,最后的 θ \theta θ即为所求。

五、代码示例

import random
import math

a_true = 0.9 # A硬币正面朝上的概率是0.9
b_true = 0.1 # B硬币正面朝上的概率是0.1
c_true = 0.8 # C硬币正面朝上的概率是0.8
n = 5000 # 一共执行5000次试验

obs = []
for i in range(n):
    if random.random() <= a_true:
        obs.append(1) if random.random() <= b_true else obs.append(0)
    else:
        obs.append(1) if random.random() <= c_true else obs.append(0)
print(obs)
# 输出5000次试验的观察结果,1是正面朝上,0是 反面朝上
[1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, ...]5000# 初始化第一次的参数
thetaj = [0.45, 0.2, 0.98] # a0, b0, c0 = 0.45, 0.2, 0.98

# thetaj(s) are from the previous iteration or initialization
def em(thetaj, obs):
    m = []
    n = []
    for i in range(len(obs)):
        m_now = ( (1-thetaj[0])*math.pow(thetaj[2],obs[i])*math.pow(1-thetaj[2],1-obs[i]) )/( (1-thetaj[0])*math.pow(thetaj[2],obs[i])*math.pow(1-thetaj[2],1-obs[i]) + thetaj[0]*math.pow(thetaj[1],obs[i])*math.pow(1-thetaj[1],1-obs[i]) )
        m.append(m_now)
        n_now = 1 - m_now
        n.append(n_now)
  
    m_total, n_total, mx_total, nx_total = 0, 0, 0, 0
    for i in range(len(obs)):
        m_total += m[i]
        n_total += n[i]
        mx_total += m[i]*obs[i]
        nx_total += n[i]*obs[i]
  
    thetaj[0] = n_total/len(obs)
    thetaj[1] = nx_total/n_total
    thetaj[2] = mx_total/m_total

# 一共迭代1000次来估算参数
for j in range(1000):
    em(thetaj, obs)

print("Type: a,b,c")
print("True: " + str(a_true) + "," + str(b_true) + "," + str(c_true))
print("Initialization: " + str(a0) + "," + str(b0) + "," + str(c0))
print("Estimated: " + ",".join([str(i) for i in thetaj]))
# 输出
Type: a,b,c
True: 0.9,0.1,0.8
Initialization: 0.45,0.2,0.98
Estimated: 0.8250824695060178,0.030452224013034972,0.8602591368880216

六、扩展

类似的,可以扩展到其他分布类型,例如混合高斯。只需要根据分布来修改 Q ( θ , θ j ) Q(\theta, \theta_j) Q(θ,θj)中的 p ( Z i , X i ; θ j ) p(Z_i, X_i; \theta_j) p(Zi,Xi;θj)表达式,再分别对要估算的参数求偏导数就可以。例如现有某个年级100名学生的身高的测量值,要求估算出男生和女生身高的均值和方差,也即4个参数需要估计。假设男生和女生身高呈现正态分布。这里的 Z Z Z随机变量表示该观测值是来自男生还是女生,是一个 0 − 1 0-1 01分布。 X X X随机变量服从混合高斯分布。在给定 Z Z Z的前提下 X X X服从一维正态分布。

七、附录一

由Jensen不等式可知,若要上面的等号成立, p ( X i = x i , Z i = z ; θ ) B ( Z i = z ; θ ) \frac{p(X_i=x_i, Z_i=z; \theta)}{B(Z_i=z; \theta)} B(Zi=z;θ)p(Xi=xi,Zi=z;θ)必须是一个常数。现在证明当 B ( Z i = z ; θ ) = p ( Z i = z ∣ X i = x i ; θ ) , Z i = 0 , 1 , i = 1 , 2 , … , 10 , 且 ∑ z = 0 1 B ( Z i = z ; θ ) = 1 B(Z_i=z; \theta) = p(Z_i=z|X_i=x_i; \theta), Z_i=0,1,i=1,2,\ldots,10, 且\sum_{z=0}^1 B(Z_i=z; \theta) =1 B(Zi=z;θ)=p(Zi=zXi=xi;θ),Zi=0,1,i=1,2,,10,z=01B(Zi=z;θ)=1时, p ( X i = x i , Z i = z ; θ ) B ( Z i = z ; θ ) \frac{p(X_i=x_i, Z_i=z; \theta)}{B(Z_i=z; \theta)} B(Zi=z;θ)p(Xi=xi,Zi=z;θ)是个常数。
p ( X i = x i , Z i = z ; θ ) B ( Z i = z ; θ ) = c \frac{p(X_i=x_i, Z_i=z; \theta)}{B(Z_i=z; \theta)} = c B(Zi=z;θ)p(Xi=xi,Zi=z;θ)=c c c c是任意一个常数,则:
B ( Z i = z ; θ ) = p ( X i = x i , Z i = z ; θ ) c = p ( X i = x i , Z i = z ; θ ) ∑ z = 0 1 c B ( Z i = z ; θ ) = p ( X i = x i , Z i = z ; θ ) ∑ z = 0 1 p ( X i = x i , Z i = z ; θ ) = p ( X i = x i , Z i = z ; θ ) p ( X i = x i ; θ ) = p ( Z i = z ∣ X i = x i ; θ ) B(Z_i=z; \theta) = \frac {p(X_i=x_i, Z_i=z; \theta)} {c} = \frac {p(X_i=x_i, Z_i=z; \theta)} { \sum_{z=0}^1 c B(Z_i=z; \theta)} \\ = \frac {p(X_i=x_i, Z_i=z; \theta)} {\sum_{z=0}^1 p(X_i=x_i, Z_i=z; \theta)} \\ = \frac {p(X_i=x_i, Z_i=z; \theta)} {p(X_i=x_i; \theta)} \\ =p(Z_i=z|X_i=x_i; \theta) B(Zi=z;θ)=cp(Xi=xi,Zi=z;θ)=z=01cB(Zi=z;θ)p(Xi=xi,Zi=z;θ)=z=01p(Xi=xi,Zi=z;θ)p(Xi=xi,Zi=z;θ)=p(Xi=xi;θ)p(Xi=xi,Zi=z;θ)=p(Zi=zXi=xi;θ)
所以,当 B ( Z i = z ; θ ) = p ( Z i = z ∣ X i = x i ; θ ) , Z i = 0 , 1 , i = 1 , 2 , … , 10 B(Z_i=z; \theta) = p(Z_i=z|X_i=x_i; \theta), Z_i=0,1,i=1,2,\ldots,10 B(Zi=z;θ)=p(Zi=zXi=xi;θ),Zi=0,1,i=1,2,,10时上述等号成立。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值