用大众能看懂的文字来详细剖析EM(Expectation-Maximum) 算法。
一、问题描述
有A,B,C三枚硬币,正面朝上的概率分别是a, b, c,是未知参数,也是EM算法要估的参数。现在对三枚硬币进行试验,先抛A硬币,若正面朝上,则抛B硬币,否则抛C硬币。如果抛B硬币或者C硬币时,正面朝上记为1,反面朝上记为0。此为一次试验。现重复进行10次如上的试验,记录观测值为:1, 0, 1, 1, 0, 1, 0, 1, 1, 1。问:a, b, c分别是多少?令 θ θ θ=(a, b, c)。
二、变量定义
定义随机变量Xi为第i次试验的观测值,其中 X i ∈ { 0 , 1 } , i = 1 , 2 , … , 10 X_i \in \{0,1 \}, i=1,2,\ldots,10 Xi∈{0,1},i=1,2,…,10。定义隐含随机变量 Z i Z_i Zi为第i次试验中A硬币正面朝上与否,其中 Z i ∈ { 0 , 1 } , i = 1 , 2 , … , 10 Z_i\in \{0,1\}, i=1,2,\ldots,10 Zi∈{0,1},i=1,2,…,10,当A硬币正面朝上, Z i Z_i Zi为1,否则为0。其中 X i X_i Xi是已知, Z i Z_i Zi是未知。10次试验的观测值为 x i ∈ { 0 , 1 } , i = 1 , 2 , … , 10 x_i\in\{0,\ 1\},i=1,2,\ldots,10 xi∈{0, 1},i=1,2,…,10。
三、似然函数
现有n=10个观测值,它们之间相互独立,故似然函数为:
L
(
θ
)
=
∏
i
=
1
10
p
(
X
i
=
x
i
;
θ
)
=
∏
i
=
1
10
[
a
b
x
i
(
1
−
b
)
1
−
x
i
+
(
1
−
a
)
c
x
i
(
1
−
c
)
1
−
x
i
]
L\left(\theta\right)=\prod_{i=1}^{10}{p\left(X_i=x_i;\ \theta\right)=}\prod_{i=1}^{10} [ab^{x_i}(1-b)^{1-x_i} + (1-a)c^{x_i} (1-c)^{1-x_i}]
L(θ)=i=1∏10p(Xi=xi; θ)=i=1∏10[abxi(1−b)1−xi+(1−a)cxi(1−c)1−xi]
对数似然函数为:
l
(
θ
)
=
l
o
g
L
(
θ
)
=
∑
i
=
1
10
l
o
g
[
a
b
x
i
(
1
−
b
)
1
−
x
i
+
(
1
−
a
)
c
x
i
(
1
−
c
)
1
−
x
i
]
l(\theta)= logL(\theta) = \sum_{i=1}^{10} log[ab^{x_i}(1-b)^{1-x_i} + (1-a)c^{x_i} (1-c)^{1-x_i}]
l(θ)=logL(θ)=i=1∑10log[abxi(1−b)1−xi+(1−a)cxi(1−c)1−xi]
我们的目标是求
θ
\theta
θ,也即a, b, c三个介于0-1的数字使得对数似然函数最大,即:
θ
∗
=
arg
max
θ
l
(
θ
)
\theta^*= \arg \max \limits_\theta l(\theta)
θ∗=argθmaxl(θ)
这种多个对数里面求和的累加是很难通过求偏导来算解析极值的。于是用Jensen不等式来进行转换,把对数里面的求和提到外面。
l ( θ ) = l o g L ( θ ) = l o g ( ∏ i = 1 10 p ( X i = x i ; θ ) ) = l o g ( ∏ i = 1 10 ∑ z = 0 1 p ( X i = x i , Z i = z ; θ ) ) 全 概 率 公 式 = ∑ i = 1 10 l o g ( ∑ z = 0 1 p ( X i = x i , Z i = z ; θ ) ) = ∑ i = 1 10 l o g ( ∑ z = 0 1 B ( Z i = z ; θ ) p ( X i = x i , Z i = z ; θ ) B ( Z i = z ; θ ) ) 分 子 分 母 同 乘 B ( Z i = z ; θ ) , 其 中 ∑ z = 0 1 B ( Z i = z ; θ ) = 1 = ∑ i = 1 10 l o g [ E Z ( p ( X i = x i , Z i ; θ ) B ( Z i ; θ ) ) ] ≥ ∑ i = 1 10 E Z ( l o g ( p ( X i = x i , Z i ; θ ) B ( Z i ; θ ) ) ) 根 据 J e n s e n 不 等 式 = ∑ i = 1 10 ∑ z = 0 1 B ( Z i = z ; θ ) log [ p ( X i = x i , Z i = z ; θ ) B ( Z i = z ; θ ) ] l(\theta) = logL(\theta) = log(\prod_{i=1}^{10}p(X_i = x_i; \theta)) \\ = log(\prod_{i=1}^{10}\sum_{z=0}^1p(X_i=x_i, Z_i=z; \theta)) \quad 全概率公式 \\ = \sum_{i=1}^{10} log(\sum_{z=0}^1 p(X_i=x_i, Z_i = z; \theta)) \\ = \sum_{i=1}^{10} log(\sum_{z=0}^1 B(Z_i=z; \theta) \frac{p(X_i=x_i, Z_i = z; \theta)}{B(Z_i=z; \theta)}) \\ 分子分母同乘B(Z_i=z; \theta),其中 \sum_{z=0}^1 B(Z_i=z; \theta) =1\\ = \sum_{i=1}^{10} log[E_Z (\frac{p(X_i=x_i, Z_i ; \theta)}{B(Z_i; \theta)})] \\ \geq \sum_{i=1}^{10} E_Z(log(\frac{p(X_i=x_i, Z_i ; \theta)}{B(Z_i; \theta)})) \quad 根据Jensen不等式\\ = \sum_{i=1}^{10} \sum_{z=0}^1 B(Z_i=z; \theta) \log[\frac{p(X_i=x_i, Z_i=z; \theta)}{B(Z_i=z; \theta)}] l(θ)=logL(θ)=log(i=1∏10p(Xi=xi;θ))=log(i=1∏10z=0∑1p(Xi=xi,Zi=z;θ))全概率公式=i=1∑10log(z=0∑1p(Xi=xi,Zi=z;θ))=i=1∑10log(z=0∑1B(Zi=z;θ)B(Zi=z;θ)p(Xi=xi,Zi=z;θ))分子分母同乘B(Zi=z;θ),其中z=0∑1B(Zi=z;θ)=1=i=1∑10log[EZ(B(Zi;θ)p(Xi=xi,Zi;θ))]≥i=1∑10EZ(log(B(Zi;θ)p(Xi=xi,Zi;θ)))根据Jensen不等式=i=1∑10z=0∑1B(Zi=z;θ)log[B(Zi=z;θ)p(Xi=xi,Zi=z;θ)]
若令
B
(
Z
i
=
z
;
θ
)
=
p
(
Z
i
=
z
∣
X
i
=
x
i
;
θ
)
,
Z
i
=
0
,
1
,
i
=
1
,
2
,
…
,
10
B(Z_i=z; \theta) = p(Z_i=z|X_i=x_i; \theta), Z_i=0,1,i=1,2,\ldots,10
B(Zi=z;θ)=p(Zi=z∣Xi=xi;θ),Zi=0,1,i=1,2,…,10则上面等号成立(证明见附录一),即:
l
(
θ
)
=
∑
i
=
1
10
∑
z
=
0
1
p
(
Z
i
=
z
∣
X
i
=
x
i
;
θ
)
log
[
P
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
p
(
Z
i
=
z
∣
X
i
=
x
i
;
θ
)
]
l(\theta) = \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i =z| X_i=x_i;\theta) \log [\frac{P(X_i=x_i, Z_i=z;\theta) }{p(Z_i=z|X_i=x_i;\theta)}]
l(θ)=i=1∑10z=0∑1p(Zi=z∣Xi=xi;θ)log[p(Zi=z∣Xi=xi;θ)P(Xi=xi,Zi=z;θ)]
也就是说,我们的目标变成了求
θ
\theta
θ,使得上述
l
(
θ
)
l(\theta)
l(θ) 最大化,即:
θ
∗
=
arg
max
θ
∑
i
=
1
10
∑
z
=
0
1
p
(
Z
i
=
z
∣
X
i
=
x
i
;
θ
)
l
o
g
[
p
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
p
(
Z
i
=
z
∣
X
i
=
x
i
;
θ
)
]
\theta^* = \arg \max \limits_\theta \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z|X_i =x_i; \theta) log[\frac{p(X_i=x_i, Z_i=z; \theta)}{p(Z_i=z|X_i=x_i; \theta)}]
θ∗=argθmaxi=1∑10z=0∑1p(Zi=z∣Xi=xi;θ)log[p(Zi=z∣Xi=xi;θ)p(Xi=xi,Zi=z;θ)]
四、迭代求参数
接下来用迭代的方法来求参数 θ \theta θ使得 l ( θ ) l(\theta) l(θ)最大化。
- 首先初始化 θ j = θ 0 \theta_j = \theta_0 θj=θ0,例如 a = 0.4 , b = 0.5 , c = 0.6 a=0.4,b=0.5,c=0.6 a=0.4,b=0.5,c=0.6。
- 则第
j
+
1
j+1
j+1次迭代的
θ
\theta
θ如下:
θ j + 1 = arg max θ ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z ∣ X i = x i ; θ j ) l o g [ p ( X i = x i , Z i = z ; θ ) p ( Z i = z ∣ X i = x i ; θ ) ] = arg max θ ∑ i = 1 10 ∑ z = 0 1 [ p ( Z i = z i ∣ X i = x i ; θ j ) log p ( X i = x i , Z i = z ; θ ) − p ( Z i = z i ∣ X i = x i ; θ j ) log p ( Z I = z ∣ X i = x i ; θ j ) ] \theta_{j+1} = \arg \max \limits_\theta \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z|X_i=x_i; \theta_j) log[\frac{p(X_i=x_i, Z_i =z; \theta)}{p(Z_i =z|X_i=x_i;\theta)}] \\ = \arg \max \limits_\theta \sum_{i=1}^{10} \sum_{z=0}^1 [p(Z_i=z_i|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta) - p(Z_i=z_i|X_i=x_i; \theta_j) \log p(Z_I=z|X_i=x_i; \theta_j)] θj+1=argθmaxi=1∑10z=0∑1p(Zi=z∣Xi=xi;θj)log[p(Zi=z∣Xi=xi;θ)p(Xi=xi,Zi=z;θ)]=argθmaxi=1∑10z=0∑1[p(Zi=zi∣Xi=xi;θj)logp(Xi=xi,Zi=z;θ)−p(Zi=zi∣Xi=xi;θj)logp(ZI=z∣Xi=xi;θj)]
去掉和 θ \theta θ无关的项,得:
arg max θ ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z i ∣ X i = x i ; θ j ) log p ( X i = x i , Z i = z ; θ ) \arg \max \limits_\theta \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z_i|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta) argθmaxi=1∑10z=0∑1p(Zi=zi∣Xi=xi;θj)logp(Xi=xi,Zi=z;θ)
记 ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z i ∣ X i = x i ; θ j ) log p ( X i = x i , Z i = z ; θ ) \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z_i|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta) ∑i=110∑z=01p(Zi=zi∣Xi=xi;θj)logp(Xi=xi,Zi=z;θ)为 Q Q Q函数,记作 Q ( θ , θ j ) Q(\theta, \theta_j) Q(θ,θj)。因为 Q ( θ , θ j ) = ∑ i = 1 10 ∑ z = 0 1 p ( Z i = z i ∣ X i = x i ; θ j ) log p ( X i = x i , Z i = z ; θ ) = ∑ i = 1 10 E Z [ log p ( X i = x i , Z i = z ; θ ) ] Q(\theta, \theta_j) = \sum_{i=1}^{10} \sum_{z=0}^1 p(Z_i=z_i|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta) =\sum_{i=1}^{10}E_Z[\log p(X_i=x_i, Z_i=z; \theta)] Q(θ,θj)=∑i=110∑z=01p(Zi=zi∣Xi=xi;θj)logp(Xi=xi,Zi=z;θ)=∑i=110EZ[logp(Xi=xi,Zi=z;θ)],这就是EM算法中E的由来。 - 继而,求使得
Q
(
θ
,
θ
j
)
Q(\theta, \theta_j)
Q(θ,θj)最大化的
θ
j
+
1
\theta_{j+1}
θj+1,即
θ
j
+
1
=
arg
max
θ
Q
(
θ
,
θ
j
)
\theta_{j+1}= \arg \max \limits_\theta Q(\theta, \theta_j)
θj+1=argθmaxQ(θ,θj),这就是M的由来。
Q ( θ , θ j ) = ∑ i = 1 10 ∑ z = 0 1 [ p ( Z i = z ∣ X i = x i ; θ j ) log p ( X i = x i , Z i = z ; θ ) ] = ∑ i = 1 10 [ p ( Z i = 0 ∣ X i = x i ; θ j ) log p ( X i = x i , Z i = 0 ; θ ) + p ( Z i = 1 ∣ X i = x i ; θ j ) log p ( X i = x i , Z i = 1 ; θ ) ] = ∑ i = 1 10 [ p ( Z i = 0 , X i = x i ; θ j ) p ( X i = x i ; θ j ) log p ( X i = x i , Z i = 0 ; θ ) + p ( Z i = 1 , X i = x i ; θ j ) p ( X i = x i ; θ j ) log p ( X i = x i , Z i = 1 ; θ ) ] 条 件 概 率 = ∑ i = 1 10 [ p ( Z i = 0 , X i = x i ; θ j ) ∑ z = 0 1 p ( X i = x i , Z i = z ; θ j ) log p ( X i = x i , Z i = 0 ; θ ) + p ( Z i = 1 , X i = x i ; θ j ) ∑ z = 0 1 p ( X i = x i , Z i = z ; θ j ) log p ( X i = x i , Z i = 1 ; θ ) ] = ∑ i = 1 10 ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i + a j b j x i ( 1 − b j ) 1 − x i log [ ( 1 − a j + 1 ) c j + 1 x i ( 1 − c j + 1 ) 1 − x i ] + a j b j x i ( 1 − b j ) 1 − x i ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i + a j b j x i ( 1 − b j ) 1 − x i log [ a j + 1 b j + 1 x i ( 1 − b j + 1 ) 1 − x i ] Q(\theta, \theta_j) = \sum_{i=1}^{10} \sum_{z=0}^1 [p(Z_i=z|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=z; \theta)] \\ = \sum_{i=1}^{10} [p(Z_i=0|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=0; \theta) + p(Z_i=1|X_i=x_i; \theta_j) \log p(X_i=x_i, Z_i=1; \theta)] \\ =\sum_{i=1}^{10} [\frac{p(Z_i =0, X_i=x_i; \theta_j)}{p(X_i=x_i; \theta_j)} \log p(X_i=x_i, Z_i=0; \theta) + \frac{p(Z_i =1, X_i=x_i; \theta_j)}{p(X_i=x_i; \theta_j)} \log p(X_i=x_i, Z_i=1; \theta)] \quad 条件概率 \\ = \sum_{i=1}^{10} [\frac{p(Z_i =0, X_i=x_i; \theta_j)}{ \sum_{z=0}^1 p(X_i=x_i, Z_i = z; \theta_j) } \log p(X_i=x_i, Z_i=0; \theta) + \frac{p(Z_i =1, X_i=x_i; \theta_j)}{ \sum_{z=0}^1 p(X_i=x_i, Z_i = z; \theta_j) } \log p(X_i=x_i, Z_i=1; \theta)] \\ =\sum_{i=1}^{10} { \frac{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} }{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} + a_jb_j^{x_i}(1-b_j)^{1-x_i} } \log[(1-a_{j+1})c_{j+1}^{x_i}(1-c_{j+1})^{1-x_i}] + \\ \frac{ a_jb_j^{x_i}(1-b_j)^{1-x_i} }{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} + a_jb_j^{x_i}(1-b_j)^{1-x_i} } \log[a_{j+1}b_{j+1}^{x_i}(1-b_{j+1})^{1-x_i}] } Q(θ,θj)=i=1∑10z=0∑1[p(Zi=z∣Xi=xi;θj)logp(Xi=xi,Zi=z;θ)]=i=1∑10[p(Zi=0∣Xi=xi;θj)logp(Xi=xi,Zi=0;θ)+p(Zi=1∣Xi=xi;θj)logp(Xi=xi,Zi=1;θ)]=i=1∑10[p(Xi=xi;θj)p(Zi=0,Xi=xi;θj)logp(Xi=xi,Zi=0;θ)+p(Xi=xi;θj)p(Zi=1,Xi=xi;θj)logp(Xi=xi,Zi=1;θ)]条件概率=i=1∑10[∑z=01p(Xi=xi,Zi=z;θj)p(Zi=0,Xi=xi;θj)logp(Xi=xi,Zi=0;θ)+∑z=01p(Xi=xi,Zi=z;θj)p(Zi=1,Xi=xi;θj)logp(Xi=xi,Zi=1;θ)]=i=1∑10(1−aj)cjxi(1−cj)1−xi+ajbjxi(1−bj)1−xi(1−aj)cjxi(1−cj)1−xilog[(1−aj+1)cj+1xi(1−cj+1)1−xi]+(1−aj)cjxi(1−cj)1−xi+ajbjxi(1−bj)1−xiajbjxi(1−bj)1−xilog[aj+1bj+1xi(1−bj+1)1−xi]
令 m i = ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i + a j b j x i ( 1 − b j ) 1 − x i , i = 1 , 2 , … , 10 m_i= \frac{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} }{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} + a_jb_j^{x_i}(1-b_j)^{1-x_i} }, i=1,2,\ldots,10 mi=(1−aj)cjxi(1−cj)1−xi+ajbjxi(1−bj)1−xi(1−aj)cjxi(1−cj)1−xi,i=1,2,…,10,令 n i = a j b j x i ( 1 − b j ) 1 − x i ( 1 − a j ) c j x i ( 1 − c j ) 1 − x i + a j b j x i ( 1 − b j ) 1 − x i , i = 1 , 2 , … , 10 n_i= \frac{ a_jb_j^{x_i}(1-b_j)^{1-x_i} }{ (1-a_j)c_j^{x_i}(1-c_j)^{1-x_i} + a_jb_j^{x_i}(1-b_j)^{1-x_i} }, i=1,2,\ldots,10 ni=(1−aj)cjxi(1−cj)1−xi+ajbjxi(1−bj)1−xiajbjxi(1−bj)1−xi,i=1,2,…,10,有 m i + n i = 1 , i = 1 , 2 , … , 10 m_i+n_i=1, i=1,2,\ldots,10 mi+ni=1,i=1,2,…,10
Q
Q
Q函数中
a
j
,
b
j
,
c
j
a_j,b_j,c_j
aj,bj,cj是上一轮迭代的结果,是已知的。其中
a
j
+
1
,
b
j
+
1
,
c
j
+
1
a_{j+1},b_{j+1},c_{j+1}
aj+1,bj+1,cj+1是未知参数,现在分别对它们求偏导并等于
0
0
0来求使
Q
Q
Q函数最大的
a
j
+
1
,
b
j
+
1
,
c
j
+
1
a_{j+1},b_{j+1},c_{j+1}
aj+1,bj+1,cj+1。
∂
Q
(
θ
,
θ
j
)
∂
a
j
+
1
=
∑
i
=
1
10
(
m
i
a
j
+
1
−
1
+
n
i
a
j
+
1
)
=
∑
i
=
1
10
a
j
+
1
m
i
+
a
j
+
1
n
i
−
n
i
(
a
j
+
1
−
1
)
a
j
+
1
\frac {\partial Q(\theta, \theta_j)} { \partial a_{j+1}} = \sum_{i=1}^{10} (\frac{m_i}{a_{j+1}-1} + \frac{n_i}{a_{j+1}}) = \sum_{i=1}^{10} \frac{a_{j+1}m_i + a_{j+1}n_i -n_i} {(a_{j+1} -1 )a_{j+1}}
∂aj+1∂Q(θ,θj)=i=1∑10(aj+1−1mi+aj+1ni)=i=1∑10(aj+1−1)aj+1aj+1mi+aj+1ni−ni
令
∂
Q
(
θ
,
θ
j
)
∂
a
j
+
1
=
0
\frac {\partial Q(\theta, \theta_j)} { \partial a_{j+1}} =0
∂aj+1∂Q(θ,θj)=0得:
a
j
+
1
=
1
10
∑
i
=
1
10
n
i
a_{j+1}=\frac{1}{10} \sum_{i=1}^{10} n_i
aj+1=101∑i=110ni。
又:
∂
Q
(
θ
,
θ
j
)
∂
b
j
+
1
=
∑
i
=
1
10
n
i
a
j
+
1
a
j
+
1
b
j
+
1
x
i
(
1
−
b
j
+
1
)
1
−
x
i
(
b
j
+
1
x
i
(
1
−
b
j
+
1
)
1
−
x
i
)
′
=
∑
i
=
1
10
n
i
(
x
i
b
j
+
1
−
1
−
x
i
1
−
b
j
+
1
)
根
据
求
导
乘
法
法
则
并
化
简
\frac {\partial Q(\theta, \theta_j)} { \partial b_{j+1}} = \sum_{i=1}^{10}n_i \frac{a_{j+1}}{a_{j+1}b_{j+1}^{x_i} (1-b_{j+1})^{1-x_i} } (b_{j+1}^{x_i} (1-b_{j+1})^{1-x_i} ) \prime \\ =\sum_{i=1}^{10} n_i (\frac{x_i}{b_{j+1}} -\frac{1-x_i}{1-b_{j+1}}) \quad 根据求导乘法法则并化简
∂bj+1∂Q(θ,θj)=i=1∑10niaj+1bj+1xi(1−bj+1)1−xiaj+1(bj+1xi(1−bj+1)1−xi)′=i=1∑10ni(bj+1xi−1−bj+11−xi)根据求导乘法法则并化简
令
∂
Q
(
θ
,
θ
j
)
∂
b
j
+
1
=
0
\frac {\partial Q(\theta, \theta_j)} { \partial b_{j+1}} =0
∂bj+1∂Q(θ,θj)=0得:
b
j
+
1
=
∑
i
=
1
10
n
i
x
i
∑
i
=
1
10
n
i
b_{j+1} = \frac{\sum_{i=1}^{10} n_ix_i} { \sum_{i=1}^{10} n_i}
bj+1=∑i=110ni∑i=110nixi
同理可得:
c
j
+
1
=
∑
i
=
1
10
m
i
x
i
∑
i
=
1
10
m
i
c_{j+1} = \frac{\sum_{i=1}^{10} m_ix_i} { \sum_{i=1}^{10} m_i}
cj+1=∑i=110mi∑i=110mixi
4. 重复2,3两步, 直至收敛,或者达到预设标准,最后的
θ
\theta
θ即为所求。
五、代码示例
import random
import math
a_true = 0.9 # A硬币正面朝上的概率是0.9
b_true = 0.1 # B硬币正面朝上的概率是0.1
c_true = 0.8 # C硬币正面朝上的概率是0.8
n = 5000 # 一共执行5000次试验
obs = []
for i in range(n):
if random.random() <= a_true:
obs.append(1) if random.random() <= b_true else obs.append(0)
else:
obs.append(1) if random.random() <= c_true else obs.append(0)
print(obs)
# 输出5000次试验的观察结果,1是正面朝上,0是 反面朝上
[1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, ...] 共5000个
# 初始化第一次的参数
thetaj = [0.45, 0.2, 0.98] # a0, b0, c0 = 0.45, 0.2, 0.98
# thetaj(s) are from the previous iteration or initialization
def em(thetaj, obs):
m = []
n = []
for i in range(len(obs)):
m_now = ( (1-thetaj[0])*math.pow(thetaj[2],obs[i])*math.pow(1-thetaj[2],1-obs[i]) )/( (1-thetaj[0])*math.pow(thetaj[2],obs[i])*math.pow(1-thetaj[2],1-obs[i]) + thetaj[0]*math.pow(thetaj[1],obs[i])*math.pow(1-thetaj[1],1-obs[i]) )
m.append(m_now)
n_now = 1 - m_now
n.append(n_now)
m_total, n_total, mx_total, nx_total = 0, 0, 0, 0
for i in range(len(obs)):
m_total += m[i]
n_total += n[i]
mx_total += m[i]*obs[i]
nx_total += n[i]*obs[i]
thetaj[0] = n_total/len(obs)
thetaj[1] = nx_total/n_total
thetaj[2] = mx_total/m_total
# 一共迭代1000次来估算参数
for j in range(1000):
em(thetaj, obs)
print("Type: a,b,c")
print("True: " + str(a_true) + "," + str(b_true) + "," + str(c_true))
print("Initialization: " + str(a0) + "," + str(b0) + "," + str(c0))
print("Estimated: " + ",".join([str(i) for i in thetaj]))
# 输出
Type: a,b,c
True: 0.9,0.1,0.8
Initialization: 0.45,0.2,0.98
Estimated: 0.8250824695060178,0.030452224013034972,0.8602591368880216
六、扩展
类似的,可以扩展到其他分布类型,例如混合高斯。只需要根据分布来修改 Q ( θ , θ j ) Q(\theta, \theta_j) Q(θ,θj)中的 p ( Z i , X i ; θ j ) p(Z_i, X_i; \theta_j) p(Zi,Xi;θj)表达式,再分别对要估算的参数求偏导数就可以。例如现有某个年级100名学生的身高的测量值,要求估算出男生和女生身高的均值和方差,也即4个参数需要估计。假设男生和女生身高呈现正态分布。这里的 Z Z Z随机变量表示该观测值是来自男生还是女生,是一个 0 − 1 0-1 0−1分布。 X X X随机变量服从混合高斯分布。在给定 Z Z Z的前提下 X X X服从一维正态分布。
七、附录一
由Jensen不等式可知,若要上面的等号成立,
p
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
B
(
Z
i
=
z
;
θ
)
\frac{p(X_i=x_i, Z_i=z; \theta)}{B(Z_i=z; \theta)}
B(Zi=z;θ)p(Xi=xi,Zi=z;θ)必须是一个常数。现在证明当
B
(
Z
i
=
z
;
θ
)
=
p
(
Z
i
=
z
∣
X
i
=
x
i
;
θ
)
,
Z
i
=
0
,
1
,
i
=
1
,
2
,
…
,
10
,
且
∑
z
=
0
1
B
(
Z
i
=
z
;
θ
)
=
1
B(Z_i=z; \theta) = p(Z_i=z|X_i=x_i; \theta), Z_i=0,1,i=1,2,\ldots,10, 且\sum_{z=0}^1 B(Z_i=z; \theta) =1
B(Zi=z;θ)=p(Zi=z∣Xi=xi;θ),Zi=0,1,i=1,2,…,10,且∑z=01B(Zi=z;θ)=1时,
p
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
B
(
Z
i
=
z
;
θ
)
\frac{p(X_i=x_i, Z_i=z; \theta)}{B(Z_i=z; \theta)}
B(Zi=z;θ)p(Xi=xi,Zi=z;θ)是个常数。
设
p
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
B
(
Z
i
=
z
;
θ
)
=
c
\frac{p(X_i=x_i, Z_i=z; \theta)}{B(Z_i=z; \theta)} = c
B(Zi=z;θ)p(Xi=xi,Zi=z;θ)=c,
c
c
c是任意一个常数,则:
B
(
Z
i
=
z
;
θ
)
=
p
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
c
=
p
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
∑
z
=
0
1
c
B
(
Z
i
=
z
;
θ
)
=
p
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
∑
z
=
0
1
p
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
=
p
(
X
i
=
x
i
,
Z
i
=
z
;
θ
)
p
(
X
i
=
x
i
;
θ
)
=
p
(
Z
i
=
z
∣
X
i
=
x
i
;
θ
)
B(Z_i=z; \theta) = \frac {p(X_i=x_i, Z_i=z; \theta)} {c} = \frac {p(X_i=x_i, Z_i=z; \theta)} { \sum_{z=0}^1 c B(Z_i=z; \theta)} \\ = \frac {p(X_i=x_i, Z_i=z; \theta)} {\sum_{z=0}^1 p(X_i=x_i, Z_i=z; \theta)} \\ = \frac {p(X_i=x_i, Z_i=z; \theta)} {p(X_i=x_i; \theta)} \\ =p(Z_i=z|X_i=x_i; \theta)
B(Zi=z;θ)=cp(Xi=xi,Zi=z;θ)=∑z=01cB(Zi=z;θ)p(Xi=xi,Zi=z;θ)=∑z=01p(Xi=xi,Zi=z;θ)p(Xi=xi,Zi=z;θ)=p(Xi=xi;θ)p(Xi=xi,Zi=z;θ)=p(Zi=z∣Xi=xi;θ)
所以,当
B
(
Z
i
=
z
;
θ
)
=
p
(
Z
i
=
z
∣
X
i
=
x
i
;
θ
)
,
Z
i
=
0
,
1
,
i
=
1
,
2
,
…
,
10
B(Z_i=z; \theta) = p(Z_i=z|X_i=x_i; \theta), Z_i=0,1,i=1,2,\ldots,10
B(Zi=z;θ)=p(Zi=z∣Xi=xi;θ),Zi=0,1,i=1,2,…,10时上述等号成立。