EM算法
EM算法是一种迭代算法,用于含有隐变量的概率模型的参数估计。该算法通过交替进行两个步骤:E步(Expectation Step,期望步骤)和M步(Maximization Step,最大化步骤)。在E步中,利用当前参数估计值,计算隐变量的后验概率;在M步中,利用E步中计算得到的隐变量后验概率,最大化完整数据的对数似然函数,得到新的参数估计值。不断迭代这两个步骤,直到收敛为止。
EM算法一般流程
EM算法通过迭代求 L ( θ ) = log P ( Y ∣ θ ) L(\theta)=\log P(Y \mid \theta) L(θ)=logP(Y∣θ) 的极大似然估计。每次迭代包含两步, E E E步, 求期望, M M M 步, 求极大化。
输入: 观测变量数据
Y
Y
Y, 隐变量数据
Z
Z
Z, 联合分布
P
(
Y
,
Z
∣
θ
)
P(Y, Z \mid \theta)
P(Y,Z∣θ), 条件分布
P
(
Z
∣
Y
,
θ
)
P(Z \mid Y, \theta)
P(Z∣Y,θ);
目标输出: 模型参数
θ
\theta
θ 。
以下为迭代步骤:
(1) 选择参数的初值
θ
(
i
)
\theta^{(i)}
θ(i), 开始迭代;
(2)
E
\mathrm{E}
E 步: 记
θ
(
i
)
\theta^{(i)}
θ(i) 为第
i
i
i 次迭代参数
θ
\theta
θ 的估计值, 在第
i
+
1
i+1
i+1 次迭代的
E
\mathrm{E}
E 步, 计算
Q
(
θ
,
θ
(
i
)
)
=
E
Z
[
log
P
(
Y
,
Z
∣
θ
)
∣
Y
,
θ
(
i
)
]
=
∑
Z
log
P
(
Y
,
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
(
i
)
)
\begin{aligned} Q\left(\theta, \theta^{(i)}\right) & =E_Z\left[\log P(Y, Z \mid \theta) \mid Y, \theta^{(i)}\right] \\ & =\sum_Z \log P(Y, Z \mid \theta) P\left(Z \mid Y, \theta^{(i)}\right) \end{aligned}
Q(θ,θ(i))=EZ[logP(Y,Z∣θ)∣Y,θ(i)]=Z∑logP(Y,Z∣θ)P(Z∣Y,θ(i))
这里,
P
(
Z
∣
Y
,
θ
(
i
)
)
P\left(Z \mid Y, \theta^{(i)}\right)
P(Z∣Y,θ(i)) 是在给定观测数据
Y
Y
Y 和当前的参数估计
θ
(
i
)
\theta^{(i)}
θ(i) 下隐变量数据
Z
Z
Z 的条件概率分布;
(3)
M
\mathrm{M}
M 步: 求使
Q
(
θ
,
θ
(
i
)
)
Q\left(\theta, \theta^{(i)}\right)
Q(θ,θ(i)) 极大化的
θ
\theta
θ, 确定第
i
+
1
i+1
i+1 次迭代的参数的估计值
θ
(
i
+
1
)
\theta^{(i+1)}
θ(i+1)
θ
(
i
+
1
)
=
arg
max
θ
Q
(
θ
,
θ
(
i
)
)
\theta^{(i+1)}=\arg \max _\theta Q\left(\theta, \theta^{(i)}\right)
θ(i+1)=argθmaxQ(θ,θ(i))
(4) 用得出的
θ
(
i
+
1
)
\theta^{(i+1)}
θ(i+1)带入,重复第 (2) 步和第 (3) 步, 直到收敛。
三硬币模型
首先介绍一个使用 EM 算法的例子(引自李航教授《统计学习方法(第2版 175页)》),书上并没有写这个例子的参数是如何得出的,所以我在此进行了更为基础和详细的补充。
例 9.1 (三硬币模型) 假设有 3 枚硬币, 分别记作 A, B, C。这些硬币正面出现的概率分别是
π
,
p
\pi, p
π,p 和
q
q
q 。进行如下掷硬币试验: 先掷硬币
A
\mathrm{A}
A, 根据其结果选出硬币
B
\mathrm{B}
B或硬币
C
C
C, 正面选硬币
B
B
B, 反面选硬币
C
C
C; 然后掷选出的硬币, 掷硬币的结果, 出现正面记作 1 , 出现反面记作 0 ; 独立地重复
n
n
n 次试验 (这里,
n
=
10
n=10
n=10 ), 观测结果如下:
1
,
1
,
0
,
1
,
0
,
0
,
1
,
0
,
1
,
1
1,1,0,1,0,0,1,0,1,1
1,1,0,1,0,0,1,0,1,1
假设只能观测到掷硬币的结果,不能观测掷硬币的过程,也就是不知道一次实验中A硬币的正反面情况的。问如何根据一批实验数据,估计三硬币正面出现的概率, 即三硬币模型的参数。
用数学公式表达出来这个模型
P ( y ∣ θ ) = ∑ z P ( y , z ∣ θ ) = ∑ z P ( z ∣ θ ) P ( y ∣ z , θ ) \begin{aligned} P(y \mid \theta) =\sum_z P(y, z \mid \theta) \\ = \sum_z P(z \mid \theta) P(y \mid z, \theta) \\ \end{aligned} P(y∣θ)=z∑P(y,z∣θ)=z∑P(z∣θ)P(y∣z,θ)
{ p ( y = 1 ∣ θ ) = π p + ( 1 − π ) q , 观测到正面 p ( y = 0 ∣ θ ) = π ( 1 − p ) + ( 1 − π ) ( 1 − q ) , 观测到反面 于是 p ( y ∣ θ ) = π p y ( 1 − p ) 1 − y + ( 1 − π ) y q y ( 1 − q ) 1 − y , y ∈ { 0 , 1 } \begin{aligned} & \left\{\begin{array}{l} p(y=1 \mid \theta)=\pi p+(1-\pi) q, \text { 观测到正面 } \\ p(y=0 \mid \theta)=\pi(1-p)+(1-\pi)(1-q), \text { 观测到反面 } \end{array}\right. \\ &于是 p(y \mid \theta)=\pi p^y(1-p)^{1-y}+(1-\pi)^y q^y(1-q)^{1-y}, y \in\{0,1\} \end{aligned} {p(y=1∣θ)=πp+(1−π)q, 观测到正面 p(y=0∣θ)=π(1−p)+(1−π)(1−q), 观测到反面 于是p(y∣θ)=πpy(1−p)1−y+(1−π)yqy(1−q)1−y,y∈{0,1}
上面的模型中 y y y表示一次实验最终观测结果是1或者0,随机变量 z z z是一个隐变量,表示未观测到的硬币 A A A的结果为正面或是反面; θ = ( π , p , q ) \theta=(\pi,p,q) θ=(π,p,q) 为这个模型的参数。
我们写出观测数据
Y
Y
Y的似然函数
P
(
Y
∣
θ
)
=
∑
Z
P
(
Z
∣
θ
)
P
(
Y
∣
Z
,
θ
)
P(Y \mid \theta)=\sum_Z P( Z \mid \theta) P(Y \mid Z, \theta)
P(Y∣θ)=Z∑P(Z∣θ)P(Y∣Z,θ)
P ( Y ∣ θ ) = ∏ j = 1 n [ π p y j ( 1 − p ) 1 − y j + ( 1 − π ) q y j ( 1 − q ) 1 − y j ] P(Y \mid \theta)=\prod_{j=1}^n\left[\pi p^{y_j}(1-p)^{1-y_j}+(1-\pi) q^{y_j}(1-q)^{1-y_j}\right] P(Y∣θ)=j=1∏n[πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj]
我们的目标就是求模型参数
θ
=
(
π
,
p
,
q
)
\theta=(\pi,p,q)
θ=(π,p,q)的极大似然估计
θ
^
=
arg max
θ
[
log
P
(
Y
∣
θ
)
]
\hat{\theta}=\argmax_\theta [\log P(Y \mid \theta)]
θ^=θargmax[logP(Y∣θ)]
针对这个问题,我们使用 E M EM EM算法对上面的三硬币模型进行求解
回顾条件期望知识点
条件期望是指在给定某些条件下,对随机变量的期望进行计算。具体地,设 X X X和 Y Y Y是两个随机变量, Y Y Y的取值范围为 y 1 , y 2 , … , y n y_1,y_2,\ldots,y_n y1,y2,…,yn,则在给定 Y Y Y的条件下, X X X的条件期望 E ( X ∣ Y ) E(X \mid Y) E(X∣Y)定义为:
E ( X ∣ Y ) = ∑ i = 1 n X i P ( X i ∣ Y ) E(X \mid Y) = \sum_{i=1}^n X_i P(X_i \mid Y) E(X∣Y)=i=1∑nXiP(Xi∣Y)
其中, X i X_i Xi表示 X X X在 Y = y i Y=y_i Y=yi时的取值, P ( X i ∣ Y ) P(X_i \mid Y) P(Xi∣Y)表示在 Y = y i Y=y_i Y=yi的条件下, X X X取 X i X_i Xi的概率。这个式子的意义是,对于每个 y i y_i yi,计算在 Y = y i Y=y_i Y=yi的条件下, X X X的期望值,然后对所有的 i i i进行求和,得到 X X X在给定 Y Y Y的条件下的期望值。
EM算法
这里我们重点对E步和M步的公式进行拆解,经过M步以后拿到的参数 θ \theta θ就可以进行下一轮迭代
E步
我们引入上面流程中的 Q Q Q 函数(这里先不证明怎么来的,先拿来用,详细推导Q函数的过程可以看文末参考文章)
Q ( θ , θ ( i ) ) = E Z [ log P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = ∑ Z log P ( Y , Z ∣ θ ) P ( Z ∣ Y , θ ( i ) ) \begin{aligned} Q\left(\theta, \theta^{(i)}\right) & =E_Z \left[\log P(Y, Z \mid \theta) \mid Y, \theta^{(i)}\right] \\ & =\sum_Z \log P(Y, Z \mid \theta) P\left(Z \mid Y, \theta^{(i)}\right) \end{aligned} Q(θ,θ(i))=EZ[logP(Y,Z∣θ)∣Y,θ(i)]=Z∑logP(Y,Z∣θ)P(Z∣Y,θ(i))
先看后半部分
P
(
Z
∣
Y
,
θ
(
i
)
)
P\left(Z \mid Y, \theta^{(i)}\right)
P(Z∣Y,θ(i))按照不同的
z
z
z拆解
对其中
z
=
1
z=1
z=1的情况,利用贝叶斯公式进行推导(
A
A
A为正面,记隐变量
z
=
1
z=1
z=1,表示
y
y
y观测结果来自
B
B
B硬币)
P
(
z
=
1
∣
y
,
θ
(
i
)
)
=
P
(
z
=
1
,
y
,
θ
(
i
)
)
P
(
y
,
θ
(
i
)
)
=
P
(
z
=
1
)
P
(
y
,
θ
(
i
)
∣
z
=
1
)
∑
z
P
(
z
)
P
(
y
,
θ
(
i
)
∣
z
)
=
P
(
z
=
1
)
p
(
y
,
θ
(
i
)
∣
z
=
1
)
P
(
z
=
1
)
p
(
y
,
θ
(
i
)
∣
z
=
1
)
+
P
(
z
=
0
)
p
(
y
,
θ
(
i
)
∣
z
=
0
)
=
π
p
y
(
1
−
p
)
1
−
y
π
p
y
(
1
−
p
)
1
−
y
+
(
1
−
π
)
q
y
(
1
−
q
)
1
−
y
\begin{aligned} P(z=1 \mid y, \theta^{(i)})=\frac{P(z=1, y, \theta^{(i)})}{P(y, \theta^{(i)})} & =\frac{P(z=1) P(y, \theta^{(i)} \mid z=1)}{\sum_z P(z) P(y, \theta^{(i)} \mid z)} \\ & =\frac{P(z=1) p(y, \theta^{(i)} \mid z=1)}{P(z=1) p(y, \theta^{(i)} \mid z=1)+P(z=0) p(y, \theta^{(i)} \mid z=0)} \\ & =\frac{\pi p^y(1-p)^{1-y}}{\pi p^y(1-p)^{1-y}+(1-\pi) q^y(1-q)^{1-y}} \end{aligned}
P(z=1∣y,θ(i))=P(y,θ(i))P(z=1,y,θ(i))=∑zP(z)P(y,θ(i)∣z)P(z=1)P(y,θ(i)∣z=1)=P(z=1)p(y,θ(i)∣z=1)+P(z=0)p(y,θ(i)∣z=0)P(z=1)p(y,θ(i)∣z=1)=πpy(1−p)1−y+(1−π)qy(1−q)1−yπpy(1−p)1−y
同理可得
z
=
0
z=0
z=0时
P
(
z
=
0
∣
y
,
θ
(
i
)
)
=
P
(
z
=
0
,
y
,
θ
(
i
)
)
P
(
y
,
θ
(
i
)
)
=
P
(
z
=
0
)
P
(
y
,
θ
(
i
)
∣
z
=
0
)
∑
z
P
(
z
)
P
(
y
,
θ
(
i
)
∣
z
)
=
P
(
z
=
0
)
p
(
y
,
θ
(
i
)
∣
z
=
0
)
P
(
z
=
1
)
p
(
y
,
θ
(
i
)
∣
z
=
1
)
+
P
(
z
=
0
)
p
(
y
,
θ
(
i
)
∣
z
=
0
)
=
(
1
−
π
)
q
y
(
1
−
q
)
1
−
y
π
p
y
(
1
−
p
)
1
−
y
+
(
1
−
π
)
q
y
(
1
−
q
)
1
−
y
\begin{aligned} P(z=0 \mid y, \theta^{(i)})=\frac{P(z=0, y, \theta^{(i)})}{P(y, \theta^{(i)})} & =\frac{P(z=0) P(y, \theta^{(i)} \mid z=0)}{\sum_z P(z) P(y, \theta^{(i)} \mid z)} \\ & =\frac{P(z=0) p(y, \theta^{(i)} \mid z=0)}{P(z=1) p(y, \theta^{(i)} \mid z=1)+P(z=0) p(y, \theta^{(i)} \mid z=0)} \\ & =\frac{(1-\pi) q^y(1-q)^{1-y}}{\pi p^y(1-p)^{1-y}+(1-\pi) q^y(1-q)^{1-y}} \end{aligned}
P(z=0∣y,θ(i))=P(y,θ(i))P(z=0,y,θ(i))=∑zP(z)P(y,θ(i)∣z)P(z=0)P(y,θ(i)∣z=0)=P(z=1)p(y,θ(i)∣z=1)+P(z=0)p(y,θ(i)∣z=0)P(z=0)p(y,θ(i)∣z=0)=πpy(1−p)1−y+(1−π)qy(1−q)1−y(1−π)qy(1−q)1−y
我们设
μ
(
i
)
=
p
(
z
=
1
∣
y
,
θ
(
i
)
)
\mu ^{(i)}=p(z=1 \mid y, \theta^{(i)})
μ(i)=p(z=1∣y,θ(i))表示y观测结果来自B硬币的概率(A正面),则硬币来自C的概率(A结果反面)为
1
−
μ
(
i
)
1-\mu ^{(i)}
1−μ(i)
此时可以对
Q
(
θ
,
θ
(
i
)
)
Q\left(\theta, \theta^{(i)}\right)
Q(θ,θ(i))继续拆解
Q
(
θ
,
θ
(
i
)
)
=
E
Z
[
log
P
(
Y
,
Z
∣
θ
)
∣
Y
,
θ
(
i
)
]
=
∑
Z
log
P
(
Y
,
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
(
i
)
)
=
log
P
(
Y
,
Z
=
1
∣
θ
)
P
(
Z
=
1
∣
Y
,
θ
(
i
)
)
+
log
P
(
Y
,
Z
=
0
∣
θ
)
P
(
Z
=
0
∣
Y
,
θ
(
i
)
)
=
log
P
(
Z
=
1
∣
θ
)
P
(
Y
∣
Z
=
1
,
θ
)
⋅
μ
(
i
)
+
log
P
(
Z
=
0
∣
θ
)
P
(
Y
∣
Z
=
0
,
θ
)
⋅
(
1
−
μ
(
i
)
)
=
μ
(
i
)
⋅
log
[
π
p
y
(
1
−
p
)
1
−
y
]
+
(
1
−
μ
(
i
)
)
⋅
log
[
(
1
−
π
)
q
y
(
1
−
q
)
1
−
y
]
=
μ
(
i
)
[
log
π
+
log
p
y
(
1
−
p
)
1
−
y
]
+
(
1
−
μ
(
i
)
)
⋅
[
log
(
1
−
π
)
+
log
q
y
(
1
−
q
)
1
−
y
]
\begin{aligned} Q\left(\theta, \theta^{(i)}\right) & =E_Z \left[\log P(Y, Z \mid \theta) \mid Y, \theta^{(i)}\right] \\ & =\sum_Z \log P(Y, Z \mid \theta) P\left(Z \mid Y, \theta^{(i)}\right) \\ & =\log P(Y, Z=1 \mid \theta) P\left(Z=1 \mid Y, \theta^{(i)}\right)+\log P(Y, Z=0 \mid \theta) P\left(Z=0 \mid Y, \theta^{(i)}\right) \\ & =\log P(Z=1 \mid \theta) P(Y \mid Z=1, \theta) \cdot \mu^{(i)}+\log P(Z=0 \mid \theta) P(Y \mid Z=0, \theta) \cdot\left(1-\mu^{(i)}\right) \\ & =\mu^{(i)} \cdot \log \left[\pi p^y(1-p)^{1-y}\right]+\left(1-\mu^{(i)}\right) \cdot \log \left[(1-\pi) q^y(1-q)^{1-y}\right] \\ & =\mu^{(i)}\left[\log \pi+\log p^y(1-p)^{1-y}\right]+\left(1-\mu^{(i)}\right) \cdot\left[\log (1-\pi)+\log q^y(1-q)^{1-y}\right] \end{aligned}
Q(θ,θ(i))=EZ[logP(Y,Z∣θ)∣Y,θ(i)]=Z∑logP(Y,Z∣θ)P(Z∣Y,θ(i))=logP(Y,Z=1∣θ)P(Z=1∣Y,θ(i))+logP(Y,Z=0∣θ)P(Z=0∣Y,θ(i))=logP(Z=1∣θ)P(Y∣Z=1,θ)⋅μ(i)+logP(Z=0∣θ)P(Y∣Z=0,θ)⋅(1−μ(i))=μ(i)⋅log[πpy(1−p)1−y]+(1−μ(i))⋅log[(1−π)qy(1−q)1−y]=μ(i)[logπ+logpy(1−p)1−y]+(1−μ(i))⋅[log(1−π)+logqy(1−q)1−y]
M步
求使 Q ( θ , θ ( i ) ) Q\left(\theta, \theta^{(i)}\right) Q(θ,θ(i)) 极大化的 θ \theta θ,确定第 i i i 次迭代的参数的估计值 θ ( i + 1 ) \theta^{(i+1)} θ(i+1)。设共有 n n n 组观测值 y j , 1 ≤ j ≤ n y_j, 1 \leq j \leq n yj,1≤j≤n,则 Q Q Q 函数可以表述为 Q ′ = ∑ j n Q ( θ , θ ( i ) ) Q^\prime=\sum_j^n Q\left(\theta, \theta^{(i)}\right) Q′=∑jnQ(θ,θ(i))。求使 Q ′ Q^\prime Q′ 最大化的参数 θ = ( π , p , q ) \theta=(\pi, p, q) θ=(π,p,q),我们令 Q ′ Q^\prime Q′的偏导为 0 0 0,分别求出参数 θ = ( π , p , q ) \theta=(\pi, p, q) θ=(π,p,q)。
以下展示求解
π
\pi
π 的过程:
∂
Q
′
∂
π
=
∑
j
n
∂
Q
∂
π
=
∑
j
n
∂
∂
π
[
μ
j
(
i
)
[
log
π
+
log
p
y
j
(
1
−
p
)
1
−
y
j
]
+
(
1
−
μ
j
(
i
)
)
⋅
[
log
(
1
−
π
)
+
log
q
y
j
(
1
−
q
)
1
−
y
j
]
]
=
∑
j
n
∂
∂
π
[
μ
j
(
i
)
log
π
+
(
1
−
μ
j
(
i
)
)
log
(
1
−
π
)
]
=
∑
j
n
μ
j
(
i
)
π
+
μ
j
(
i
)
−
1
1
−
π
=
0
=
∑
j
n
(
1
−
π
)
μ
j
(
i
)
+
π
(
μ
j
(
i
)
−
1
)
=
0
∑
j
n
μ
j
(
i
)
=
∑
j
n
π
\begin{aligned} \frac{\partial Q^\prime}{\partial \pi} =\sum_j^n \frac{\partial Q}{ \partial \pi} &= \sum_j^n \frac{\partial}{\partial \pi} \left[\mu_j^{(i)}\left[\log \pi+\log p^{y_j}(1-p)^{1-{y_j}}\right]+\left(1-\mu_j^{(i)}\right) \cdot \left[\log (1-\pi)+\log q^{y_j}(1-q)^{1-{y_j}}\right]\right] \\ &= \sum_j^n \frac{\partial}{\partial \pi}\left[\mu_j^{(i)} \log \pi+(1-\mu_j^{(i)}) \log (1-\pi)\right] \\ &= \sum_j^n \frac{\mu_j^{(i)}}{\pi}+\frac{\mu_j^{(i)}-1}{1-\pi}=0 \\ &= \sum_j^n(1-\pi) \mu_j^{(i)}+\pi\left(\mu_j^{(i)}-1\right)=0 \\ & \sum_j^n \mu_j^{(i)}=\sum_j^n \pi \end{aligned}
∂π∂Q′=j∑n∂π∂Q=j∑n∂π∂[μj(i)[logπ+logpyj(1−p)1−yj]+(1−μj(i))⋅[log(1−π)+logqyj(1−q)1−yj]]=j∑n∂π∂[μj(i)logπ+(1−μj(i))log(1−π)]=j∑nπμj(i)+1−πμj(i)−1=0=j∑n(1−π)μj(i)+π(μj(i)−1)=0j∑nμj(i)=j∑nπ
即
∑
j
n
μ
j
(
i
)
=
n
π
\sum_j^n \mu_j^{(i)}=n \pi
j∑nμj(i)=nπ 于是
π
=
1
n
∑
j
n
μ
j
(
i
)
\pi=\frac{1}{n} \sum_j^n \mu_j^{(i)}
π=n1j∑nμj(i)
同样的步骤,我们分别令
∑
j
n
∂
Q
∂
p
=
0
\sum_j^n \frac{\partial Q}{\partial p}=0
∑jn∂p∂Q=0和
∑
j
n
∂
Q
∂
q
=
0
\sum_j^n \frac{\partial Q}{\partial q}=0
∑jn∂q∂Q=0,即可求出下次迭代的全部参数
θ
=
(
π
,
p
,
q
)
\theta=(\pi, p, q)
θ=(π,p,q) 可以进行下一轮迭代优化。
令
∑
j
n
∂
Q
∂
p
=
0
=
∑
j
n
∂
∂
p
[
μ
j
(
i
)
[
log
π
+
log
p
y
j
(
1
−
p
)
1
−
y
j
]
+
(
1
−
μ
j
(
i
)
)
⋅
[
log
(
1
−
π
)
+
log
q
y
j
(
1
−
q
)
1
−
y
j
]
]
=
∑
j
n
∂
∂
p
[
μ
j
(
i
)
⋅
log
p
y
j
(
1
−
p
)
(
1
−
y
j
)
]
=
∑
j
n
∂
∂
p
[
y
j
⋅
μ
j
(
i
)
log
p
+
μ
j
(
i
)
(
1
−
y
j
)
log
(
1
−
p
)
]
=
∑
j
n
[
y
j
⋅
μ
j
(
i
)
p
+
μ
j
(
i
)
⋅
(
1
−
y
j
)
⋅
(
−
1
)
1
−
p
]
=
0
=
∑
j
n
y
j
⋅
μ
j
(
i
)
−
μ
j
(
i
)
⋅
y
j
⋅
p
+
p
⋅
y
j
⋅
μ
j
(
i
)
−
p
⋅
μ
j
(
i
)
=
0
\begin{aligned} & \text { 令 } \sum_j^n \frac{\partial Q}{\partial p}=0 \\ & =\sum_j^n \frac{\partial}{\partial p}\left[\mu_j^{(i)}\left[\log \pi+\log p^{y_j}(1-p)^{1-{y_j}}\right]+\left(1-\mu_j^{(i)}\right) \cdot\left[\log (1-\pi)+\log q^{y_j}(1-q)^{1-{y_j}}\right]\right] \\ & =\sum_j^n \frac{\partial}{\partial p}\left[\mu_j^{(i)} \cdot \log p^{y_j}(1-p)^{(1-{y_j})}\right] \\ & =\sum_j^n \frac{\partial}{\partial p}[{y_j} \cdot \mu_j^{(i)} \log p+\mu_j^{(i)} (1-{y_j}) \log (1-p)] \\ & =\sum_j^n\left[\frac{{y_j} \cdot \mu_j^{(i)} }{p}+\frac{\mu_j^{(i)} \cdot(1-{y_j}) \cdot(-1)}{1-p}\right]=0 \quad \\ & =\sum_j^n {y_j} \cdot \mu_j^{(i)}-\mu_j^{(i)} \cdot {y_j} \cdot p+p \cdot {y_j} \cdot \mu_j^{(i)}-p \cdot \mu_j^{(i)}=0 \quad \\ \end{aligned}
令 j∑n∂p∂Q=0=j∑n∂p∂[μj(i)[logπ+logpyj(1−p)1−yj]+(1−μj(i))⋅[log(1−π)+logqyj(1−q)1−yj]]=j∑n∂p∂[μj(i)⋅logpyj(1−p)(1−yj)]=j∑n∂p∂[yj⋅μj(i)logp+μj(i)(1−yj)log(1−p)]=j∑n[pyj⋅μj(i)+1−pμj(i)⋅(1−yj)⋅(−1)]=0=j∑nyj⋅μj(i)−μj(i)⋅yj⋅p+p⋅yj⋅μj(i)−p⋅μj(i)=0
即
∑
j
n
y
j
⋅
μ
j
(
i
)
=
∑
j
n
p
⋅
μ
j
(
i
)
p
=
∑
j
n
y
j
⋅
μ
j
(
i
)
∑
j
n
μ
j
(
i
)
\sum_j^n y_j \cdot \mu_j^{(i)}=\sum_j^n p \cdot \mu_j^{(i)} \\ p=\frac{\sum_j^n y_j \cdot \mu_j^{(i)}}{\sum_j^n \mu_j^{(i)}} \\
j∑nyj⋅μj(i)=j∑np⋅μj(i)p=∑jnμj(i)∑jnyj⋅μj(i)
接下来
令
∑
j
n
∂
Q
q
=
0
=
∑
j
n
∂
∂
q
[
μ
j
(
i
)
[
log
π
+
log
p
y
j
(
1
−
p
)
1
−
y
j
]
+
(
1
−
μ
j
(
i
)
)
⋅
[
log
(
1
−
π
)
+
log
q
y
j
(
1
−
q
)
1
−
y
j
]
]
=
∑
j
n
∂
∂
q
[
(
1
−
μ
j
(
i
)
)
⋅
log
q
y
j
+
(
1
−
μ
j
(
i
)
)
⋅
(
1
−
y
j
)
⋅
log
(
1
−
q
)
]
=
∑
j
n
(
1
−
μ
j
(
i
)
)
⋅
y
j
q
+
(
1
−
μ
j
(
i
)
)
⋅
(
1
−
y
j
)
⋅
(
−
1
)
1
−
q
=
∑
j
n
(
y
j
⋅
(
1
−
q
)
−
μ
j
(
i
)
⋅
y
j
(
1
−
q
)
+
q
⋅
(
y
j
−
1
)
−
q
⋅
μ
j
(
i
)
⋅
(
y
j
−
1
)
)
=
∑
j
n
(
y
j
−
y
j
⋅
q
−
μ
j
(
i
)
⋅
y
j
+
μ
j
(
i
)
⋅
y
j
⋅
q
+
y
j
q
−
q
−
μ
j
(
i
)
⋅
y
j
⋅
q
+
μ
j
(
i
)
⋅
q
)
=
∑
j
n
(
y
j
−
μ
j
(
i
)
⋅
y
j
−
q
+
μ
j
(
i
)
⋅
q
)
\begin{aligned} & \text { 令 } \sum_j^n \frac{\partial Q}{q}=0 \\ & =\sum_j^n \frac{\partial}{\partial q}\left[\mu_j^{(i)}\left[\log \pi+\log p^{y_j}(1-p)^{1-y_j}\right]+\left(1-\mu_j^{(i)}\right) \cdot\left[\log (1-\pi)+\log q^{y_j}(1-q)^{1-y_j}\right]\right] \\ & =\sum_j^n \frac{\partial}{\partial q}\left[\left(1-\mu_j^{(i)}\right) \cdot \log q^{y_j}+\left(1-\mu_j^{(i)}\right) \cdot\left(1-y_j\right) \cdot \log (1-q)\right] \\ & =\sum_j^n \frac{\left(1-\mu_j^{(i)}\right) \cdot y_j}{q}+\frac{\left(1-\mu_j^{(i)}\right) \cdot\left(1-y_j\right) \cdot(-1)}{1-q} \\ & =\sum_j^n\left(y_j \cdot(1-q)-\mu_j^{(i)} \cdot y_j(1-q)+q \cdot\left(y_j-1\right)-q \cdot \mu_j^{(i)} \cdot\left(y_j-1\right)\right) \\ & =\sum_j^n\left(y_j-y_j \cdot q-\mu_j^{(i)} \cdot y_j+\mu_j^{(i)} \cdot y_j \cdot q+y_j q-q-\mu_j^{(i)} \cdot y_j \cdot q+\mu_j^{(i)} \cdot q\right) \\ & =\sum_j^n\left(y_j-\mu_j^{(i)} \cdot y_j-q+\mu_j^{(i)} \cdot q\right) \\ & \end{aligned}
令 j∑nq∂Q=0=j∑n∂q∂[μj(i)[logπ+logpyj(1−p)1−yj]+(1−μj(i))⋅[log(1−π)+logqyj(1−q)1−yj]]=j∑n∂q∂[(1−μj(i))⋅logqyj+(1−μj(i))⋅(1−yj)⋅log(1−q)]=j∑nq(1−μj(i))⋅yj+1−q(1−μj(i))⋅(1−yj)⋅(−1)=j∑n(yj⋅(1−q)−μj(i)⋅yj(1−q)+q⋅(yj−1)−q⋅μj(i)⋅(yj−1))=j∑n(yj−yj⋅q−μj(i)⋅yj+μj(i)⋅yj⋅q+yjq−q−μj(i)⋅yj⋅q+μj(i)⋅q)=j∑n(yj−μj(i)⋅yj−q+μj(i)⋅q)
即
∑
j
n
y
j
−
μ
j
(
i
)
⋅
y
j
=
∑
j
n
q
(
1
−
μ
j
(
i
)
)
q
=
∑
j
n
(
1
−
μ
j
(
i
)
)
⋅
y
j
∑
j
n
(
1
−
μ
j
(
i
)
)
\begin{aligned} \sum_j^n y_j-\mu_j^{(i)} \cdot y_j=\sum_j^n q\left(1-\mu_j^{(i)}\right) \\ q=\frac{\sum_j^n\left(1-\mu_j^{(i)}\right) \cdot y_j}{\sum_j^n\left(1-\mu_j^{(i)}\right)} \end{aligned}
j∑nyj−μj(i)⋅yj=j∑nq(1−μj(i))q=∑jn(1−μj(i))∑jn(1−μj(i))⋅yj
至此,我们已经求得参数
θ
=
(
π
,
p
,
q
)
\theta=(\pi, p, q)
θ=(π,p,q)分别为
π
=
1
n
∑
j
n
μ
j
(
i
)
p
=
∑
j
n
y
j
⋅
μ
j
(
i
)
∑
j
n
μ
j
(
i
)
q
=
∑
j
n
(
1
−
μ
j
(
i
)
)
⋅
y
j
∑
j
n
(
1
−
μ
j
(
i
)
)
\pi=\frac{1}{n} \sum_j^n \mu_j^{(i)} \\ p=\frac{\sum_j^n y_j \cdot \mu_j^{(i)}}{\sum_j^n \mu_j^{(i)}} \\ q=\frac{\sum_j^n\left(1-\mu_j^{(i)}\right) \cdot y_j}{\sum_j^n\left(1-\mu_j^{(i)}\right)}\\
π=n1j∑nμj(i)p=∑jnμj(i)∑jnyj⋅μj(i)q=∑jn(1−μj(i))∑jn(1−μj(i))⋅yj
有了上面参数的迭代关系及样本
1
,
1
,
0
,
1
,
0
,
0
,
1
,
0
,
1
,
1
1,1,0,1,0,0,1,0,1,1
1,1,0,1,0,0,1,0,1,1
我们设初始参数
π
(
0
)
=
0.5
,
p
(
0
)
=
0.5
,
q
(
0
)
=
0.5
\pi^{(0)}=0.5,p^{(0)}=0.5,q^{(0)}=0.5
π(0)=0.5,p(0)=0.5,q(0)=0.5
由
μ
j
(
i
)
\mu_{j}^{(i)}
μj(i)的表达式得到无论
y
j
=
0
y_j=0
yj=0还是
y
j
=
1
y_j=1
yj=1,均有
μ
j
(
1
)
=
0.5
\mu_{j}^{(1)}=0.5
μj(1)=0.5
于是下一轮
π
(
1
)
=
1
10
(
μ
1
(
0
)
+
μ
2
(
0
)
+
μ
3
(
0
)
⋯
+
μ
10
(
0
)
)
=
0.5
p
(
1
)
=
y
1
⋅
μ
1
(
0
)
+
y
2
⋅
μ
2
(
0
)
+
y
3
⋅
μ
3
(
0
)
+
⋯
+
y
10
⋅
μ
10
(
0
)
(
μ
1
(
0
)
+
μ
2
(
0
)
+
μ
3
(
0
)
⋯
+
μ
10
(
0
)
)
=
6
×
0.5
10
×
0.5
=
0.6
q
(
1
)
=
(
1
−
μ
1
(
0
)
)
⋅
y
1
+
(
1
−
μ
2
(
0
)
)
⋅
y
2
⋯
+
(
1
−
μ
10
(
0
)
)
⋅
y
10
(
1
−
μ
1
(
0
)
)
+
(
1
−
μ
2
(
0
)
)
+
⋯
+
(
1
−
μ
10
(
0
)
)
=
6
×
0.5
10
−
0.5
×
10
=
0.6
\begin{aligned} \pi^{(1)} & =\frac{1}{10}\left(\mu_1^{(0)}+\mu_2^{(0)}+\mu_3^{(0)} \cdots+\mu_{10}^{(0)}\right) \\ & =0.5 \\ p^{(1)} & =\frac{y_1 \cdot \mu_1^{(0)}+y_2 \cdot \mu_2^{(0)}+y_3 \cdot \mu_3^{(0)}+ \cdots+y_{10} \cdot \mu_{10}^{(0)}}{\left(\mu_1^{(0)}+\mu_2^{(0)}+\mu_3^{(0)} \cdots+\mu_{10}^{(0)}\right)} \\ & =\frac{6 \times 0.5}{10 \times 0.5} \\ & =0.6 \\ q^{(1)} & =\frac{\left(1-\mu_1^{(0)}\right) \cdot y_1+\left(1-\mu_2^{(0)}\right) \cdot y_2 \cdots+\left(1-\mu_{10}^{(0)}\right) \cdot y_{10}}{\left(1-\mu_1^{(0)}\right)+\left(1-\mu_2^{(0)}\right)+\cdots+\left(1-\mu_{10}^{(0)}\right)} \\ & =\frac{6 \times 0.5}{10-0.5 \times 10}=0.6 \end{aligned}
π(1)p(1)q(1)=101(μ1(0)+μ2(0)+μ3(0)⋯+μ10(0))=0.5=(μ1(0)+μ2(0)+μ3(0)⋯+μ10(0))y1⋅μ1(0)+y2⋅μ2(0)+y3⋅μ3(0)+⋯+y10⋅μ10(0)=10×0.56×0.5=0.6=(1−μ1(0))+(1−μ2(0))+⋯+(1−μ10(0))(1−μ1(0))⋅y1+(1−μ2(0))⋅y2⋯+(1−μ10(0))⋅y10=10−0.5×106×0.5=0.6
一直迭代下去,我们可以得到最终参数
θ
=
(
π
,
p
,
q
)
\theta=(\pi, p, q)
θ=(π,p,q)的一个最终值,即为三硬币模型的参数。
使用Python代码进行实验模拟,检测是否有效
以下为一段代码,用于生成实验数据,并用EM算法预估参数,用得到的预估参数进行多次模拟实验,检验模型预测能力
import random
import numpy as np
import time
def coin_experiment(pi, p, q, n):
"""
模拟掷硬币实验
pi: 硬币 A 正面出现的概率
p: 硬币 B 正面出现的概率
q: 硬币 C 正面出现的概率
n: 实验次数
"""
results = []
results_A = []
results_B = []
results_C = []
for i in range(n):
# 先掷硬币 A
if random.random() < pi:
# 选硬币 B
coin = 'B'
p_head = p
else:
# 选硬币 C
coin = 'C'
p_head = q
# 接着掷选出的硬币
if random.random() < p_head:
results.append(1)
else:
results.append(0)
# 记录每个硬币的正反面
if coin == 'B':
if random.random() < p:
results_B.append(1)
else:
results_B.append(0)
results_A.append(1)
else:
if random.random() < q:
results_C.append(1)
else:
results_C.append(0)
results_A.append(0)
# 计算 A、B、C 硬币的正面概率
p_A = sum(results_A) / len((results_A))
p_B = sum(results_B) / len(results_B)
p_C = sum(results_C) / len(results_C)
return results, p_A, p_B, p_C
pi = 0.2
p = 0.3
q = 0.8
n = 100000
s=time.time()
print(f'开始模拟,模拟参数为pi={pi},p={p},q={q}……')
Y,pi,p,q=coin_experiment(pi, p, q, n)
Y= np.array(Y)
print(f'模拟结束,共模拟{n}次,耗时{time.time()-s}……')
print(f"模拟结果,pi={pi},p={p},q={q},整体实验Y=1概率={sum(Y)/len(Y)}")
#EM参数反推,任意设定初始参数
pi_0 = 0.9
p_0 = 0.8
q_0 = 0.9
epsiodes=1000 #迭代次数
count=1
while count<=epsiodes:
mu = pi_0 * p_0**Y * (1-p_0)**(1-Y) / (pi_0 * p_0**Y * (1-p_0)**(1-Y) + (1-pi_0) * q_0**Y * (1-q_0)**(1-Y))
pi=(1/n)*sum(mu)
p=sum(Y*mu)/sum(mu)
q=sum((1-mu)*Y)/sum(1-mu)
if count%100==0:
print(f"第{count}次迭代,估算参数分别为:pi={pi},p={p},q={q}")
pi_0 = pi
p_0 = p
q_0 = q
count+=1
#用拿到的估计参数重新模拟下
s=time.time()
print(f'二次模拟检验,模拟参数为pi={pi},p={p},q={q}……')
Y,pi,p,q=coin_experiment(pi, p, q, n)
Y= np.array(Y)
print(f'模拟结束,共模拟{n}次,耗时{time.time()-s}……')
print(f"模拟结果,pi={pi},p={p},q={q},整体实验Y=1概率={sum(Y)/len(Y)}")
观察实验,按照我们设定的初始pi = 0.2,p = 0.3,q = 0.8,进行100000次实验
模拟过程中pi,p,q和我们设定的值非常接近,实验确实是按这个参数进行,整体实验结果y=1的概率为0.696,最终EM算法迭代1000次求出来的参数pi = 0.91,p = 0.68,q = 0.83,虽然和我们模型实际参数相差很大,但是我们用这个参数进行了新的100000次实验,最终整体实验结果y=1的概率为0.6983,和我们实际模型的结果很接近,实现了比较精准的模拟,模型是有效的。