其他章节答案请参考我的汇总统计学习方法答案汇总,都是自己写的。
9.1、如例9.1的三硬币模型。假设观测数据不变,试选择不同的初值,例如 π ( 0 ) = 0.46 , p ( 0 ) = 0.55 , q ( 0 ) = 0.67 \pi^{(0)}=0.46,p^{(0)}=0.55,q^{(0)}=0.67 π(0)=0.46,p(0)=0.55,q(0)=0.67,求模型参数 θ = ( π , p , q ) \theta = (\pi,p,q) θ=(π,p,q)的极大似然估计。
解:
如果按照书上提供公式计算,那么这一题是很简单,但是如果想要理解EM算法在该题上的应用思想,是十分重要的,三硬币模型的具体推导请参考我写的三硬币模型的具体推导,相信你一定可以看懂,并且理解。
import numpy as np
def solution(data, pi, p, q):
'''
Parameters
----------
data : numpy array
观测数据集.
pi : float
模型参数初值.
p : float
模型参数初值.
q : float
模型参数初值.
Returns
-------
迭代之后的模型参数.
'''
e = 10 ** (-3)
l = len(data) ## 观测数据的个数
pi_i= pi
p_i = p
q_i = q
while True:
mu = np.zeros(shape = l)
for j in range(l):
mu[j] = pi_i * (p_i ** (data[j])) * ((1 - p_i) ** (1 - data[j])) / (pi_i * (p_i ** (data[j])) * ((1 - p_i) ** (1 - data[j])) + (1 - pi_i) * (q_i ** (data[j])) * ((1 - q_i) ** (1 - data[j])))
pi_ = np.mean(mu)
p_ = sum(mu * data) / sum(mu)
q_ = sum((1 - mu) * data) / sum(1 - mu)
if ((pi_ - pi_i) ** 2) < e and ((p_i - p_) ** 2) < e and ((q_i - q_) ** 2) < e:
break
pi_i= pi_
p_i = p_
q_i = q_
return pi_, p_, q_
if __name__ == '__main__':
data = np.array([1, 1, 0, 1, 0, 0, 1, 0, 1, 1])
pi, p, q = 0.46, 0.55, 0.67
r = solution(data, pi, p, q)
print(r)
9.2、证明引理9.2
证明:
已知
P
~
θ
(
Z
)
=
P
(
Z
∣
Y
,
θ
)
(1)
\tilde{P}_{\theta }(Z) = P(Z|Y,\theta ) \tag{1}
P~θ(Z)=P(Z∣Y,θ)(1)
将等式(1)带入到
F
(
P
~
,
θ
)
F(\tilde{P}, \theta)
F(P~,θ),那么就得到
F
(
P
~
,
θ
)
=
E
P
~
[
l
o
g
P
(
Y
,
Z
∣
θ
)
]
+
H
(
P
~
)
=
∑
Z
l
o
g
P
(
Y
,
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
)
−
∑
Z
l
o
g
P
(
Z
∣
Y
,
θ
)
P
(
Z
∣
Y
,
θ
)
=
∑
Z
P
(
Z
∣
Y
,
θ
)
l
o
g
(
P
(
Y
,
Z
∣
θ
)
P
(
Z
∣
Y
,
θ
)
)
=
∑
Z
P
(
Z
∣
Y
,
θ
)
l
o
g
(
P
(
Y
∣
θ
)
)
=
l
o
g
(
P
(
Y
∣
θ
)
)
∑
Z
P
(
Z
∣
Y
,
θ
)
=
l
o
g
(
P
(
Y
∣
θ
)
)
∗
1
=
l
o
g
(
P
(
Y
∣
θ
)
)
F(\tilde{P}, \theta ) = E_{\tilde P}[log\ P(Y,Z|\theta )] + H(\tilde P) \\ = \sum_{Z}log\ P(Y,Z|\theta )P(Z|Y,\theta ) - \sum_{Z}log\ P(Z|Y,\theta )P(Z|Y,\theta ) \\ = \sum_{Z}P(Z|Y,\theta)log(\frac{P(Y,Z|\theta)}{P(Z|Y,\theta)}) \\=\sum_{Z}P(Z|Y,\theta)log(P(Y|\theta)) \\=log(P(Y|\theta))\sum_{Z}P(Z|Y,\theta) \\=log(P(Y|\theta))*1=log(P(Y|\theta))
F(P~,θ)=EP~[log P(Y,Z∣θ)]+H(P~)=Z∑log P(Y,Z∣θ)P(Z∣Y,θ)−Z∑log P(Z∣Y,θ)P(Z∣Y,θ)=Z∑P(Z∣Y,θ)log(P(Z∣Y,θ)P(Y,Z∣θ))=Z∑P(Z∣Y,θ)log(P(Y∣θ))=log(P(Y∣θ))Z∑P(Z∣Y,θ)=log(P(Y∣θ))∗1=log(P(Y∣θ))
9.3、已知观测数据-67,-48,6,8,14,16,23,24,28,29,41,49,56,60,75,试估计两个分量的高斯混合模型的5个参数。
解:混合高斯模型和三硬币模型很像,如何对书上的,三硬币模型的具体推导请参考我写的三硬币模型的具体推导,相信你一定可以看懂,并且理解。
下面提供本题的代码
import numpy as np
import math
def Solution(observations, alpha, Gauss_Para1, Gauss_Para2):
'''
Parameters
----------
observations : numpy array
观测数据.
alpha : float
选择第一个高斯模型的概率.
Gauss_Para1 : float list
第一个高斯模型的参数.
Gauss_Para2 : float list
第二个高斯模型的参数.
Returns
-------
混合高斯模型的五个参数.
'''
l = len(observations)
alpha_i = alpha
Gauss1_i = Gauss_Para1
Gauss2_i = Gauss_Para2
while True:
# 下面开始计算第j个观测数据来自第1个高斯模型的概率
gamma = np.zeros(shape = l)
for j in range(l):
gamma[j] = (alpha_i / ((2 * math.pi) ** 0.5 * Gauss1_i[1]) * np.exp(- (observations[j] - Gauss1_i[0]) ** 2 / (2 * Gauss1_i[1] ** 2))) /\
(sum([alpha_i / ((2 * math.pi) ** 0.5 * Gauss1_i[1]) * np.exp(- (observations[j] - Gauss1_i[0]) ** 2 / (2 * Gauss1_i[1] ** 2)), (1 - alpha_i) / ((2 * math.pi) ** 0.5 * Gauss2_i[1]) * np.exp(- (observations[j] - Gauss2_i[0]) ** 2 / (2 * Gauss2_i[1] ** 2))]))
# 下面开始更新参数
Gauss1_ = [sum(gamma * observations) / sum(gamma), (sum(gamma * (observations - Gauss1_i[0]) ** 2) / sum(gamma)) ** 0.5]
Gauss2_ = [sum((1 - gamma) * observations) / sum(1 - gamma), (sum((1 - gamma) * (observations - Gauss1_i[0]) ** 2) / sum(1 - gamma)) ** 0.5]
alpha_ = sum(gamma) / l
if (alpha_ - alpha_i) ** 2 + (Gauss1_[0] - Gauss1_i[0]) ** 2 + (Gauss1_[1] - Gauss1_i[1]) ** 2 \
+ (Gauss2_[0] - Gauss2_i[0]) ** 2 + (Gauss2_[1] - Gauss2_i[1]) ** 2 < 10 ** -5:
break
Gauss1_i = Gauss1_
Gauss2_i = Gauss2_
alpha_i = alpha_
return alpha_, Gauss1_, Gauss2_
if __name__ == '__main__':
alpha = 0.5 #因为需要两个高斯模型,所以只需要知道选择第一个模型的概率即可,那么选择另外一个模型的概率就是1-alpha
observations = np.array([-67,-48,6,8,14,16,23,24,28,29,41,49,56,60,75])
Gauss_Para1 = [0., 10000.0]
Gauss_Para2 = [0., 10000.0]
r = Solution(observations, alpha, Gauss_Para1, Gauss_Para2)
print(r)
9.4、EM算法可以用到朴素贝叶斯的无监督学习。试写出其算法。
解:
首先我们知道朴素贝叶斯一般都用在有标签的监督学习,例如分类。假如要将监督学习变成无监督的学习方法,那么我们肯定需要将数据的标签去掉,将分类变成聚类算法。其实也就是,将每个数据的标签看作是隐变量来处理,不然也没法使用EM算法,EM算法就是要处理具有隐变量的概率模型参数的极大似然估计或者极大后验概率估计。
给定数据集
T
=
{
x
1
,
.
.
.
,
x
N
}
,
x
i
∈
R
n
,
i
=
1
,
2
,
.
.
.
,
N
T = \{x_{1},...,x_{N}\},\quad x_{i} \in R^{n},\ i = 1,2,...,N
T={x1,...,xN},xi∈Rn, i=1,2,...,N
我们利用EM算法将数据集
T
T
T内数据进行聚类,不妨将设只有两个类【0,1】,使用伯努利分布来处理
Z | 0 | 1 |
---|---|---|
P | p | 1-p |
下面给出EM算法在问题里面的算法流程: |
- 计算给定数据 x j x_{j} xj,当 z = 0 z = 0 z=0的概率 P ( z = 0 ∣ x i , θ k ) = P ( z = 0 , x i ∣ θ k ) P ( x i ∣ θ k ) = P ( z = 0 ∣ θ k ) P ( x i ∣ z = 0 , θ k ) P ( z = 0 ∣ θ k ) P ( x i ∣ z = 0 , θ k ) + P ( z = 1 ∣ θ k ) P ( x i ∣ z = 1 , θ k ) = p k ∏ j = 1 n P k ( x i j ∣ z = 0 ) p k ∏ j = 1 n P k ( x i j ∣ z = 0 ) + ( 1 − p k ) ∏ j = 1 n P k ( x i j ∣ z = 1 ) P(z = 0|x_{i},\theta_k)=\frac{P(z = 0, x_{i}|\theta_k)}{P(x_{i}|\theta_k)}\\=\frac{P(z = 0|\theta_k)P(x_{i}|z = 0,\theta_k)}{P(z = 0|\theta_k)P(x_{i}|z = 0,\theta_k)+P(z = 1|\theta_k)P(x_{i}|z = 1,\theta_k)}\\=\frac{p_k\prod_{j = 1}^{n}P_k( x_{i}^{j}|z = 0)}{p_k\prod_{j = 1}^{n}P_k( x_{i}^{j}|z = 0)+(1-p_k)\prod_{j = 1}^{n}P_k( x_{i}^{j}|z = 1)} P(z=0∣xi,θk)=P(xi∣θk)P(z=0,xi∣θk)=P(z=0∣θk)P(xi∣z=0,θk)+P(z=1∣θk)P(xi∣z=1,θk)P(z=0∣θk)P(xi∣z=0,θk)=pk∏j=1nPk(xij∣z=0)+(1−pk)∏j=1nPk(xij∣z=1)pk∏j=1nPk(xij∣z=0),其中, θ k \theta_k θk是上一步迭代的参数,然后利用EM算法更新到第 k + 1 k+1 k+1步。
- 下面开始计算Q函数,也就是EM算法的E步 Q ( θ , θ k ) = E z [ l o g P ( X , Z ) ∣ X , θ k ) ] = ∑ i = 1 N P ( z = 0 ∣ x i , θ k ) l o g P ( x i , z = 0 ) + P ( z = 1 ∣ x i , θ k ) l o g P ( x i , z = 1 ) = ∑ i = 1 N P ( z = 0 ∣ x i , θ k ) l o g P ( z = 0 ) P ( x i ∣ z = 0 ) + P ( z = 1 ∣ x i , θ k ) l o g P ( z = 1 ) P ( x i ∣ z = 1 ) = ∑ i = 1 N P ( z = 0 ∣ x i , θ k ) ( l o g p + ∑ j = 1 n l o g P ( x i j ∣ z = 0 ) ) + P ( z = 1 ∣ x i , θ k ) ( l o g ( 1 − p ) + ∑ j = 1 n l o g P ( x i j ∣ z = 1 ) ) Q(\theta,\theta_k) = E_{z}[log\ P(X,Z)|X,\theta_{k})]\\=\sum_{i = 1}^{N}P(z = 0|x_{i},\theta_{k})log\ P(x_{i},z = 0) + P(z = 1|x_{i},\theta_{k})log \ P(x_{i},z = 1)\\=\sum_{i = 1}^{N}P(z = 0|x_{i},\theta_{k})log\ P(z = 0)P(x_{i}|z = 0) + P(z = 1|x_{i},\theta_{k})log \ P(z = 1)P(x_{i}|z = 1)\\=\sum_{i = 1}^{N}P(z = 0|x_{i},\theta_{k})(log\ p + \sum_{j = 1}^{n}log\ P(x_{i}^{j}|z = 0)) + P(z = 1|x_{i},\theta_{k})(log\ (1-p)+\sum_{j =1}^{n}log\ P(x_{i}^{j}|z = 1)) Q(θ,θk)=Ez[log P(X,Z)∣X,θk)]=i=1∑NP(z=0∣xi,θk)log P(xi,z=0)+P(z=1∣xi,θk)log P(xi,z=1)=i=1∑NP(z=0∣xi,θk)log P(z=0)P(xi∣z=0)+P(z=1∣xi,θk)log P(z=1)P(xi∣z=1)=i=1∑NP(z=0∣xi,θk)(log p+j=1∑nlog P(xij∣z=0))+P(z=1∣xi,θk)(log (1−p)+j=1∑nlog P(xij∣z=1))其中 θ = { p , P ( x i j ∣ z = 0 ) , P ( x i j ∣ z = 1 ) } \theta = \{p,P( x_{i}^{j}|z = 0),P( x_{i}^{j}|z = 1)\} θ={p,P(xij∣z=0),P(xij∣z=1)},我们可以看到在这个公式中 P ( x i j ∣ z = 0 ) , P ( x i j ∣ z = 1 ) P( x_{i}^{j}|z = 0),P( x_{i}^{j}|z = 1) P(xij∣z=0),P(xij∣z=1)是无法计算的,但是我们知道训练数据的每个分量的取值 x i j x_{i}^{j} xij的个数都是有限的,设为 S j S_{j} Sj个,朴素贝叶斯算法也是这样处理的。我们将 P ( x i j ∣ z = 0 ) , P ( x i j ∣ z = 1 ) P( x_{i}^{j}|z = 0),P( x_{i}^{j}|z = 1) P(xij∣z=0),P(xij∣z=1)看作是可以求导的变量,到了这里我们也可以很简单计算出 P ( z = 1 ∣ x i ) = 1 − P ( z = 0 ∣ x i ) P(z = 1|x_{i})=1-P(z = 0|x_{i}) P(z=1∣xi)=1−P(z=0∣xi), P ( z = 1 ∣ x i ) P(z = 1|x_{i}) P(z=1∣xi)也可以由 P ( x i j ∣ z = 0 ) , P ( x i j ∣ z = 1 ) P( x_{i}^{j}|z = 0),P( x_{i}^{j}|z = 1) P(xij∣z=0),P(xij∣z=1)表示,其中 P ( z = 0 ) = p P(z = 0) = p P(z=0)=p也是需要迭代更新的变量。
- 下面是M步计算 θ k + 1 = a r g m a x θ Q ( θ , θ k ) \theta^{k+1} = \underset{\theta }{argmax} Q(\theta,\theta_k) θk+1=θargmaxQ(θ,θk)其实也就是对 Q ( θ , θ k ) Q(\theta,\theta_k) Q(θ,θk)关于 θ = { p , P ( x i j ∣ z = 0 ) , P ( x i j ∣ z = 1 ) } \theta = \{p,P( x_{i}^{j}|z = 0),P( x_{i}^{j}|z = 1)\} θ={p,P(xij∣z=0),P(xij∣z=1)}分别求导,令代数分别为0,然后可以求出 θ \theta θ关于 θ k \theta_k θk的迭代格式,至此结束,求导数的话,自己求吧(doge)