【题目一】最大似然估计也可以用来估计先验概率。假设样本是连续独立地从自然状态 ω i \omega_i ωi 中抽取的, 每一个自然状态的概率为 P ( ω i ) P\left(\omega_i\right) P(ωi) 。如果第 k k k 个样本的自然状态为 ω i \omega_i ωi, 那么就记 z i k = 1 z_{i k}=1 zik=1, 否则 z i k = 0 z_{i k}=0 zik=0
- 证明
P ( z i 1 , ⋯ , z i n ∣ P ( ω i ) ) = ∏ k = 1 n P ( ω i ) z i k ( 1 − P ( ω i ) ) 1 − z i k P\left(z_{i 1}, \cdots, z_{i n} \mid P\left(\omega_i\right)\right)=\prod_{k=1}^n P\left(\omega_i\right)^{z_{i k}}\left(1-P\left(\omega_i\right)\right)^{1-z_{i k}} P(zi1,⋯,zin∣P(ωi))=k=1∏nP(ωi)zik(1−P(ωi))1−zik
【解】在第
i
i
i 类的概率为
P
(
ω
i
)
P(\omega_i)
P(ωi)的条件下,
z
i
1
=
1
z_{i1}=1
zi1=1即第一个样本属于第
i
i
i 类的概率为
P
(
ω
i
)
P(\omega_i)
P(ωi);在第
i
i
i 类的概率为
P
(
ω
i
)
P(\omega_i)
P(ωi)的条件下,
z
i
1
=
0
z_{i1}=0
zi1=0即第一个样本不属于第
i
i
i 类的概率为
1
−
P
(
ω
i
)
1-P(\omega_i)
1−P(ωi)。整理一下得
P
(
z
i
1
∣
P
(
ω
i
)
)
=
P
(
ω
i
)
z
i
1
(
1
−
P
(
ω
i
)
)
1
−
z
i
1
P\left(z_{i 1} \mid P\left(\omega_i\right)\right)= P\left(\omega_i\right)^{z_{i 1}}\left(1-P\left(\omega_i\right)\right)^{1-z_{i 1}}
P(zi1∣P(ωi))=P(ωi)zi1(1−P(ωi))1−zi1
于是
P
(
z
i
1
,
⋯
,
z
i
n
∣
P
(
ω
i
)
)
=
P
(
z
i
1
∣
P
(
ω
i
)
)
P
(
z
i
2
∣
P
(
ω
i
)
)
…
P
(
z
i
n
∣
P
(
ω
i
)
)
=
∏
k
=
1
n
P
(
ω
i
)
z
i
k
(
1
−
P
(
ω
i
)
)
1
−
z
i
k
\begin{aligned} P\left(z_{i 1}, \cdots, z_{i n} \mid P\left(\omega_i\right)\right) & =P\left(z_{i 1} \mid P\left(\omega_i\right)\right) P\left(z_{i 2} \mid P\left(\omega_i\right)\right) \ldots P\left(z_{i n} \mid P\left(\omega_i\right)\right) \\ &=\prod_{k=1}^n P\left(\omega_i\right)^{z_{i k}}\left(1-P\left(\omega_i\right)\right)^{1-z_{i k}} \end{aligned}
P(zi1,⋯,zin∣P(ωi))=P(zi1∣P(ωi))P(zi2∣P(ωi))…P(zin∣P(ωi))=k=1∏nP(ωi)zik(1−P(ωi))1−zik
- 证明对
P
(
ω
i
)
P\left(\omega_i\right)
P(ωi) 的最大似然估计为
P ^ ( ω i ) = 1 n ∑ k = 1 n z i k \hat{P}\left(\omega_i\right)=\frac{1}{n} \sum_{k=1}^n z_{i k} P^(ωi)=n1k=1∑nzik
并且简单解释这个结果。
【解】由 (1) 得对数似然函数:
ln
P
(
z
i
1
,
⋯
,
z
i
n
∣
P
(
ω
i
)
)
=
∑
k
=
1
n
z
i
k
ln
P
(
ω
i
)
+
∑
k
=
1
n
(
1
−
z
i
k
)
ln
(
1
−
P
(
ω
i
)
)
\ln P\left(z_{i 1}, \cdots, z_{i n} \mid P\left(\omega_i\right)\right)=\sum_{k=1}^n z_{i k} \ln P\left(\omega_i\right)+\sum_{k=1}^n\left(1-z_{i k}\right) \ln \left(1-P\left(\omega_i\right)\right)
lnP(zi1,⋯,zin∣P(ωi))=k=1∑nziklnP(ωi)+k=1∑n(1−zik)ln(1−P(ωi))
由
∂
ln
P
∂
P
(
ω
i
)
=
∑
k
=
1
n
z
i
k
1
P
(
ω
i
)
−
∑
k
=
1
n
(
1
−
z
i
k
)
1
1
−
P
(
ω
i
)
=
0
\frac{\partial \ln P}{\partial P\left(\omega_i\right)}=\sum_{k=1}^n z_{i k} \frac{1}{P\left(\omega_i\right)}-\sum_{k=1}^n\left(1-z_{i k}\right) \frac{1}{1-P\left(\omega_i\right)}=0
∂P(ωi)∂lnP=k=1∑nzikP(ωi)1−k=1∑n(1−zik)1−P(ωi)1=0
得:
∑
k
=
1
n
z
i
k
(
1
−
P
(
ω
i
)
)
−
∑
k
=
1
n
(
1
−
z
i
k
)
P
(
ω
i
)
=
0
\sum_{k=1}^n z_{i k}\left(1-P\left(\omega_i\right)\right)-\sum_{k=1}^n\left(1-z_{i k}\right) P\left(\omega_i\right)=0
k=1∑nzik(1−P(ωi))−k=1∑n(1−zik)P(ωi)=0
化简可得, 最大似然估计为:
P
^
(
ω
i
)
=
1
n
∑
k
=
1
n
z
i
k
\hat{P}\left(\omega_i\right)=\frac{1}{n} \sum_{k=1}^n z_{i k}
P^(ωi)=n1k=1∑nzik
该结果表示, 某个类别的先验概率的最大似然估计等于样本中属于该类的样本数在总样本数中的占比。
【题目二】设
x
x
x 的概率密度为均匀分布:
p
(
x
∣
θ
)
∼
U
(
0
,
θ
)
=
{
1
/
θ
,
0
≤
x
≤
θ
0
,
otherwise
p(x \mid \theta) \sim U(0, \theta)=\left\{\begin{array}{cc} 1 / \theta, & 0 \leq x \leq \theta \\ 0, & \text { otherwise } \end{array}\right.
p(x∣θ)∼U(0,θ)={1/θ,0,0≤x≤θ otherwise
- 假设 n n n 个样本 D = { x 1 , ⋯ , x n } \mathcal{D}=\left\{x_1, \cdots, x_n\right\} D={x1,⋯,xn} 都独立地服从分布 p ( x ∣ θ ) p(x \mid \theta) p(x∣θ) 。证明对 于 θ \theta θ 的最大似然估计就是 D \mathcal{D} D 中的最大值 max [ D ] \max [\mathcal{D}] max[D] 。
【解】
n
n
n 个样本独立同分布, 则:
P
(
D
∣
θ
)
=
∏
k
=
1
n
p
(
x
k
∣
θ
)
=
{
1
θ
n
,
0
≤
x
1
,
x
2
,
…
,
x
n
≤
θ
0
,
otherwise
\begin{aligned} P(\mathcal{D} \mid \theta) & =\prod_{k=1}^n p\left(x_k \mid \theta\right) \\ & = \begin{cases}\frac{1}{\theta^n}, & 0 \leq x_1, x_2, \ldots, x_n \leq \theta \\ 0, & \text { otherwise }\end{cases} \end{aligned}
P(D∣θ)=k=1∏np(xk∣θ)={θn1,0,0≤x1,x2,…,xn≤θ otherwise
对数似然函数为:
L
(
D
∣
θ
)
=
ln
(
D
∣
θ
)
=
=
{
−
n
ln
θ
,
0
≤
x
1
,
x
2
,
…
,
x
n
≤
θ
−
∞
,
otherwise
L(\mathcal{D} \mid \theta)=\ln (\mathcal{D} \mid \theta)==\left\{\begin{array}{lr} -n \ln \theta, & 0 \leq x_1, x_2, \ldots, x_n \leq \theta \\ -\infty, & \text { otherwise } \end{array}\right.
L(D∣θ)=ln(D∣θ)=={−nlnθ,−∞,0≤x1,x2,…,xn≤θ otherwise
由于
−
n
ln
θ
-n \ln \theta
−nlnθ 是递减的,
θ
\theta
θ 越小,似然函数越大,但是
θ
\theta
θ 又有限制
0
≤
x
1
,
x
2
,
…
,
x
n
≤
θ
0 \leq x_1, x_2, \ldots, x_n \leq \theta
0≤x1,x2,…,xn≤θ ,因此
θ
\theta
θ 的极大似然估计为
max
[
D
]
\max [\mathcal{D}]
max[D] 。
- 假设从该分布中采样 5 个样本 ( n = 5 ) (n=5) (n=5), 且有 max k x k = 0.6 \max _k x_k=0.6 maxkxk=0.6, 画出在区间 0 ≤ θ ≤ 1 0 \leq \theta \leq 1 0≤θ≤1 上的似然函数 p ( D ∣ θ ) p(\mathcal{D} \mid \theta) p(D∣θ), 并解释为什么此时不需要知道其余四个点的值。
【解】由 (1) 得, 似然函数为:
P
(
D
∣
θ
)
=
{
1
θ
5
0
≤
x
1
,
x
2
,
…
,
x
5
≤
θ
0
,
otherwise
P(\mathcal{D} \mid \theta)=\left\{\begin{array}{lr} \frac{1}{\theta^5} & 0 \leq x_1, x_2, \ldots, x_5 \leq \theta \\ 0, & \text { otherwise } \end{array}\right.
P(D∣θ)={θ510,0≤x1,x2,…,x5≤θ otherwise
在区间
[
0
,
1
]
[0,1]
[0,1] 上似然函数
p
(
D
∣
θ
)
p(\mathcal{D} \mid \theta)
p(D∣θ) 曲线如图 1 。由于
θ
≥
max
[
D
]
\theta \geq \max [\mathcal{D}]
θ≥max[D], 则无需知道其他四个点的具体值也可以得到似然函数。(不妨设
x
1
=
0.6
x_1=0.6
x1=0.6, 当
θ
<
0.6
\theta<0.6
θ<0.6 时,
p
(
x
1
∣
θ
)
=
0
,
p
(
D
∣
θ
)
=
0
p\left(x_1 \mid \theta\right)=0, p(D \mid \theta)=0
p(x1∣θ)=0,p(D∣θ)=0; 当
θ
≥
0.6
\theta \geq 0.6
θ≥0.6 时,
p
(
D
∣
θ
)
=
(
1
θ
)
5
p(D \mid \theta)=\left(\frac{1}{\theta}\right)^5
p(D∣θ)=(θ1)5)(我用MATLAB画的)
【题目三】一种度量同一空间中的两个不同分布的距离的方式为 KullbackLeibler 散度 (简称 KL 散度)
D
K
L
(
p
2
(
x
)
∥
p
1
(
x
)
)
=
∫
p
2
(
x
)
ln
p
2
(
x
)
p
1
(
x
)
d
x
D_{K L}\left(p_2(\mathbf{x}) \| p_1(\mathbf{x})\right)=\int p_2(\mathbf{x}) \ln \frac{p_2(\mathbf{x})}{p_1(\mathbf{x})} d x
DKL(p2(x)∥p1(x))=∫p2(x)lnp1(x)p2(x)dx这个距离度量并不符合严格意义上的度量必须满足的对称性和三角不等式关系。假设我们使用正态分布
p
1
(
x
)
∼
N
(
μ
,
Σ
)
p_1(\mathbf{x}) \sim N(\boldsymbol{\mu}, \Sigma)
p1(x)∼N(μ,Σ) 来近似某一个任意的分布
p
2
(
x
)
p_2(\mathbf{x})
p2(x) 。证明能够产生最小的 KL 散度的结果为下面这个明显的结论:
μ
=
ε
2
[
x
]
Σ
=
ε
2
[
(
x
−
μ
)
(
x
−
μ
)
t
]
\begin{aligned} & \boldsymbol{\mu}=\varepsilon_2[\mathbf{x}] \\ & \Sigma=\varepsilon_2\left[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^t\right] \end{aligned}
μ=ε2[x]Σ=ε2[(x−μ)(x−μ)t]其中的数学期望是对概率密度函数
p
2
(
x
)
p_2(\mathbf{x})
p2(x) 进行的
【解】带入
p
1
(
x
)
p_1(\mathbf{x})
p1(x) 的分布, 可得
D
K
L
(
p
2
(
x
)
∥
p
1
(
x
)
)
=
∫
p
2
(
x
)
ln
p
2
(
x
)
+
1
2
p
2
(
x
)
(
d
ln
2
π
+
ln
∣
Σ
∣
)
+
1
2
(
x
−
μ
)
t
Σ
−
1
(
x
−
μ
)
p
2
(
x
)
d
x
\begin{gathered} D_{K L}\left(p_2(\mathbf{x}) \| p_1(\mathbf{x})\right)=\int p_2(\mathbf{x}) \ln p_2(\mathbf{x})+\frac{1}{2} p_2(x)(d \ln 2 \pi+\ln |\Sigma|) \\ +\frac{1}{2}(x-\mu)^t \Sigma^{-1}(x-\mu) p_2(x) d x \end{gathered}
DKL(p2(x)∥p1(x))=∫p2(x)lnp2(x)+21p2(x)(dln2π+ln∣Σ∣)+21(x−μ)tΣ−1(x−μ)p2(x)dx不考虑无关项, 令
f
(
μ
,
Σ
)
=
∫
p
2
(
x
)
(
ln
∣
Σ
∣
+
(
x
−
μ
)
t
Σ
−
1
(
x
−
μ
)
)
d
x
f(\mu, \Sigma)=\int p_2(x)\left(\ln |\Sigma|+(x-\mu)^t \Sigma^{-1}(x-\mu)\right) d x
f(μ,Σ)=∫p2(x)(ln∣Σ∣+(x−μ)tΣ−1(x−μ))dx对
μ
,
Σ
\mu, \Sigma
μ,Σ 求偏导数
∂
f
(
μ
,
Σ
)
∂
μ
=
−
(
Σ
−
1
+
Σ
−
t
)
(
μ
−
∫
x
p
2
(
x
)
d
x
)
=
−
(
Σ
−
1
+
Σ
−
t
)
(
μ
−
ε
2
[
x
]
)
\frac{\partial f(\mu, \Sigma)}{\partial \mu}=-\left(\Sigma^{-1}+\Sigma^{-t}\right)\left(\mu-\int x p_2(x) d x\right)=-\left(\Sigma^{-1}+\Sigma^{-t}\right)\left(\mu-\varepsilon_2[x]\right)
∂μ∂f(μ,Σ)=−(Σ−1+Σ−t)(μ−∫xp2(x)dx)=−(Σ−1+Σ−t)(μ−ε2[x])
∂
f
(
μ
,
Σ
)
∂
Σ
=
∫
p
2
(
x
)
Σ
−
t
+
p
2
(
x
)
[
−
Σ
−
t
(
x
−
μ
)
(
x
−
μ
)
t
Σ
−
t
]
d
x
=
Σ
−
t
⋅
∫
p
2
(
x
)
[
Σ
t
−
(
x
−
μ
)
(
x
−
μ
)
t
]
Σ
−
t
d
x
=
Σ
−
t
⋅
(
Σ
−
ε
2
[
(
x
−
μ
)
(
x
−
μ
)
t
]
)
Σ
−
t
\begin{aligned} \frac{\partial f(\mu, \Sigma)}{\partial \Sigma} & =\int p_2(x) \Sigma^{-t}+p_2(x)\left[-\Sigma^{-t}(x-\mu)(x-\mu)^t \Sigma^{-t}\right] d x \\ & =\Sigma^{-t} \cdot \int p_2(x)\left[\Sigma^t-(x-\mu)(x-\mu)^t\right] \Sigma^{-t} d x \\ & =\Sigma^{-t} \cdot\left(\Sigma-\varepsilon_2\left[(x-\mu)(x-\mu)^t\right]\right) \Sigma^{-t} \end{aligned}
∂Σ∂f(μ,Σ)=∫p2(x)Σ−t+p2(x)[−Σ−t(x−μ)(x−μ)tΣ−t]dx=Σ−t⋅∫p2(x)[Σt−(x−μ)(x−μ)t]Σ−tdx=Σ−t⋅(Σ−ε2[(x−μ)(x−μ)t])Σ−t令偏导数为 0 , 可得
μ
=
ε
2
[
x
]
Σ
=
ε
2
[
(
x
−
μ
)
(
x
−
μ
)
t
]
\begin{aligned} \boldsymbol{\mu} & =\varepsilon_2[\mathbf{x}] \\ \Sigma & =\varepsilon_2\left[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^t\right] \end{aligned}
μΣ=ε2[x]=ε2[(x−μ)(x−μ)t]
【题目四】 数据
D
=
{
(
1
1
)
,
(
3
3
)
,
(
2
∗
)
}
\mathcal{D}=\left\{\left(\begin{array}{l}1 \\ 1\end{array}\right),\left(\begin{array}{l}3 \\ 3\end{array}\right),\left(\begin{array}{l}2 \\ *\end{array}\right)\right\}
D={(11),(33),(2∗)} 中的样本独立地服从二维的分布
p
(
x
1
,
x
2
)
=
p
(
x
1
)
p
(
x
2
)
p\left(x_1, x_2\right)=p\left(x_1\right) p\left(x_2\right)
p(x1,x2)=p(x1)p(x2) 。其中,
∗
*
∗ 代表丢失的数据, 且有
p
(
x
1
)
=
{
1
θ
1
e
−
x
1
/
θ
1
,
x
1
≥
0
0
,
otherwise
p\left(x_1\right)=\left\{\begin{array}{l} \frac{1}{\theta_1} e^{-x_1 / \theta_1}, \quad x_1 \geq 0 \\ 0, \quad \text { otherwise } \end{array}\right.
p(x1)={θ11e−x1/θ1,x1≥00, otherwise
p
(
x
2
)
∼
U
(
0
,
θ
2
)
=
{
1
θ
2
,
0
≤
x
2
≤
θ
0
,
otherwise
p\left(x_2\right) \sim U\left(0, \theta_2\right)=\left\{\begin{array}{cl} \frac{1}{\theta_2}, & 0 \leq x_2 \leq \theta \\ 0, & \text { otherwise } \end{array}\right.
p(x2)∼U(0,θ2)={θ21,0,0≤x2≤θ otherwise
- 假设初始估计为 θ 0 = ( 2 4 ) \boldsymbol{\theta}^0=\left(\begin{array}{c}2 \\ 4\end{array}\right) θ0=(24), 计算 Q ( θ ; θ 0 ) Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right) Q(θ;θ0) (EM 算法中的 E \mathrm{E} E 步)。 注意要对分布进行归一化。
【解】对于
E
\mathbf{E}
E 步骤:
Q
(
θ
;
θ
0
)
=
E
x
32
[
ln
p
(
x
g
;
x
b
;
θ
)
∣
θ
0
,
D
g
]
=
∫
−
∞
∞
[
ln
p
(
x
1
∣
θ
)
+
ln
p
(
x
2
∣
θ
)
+
ln
p
(
x
3
∣
θ
)
]
p
(
x
32
∣
θ
0
;
x
31
=
2
)
d
x
32
=
ln
p
(
x
1
∣
θ
)
+
ln
p
(
x
2
∣
θ
)
+
∫
−
∞
∞
ln
p
(
x
3
∣
θ
)
⋅
p
(
x
32
∣
θ
0
;
x
31
=
2
)
d
x
32
=
ln
p
(
x
1
∣
θ
)
+
ln
p
(
x
2
∣
θ
)
+
∫
−
∞
∞
ln
p
(
(
2
x
32
)
∣
θ
)
⋅
p
(
(
2
x
32
)
∣
θ
0
)
∫
−
∞
∞
p
(
(
2
x
32
′
)
∣
θ
0
)
d
x
32
′
⏟
1
/
(
2
e
4
)
d
x
32
=
ln
p
(
x
1
∣
θ
)
+
ln
p
(
x
2
∣
θ
)
+
2
e
∫
−
∞
∞
ln
p
(
(
2
x
32
)
∣
θ
)
⋅
p
(
(
2
x
32
)
∣
θ
0
)
d
x
32
=
ln
p
(
x
1
∣
θ
)
+
ln
p
(
x
2
∣
θ
)
+
C
(1)
\begin{aligned} & Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right)=\mathcal{E}_{x_{32}}\left[\ln p\left(\mathbf{x}_g ; \mathbf{x}_b ; \boldsymbol{\theta}\right) \mid \boldsymbol{\theta}^0, \mathcal{D}_g\right] \\ & =\int_{-\infty}^{\infty}\left[\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_3 \mid \boldsymbol{\theta}\right)\right] p\left(x_{32} \mid \boldsymbol{\theta}^0 ; x_{31}=2\right) \mathrm{d} x_{32} \\ & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+\int_{-\infty}^{\infty} \ln p\left(\mathbf{x}_3 \mid \boldsymbol{\theta}\right) \cdot p\left(x_{32} \mid \boldsymbol{\theta}^0 ; x_{31}=2\right) \mathrm{d} x_{32} \\ & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+\int_{-\infty}^{\infty} \ln p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}\right) \cdot \frac{p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}^0\right)}{\underbrace{\int_{-\infty}^{\infty} p\left(\left(\begin{array}{c} 2 \\ x_{32}^{\prime} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32}^{\prime}}_{1 /\left(2 e^4\right)} \mathrm{d} x_{32}} \\ & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+2 e \int_{-\infty}^{\infty} \ln p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}\right) \cdot p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32} \\ & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+C \\ &\tag{1} \end{aligned}
Q(θ;θ0)=Ex32[lnp(xg;xb;θ)∣θ0,Dg]=∫−∞∞[lnp(x1∣θ)+lnp(x2∣θ)+lnp(x3∣θ)]p(x32∣θ0;x31=2)dx32=lnp(x1∣θ)+lnp(x2∣θ)+∫−∞∞lnp(x3∣θ)⋅p(x32∣θ0;x31=2)dx32=lnp(x1∣θ)+lnp(x2∣θ)+∫−∞∞lnp((2x32)∣θ)⋅1/(2e4)
∫−∞∞p((2x32′)∣θ0)dx32′dx32p((2x32)∣θ0)=lnp(x1∣θ)+lnp(x2∣θ)+2e∫−∞∞lnp((2x32)∣θ)⋅p((2x32)∣θ0)dx32=lnp(x1∣θ)+lnp(x2∣θ)+C(1)其中, 式 (1) 中的归一化项计算方式为
∫
−
∞
∞
p
(
(
2
x
32
′
)
∣
θ
0
)
d
x
32
′
=
∫
−
∞
∞
p
(
x
31
=
2
∣
θ
1
0
=
2
)
⋅
p
(
x
32
′
∣
θ
2
0
=
4
)
d
x
32
′
=
∫
0
4
1
2
e
−
2
×
2
⋅
1
4
d
x
32
′
=
1
2
e
4
(2)
\begin{aligned} \int_{-\infty}^{\infty} p\left(\left(\begin{array}{c} 2 \\ x_{32}^{\prime} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32}^{\prime} & =\int_{-\infty}^{\infty} p\left(x_{31}=2 \mid \theta_1^0=2\right) \cdot p\left(x_{32}^{\prime} \mid \theta_2^0=4\right) \mathrm{d} x_{32}^{\prime} \\ & =\int_0^4 \frac{1}{2} e^{-2 \times 2} \cdot \frac{1}{4} \mathrm{~d} x_{32}^{\prime} \\ & =\frac{1}{2 e^4}\tag{2} \end{aligned}
∫−∞∞p((2x32′)∣θ0)dx32′=∫−∞∞p(x31=2∣θ10=2)⋅p(x32′∣θ20=4)dx32′=∫0421e−2×2⋅41 dx32′=2e41(2)根据
θ
2
\theta_2
θ2 分情况, 求式 (1) 中
C
C
C 的不同取值, 由于已知样本中,
max
x
2
=
\max x_2=
maxx2=
x
22
=
3
x_{22}=3
x22=3, 故:
θ
2
≥
3
\theta_2 \geq 3
θ2≥3 。
分类讨论如下:
-
3
≤
θ
2
≤
4
3 \leq \theta_2 \leq 4
3≤θ2≤4
C = 2 e 4 ∫ 0 θ 2 ln p ( ( 2 x 32 ) ∣ θ ) ⋅ p ( ( 2 x 32 ) ∣ θ 0 ) d x 32 = 2 e 4 ∫ 0 θ 2 ln ( 1 θ 1 e − 2 θ 1 1 θ 2 ) ⋅ 1 2 e − 2 × 2 1 4 d x 32 = 1 4 θ 2 ln ( 1 θ 1 e − 2 θ 1 1 θ 2 ) (3) \begin{aligned} C & =2 e^4 \int_0^{\theta_2} \ln p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}\right) \cdot p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32} \\ & =2 e^4 \int_0^{\theta_2} \ln \left(\frac{1}{\theta_1} e^{-2 \theta_1} \frac{1}{\theta_2}\right) \cdot \frac{1}{2} e^{-2 \times 2} \frac{1}{4} \mathrm{~d} x_{32} \\ & =\frac{1}{4} \theta_2 \ln \left(\frac{1}{\theta_1} e^{-2 \theta_1} \frac{1}{\theta_2}\right)\tag{3} \end{aligned} C=2e4∫0θ2lnp((2x32)∣θ)⋅p((2x32)∣θ0)dx32=2e4∫0θ2ln(θ11e−2θ1θ21)⋅21e−2×241 dx32=41θ2ln(θ11e−2θ1θ21)(3) -
θ
2
≥
4
\theta_2 \geq 4
θ2≥4
C = 2 e 4 ∫ 0 4 ln p ( ( 2 x 32 ) ∣ θ ) ⋅ p ( ( 2 x 32 ) ∣ θ 0 ) d x 32 = 2 e 4 ∫ 0 4 ln ( 1 θ 1 e − 2 θ 1 1 θ 2 ) ⋅ 1 2 e − 2 × 2 1 4 d x 32 = ln ( 1 θ 1 e − 2 θ 1 1 θ 2 ) (4) \begin{aligned} C & =2 e^4 \int_0^4 \ln p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}\right) \cdot p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32} \\ & =2 e^4 \int_0^4 \ln \left(\frac{1}{\theta_1} e^{-2 \theta_1} \frac{1}{\theta_2}\right) \cdot \frac{1}{2} e^{-2 \times 2} \frac{1}{4} \mathrm{~d} x_{32} \\ & =\ln \left(\frac{1}{\theta_1} e^{-2 \theta_1} \frac{1}{\theta_2}\right)\tag{4} \end{aligned} C=2e4∫04lnp((2x32)∣θ)⋅p((2x32)∣θ0)dx32=2e4∫04ln(θ11e−2θ1θ21)⋅21e−2×241 dx32=ln(θ11e−2θ1θ21)(4)将上述几种情况的 C C C 代入到式 (1) 中, 即可得到
Q ( θ ; θ 0 ) = ln p ( x 1 ∣ θ ) + ln p ( x 2 ∣ θ ) + C = ln ( 1 θ 1 e − x 11 θ 1 1 θ 2 ) + ln ( 1 θ 1 e − x 21 θ 1 1 θ 2 ) + C = ln ( 1 θ 1 e − θ 1 1 θ 2 ) + ln ( 1 θ 1 e − 3 θ 1 1 θ 2 ) + C = − 4 θ 1 − 2 ln ( θ 1 θ 2 ) + C (5) \begin{aligned} Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right) & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+C \\ & =\ln \left(\frac{1}{\theta_1} e^{-x_{11} \theta_1} \frac{1}{\theta_2}\right)+\ln \left(\frac{1}{\theta_1} e^{-x_{21} \theta_1} \frac{1}{\theta_2}\right)+C \\ & =\ln \left(\frac{1}{\theta_1} e^{-\theta_1} \frac{1}{\theta_2}\right)+\ln \left(\frac{1}{\theta_1} e^{-3 \theta_1} \frac{1}{\theta_2}\right)+C \\ & =-4 \theta_1-2 \ln \left(\theta_1 \theta_2\right)+C\tag{5} \end{aligned} Q(θ;θ0)=lnp(x1∣θ)+lnp(x2∣θ)+C=ln(θ11e−x11θ1θ21)+ln(θ11e−x21θ1θ21)+C=ln(θ11e−θ1θ21)+ln(θ11e−3θ1θ21)+C=−4θ1−2ln(θ1θ2)+C(5)式 (5) 中的 C C C 见分类讨论情况式 (3) 和式 (4)。化简可得
Q ( θ ; θ 0 ) = { − 3 ln ( θ 1 θ 2 ) − 6 θ 1 , θ 2 ≥ 4 − ( 2 + θ 2 4 ) ln ( θ 1 θ 2 ) − ( 4 + θ 2 2 ) / θ 1 , 3 ≤ θ 2 ≤ 4 Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right)=\left\{\begin{array}{l} -3 \ln \left(\theta_1 \theta_2\right)-\frac{6}{\theta_1}, \quad \theta_2 \geq 4 \\ -\left(2+\frac{\theta_2}{4}\right) \ln \left(\theta_1 \theta_2\right)-\left(4+\frac{\theta_2}{2}\right) / \theta_1, \quad 3 \leq \theta_2 \leq 4 \end{array}\right. Q(θ;θ0)={−3ln(θ1θ2)−θ16,θ2≥4−(2+4θ2)ln(θ1θ2)−(4+2θ2)/θ1,3≤θ2≤4
- 求使得 Q ( θ ; θ 0 ) Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right) Q(θ;θ0) 最大的那个 θ ( E M \theta(\mathrm{EM} θ(EM 算法中的 M \mathrm{M} M 步 ) ) )
【解】对于
M
\mathrm{M}
M 步骤, 估计准则为:
θ
^
=
arg
max
θ
Q
(
θ
;
θ
0
)
\hat{\boldsymbol{\theta}}=\arg \max _{\boldsymbol{\theta}} Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right)
θ^=argθmaxQ(θ;θ0)
-
3
≤
θ
2
≤
4
:
3 \leq \theta_2 \leq 4:
3≤θ2≤4:
计算偏导数, 进一步可得 θ = ( 2 3 ) \theta=\left(\begin{array}{l}2 \\ 3\end{array}\right) θ=(23) 时取最优, 此时 Q = Q= Q= − 1 4 ln 6 − 13 8 -\frac{1}{4} \ln 6-\frac{13}{8} −41ln6−813 -
θ
2
≥
4
:
\theta_2 \geq 4:
θ2≥4:
计算偏导数, 进一步可得 θ = ( 2 4 ) \theta=\left(\begin{array}{l}2 \\ 4\end{array}\right) θ=(24) 时取最优, 此时 Q = Q= Q= − 3 ln 8 − 3 -3 \ln 8-3 −3ln8−3
综合两种情况,
θ
=
(
2
3
)
\theta=\left(\begin{array}{l}2 \\ 3\end{array}\right)
θ=(23), 此时
Q
Q
Q 最大
(后面还有两题,知识盲区,考了吃屎)