1.EM算法
假定有训练数据集
{
x
(
1
)
,
x
(
2
)
,
⋯
,
x
(
m
)
}
\left\{x^{(1)}, x^{(2)}, \cdots, x^{(m)}\right\}
{x(1),x(2),⋯,x(m)}
包含m个独立样本,希望从中找出该组数据得模型模型
p
(
x
,
z
)
p(x, z)
p(x,z)得参数。
取对数似然函数
l
(
θ
)
=
∑
i
=
1
m
log
p
(
x
;
θ
)
=
∑
i
=
1
m
log
∑
z
p
(
x
,
z
;
θ
)
\begin{aligned} &l(\theta)=\sum_{i=1}^{m} \log p(x ; \theta)\\ &=\sum_{i=1}^{m} \log \sum_{z} p(x, z ; \theta) \end{aligned}
l(θ)=i=1∑mlogp(x;θ)=i=1∑mlogz∑p(x,z;θ)
z是隐随机变量,不方便直接找到参数估计,使用下面的策略找出:计算
1
(
θ
)
1(\theta)
1(θ)的下界,求该下界最大值;重复该过程,直到收敛到局部最大值。
令
Q
i
Q_i
Qi是z的某一个分布,
Q
i
≥
0
Q_i \geq 0
Qi≥0,有:
l
(
θ
)
=
∑
i
=
1
m
log
∑
z
p
(
x
,
z
;
θ
)
=
∑
i
=
1
m
log
∑
z
(
i
)
p
(
x
(
i
)
,
z
(
i
)
;
θ
)
l(\theta)=\sum_{i=1}^{m} \log \sum_{z} p(x, z ; \theta)=\sum_{i=1}^{m} \log \sum_{z^{(i)}} p\left(x^{(i)}, z^{(i)} ; \theta\right)
l(θ)=i=1∑mlogz∑p(x,z;θ)=i=1∑mlogz(i)∑p(x(i),z(i);θ)
=
∑
i
=
1
m
log
∑
z
(
M
Q
i
(
z
(
i
)
)
p
(
x
(
i
)
,
z
(
i
)
;
θ
)
Q
i
(
z
(
i
)
)
≥
∑
i
=
1
m
∑
z
(
i
)
Q
i
(
z
(
i
)
)
log
p
(
x
(
i
)
,
z
(
i
)
;
θ
)
Q
i
(
z
(
i
)
)
\begin{aligned} &=\sum_{i=1}^{m} \log \sum_{z^{(M}} Q_{i}\left(z^{(i)}\right) \frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{Q_{i}\left(z^{(i)}\right)}\\ &\geq \sum_{i=1}^{m} \sum_{z^{(i)}} Q_{i}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{Q_{i}\left(z^{(i)}\right)} \end{aligned}
=i=1∑mlogz(M∑Qi(z(i))Qi(z(i))p(x(i),z(i);θ)≥i=1∑mz(i)∑Qi(z(i))logQi(z(i))p(x(i),z(i);θ)
寻找尽量紧的下界,可以令:
p
(
x
(
i
)
,
z
(
i
)
;
θ
)
Q
i
(
z
(
i
)
)
=
c
\frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{Q_{i}\left(z^{(i)}\right)}=c
Qi(z(i))p(x(i),z(i);θ)=c
进一步分析:
Q
i
(
z
(
i
)
)
∝
p
(
x
(
i
)
,
z
(
i
)
;
θ
)
∑
z
Q
i
(
z
(
i
)
)
=
1
Q
i
(
z
(
i
)
)
=
p
(
x
(
i
)
,
z
(
i
)
;
θ
)
∑
z
p
(
x
(
i
)
,
z
(
i
)
;
θ
)
=
p
(
x
(
i
)
,
z
(
i
)
;
θ
)
p
(
x
(
i
)
;
θ
)
=
p
(
z
(
i
)
∣
x
(
i
)
;
θ
)
\begin{array}{c} Q_{i}\left(z^{(i)}\right) \propto p\left(x^{(i)}, z^{(i)} ; \theta\right) \quad \sum_{z} Q_{i}\left(z^{(i)}\right)=1 \\ Q_{i}\left(z^{(i)}\right)=\frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{\sum_{z} p\left(x^{(i)}, z^{(i)} ; \theta\right)} \\ =\frac{p\left(x^{(i)}, z^{(i)} ; \theta\right)}{p\left(x^{(i)} ; \theta\right)} \\ =p\left(z^{(i)} | x^{(i)} ; \theta\right) \end{array}
Qi(z(i))∝p(x(i),z(i);θ)∑zQi(z(i))=1Qi(z(i))=∑zp(x(i),z(i);θ)p(x(i),z(i);θ)=p(x(i);θ)p(x(i),z(i);θ)=p(z(i)∣x(i);θ)
EM算法整体框架:
2.从理论公式推导GMM
随机变量X是由K个高斯分布混合而成,取各个高斯分布的概率为
φ
1
φ
2
⋯
φ
K
\varphi_{1} \varphi_{2} \cdots \varphi_{K}
φ1φ2⋯φK,第i个高斯分布的均值为
μ
i
\mu_i
μi,方差为
∑
i
\sum_i
∑i。若观测到随机变量X的一系列样本
x
1
,
x
2
,
…
,
x
n
\mathrm{x}_{1}, \mathrm{x}_{2}, \ldots, \mathrm{x}_{\mathrm{n}}
x1,x2,…,xn,试估计参数
φ
,
μ
,
Σ
\varphi, \quad \boldsymbol{\mu}, \quad \boldsymbol{\Sigma}
φ,μ,Σ。
E-step
w
j
(
i
)
=
Q
i
(
z
(
i
)
=
j
)
=
P
(
z
(
i
)
=
j
∣
x
(
i
)
;
ϕ
,
μ
,
Σ
)
w_{j}^{(i)}=Q_{i}\left(z^{(i)}=j\right)=P\left(z^{(i)}=j | x^{(i)} ; \phi, \mu, \Sigma\right)
wj(i)=Qi(z(i)=j)=P(z(i)=j∣x(i);ϕ,μ,Σ)
M-step
将多项分布和高斯分布的参数带入:
∑
i
=
1
m
∑
z
(
i
)
Q
i
(
z
(
i
)
)
log
p
(
x
(
i
)
,
z
(
i
)
;
ϕ
,
μ
,
Σ
)
Q
i
(
z
(
i
)
)
=
∑
i
=
1
m
∑
j
=
1
k
Q
i
(
z
(
i
)
=
j
)
log
p
(
x
(
i
)
∣
z
(
i
)
=
j
;
μ
,
Σ
)
p
(
z
(
i
)
=
j
;
ϕ
)
Q
i
(
z
(
i
)
=
j
)
=
∑
i
=
1
m
∑
j
=
1
k
w
j
(
i
)
log
1
(
2
π
)
n
/
2
∣
Σ
j
∣
1
/
2
exp
(
−
1
2
(
x
(
i
)
−
μ
j
)
T
Σ
j
−
1
(
x
(
i
)
−
μ
j
)
)
⋅
ϕ
j
w
j
(
i
)
\begin{array}{l} \sum_{i=1}^{m} \sum_{z^{(i)}} Q_{i}\left(z^{(i)}\right) \log \frac{p\left(x^{(i)}, z^{(i)} ; \phi, \mu, \Sigma\right)}{Q_{i}\left(z^{(i)}\right)} \\ \quad=\sum_{i=1}^{m} \sum_{j=1}^{k} Q_{i}\left(z^{(i)}=j\right) \log \frac{p\left(x^{(i)} | z^{(i)}=j ; \mu, \Sigma\right) p\left(z^{(i)}=j ; \phi\right)}{Q_{i}\left(z^{(i)}=j\right)} \\ \quad=\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{(2 \pi)^{n / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \cdot \phi_{j}}{w_{j}^{(i)}} \end{array}
∑i=1m∑z(i)Qi(z(i))logQi(z(i))p(x(i),z(i);ϕ,μ,Σ)=∑i=1m∑j=1kQi(z(i)=j)logQi(z(i)=j)p(x(i)∣z(i)=j;μ,Σ)p(z(i)=j;ϕ)=∑i=1m∑j=1kwj(i)logwj(i)(2π)n/2∣Σj∣1/21exp(−21(x(i)−μj)TΣj−1(x(i)−μj))⋅ϕj
对均值求偏导
∇
μ
l
∑
i
=
1
m
∑
j
=
1
k
w
j
(
i
)
log
1
(
2
π
)
n
/
2
∣
∑
j
∣
1
/
2
exp
(
−
1
2
(
x
(
i
)
−
μ
j
)
T
Σ
j
−
1
(
x
(
i
)
−
μ
j
)
)
⋅
ϕ
j
w
j
(
i
)
=
−
∇
μ
l
∑
i
=
1
m
∑
j
=
1
k
w
j
(
i
)
1
2
(
x
(
i
)
−
μ
j
)
T
Σ
j
−
1
(
x
(
i
)
−
μ
j
)
=
1
2
∑
i
=
1
m
w
l
(
i
)
∇
μ
l
2
μ
l
T
Σ
l
−
1
x
(
i
)
−
μ
l
T
Σ
l
−
1
μ
l
=
∑
i
=
1
m
w
l
(
i
)
(
Σ
l
−
1
x
(
i
)
−
Σ
l
−
1
μ
l
)
\begin{array}{l} \nabla_{\mu_{l}} \sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{\left.\left.(2 \pi)^{n / 2}\right|\sum_ j\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \cdot \phi_{j}}{w_{j}^{(i)}} \\ \quad=-\nabla_{\mu_{l}} \sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right) \\ \quad=\frac{1}{2} \sum_{i=1}^{m} w_{l}^{(i)} \nabla_{\mu_{l}} 2 \mu_{l}^{T} \Sigma_{l}^{-1} x^{(i)}-\mu_{l}^{T} \Sigma_{l}^{-1} \mu_{l} \\ \quad=\sum_{i=1}^{m} w_{l}^{(i)}\left(\Sigma_{l}^{-1} x^{(i)}-\Sigma_{l}^{-1} \mu_{l}\right) \end{array}
∇μl∑i=1m∑j=1kwj(i)logwj(i)(2π)n/2∣∑j∣∣1/21exp(−21(x(i)−μj)TΣj−1(x(i)−μj))⋅ϕj=−∇μl∑i=1m∑j=1kwj(i)21(x(i)−μj)TΣj−1(x(i)−μj)=21∑i=1mwl(i)∇μl2μlTΣl−1x(i)−μlTΣl−1μl=∑i=1mwl(i)(Σl−1x(i)−Σl−1μl)
令上式等于0,解的均值为:
μ
l
:
=
∑
i
=
1
m
w
l
(
i
)
x
(
i
)
∑
i
=
1
m
w
l
(
i
)
\mu_{l}:=\frac{\sum_{i=1}^{m} w_{l}^{(i)} x^{(i)}}{\sum_{i=1}^{m} w_{l}^{(i)}}
μl:=∑i=1mwl(i)∑i=1mwl(i)x(i)
对方差求偏导,等于0
Σ
j
=
∑
i
=
1
m
w
j
(
i
)
(
x
(
i
)
−
μ
j
)
(
x
(
i
)
−
μ
j
)
T
∑
i
=
1
m
w
j
(
i
)
\Sigma_{j}=\frac{\sum_{i=1}^{m} w_{j}^{(i)}\left(x^{(i)}-\mu_{j}\right)\left(x^{(i)}-\mu_{j}\right)^{T}}{\sum_{i=1}^{m} w_{j}^{(i)}}
Σj=∑i=1mwj(i)∑i=1mwj(i)(x(i)−μj)(x(i)−μj)T
多项分布参数,考察M-step的目标函数,对于
ϕ
\phi
ϕ,删除常数项
∑
i
=
1
m
∑
j
=
1
k
w
j
(
i
)
log
1
(
2
π
)
n
/
2
∣
Σ
j
∣
1
/
2
exp
(
−
1
2
(
x
(
i
)
−
μ
j
)
T
Σ
j
−
1
(
x
(
i
)
−
μ
j
)
)
⋅
ϕ
j
w
j
(
i
)
\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \frac{\frac{1}{(2 \pi)^{n / 2}\left|\Sigma_{j}\right|^{1 / 2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu_{j}\right)^{T} \Sigma_{j}^{-1}\left(x^{(i)}-\mu_{j}\right)\right) \cdot \phi_{j}}{w_{j}^{(i)}}
i=1∑mj=1∑kwj(i)logwj(i)(2π)n/2∣Σj∣1/21exp(−21(x(i)−μj)TΣj−1(x(i)−μj))⋅ϕj
得到
∑
i
=
1
m
∑
j
=
1
k
w
j
(
i
)
log
ϕ
j
\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \phi_{j}
i=1∑mj=1∑kwj(i)logϕj
拉格朗日乘子法
由于多项分布的概率和为1,建立拉格朗日方程
L
(
ϕ
)
=
∑
i
=
1
m
∑
j
=
1
k
w
j
(
i
)
log
ϕ
j
+
β
(
∑
j
=
1
k
ϕ
j
−
1
)
\mathcal{L}(\phi)=\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)} \log \phi_{j}+\beta\left(\sum_{j=1}^{k} \phi_{j}-1\right)
L(ϕ)=i=1∑mj=1∑kwj(i)logϕj+β(j=1∑kϕj−1)
求偏导,等于0
∂
∂
ϕ
j
L
(
ϕ
)
=
∑
i
=
1
m
w
j
(
i
)
ϕ
j
+
β
−
β
=
∑
i
=
1
m
∑
j
=
1
k
w
j
(
i
)
=
∑
i
=
1
m
1
=
m
ϕ
j
:
=
1
m
∑
i
=
1
m
w
j
(
i
)
\begin{array}{c} \frac{\partial}{\partial \phi_{j}} \mathcal{L}(\phi)=\sum_{i=1}^{m} \frac{w_{j}^{(i)}}{\phi_{j}}+\beta \\ -\beta=\sum_{i=1}^{m} \sum_{j=1}^{k} w_{j}^{(i)}=\sum_{i=1}^{m} 1=m \\ \phi_{j}:=\frac{1}{m} \sum_{i=1}^{m} w_{j}^{(i)} \end{array}
∂ϕj∂L(ϕ)=∑i=1mϕjwj(i)+β−β=∑i=1m∑j=1kwj(i)=∑i=1m1=mϕj:=m1∑i=1mwj(i)
总结,对于所有的数据点,可以看作组份k生成了这些点。组份k是一个标准的高斯分布,利用上面结论:
{
γ
(
i
,
k
)
x
i
∣
i
=
1
,
2
,
⋯
N
}
\left\{\gamma(i, k) x_{i} | i=1,2, \cdots N\right\}
{γ(i,k)xi∣i=1,2,⋯N}。
{
μ
k
=
1
N
k
∑
i
=
1
N
γ
(
i
,
k
)
x
i
Σ
k
=
1
N
k
∑
i
=
1
N
γ
(
i
,
k
)
(
x
i
−
μ
k
)
(
x
i
−
μ
k
)
T
π
k
=
1
N
∑
i
=
1
N
γ
(
i
,
k
)
N
k
=
N
⋅
π
k
\left\{\begin{array}{l} \mu_{k}=\frac{1}{N_{k}} \sum_{i=1}^{N} \gamma(i, k) x_{i} \\ \Sigma_{k}=\frac{1}{N_{k}} \sum_{i=1}^{N} \gamma(i, k)\left(x_{i}-\mu_{k}\right)\left(x_{i}-\mu_{k}\right)^{T} \\ \pi_{k}=\frac{1}{N} \sum_{i=1}^{N} \gamma(i, k) \\ N_{k}=N \cdot \pi_{k} \end{array}\right.
⎩⎪⎪⎪⎨⎪⎪⎪⎧μk=Nk1∑i=1Nγ(i,k)xiΣk=Nk1∑i=1Nγ(i,k)(xi−μk)(xi−μk)Tπk=N1∑i=1Nγ(i,k)Nk=N⋅πk
EM算法(2)
最新推荐文章于 2024-10-25 21:26:56 发布