前言
阅读背景差分算法相关文献碰到了这个模型,比较生疏,故此学习。
一、模型介绍
如图所示,横轴上的数据集的密度函数可以视为两个高斯分布的叠加。
从几何角度看,可以将其视为多个高斯分布叠加的加权平均。
p
(
x
)
=
∑
k
=
1
K
α
k
N
(
x
∣
μ
k
,
∑
k
)
,
∑
k
=
1
K
α
k
=
1
p(x)=\sum_{k=1}^K \alpha_k N(x |\mu_k,{\sum}_k ),\sum_{k=1}^K \alpha_k=1
p(x)=k=1∑KαkN(x∣μk,∑k),k=1∑Kαk=1
从混合模型(生成模型)角度来看,x(observed variable),z(latent variable->对应的样本x属于哪一个高斯分布->离散随机变量)
如下表所示
z | c 1 c_1 c1 | c 2 c_2 c2 | … | c k c_k ck |
---|---|---|---|---|
p(x) | p 1 p_1 p1 | p 2 p_2 p2 | … | p k p_k pk |
概率图表示:
其中p={
p
1
,
p
2
,
.
.
.
,
p
k
p_{1},p_{2},...,p_k
p1,p2,...,pk}
二、极大似然
将p(x)分解:
p
(
x
)
=
∑
z
p
(
x
,
z
)
=
∑
k
=
1
K
p
(
x
,
z
=
c
k
)
=
∑
k
=
1
K
p
(
z
=
c
k
)
⋅
p
(
x
∣
z
=
c
k
)
=
∑
k
=
1
K
p
k
⋅
N
(
x
∣
μ
k
,
∑
k
)
p(x)=\sum_z p(x,z) \\=\sum_{k=1}^K p(x,z=c_k) \\=\sum_{k=1}^K p(z=c_k)\cdot p(x|z=c_k) \\=\sum_{k=1}^K p_k \cdot N(x|\mu_k,{\sum}_k)
p(x)=z∑p(x,z)=k=1∑Kp(x,z=ck)=k=1∑Kp(z=ck)⋅p(x∣z=ck)=k=1∑Kpk⋅N(x∣μk,∑k)
一些参数:
X:observed data->X=(
x
1
,
x
2
,
.
.
.
,
x
N
x_1,x_2,...,x_N
x1,x2,...,xN)
(X,Z):complete data
θ \theta θ:parameter-> θ = { p 1 , p 2 , . . . , p k , μ 1 , μ 2 , . . . , μ k , ∑ 1 , ∑ 2 , . . . , ∑ k } \theta=\{p_1,p_2,...,p_k,\mu_1,\mu_2,...,\mu_k,{\sum}_1,{\sum}_2,...,{\sum}_k \} θ={p1,p2,...,pk,μ1,μ2,...,μk,∑1,∑2,...,∑k}
θ ^ = a r g m a x θ log ( p ( x ) ) = a r g m a x θ log ( ∏ i = 1 N p ( x ) ) = a r g m a x θ ∑ i = 1 N log ( p ( x ) ) = a r g m a x θ ∑ i = 1 N log ( ∑ k = 1 K p k ∼ N ( x ∣ μ k , ∑ k ) ) \hat{\theta}=arg {max}_\theta \log(p(x)) \\=arg {max}_\theta \log(\prod_{i=1}^N p(x)) \\=arg {max}_\theta \sum_{i=1}^N \log(p(x)) \\=arg {max}_\theta \sum_{i=1}^N \log(\sum_{k=1}^K p_k \thicksim N(x|\mu_k,{\sum}_k)) θ^=argmaxθlog(p(x))=argmaxθlog(∏i=1Np(x))=argmaxθ∑i=1Nlog(p(x))=argmaxθ∑i=1Nlog(∑k=1Kpk∼N(x∣μk,∑k))
三、EM求解
EM公式: θ t + 1 = a r g m a x E z ∣ x , θ t [ log p ( x , z ∣ θ ) ] \theta^{t+1}=argmax E_{z|x,\theta^t}[\log p(x,z|\theta)] θt+1=argmaxEz∣x,θt[logp(x,z∣θ)]
E-step
可简写为Q函数:
Q
(
θ
,
θ
t
)
Q(\theta,\theta^t)
Q(θ,θt)
Q
(
θ
,
θ
t
)
=
∫
z
log
p
(
x
,
z
∣
θ
)
⋅
p
(
z
∣
x
,
θ
t
)
d
z
=
∑
i
=
1
N
∑
z
i
log
p
(
x
i
,
z
i
∣
θ
i
)
⋅
p
(
z
i
∣
x
i
,
θ
t
)
=
∑
i
=
1
N
∑
z
i
log
[
p
z
i
⋅
N
(
x
i
∣
μ
z
i
,
∑
z
i
)
]
⋅
p
z
i
⋅
N
(
x
i
∣
μ
z
i
t
,
∑
z
i
t
)
∑
k
=
1
K
p
k
t
⋅
N
(
x
i
∣
μ
z
i
t
,
∑
z
i
t
)
Q(\theta,\theta^t)=\int_z \log p(x,z|\theta) \cdot p(z|x,\theta^t)dz \\=\sum_{i=1}^N \sum_{z_i}\log p(x_i,z_i|\theta_i) \cdot p(z_i|x_i,\theta^t) \\=\sum_{i=1}^N \sum_{z_i}\log[ p_{z_i} \cdot N(x_i|\mu_{z_i},{\sum}_{z_i})] \cdot \frac{p_{z_i} \cdot N(x_i|\mu_{z_i}^t,{\sum}_{z_i}^t)}{\sum_{k=1}^K p_k^t \cdot N(x_i|\mu_{z_i}^t,{\sum}_{z_i}^t)}
Q(θ,θt)=∫zlogp(x,z∣θ)⋅p(z∣x,θt)dz=i=1∑Nzi∑logp(xi,zi∣θi)⋅p(zi∣xi,θt)=i=1∑Nzi∑log[pzi⋅N(xi∣μzi,∑zi)]⋅∑k=1Kpkt⋅N(xi∣μzit,∑zit)pzi⋅N(xi∣μzit,∑zit)
推导过程:
M-step:
θ
t
+
1
=
a
r
g
m
a
x
θ
Q
(
θ
,
θ
t
)
\theta^{t+1}=argmax_{\theta} Q(\theta,\theta^t)
θt+1=argmaxθQ(θ,θt)
求
p
k
t
+
1
=
a
r
g
m
a
x
p
k
∑
k
=
1
K
∑
i
=
1
N
log
p
k
⋅
p
(
z
i
=
c
k
∣
x
i
,
θ
t
)
,
s
.
t
.
∑
k
=
1
K
p
k
=
1
p_k^{t+1}=argmax_{p_ k}\sum_{k=1}^K\sum_{i=1}^N \log p_k \cdot p(z_i=c_k|x_i,\theta^t),s.t. \sum_{k=1}^K p_k=1
pkt+1=argmaxpk∑k=1K∑i=1Nlogpk⋅p(zi=ck∣xi,θt),s.t.∑k=1Kpk=1
拉格朗日乘数法:
其他参数同理