为了解决高斯模型的单峰性的问题,我们引入多个高斯模型的加权平均来拟合多峰数据:
p ( x ) = ∑ k = 1 K α k N ( μ k , Σ k ) p(x)=\sum_{k=1}^K\alpha_k\mathcal{N}(\mu_k,\Sigma_k) p(x)=k=1∑KαkN(μk,Σk)
引入隐变量z,这个变量表示对应的样本x属于一个高斯分布,这个变量时一个离散的随机变量:
p ( z = i ) = p i , ∑ i = 1 k p ( z = i ) = 1 p(z=i)=p_i,\sum_{i=1}^kp(z=i)=1 p(z=i)=pi,i=1∑kp(z=i)=1
作为一个生成式模型,高斯混合模型通过隐变量z的分布来生成样本。用概率图来表示:
其中,节点z就是上面的概率,x就是生成的高斯分布。于是对
p
(
x
)
p(x)
p(x):
p
(
x
)
=
∑
z
p
(
x
,
z
)
=
∑
k
=
1
K
p
(
x
,
z
=
k
)
=
∑
k
=
1
K
p
(
z
=
k
)
p
(
x
∣
z
=
k
)
p(x)=\sum_zp(x,z)=\sum_{k=1}^Kp(x,z=k)=\sum_{k=1}^Kp(z=k)p(x|z=k)
p(x)=z∑p(x,z)=k=1∑Kp(x,z=k)=k=1∑Kp(z=k)p(x∣z=k)
因此:
p
(
x
)
=
∑
k
=
1
K
p
k
N
(
x
∣
μ
k
,
Σ
k
)
p(x) = \sum_{k=1}^Kp_k\mathcal{N}(x|\mu_k,\Sigma_k)
p(x)=k=1∑KpkN(x∣μk,Σk)
极大似然估计
样本为
X
=
(
x
1
,
x
2
,
.
.
.
,
x
N
)
X=(x_1,x_2,...,x_N)
X=(x1,x2,...,xN),
(
X
,
Z
)
(X,Z)
(X,Z)为完全参数,参数为
θ
=
p
1
,
p
2
,
.
.
.
,
p
k
,
μ
1
,
μ
2
,
.
.
.
,
μ
k
,
Σ
1
,
Σ
2
,
.
.
.
,
Σ
k
\theta={p_1, p_2, ..., p_k, \mu_1, \mu_2,..., \mu_k,\Sigma_1,\Sigma_2,...,\Sigma_k}
θ=p1,p2,...,pk,μ1,μ2,...,μk,Σ1,Σ2,...,Σk.我们通过极大似然估计得到
θ
\theta
θ的值:
θ
M
L
E
=
a
r
g
m
a
x
θ
l
o
g
p
(
X
)
=
a
r
g
m
a
x
θ
∑
i
=
1
N
l
o
g
p
(
x
i
)
=
a
r
g
m
a
x
θ
∑
i
=
1
N
l
o
g
∑
k
=
1
K
p
k
N
(
x
i
∣
μ
k
,
Σ
k
)
\theta_{MLE}=argmax_{\theta}logp(X)=argmax_{\theta}\sum_{i=1}^Nlogp(x_i)\\ =argmax_{\theta}\sum_{i=1}^Nlog\sum_{k=1}^Kp_k\mathcal{N}(x_i|\mu_k,\Sigma_k)
θMLE=argmaxθlogp(X)=argmaxθi=1∑Nlogp(xi)=argmaxθi=1∑Nlogk=1∑KpkN(xi∣μk,Σk)
这个表达式直接通过求导,由于连加号的存在,无法得到解析解。因此需要使用EM算法。
EM求解GMM
EM算法的基本表达式为:
θ
t
+
1
=
a
r
g
m
a
x
θ
E
z
∣
x
,
θ
t
[
p
(
x
,
z
∣
θ
)
]
\theta^{t+1}=\mathop{argmax}\limits_{\theta}\mathbb{E}_{z|x,\theta_t}[p(x,z|\theta)]
θt+1=θargmaxEz∣x,θt[p(x,z∣θ)]。套用GMM的表达式,对数据集来说:
Q
(
θ
,
θ
t
)
=
∑
z
[
log
∏
i
=
1
N
p
(
x
i
,
z
i
∣
θ
)
]
∏
i
=
1
N
p
(
z
i
∣
x
i
,
θ
t
)
=
∑
z
[
∑
i
=
1
N
log
p
(
x
i
,
z
i
∣
θ
)
]
∏
i
=
1
N
p
(
z
i
∣
x
i
,
θ
t
)
Q(\theta,\theta^t)=\sum\limits_z[\log\prod\limits_{i=1}^Np(x_i,z_i|\theta)]\prod \limits_{i=1}^Np(z_i|x_i,\theta^t)\\ =\sum\limits_z[\sum\limits_{i=1}^N\log p(x_i,z_i|\theta)]\prod \limits_{i=1}^Np(z_i|x_i,\theta^t)
Q(θ,θt)=z∑[logi=1∏Np(xi,zi∣θ)]i=1∏Np(zi∣xi,θt)=z∑[i=1∑Nlogp(xi,zi∣θ)]i=1∏Np(zi∣xi,θt)
对于中间的那个求和号,展开,第一项为:
∑
z
log
p
(
x
1
,
z
1
∣
θ
)
∏
i
=
1
N
p
(
z
i
∣
x
i
,
θ
t
)
=
∑
z
log
p
(
x
1
,
z
1
∣
θ
)
p
(
z
1
∣
x
1
,
θ
t
)
∏
i
=
2
N
p
(
z
i
∣
x
i
,
θ
t
)
=
∑
z
1
log
p
(
x
1
,
z
1
∣
θ
)
p
(
z
1
∣
x
1
,
θ
t
)
∑
z
2
,
⋯
,
z
K
∏
i
=
2
N
p
(
z
i
∣
x
i
,
θ
t
)
=
∑
z
1
log
p
(
x
1
,
z
1
∣
θ
)
p
(
z
1
∣
x
1
,
θ
t
)
\sum\limits_z\log p(x_1,z_1|\theta)\prod\limits_{i=1}^Np(z_i|x_i,\theta^t)=\sum\limits_z\log p(x_1,z_1|\theta)p(z_1|x_1,\theta^t)\prod\limits_{i=2}^Np(z_i|x_i,\theta^t)\\ =\sum\limits_{z_1}\log p(x_1,z_1|\theta) p(z_1|x_1,\theta^t)\sum\limits_{z_2,\cdots,z_K}\prod\limits_{i=2}^Np(z_i|x_i,\theta^t)\\ =\sum\limits_{z_1}\log p(x_1,z_1|\theta)p(z_1|x_1,\theta^t)
z∑logp(x1,z1∣θ)i=1∏Np(zi∣xi,θt)=z∑logp(x1,z1∣θ)p(z1∣x1,θt)i=2∏Np(zi∣xi,θt)=z1∑logp(x1,z1∣θ)p(z1∣x1,θt)z2,⋯,zK∑i=2∏Np(zi∣xi,θt)=z1∑logp(x1,z1∣θ)p(z1∣x1,θt)
类似地,Q可以写为:
Q
(
θ
,
θ
t
)
=
∑
i
=
1
N
∑
z
i
log
p
(
x
i
,
z
i
∣
θ
)
p
(
z
i
∣
x
i
,
θ
t
)
Q(\theta,\theta^t)=\sum\limits_{i=1}^N\sum\limits_{z_i}\log p(x_i,z_i|\theta)p(z_i|x_i,\theta^t)
Q(θ,θt)=i=1∑Nzi∑logp(xi,zi∣θ)p(zi∣xi,θt)
对于
p
(
x
,
z
∣
θ
)
p(x,z|\theta)
p(x,z∣θ):
p
(
x
,
z
∣
θ
)
=
p
(
z
∣
θ
)
p
(
x
∣
z
,
θ
)
=
p
z
N
(
x
∣
μ
z
,
Σ
z
)
p(x,z|\theta)=p(z|\theta)p(x|z,\theta)=p_z\mathcal{N}(x|\mu_z,\Sigma_z)
p(x,z∣θ)=p(z∣θ)p(x∣z,θ)=pzN(x∣μz,Σz)
代入
Q
Q
Q:
Q
=
∑
i
=
1
N
∑
z
i
log
p
z
i
N
(
x
i
∣
μ
z
i
,
Σ
z
i
)
p
z
i
t
N
(
x
i
∣
μ
z
i
t
,
Σ
z
i
t
)
∑
k
p
k
t
N
(
x
i
∣
μ
k
t
,
Σ
k
t
)
Q=\sum\limits_{i=1}^N\sum\limits_{z_i}\log p_{z_i}\mathcal{N(x_i|\mu_{z_i},\Sigma_{z_i})}\frac{p_{z_i}^t\mathcal{N}(x_i|\mu_{z_i}^t,\Sigma_{z_i}^t)}{\sum\limits_kp_k^t\mathcal{N}(x_i|\mu_k^t,\Sigma_k^t)}
Q=i=1∑Nzi∑logpziN(xi∣μzi,Σzi)k∑pktN(xi∣μkt,Σkt)pzitN(xi∣μzit,Σzit)
下面需要对Q值求最大值:
Q
=
∑
k
=
1
K
∑
i
=
1
N
[
log
p
k
+
log
N
(
x
i
∣
μ
k
,
Σ
k
)
]
p
(
z
i
=
k
∣
x
i
,
θ
t
)
Q=\sum\limits_{k=1}^K\sum\limits_{i=1}^N[\log p_k+\log \mathcal{N}(x_i|\mu_k,\Sigma_k)]p(z_i=k|x_i,\theta^t)
Q=k=1∑Ki=1∑N[logpk+logN(xi∣μk,Σk)]p(zi=k∣xi,θt)
-
p
k
t
+
1
p_k^{t+1}
pkt+1:
p k t + 1 = a r g m a x p k ∑ k = 1 K ∑ i = 1 N [ log p k + log N ( x i ∣ μ k , Σ k ) ] p ( z i = k ∣ x i , θ t ) s . t . ∑ k = 1 K p k = 1 p_k^{t+1}=\mathop{argmax}_{p_k}\sum\limits_{k=1}^K\sum\limits_{i=1}^N[\log p_k+\log \mathcal{N}(x_i|\mu_k,\Sigma_k)]p(z_i=k|x_i,\theta^t)\ s.t.\ \sum\limits_{k=1}^Kp_k=1 pkt+1=argmaxpkk=1∑Ki=1∑N[logpk+logN(xi∣μk,Σk)]p(zi=k∣xi,θt) s.t. k=1∑Kpk=1
即:
p k t + 1 = a r g m a x p k ∑ k = 1 K ∑ i = 1 N log p k p ( z i = k ∣ x i , θ t ) s . t . ∑ k = 1 K p k = 1 p_k^{t+1}=\mathop{argmax}_{p_k}\sum\limits_{k=1}^K\sum\limits_{i=1}^N\log p_kp(z_i=k|x_i,\theta^t)\ s.t.\ \sum\limits_{k=1}^Kp_k=1 pkt+1=argmaxpkk=1∑Ki=1∑Nlogpkp(zi=k∣xi,θt) s.t. k=1∑Kpk=1
引入Lagrange乘子:
L ( p k , λ ) = ∑ k = 1 K ∑ i = 1 N log p k p ( z i = k ∣ x i , θ t ) − λ ( 1 − ∑ k = 1 K p k ) L(p_k,\lambda)=\sum\limits_{k=1}^K\sum\limits_{i=1}^N\log p_kp(z_i=k|x_i,\theta^t)-\lambda(1-\sum\limits_{k=1}^Kp_k) L(pk,λ)=k=1∑Ki=1∑Nlogpkp(zi=k∣xi,θt)−λ(1−k=1∑Kpk)。所以:
∂ ∂ p k L = ∑ i = 1 N 1 p k p ( z i = k ∣ x i , θ t ) + λ = 0 ⇒ ∑ k ∑ i = 1 N 1 p k p ( z i = k ∣ x i , θ t ) + λ ∑ k p k = 0 ⇒ λ = − N \frac{\partial}{\partial p_k}L=\sum\limits_{i=1}^N\frac{1}{p_k}p(z_i=k|x_i,\theta^t)+\lambda=0\\ \Rightarrow \sum\limits_k\sum\limits_{i=1}^N\frac{1}{p_k}p(z_i=k|x_i,\theta^t)+\lambda\sum\limits_kp_k=0\\ \Rightarrow\lambda=-N ∂pk∂L=i=1∑Npk1p(zi=k∣xi,θt)+λ=0⇒k∑i=1∑Npk1p(zi=k∣xi,θt)+λk∑pk=0⇒λ=−N
于是有:
p k t + 1 = 1 N ∑ i = 1 N p ( z i = k ∣ x i , θ t ) p_k^{t+1}=\frac{1}{N}\sum\limits_{i=1}^Np(z_i=k|x_i,\theta^t) pkt+1=N1i=1∑Np(zi=k∣xi,θt) - μ k , Σ k \mu_k,\Sigma_k μk,Σk,这两个参数是无约束的,直接求导即可。