近来看了一篇文章《Hierarchical Clustering of a Mixture Model》【1】,它是一篇比较早的文章了,2005年,Jacob Goldberger Sam Roweis,Department of Computer Science, University of Toronto。文中讲到一种简单的高斯聚类方法,近来有文章将它用于目标检测(Object Detection)最后阶段,以代替NMS(non maximum suppression,非极大值抑制),获得更好的bbox预测。我试了一下,这种方法有其独到之处,经它处理后的bbox相互重叠的情况有很大改善,而且较NMS要“软”一些,没那么“硬”,生成的box样子要比NMS能更好地覆盖目标。接下来,我将结合自己的代码和实验进行小结。
一、高斯聚类
混合高斯模型(MoG,Gaussian Mixture Model)是一种常见的参数化概率模型,其表达形式如下:
f
(
y
)
=
∑
i
=
1
k
α
i
N
(
y
;
μ
i
,
Σ
i
)
=
∑
i
=
1
k
α
i
f
i
(
y
)
(
1
)
f(y) = \sum^k_{i=1}\alpha_iN(y;\mu_i,\Sigma_i)= \sum^k_{i=1}\alpha_i f_i(y)\qquad (1)
f(y)=i=1∑kαiN(y;μi,Σi)=i=1∑kαifi(y)(1)
f
(
y
)
f(y)
f(y)是由
k
k
k 个d维高斯分布构成的混合分布,各高斯分布
f
i
(
y
)
f_i(y)
fi(y) 的期望和方差分别为
μ
i
,
Σ
i
\mu_i,\Sigma_i
μi,Σi。可将每个独立的高斯分布称为一个高斯核,所谓高斯聚类就是将多个高斯核进行聚类,用较少的高斯核来近似表达它,此过程描述如下:
f
(
y
)
=
∑
i
=
1
k
α
i
f
i
(
y
)
≈
∑
j
=
1
m
β
j
g
j
(
y
)
, and
k
>
m
(
2
)
f(y) =\sum^k_{i=1}\alpha_i f_i(y)\approx \sum^m_{j=1}\beta_j g_j(y)\quad \text{, and }k\gt m\qquad(2)
f(y)=i=1∑kαifi(y)≈j=1∑mβjgj(y), and k>m(2)
(2)式中
g
j
(
y
)
g_j(y)
gj(y) 也是与
f
i
(
y
)
f_i(y)
fi(y) 同维度的高斯分布,且约等于式左边的高斯核数量要大于右边的高斯核数量。所谓高斯聚类,指的就是用较少的高斯核混合分布来拟合较多核的高斯混合分布。
要完成
G
(
y
)
=
∑
j
=
1
m
β
j
g
j
(
y
)
G(y)= \sum^m_{j=1}\beta_j g_j(y)
G(y)=∑j=1mβjgj(y) 对
F
(
y
)
=
∑
i
=
1
k
α
i
f
i
(
y
)
F(y)=\sum^k_{i=1}\alpha_i f_i(y)
F(y)=∑i=1kαifi(y) 的拟合即求使分布
G
(
y
)
G(y)
G(y) 与分布
F
(
y
)
F(y)
F(y) 距离最小的参数,设
θ
\theta
θ 是
G
(
y
)
G(y)
G(y) 的可调参数集,于是
G
(
y
)
G(y)
G(y) 可写为
G
θ
(
y
)
G_{\theta}(y)
Gθ(y),于是拟合问题就是:
θ
=
arg
min
θ
D
i
s
t
a
n
c
e
(
G
θ
,
F
)
(
3
)
\theta=\arg\min_{\theta} {Distance}(G_{\theta},F)\qquad (3)
θ=argθminDistance(Gθ,F)(3)
G
(
y
)
G(y)
G(y) 与
F
(
y
)
F(y)
F(y) 是两个分布,衡量概率分布的距离可以用KL散度:
K
L
(
G
,
F
)
=
E
F
(
log
p
g
p
f
)
(
4
)
KL(G,F)=\mathbb E_{F}(\log\frac{p_g}{p_f})\qquad(4)
KL(G,F)=EF(logpfpg)(4)
有了衡量拟合效果的距离定义(KL),有了可控参数模型(
G
(
y
)
G(y)
G(y)),似乎拟合问题就可以直接转化为最优化问题了:将
G
θ
(
y
)
G_{\theta}(y)
Gθ(y) 和
F
(
y
)
F(y)
F(y) 是代入(3)、(4),距离KL对参数集(
θ
=
{
β
,
μ
,
Σ
}
\theta=\{\beta,\mu,\Sigma\}
θ={β,μ,Σ})求偏导,偏导置零求解最优参数集。但不幸的是
G
θ
(
y
)
G_{\theta}(y)
Gθ(y) 和
F
(
y
)
F(y)
F(y) 都是混合高斯模型,这个过程没有闭式解,直接求最优解的方法行不通。怎么办?
突破点1:
两个混合高斯的KL没有闭式,但两个高斯的KL却是有闭式形式的:
设两个高斯分布分别为
N
1
(
μ
1
,
Σ
1
)
N_1(\mu_1,\Sigma_1)
N1(μ1,Σ1) 和
N
2
(
μ
2
,
Σ
2
)
\N_2(\mu_2,\Sigma_2)
N2(μ2,Σ2),则它们的KL距离为:
K
L
(
N
1
∣
∣
N
2
)
=
1
2
(
log
∣
Σ
2
∣
∣
Σ
1
∣
+
T
r
(
Σ
2
−
1
Σ
1
)
+
(
μ
1
−
μ
2
)
T
(
Σ
2
)
−
1
(
μ
1
−
μ
2
)
+
d
)
KL(N_1||N_2) = \frac 12 \left( \log\frac {|\Sigma_2|}{|\Sigma_1|}+Tr(\Sigma_2^{-1}\Sigma_1)+(\mu_1-\mu_2)^T(\Sigma_2)^{-1}(\mu_1-\mu_2)+d\right)
KL(N1∣∣N2)=21(log∣Σ1∣∣Σ2∣+Tr(Σ2−1Σ1)+(μ1−μ2)T(Σ2)−1(μ1−μ2)+d)
证明:
d维高斯分布:
N
(
x
∣
u
,
Σ
)
=
1
(
2
π
)
n
/
2
∣
Σ
∣
1
/
2
exp
{
−
1
2
(
x
−
u
)
T
Σ
−
1
(
x
−
u
)
}
N(x|u,\Sigma ) = {1 \over {{{(2\pi )}^{n/2}}{{\left| \Sigma \right|}^{1/2}}}}\exp \{ - {1 \over 2}{(x - u)^T}{\Sigma ^{ - 1}}(x - u)\}
N(x∣u,Σ)=(2π)n/2∣Σ∣1/21exp{−21(x−u)TΣ−1(x−u)}
则两个高斯分布的KL散度为:
D
K
L
(
P
1
∣
∣
P
2
)
=
E
P
1
[
log
P
1
−
log
P
2
]
=
1
2
E
P
1
[
−
log
∣
Σ
1
∣
−
(
x
−
u
1
)
T
Σ
1
−
1
(
x
−
u
1
)
+
log
∣
Σ
2
∣
+
(
x
−
u
2
)
T
Σ
2
−
1
(
x
−
u
2
)
]
=
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
+
1
2
E
P
1
[
−
(
x
−
u
1
)
T
Σ
1
−
1
(
x
−
u
1
)
+
(
x
−
u
2
)
T
Σ
2
−
1
(
x
−
u
2
)
]
=
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
+
1
2
E
P
1
{
−
t
r
[
Σ
1
−
1
(
x
−
u
1
)
(
x
−
u
1
)
T
]
+
t
r
[
Σ
2
−
1
(
x
−
u
2
)
(
x
−
u
2
)
T
]
}
{D_{KL}}({P_1}||{P_2}) = {E_{{P_1}}}\left[ {\log {P_1} - \log {P_2}} \right] \\ \ \\ = {1 \over 2}{E_{{P_1}}}\left[ { - \log \left| {{\Sigma _1}} \right| - {{(x - {u_1})}^T}\Sigma _1^{ - 1}(x - {u_1}) + \log \left| {{\Sigma _2}} \right| + {{(x - {u_2})}^T}\Sigma _2^{ - 1}(x - {u_2})} \right] \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} + {1 \over 2}{E_{{P_1}}}\left[ { - {{(x - {u_1})}^T}\Sigma _1^{ - 1}(x - {u_1}) + {{(x - {u_2})}^T}\Sigma _2^{ - 1}(x - {u_2})} \right] \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} + {1 \over 2}{E_{{P_1}}}\left\{ { - tr\left[ {\Sigma _1^{ - 1}(x - {u_1}){{(x - {u_1})}^T}} \right] + tr\left[ {\Sigma _2^{ - 1}(x - {u_2}){{(x - {u_2})}^T}} \right]} \right\}
DKL(P1∣∣P2)=EP1[logP1−logP2] =21EP1[−log∣Σ1∣−(x−u1)TΣ1−1(x−u1)+log∣Σ2∣+(x−u2)TΣ2−1(x−u2)] =21log∣Σ1∣∣Σ2∣+21EP1[−(x−u1)TΣ1−1(x−u1)+(x−u2)TΣ2−1(x−u2)] =21log∣Σ1∣∣Σ2∣+21EP1{−tr[Σ1−1(x−u1)(x−u1)T]+tr[Σ2−1(x−u2)(x−u2)T]}
这是因为
(
x
−
u
1
)
T
Σ
1
−
1
(
x
−
u
1
)
(x - {u_1})^T\Sigma _1^{ - 1}(x - {u_1})
(x−u1)TΣ1−1(x−u1) 是一个标量,因此有:
(
x
−
u
1
)
T
Σ
1
−
1
(
x
−
u
1
)
=
t
r
(
(
x
−
u
1
)
T
Σ
1
−
1
(
x
−
u
1
)
)
=
t
r
(
Σ
1
−
1
(
x
−
u
1
)
(
x
−
u
1
)
T
)
(x - {u_1})^T\Sigma _1^{ - 1}(x - {u_1}) = tr\left((x - {u_1})^T\Sigma _1^{ - 1}(x - {u_1})\right)\\ \ \\ =tr\left(\Sigma _1^{ - 1}(x - {u_1})(x - {u_1})^T\right)
(x−u1)TΣ1−1(x−u1)=tr((x−u1)TΣ1−1(x−u1)) =tr(Σ1−1(x−u1)(x−u1)T)
=
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
+
1
2
E
P
1
{
−
t
r
[
Σ
1
−
1
(
x
−
u
1
)
(
x
−
u
1
)
T
]
}
+
1
2
E
P
1
{
t
r
[
Σ
2
−
1
(
x
−
u
2
)
(
x
−
u
2
)
T
]
}
=
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
−
1
2
t
r
{
E
P
1
[
Σ
1
−
1
(
x
−
u
1
)
(
x
−
u
1
)
T
]
}
+
1
2
t
r
{
E
P
1
[
Σ
2
−
1
(
x
−
u
2
)
(
x
−
u
2
)
T
]
}
=
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
−
1
2
t
r
{
Σ
1
−
1
E
P
1
[
(
x
−
u
1
)
(
x
−
u
1
)
T
]
}
+
1
2
t
r
{
E
P
1
[
Σ
2
−
1
(
x
x
T
−
u
2
x
T
−
x
u
2
T
+
u
2
u
2
T
)
]
}
=
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
−
1
2
t
r
{
Σ
1
−
1
Σ
1
}
+
1
2
t
r
{
Σ
2
−
1
E
P
1
(
x
x
T
−
u
2
x
T
−
x
u
2
T
+
u
2
u
2
T
)
}
=
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
−
1
2
d
+
1
2
t
r
{
Σ
2
−
1
(
Σ
1
+
u
1
u
1
T
−
u
2
u
1
T
−
u
1
u
2
T
+
u
2
u
2
T
)
}
= {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} + {1 \over 2}{E_{{P_1}}}\left\{ { - tr\left[ {\Sigma _1^{ - 1}(x - {u_1}){{(x - {u_1})}^T}} \right]} \right\} + {1 \over 2}{E_{{P_1}}}\left\{ {tr\left[ {\Sigma _2^{ - 1}(x - {u_2}){{(x - {u_2})}^T}} \right]} \right\} \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - {1 \over 2}tr\left\{ {{E_{{P_1}}}\left[ {\Sigma _1^{ - 1}(x - {u_1}){{(x - {u_1})}^T}} \right]} \right\} + {1 \over 2}tr\left\{ {{E_{{P_1}}}\left[ {\Sigma _2^{ - 1}(x - {u_2}){{(x - {u_2})}^T}} \right]} \right\} \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - {1 \over 2}tr\left\{ {\Sigma _1^{ - 1}{E_{{P_1}}}\left[ {(x - {u_1}){{(x - {u_1})}^T}} \right]} \right\} + {1 \over 2}tr\left\{ {{E_{{P_1}}}\left[ {\Sigma _2^{ - 1}(x{x^T} - {u_2}{x^T} - xu_2^T + {u_2}u_2^T)} \right]} \right\} \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - {1 \over 2}tr\left\{ {\Sigma _1^{ - 1}{\Sigma _1}} \right\} + {1 \over 2}tr\left\{ {\Sigma _2^{ - 1}{E_{{P_1}}}(x{x^T} - {u_2}{x^T} - xu_2^T + {u_2}u_2^T)} \right\} \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - {1 \over 2}d + {1 \over 2}tr\left\{ {\Sigma _2^{ - 1}({\Sigma _1} + {u_1}u_1^T - {u_2}u_1^T - {u_1}u_2^T + {u_2}u_2^T)} \right\}
=21log∣Σ1∣∣Σ2∣+21EP1{−tr[Σ1−1(x−u1)(x−u1)T]}+21EP1{tr[Σ2−1(x−u2)(x−u2)T]} =21log∣Σ1∣∣Σ2∣−21tr{EP1[Σ1−1(x−u1)(x−u1)T]}+21tr{EP1[Σ2−1(x−u2)(x−u2)T]} =21log∣Σ1∣∣Σ2∣−21tr{Σ1−1EP1[(x−u1)(x−u1)T]}+21tr{EP1[Σ2−1(xxT−u2xT−xu2T+u2u2T)]} =21log∣Σ1∣∣Σ2∣−21tr{Σ1−1Σ1}+21tr{Σ2−1EP1(xxT−u2xT−xu2T+u2u2T)} =21log∣Σ1∣∣Σ2∣−21d+21tr{Σ2−1(Σ1+u1u1T−u2u1T−u1u2T+u2u2T)}
这是因为
x
∼
N
1
(
u
1
,
Σ
1
)
x \sim N_1(u_1,\Sigma_1)
x∼N1(u1,Σ1):
E
P
1
[
x
x
T
]
=
Σ
1
+
u
1
u
1
T
E_{P_{1}}\left[ {x{x^T}} \right] = \Sigma_1 + u_1{u_1^T}
EP1[xxT]=Σ1+u1u1T
=
1
2
{
log
∣
Σ
2
∣
∣
Σ
1
∣
−
d
+
t
r
(
Σ
2
−
1
Σ
1
)
+
t
r
{
Σ
2
−
1
(
u
1
u
1
T
−
u
2
u
1
T
−
u
1
u
2
T
+
u
2
u
2
T
)
}
}
=
1
2
{
log
∣
Σ
2
∣
∣
Σ
1
∣
−
d
+
t
r
(
Σ
2
−
1
Σ
1
)
+
t
r
{
Σ
2
−
1
u
1
u
1
T
−
Σ
2
−
1
u
2
u
1
T
−
Σ
2
−
1
u
1
u
2
T
+
Σ
2
−
1
u
2
u
2
T
}
}
=
1
2
{
log
∣
Σ
2
∣
∣
Σ
1
∣
−
d
+
t
r
(
Σ
2
−
1
Σ
1
)
+
t
r
{
u
1
T
Σ
2
−
1
u
1
−
2
u
1
T
Σ
2
−
1
u
2
+
u
2
T
Σ
2
−
1
u
2
}
}
=
1
2
{
log
∣
Σ
2
∣
∣
Σ
1
∣
−
d
+
t
r
(
Σ
2
−
1
Σ
1
)
+
(
u
2
−
u
1
)
T
Σ
2
−
1
(
u
2
−
u
1
)
}
= {1 \over 2}\left\{ {\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - d + tr(\Sigma _2^{ - 1}{\Sigma _1}) + tr\left\{ {\Sigma _2^{ - 1}({u_1}u_1^T - {u_2}u_1^T - {u_1}u_2^T + {u_2}u_2^T)} \right\}} \right\} \\ \ \\ = {1 \over 2}\left\{ {\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - d + tr(\Sigma _2^{ - 1}{\Sigma _1}) + tr\left\{ {\Sigma _2^{ - 1}{u_1}u_1^T - \Sigma _2^{ - 1}{u_2}u_1^T - \Sigma _2^{ - 1}{u_1}u_2^T + \Sigma _2^{ - 1}{u_2}u_2^T} \right\}} \right\} \\ \ \\ = {1 \over 2}\left\{ {\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - d + tr(\Sigma _2^{ - 1}{\Sigma _1}) + tr\left\{ {u_1^T\Sigma _2^{ - 1}{u_1} - 2u_1^T\Sigma _2^{ - 1}{u_2} + u_2^T\Sigma _2^{ - 1}{u_2}} \right\}} \right\} \\ \ \\ = {1 \over 2}\left\{ {\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - d + tr(\Sigma _2^{ - 1}{\Sigma _1}) + {{({u_2} - {u_1})}^T}\Sigma _2^{ - 1}({u_2} - {u_1})} \right\}
=21{log∣Σ1∣∣Σ2∣−d+tr(Σ2−1Σ1)+tr{Σ2−1(u1u1T−u2u1T−u1u2T+u2u2T)}} =21{log∣Σ1∣∣Σ2∣−d+tr(Σ2−1Σ1)+tr{Σ2−1u1u1T−Σ2−1u2u1T−Σ2−1u1u2T+Σ2−1u2u2T}} =21{log∣Σ1∣∣Σ2∣−d+tr(Σ2−1Σ1)+tr{u1TΣ2−1u1−2u1TΣ2−1u2+u2TΣ2−1u2}} =21{log∣Σ1∣∣Σ2∣−d+tr(Σ2−1Σ1)+(u2−u1)TΣ2−1(u2−u1)}
这是因为
x
∼
N
(
u
,
Σ
)
x \sim N(u,\Sigma)
x∼N(u,Σ) 有:
E
(
x
T
A
x
)
=
t
r
(
A
Σ
)
+
u
T
A
u
E\left( {{x^T}Ax} \right) = tr(A\Sigma ) + {u^T}Au
E(xTAx)=tr(AΣ)+uTAu
证毕。
突破点2:
经聚类后,
f
i
f_i
fi 必对应于一个
g
j
g_j
gj,一个
g
j
g_j
gj 可以对应多个
f
i
f_i
fi,这是一个多对一的映射,该映射可定义为:
π
:
i
→
j
,
i
∈
{
1
,
2
,
⋯
 
,
k
}
,
j
∈
{
1
,
2
,
⋯
 
,
m
}
,
and
k
>
m
\pi:i\to j,\ i\in \{1,2,\cdots,k\}, \ j\in \{1,2,\cdots, m\},\ \text{and} \ k\gt m
π:i→j, i∈{1,2,⋯,k}, j∈{1,2,⋯,m}, and k>m
由上,定义一个具有闭式形式的混合高斯分布F和G的距离:
d
(
F
,
G
,
π
)
=
∑
i
=
1
k
α
i
K
L
(
f
i
∥
g
π
(
i
)
)
(
5
)
d(F,G,\pi)=\sum_{i=1}^k \alpha_i KL(f_{i} \Vert g_{\pi(i)}) \qquad (5)
d(F,G,π)=i=1∑kαiKL(fi∥gπ(i))(5)
高斯聚类的结果就是使(5)最小。
二、聚类迭代
设F有k个高斯核,是需要拟合的混合高斯分布;G有m个高斯核,是参数化混合高斯分布模型。
1、初始化G的m个高斯核;
2、计算
f
i
f_i
fi 与
g
j
g_j
gj 之间距离,根据最近邻原则,构造映射关系
j
=
π
(
i
)
j = \pi(i)
j=π(i),
i
=
π
−
1
(
j
)
i=\pi^{-1}(j)
i=π−1(j) 是它的反函数。
3、计算 group_j 对应的高斯核
u
j
′
=
1
β
j
∑
i
∈
π
−
1
(
j
)
α
i
u
i
(
6
a
)
Σ
j
′
=
1
β
j
∑
i
∈
π
−
1
(
j
)
α
i
(
Σ
i
+
(
u
i
−
u
j
′
)
(
u
i
−
u
j
′
)
T
)
(
6
b
)
β
j
=
∑
i
∈
π
−
1
(
j
)
α
i
(
6
c
)
u_j'={1 \over {\beta_j}}\sum_{i\in \pi^{-1}(j)} \alpha_i u_i \qquad (6a) \\ \ \\ \Sigma_j'= {1 \over \beta_j}\sum_{i\in \pi^{-1}(j)}\alpha_i\left(\Sigma_i+(u_i-u_j') (u_i-u_j')^T\right) \qquad (6b) \\ \ \\ \beta_j = \sum_{i\in \pi^{-1}(j)}\alpha_i \qquad (6c)
uj′=βj1i∈π−1(j)∑αiui(6a) Σj′=βj1i∈π−1(j)∑αi(Σi+(ui−uj′)(ui−uj′)T)(6b) βj=i∈π−1(j)∑αi(6c)
4、重复2、3步骤,至收敛。
三、实验
问题:设F是3个高斯核的混合分布,G有两个高斯核,用G来拟合F。
1、定义高斯核
class guassian_component:
def __init__(self, miu1, miu2, var1, var2, co12=0, co21=0):
self.miu = np.asarray([[miu1], [miu2]])
self.sigma = np.zeros((2,2))
self.sigma[0][0] = var1
self.sigma[1][1] = var2
self.sigma[0][1] = co12
self.sigma[1][0] = co21
def getMiu(self):
return self.miu
def getSigma(self):
return self.sigma
def sigmaDet(self):
#return self.sigma[0][0]*self.sigma[1][1]
return np.linalg.det(self.sigma)
def sigmaInv(self):
'''
inv = np.zeros((2,2))
inv[0][0] = 1/self.sigma[0][0]
inv[1][1] = 1/self.sigma[1][1]
'''
inv = np.linalg.inv(self.sigma)
return inv
def Gaussian2D(x, gaussian_ker):
y = x-gaussian_ker.getMiu()
y = np.exp(-0.5*np.matmul(np.matmul(y.transpose(),gaussian_ker.sigmaInv()),y))
y = 1/(2*np.pi)/np.sqrt(np.abs(gaussian_ker.sigmaDet()))*y
return y
def f_contour(x,y, guassian_ker):#定义函数生成对于(x,y)的高值
z = []
for _y in y:
for _x in x:
_xy = np.asarray([[_x],[_y]])
z.append(Gaussian2D(_xy,guassian_ker))
z = np.asarray(z)
return z.reshape((x.shape[0],y.shape[0]))
dx=0.01;dy=0.01
x=np.arange(-2.0,2.0,dx)
y=np.arange(-2.0,2.0,dx)
X,Y=np.meshgrid(x,y)#生成网格矩阵
c1 = guassian_component(0,0,0.1,0.2)
z = f_contour(x,y, c1)
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(1,1,1)
C=ax.contour(X,Y,z,8,colors='black')#绘制等高线,为等高线设置为黑色
ax.contourf(X,Y,z,8)#添加颜色(颜色区分等高线)
ax.clabel(C,inline=1,fontsize=10)#0闭合添加数字,1表示开口添加等高线。等高线字体大小
plt.show()
2、混合高斯分布
c_a = guassian_component(1,0,0.1,0.3) # miu_x, miu_y, var_x, var_y
c_b = guassian_component(0,0,0.2,0.2)
c_c = guassian_component(0,0.5,0.2,0.1)
dx=0.01;dy=0.01
x=np.arange(-2.0,2.0,dx)
y=np.arange(-2.0,2.0,dx)
z_a = f_contour(x,y, c_a)
z_b = f_contour(x,y, c_b)
z_c = f_contour(x,y, c_c)
z = 0.4*z_a + 0.1*z_b + 0.5*z_c
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(1,1,1)
C=ax.contour(X,Y,z,8,colors='black')#绘制等高线,为等高线设置为黑色
ax.contourf(X,Y,z,8)#添加颜色(颜色区分等高线)
ax.clabel(C,inline=1,fontsize=10)#0闭合添加数字,1表示开口添加等高线。等高线字体大小
plt.show()
3、高斯聚类
def gaussian_KL(f_ker, g_ker):
y = np.log(np.abs(g_ker.sigmaDet())/np.abs(f_ker.sigmaDet()))+ \
np.matmul(np.matmul((f_ker.getMiu()-g_ker.getMiu()).transpose(),g_ker.sigmaInv()), \
(f_ker.getMiu()-g_ker.getMiu())) + \
np.trace(np.matmul(g_ker.sigmaInv(),f_ker.getSigma()))
return 0.5*y
def new_GaussianComp(kernels, weights):
'''
args:
kernels: (list) [#num_kernels, ]
weights: (list) [#num_kernels, ]
returns:
guassian_component
'''
if weights == None:
raise Exception("group is empty!")
if len(weights)==1:
#print('group is 1.')
return kernels[0]
miu = np.zeros((2,1))
sigma = np.zeros((2,2))
beta = 0
for w,ker in zip(weights, kernels):
miu += w*ker.getMiu()
beta += w
miu /= beta
for w, ker in zip(weights, kernels):
y = ker.getSigma()+np.matmul((ker.getMiu()-miu),(ker.getMiu()-miu).transpose())
#print(ker.getSigma())
#print(ker.getMiu()-miu)
#print(y)
#print()
y *= w
sigma += y
sigma /= beta
return guassian_component(miu[0][0],miu[1][0], sigma[0][0], sigma[1][1], sigma[0][1], sigma[1][0])
class GaussianCluster:
def __init__(self, group_num, original_gaussians, weights):
'''
args:
orginal_gaussians: (list) each of orginal_gaussian is guassian_component
weights: (list) the sum of all weights is 1.
'''
self.original = original_gaussians
self.weights = weights
self.group_num = group_num
self.match_grid = np.zeros((group_num, len(weights))) # one-hot, one result to serveral original
self.distance_grid = np.zeros((group_num, len(weights))) # distance between original-kernels and result-kernels
# initialize group_kers by some selected original-kernels
idx = np.arange(len(weights))
np.random.shuffle(idx)
idx = idx[:group_num]
self.group_kers = [] # new groups
self.group_betas = [] # new mixture weights
for i in idx:
var1 = original_gaussians[i].getSigma()[0][0]
var2 = original_gaussians[i].getSigma()[1][1]
miu1 = original_gaussians[i].getMiu()[0][0]+np.random.randn()*np.sqrt(var1)
miu2 = original_gaussians[i].getMiu()[1][0]+np.random.randn()*np.sqrt(var2)
ker = guassian_component(miu1, miu2, var1*2,var2*2)
self.group_kers.append(ker)
def cal_distance(self):
for i,g in enumerate(self.group_kers):
for j,f in enumerate(self.original):
self.distance_grid[i][j] = gaussian_KL(f,g)
def cal_pie(self):
self.match_grid = np.zeros((self.group_num, len(self.weights)))
self.cal_distance()
pie = np.argmin(self.distance_grid, axis=0)
for i,grp_idx in enumerate(pie):
self.match_grid[grp_idx][i]=1
def cal_kernels(self):
self.group_kers = []
self.group_betas = []
for i in range(self.group_num):
group_j = self.match_grid[i]
#print(group_j)
index = np.asarray([idx for idx in range(len(self.weights))])
j_i = index[group_j>0]
if (len(j_i)==0):
print('j_i is Empty!')
# reinitialize the kernel:
# initialize group_kers by some selected original-kernels
idx = np.arange(len(self.weights))
np.random.shuffle(idx)
idx = idx[0]
var1 = self.original[i].getSigma()[0][0]
var2 = self.original[i].getSigma()[1][1]
miu1 = self.original[i].getMiu()[0][0]+np.random.randn()*np.sqrt(var1)
miu2 = self.original[i].getMiu()[1][0]+np.random.randn()*np.sqrt(var2)
ker = guassian_component(miu1, miu2, var1*2,var2*2)
self.group_kers.append(ker)
self.group_betas.append(0)
#break
else:
#print(j_i)
gau = []
wgt = []
for idx in j_i:
gau.append(self.original[idx])
wgt.append(self.weights[idx])
c_new = new_GaussianComp(gau,wgt)
self.group_kers.append(c_new)
self.group_betas.append(sum(wgt))
def iteration(self, times):
for _ in range(times):
self.cal_pie()
self.cal_kernels()
return self.group_kers
def getGroupKers(self):
return self.group_kers
def getGroupBetas(self):
return self.group_betas
def mixture_for_contour(kernels, weights):
dx=0.01;dy=0.01
x=np.arange(-2.0,2.0,dx)
y=np.arange(-2.0,2.0,dx)
def f(x,y, guassian_kers, weights):#定义函数生成对于(x,y)的高值
z = []
for _y in y:
for _x in x:
mix = 0
for ker,wgt in zip(guassian_kers, weights):
_xy = np.asarray([[_x],[_y]])
mix += wgt*Gaussian2D(_xy,ker)
z.append(mix)
z = np.asarray(z)
return z.reshape((x.shape[0],y.shape[0]))
z = f(x,y, kernels, weights)
return x,y,z
kernels = [c_a, c_b, c_c]
weights = [0.4, 0.1, 0.5]
gCluster = GaussianCluster(2, kernels, weights)
kernels = gCluster.iteration(5)
weights = gCluster.getGroupBetas()
print(weights)
X,Y,Z = mixture_for_contour(kernels, weights)
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(1,1,1)
C=ax.contour(X,Y,Z,8,colors='black')#绘制融合后等高线
C=ax.contour(X0,Y0,Z0,8,colors='black') #绘制原图等高线
ax.contourf(X,Y,Z,8)#添加颜色(颜色区分等高线)
ax.clabel(C,inline=1,fontsize=10)#0闭合添加数字,1表示开口添加等高线。等高线字体大小
plt.show()
四、小结
该方法能够很好地实现高斯聚类,一般的迭代次数仅为5~6次,但它也有一些局限:
1、初始化对结果有很大的影响;
2、需要设定唯一的超级参数,即聚类的数量。
3、以上实现方法在聚类数量较大的时候,运行速度还是慢一些,可以再做一些优化,比如在期望相差较远的高斯核之间就不用计算其距离了,另外在设定被抛出的高斯核的初始值时,可以更科学一些。
最后,这种方法可以用来合并Object Detection产生的bbox,它的效果与NMS各有千秋,将其复合起来用,效果会更好。
【1】《Hierarchical Clustering of a Mixture Model》