混合高斯聚类的一种实现方法

近来看了一篇文章《Hierarchical Clustering of a Mixture Model》【1】,它是一篇比较早的文章了,2005年,Jacob Goldberger Sam Roweis,Department of Computer Science, University of Toronto。文中讲到一种简单的高斯聚类方法,近来有文章将它用于目标检测(Object Detection)最后阶段,以代替NMS(non maximum suppression,非极大值抑制),获得更好的bbox预测。我试了一下,这种方法有其独到之处,经它处理后的bbox相互重叠的情况有很大改善,而且较NMS要“软”一些,没那么“硬”,生成的box样子要比NMS能更好地覆盖目标。接下来,我将结合自己的代码和实验进行小结。

一、高斯聚类

混合高斯模型(MoG,Gaussian Mixture Model)是一种常见的参数化概率模型,其表达形式如下:
f ( y ) = ∑ i = 1 k α i N ( y ; μ i , Σ i ) = ∑ i = 1 k α i f i ( y ) ( 1 ) f(y) = \sum^k_{i=1}\alpha_iN(y;\mu_i,\Sigma_i)= \sum^k_{i=1}\alpha_i f_i(y)\qquad (1) f(y)=i=1kαiN(y;μi,Σi)=i=1kαifi(y)(1)
f ( y ) f(y) f(y)是由 k k k 个d维高斯分布构成的混合分布,各高斯分布 f i ( y ) f_i(y) fi(y) 的期望和方差分别为 μ i , Σ i \mu_i,\Sigma_i μi,Σi。可将每个独立的高斯分布称为一个高斯核,所谓高斯聚类就是将多个高斯核进行聚类,用较少的高斯核来近似表达它,此过程描述如下:
f ( y ) = ∑ i = 1 k α i f i ( y ) ≈ ∑ j = 1 m β j g j ( y ) , and  k > m ( 2 ) f(y) =\sum^k_{i=1}\alpha_i f_i(y)\approx \sum^m_{j=1}\beta_j g_j(y)\quad \text{, and }k\gt m\qquad(2) f(y)=i=1kαifi(y)j=1mβjgj(y), and k>m(2)
(2)式中 g j ( y ) g_j(y) gj(y) 也是与 f i ( y ) f_i(y) fi(y) 同维度的高斯分布,且约等于式左边的高斯核数量要大于右边的高斯核数量。所谓高斯聚类,指的就是用较少的高斯核混合分布来拟合较多核的高斯混合分布。
要完成 G ( y ) = ∑ j = 1 m β j g j ( y ) G(y)= \sum^m_{j=1}\beta_j g_j(y) G(y)=j=1mβjgj(y) F ( y ) = ∑ i = 1 k α i f i ( y ) F(y)=\sum^k_{i=1}\alpha_i f_i(y) F(y)=i=1kαifi(y) 的拟合即求使分布 G ( y ) G(y) G(y) 与分布 F ( y ) F(y) F(y) 距离最小的参数,设 θ \theta θ G ( y ) G(y) G(y) 的可调参数集,于是 G ( y ) G(y) G(y) 可写为 G θ ( y ) G_{\theta}(y) Gθ(y),于是拟合问题就是:
θ = arg ⁡ min ⁡ θ D i s t a n c e ( G θ , F ) ( 3 ) \theta=\arg\min_{\theta} {Distance}(G_{\theta},F)\qquad (3) θ=argθminDistance(Gθ,F)(3)
G ( y ) G(y) G(y) F ( y ) F(y) F(y) 是两个分布,衡量概率分布的距离可以用KL散度:
K L ( G , F ) = E F ( log ⁡ p g p f ) ( 4 ) KL(G,F)=\mathbb E_{F}(\log\frac{p_g}{p_f})\qquad(4) KL(G,F)=EF(logpfpg)(4)
有了衡量拟合效果的距离定义(KL),有了可控参数模型( G ( y ) G(y) G(y)),似乎拟合问题就可以直接转化为最优化问题了:将 G θ ( y ) G_{\theta}(y) Gθ(y) F ( y ) F(y) F(y) 是代入(3)、(4),距离KL对参数集( θ = { β , μ , Σ } \theta=\{\beta,\mu,\Sigma\} θ={β,μ,Σ})求偏导,偏导置零求解最优参数集。但不幸的是 G θ ( y ) G_{\theta}(y) Gθ(y) F ( y ) F(y) F(y) 都是混合高斯模型,这个过程没有闭式解,直接求最优解的方法行不通。怎么办?
突破点1:
两个混合高斯的KL没有闭式,但两个高斯的KL却是有闭式形式的:
设两个高斯分布分别为 N 1 ( μ 1 , Σ 1 ) N_1(\mu_1,\Sigma_1) N1(μ1,Σ1) N 2 ( μ 2 , Σ 2 ) \N_2(\mu_2,\Sigma_2) N2(μ2,Σ2),则它们的KL距离为:
K L ( N 1 ∣ ∣ N 2 ) = 1 2 ( log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ + T r ( Σ 2 − 1 Σ 1 ) + ( μ 1 − μ 2 ) T ( Σ 2 ) − 1 ( μ 1 − μ 2 ) + d ) KL(N_1||N_2) = \frac 12 \left( \log\frac {|\Sigma_2|}{|\Sigma_1|}+Tr(\Sigma_2^{-1}\Sigma_1)+(\mu_1-\mu_2)^T(\Sigma_2)^{-1}(\mu_1-\mu_2)+d\right) KL(N1N2)=21(logΣ1Σ2+Tr(Σ21Σ1)+(μ1μ2)T(Σ2)1(μ1μ2)+d)
证明:
d维高斯分布:
N ( x ∣ u , Σ ) = 1 ( 2 π ) n / 2 ∣ Σ ∣ 1 / 2 exp ⁡ { − 1 2 ( x − u ) T Σ − 1 ( x − u ) } N(x|u,\Sigma ) = {1 \over {{{(2\pi )}^{n/2}}{{\left| \Sigma \right|}^{1/2}}}}\exp \{ - {1 \over 2}{(x - u)^T}{\Sigma ^{ - 1}}(x - u)\} N(xu,Σ)=(2π)n/2Σ1/21exp{21(xu)TΣ1(xu)}
则两个高斯分布的KL散度为:
D K L ( P 1 ∣ ∣ P 2 ) = E P 1 [ log ⁡ P 1 − log ⁡ P 2 ]   = 1 2 E P 1 [ − log ⁡ ∣ Σ 1 ∣ − ( x − u 1 ) T Σ 1 − 1 ( x − u 1 ) + log ⁡ ∣ Σ 2 ∣ + ( x − u 2 ) T Σ 2 − 1 ( x − u 2 ) ]   = 1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ + 1 2 E P 1 [ − ( x − u 1 ) T Σ 1 − 1 ( x − u 1 ) + ( x − u 2 ) T Σ 2 − 1 ( x − u 2 ) ]   = 1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ + 1 2 E P 1 { − t r [ Σ 1 − 1 ( x − u 1 ) ( x − u 1 ) T ] + t r [ Σ 2 − 1 ( x − u 2 ) ( x − u 2 ) T ] } {D_{KL}}({P_1}||{P_2}) = {E_{{P_1}}}\left[ {\log {P_1} - \log {P_2}} \right] \\ \ \\ = {1 \over 2}{E_{{P_1}}}\left[ { - \log \left| {{\Sigma _1}} \right| - {{(x - {u_1})}^T}\Sigma _1^{ - 1}(x - {u_1}) + \log \left| {{\Sigma _2}} \right| + {{(x - {u_2})}^T}\Sigma _2^{ - 1}(x - {u_2})} \right] \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} + {1 \over 2}{E_{{P_1}}}\left[ { - {{(x - {u_1})}^T}\Sigma _1^{ - 1}(x - {u_1}) + {{(x - {u_2})}^T}\Sigma _2^{ - 1}(x - {u_2})} \right] \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} + {1 \over 2}{E_{{P_1}}}\left\{ { - tr\left[ {\Sigma _1^{ - 1}(x - {u_1}){{(x - {u_1})}^T}} \right] + tr\left[ {\Sigma _2^{ - 1}(x - {u_2}){{(x - {u_2})}^T}} \right]} \right\} DKL(P1P2)=EP1[logP1logP2] =21EP1[logΣ1(xu1)TΣ11(xu1)+logΣ2+(xu2)TΣ21(xu2)] =21logΣ1Σ2+21EP1[(xu1)TΣ11(xu1)+(xu2)TΣ21(xu2)] =21logΣ1Σ2+21EP1{tr[Σ11(xu1)(xu1)T]+tr[Σ21(xu2)(xu2)T]}
这是因为 ( x − u 1 ) T Σ 1 − 1 ( x − u 1 ) (x - {u_1})^T\Sigma _1^{ - 1}(x - {u_1}) (xu1)TΣ11(xu1) 是一个标量,因此有:
( x − u 1 ) T Σ 1 − 1 ( x − u 1 ) = t r ( ( x − u 1 ) T Σ 1 − 1 ( x − u 1 ) )   = t r ( Σ 1 − 1 ( x − u 1 ) ( x − u 1 ) T ) (x - {u_1})^T\Sigma _1^{ - 1}(x - {u_1}) = tr\left((x - {u_1})^T\Sigma _1^{ - 1}(x - {u_1})\right)\\ \ \\ =tr\left(\Sigma _1^{ - 1}(x - {u_1})(x - {u_1})^T\right) (xu1)TΣ11(xu1)=tr((xu1)TΣ11(xu1)) =tr(Σ11(xu1)(xu1)T)
= 1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ + 1 2 E P 1 { − t r [ Σ 1 − 1 ( x − u 1 ) ( x − u 1 ) T ] } + 1 2 E P 1 { t r [ Σ 2 − 1 ( x − u 2 ) ( x − u 2 ) T ] }   = 1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − 1 2 t r { E P 1 [ Σ 1 − 1 ( x − u 1 ) ( x − u 1 ) T ] } + 1 2 t r { E P 1 [ Σ 2 − 1 ( x − u 2 ) ( x − u 2 ) T ] }   = 1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − 1 2 t r { Σ 1 − 1 E P 1 [ ( x − u 1 ) ( x − u 1 ) T ] } + 1 2 t r { E P 1 [ Σ 2 − 1 ( x x T − u 2 x T − x u 2 T + u 2 u 2 T ) ] }   = 1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − 1 2 t r { Σ 1 − 1 Σ 1 } + 1 2 t r { Σ 2 − 1 E P 1 ( x x T − u 2 x T − x u 2 T + u 2 u 2 T ) }   = 1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − 1 2 d + 1 2 t r { Σ 2 − 1 ( Σ 1 + u 1 u 1 T − u 2 u 1 T − u 1 u 2 T + u 2 u 2 T ) } = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} + {1 \over 2}{E_{{P_1}}}\left\{ { - tr\left[ {\Sigma _1^{ - 1}(x - {u_1}){{(x - {u_1})}^T}} \right]} \right\} + {1 \over 2}{E_{{P_1}}}\left\{ {tr\left[ {\Sigma _2^{ - 1}(x - {u_2}){{(x - {u_2})}^T}} \right]} \right\} \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - {1 \over 2}tr\left\{ {{E_{{P_1}}}\left[ {\Sigma _1^{ - 1}(x - {u_1}){{(x - {u_1})}^T}} \right]} \right\} + {1 \over 2}tr\left\{ {{E_{{P_1}}}\left[ {\Sigma _2^{ - 1}(x - {u_2}){{(x - {u_2})}^T}} \right]} \right\} \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - {1 \over 2}tr\left\{ {\Sigma _1^{ - 1}{E_{{P_1}}}\left[ {(x - {u_1}){{(x - {u_1})}^T}} \right]} \right\} + {1 \over 2}tr\left\{ {{E_{{P_1}}}\left[ {\Sigma _2^{ - 1}(x{x^T} - {u_2}{x^T} - xu_2^T + {u_2}u_2^T)} \right]} \right\} \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - {1 \over 2}tr\left\{ {\Sigma _1^{ - 1}{\Sigma _1}} \right\} + {1 \over 2}tr\left\{ {\Sigma _2^{ - 1}{E_{{P_1}}}(x{x^T} - {u_2}{x^T} - xu_2^T + {u_2}u_2^T)} \right\} \\ \ \\ = {1 \over 2}\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - {1 \over 2}d + {1 \over 2}tr\left\{ {\Sigma _2^{ - 1}({\Sigma _1} + {u_1}u_1^T - {u_2}u_1^T - {u_1}u_2^T + {u_2}u_2^T)} \right\} =21logΣ1Σ2+21EP1{tr[Σ11(xu1)(xu1)T]}+21EP1{tr[Σ21(xu2)(xu2)T]} =21logΣ1Σ221tr{EP1[Σ11(xu1)(xu1)T]}+21tr{EP1[Σ21(xu2)(xu2)T]} =21logΣ1Σ221tr{Σ11EP1[(xu1)(xu1)T]}+21tr{EP1[Σ21(xxTu2xTxu2T+u2u2T)]} =21logΣ1Σ221tr{Σ11Σ1}+21tr{Σ21EP1(xxTu2xTxu2T+u2u2T)} =21logΣ1Σ221d+21tr{Σ21(Σ1+u1u1Tu2u1Tu1u2T+u2u2T)}
这是因为 x ∼ N 1 ( u 1 , Σ 1 ) x \sim N_1(u_1,\Sigma_1) xN1(u1,Σ1)
E P 1 [ x x T ] = Σ 1 + u 1 u 1 T E_{P_{1}}\left[ {x{x^T}} \right] = \Sigma_1 + u_1{u_1^T} EP1[xxT]=Σ1+u1u1T
= 1 2 { log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − d + t r ( Σ 2 − 1 Σ 1 ) + t r { Σ 2 − 1 ( u 1 u 1 T − u 2 u 1 T − u 1 u 2 T + u 2 u 2 T ) } }   = 1 2 { log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − d + t r ( Σ 2 − 1 Σ 1 ) + t r { Σ 2 − 1 u 1 u 1 T − Σ 2 − 1 u 2 u 1 T − Σ 2 − 1 u 1 u 2 T + Σ 2 − 1 u 2 u 2 T } }   = 1 2 { log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − d + t r ( Σ 2 − 1 Σ 1 ) + t r { u 1 T Σ 2 − 1 u 1 − 2 u 1 T Σ 2 − 1 u 2 + u 2 T Σ 2 − 1 u 2 } }   = 1 2 { log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − d + t r ( Σ 2 − 1 Σ 1 ) + ( u 2 − u 1 ) T Σ 2 − 1 ( u 2 − u 1 ) } = {1 \over 2}\left\{ {\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - d + tr(\Sigma _2^{ - 1}{\Sigma _1}) + tr\left\{ {\Sigma _2^{ - 1}({u_1}u_1^T - {u_2}u_1^T - {u_1}u_2^T + {u_2}u_2^T)} \right\}} \right\} \\ \ \\ = {1 \over 2}\left\{ {\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - d + tr(\Sigma _2^{ - 1}{\Sigma _1}) + tr\left\{ {\Sigma _2^{ - 1}{u_1}u_1^T - \Sigma _2^{ - 1}{u_2}u_1^T - \Sigma _2^{ - 1}{u_1}u_2^T + \Sigma _2^{ - 1}{u_2}u_2^T} \right\}} \right\} \\ \ \\ = {1 \over 2}\left\{ {\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - d + tr(\Sigma _2^{ - 1}{\Sigma _1}) + tr\left\{ {u_1^T\Sigma _2^{ - 1}{u_1} - 2u_1^T\Sigma _2^{ - 1}{u_2} + u_2^T\Sigma _2^{ - 1}{u_2}} \right\}} \right\} \\ \ \\ = {1 \over 2}\left\{ {\log {{\left| {{\Sigma _2}} \right|} \over {\left| {{\Sigma _1}} \right|}} - d + tr(\Sigma _2^{ - 1}{\Sigma _1}) + {{({u_2} - {u_1})}^T}\Sigma _2^{ - 1}({u_2} - {u_1})} \right\} =21{logΣ1Σ2d+tr(Σ21Σ1)+tr{Σ21(u1u1Tu2u1Tu1u2T+u2u2T)}} =21{logΣ1Σ2d+tr(Σ21Σ1)+tr{Σ21u1u1TΣ21u2u1TΣ21u1u2T+Σ21u2u2T}} =21{logΣ1Σ2d+tr(Σ21Σ1)+tr{u1TΣ21u12u1TΣ21u2+u2TΣ21u2}} =21{logΣ1Σ2d+tr(Σ21Σ1)+(u2u1)TΣ21(u2u1)}
这是因为 x ∼ N ( u , Σ ) x \sim N(u,\Sigma) xN(u,Σ) 有:
E ( x T A x ) = t r ( A Σ ) + u T A u E\left( {{x^T}Ax} \right) = tr(A\Sigma ) + {u^T}Au E(xTAx)=tr(AΣ)+uTAu
证毕。

突破点2:
经聚类后, f i f_i fi 必对应于一个 g j g_j gj,一个 g j g_j gj 可以对应多个 f i f_i fi,这是一个多对一的映射,该映射可定义为:
π : i → j ,   i ∈ { 1 , 2 , ⋯   , k } ,   j ∈ { 1 , 2 , ⋯   , m } ,  and  k > m \pi:i\to j,\ i\in \{1,2,\cdots,k\}, \ j\in \{1,2,\cdots, m\},\ \text{and} \ k\gt m π:ij, i{1,2,,k}, j{1,2,,m}, and k>m
由上,定义一个具有闭式形式的混合高斯分布F和G的距离:
d ( F , G , π ) = ∑ i = 1 k α i K L ( f i ∥ g π ( i ) ) ( 5 ) d(F,G,\pi)=\sum_{i=1}^k \alpha_i KL(f_{i} \Vert g_{\pi(i)}) \qquad (5) d(F,G,π)=i=1kαiKL(figπ(i))(5)
高斯聚类的结果就是使(5)最小。

二、聚类迭代

设F有k个高斯核,是需要拟合的混合高斯分布;G有m个高斯核,是参数化混合高斯分布模型。
1、初始化G的m个高斯核;
2、计算 f i f_i fi g j g_j gj 之间距离,根据最近邻原则,构造映射关系 j = π ( i ) j = \pi(i) j=π(i) i = π − 1 ( j ) i=\pi^{-1}(j) i=π1(j) 是它的反函数。
3、计算 group_j 对应的高斯核
u j ′ = 1 β j ∑ i ∈ π − 1 ( j ) α i u i ( 6 a )   Σ j ′ = 1 β j ∑ i ∈ π − 1 ( j ) α i ( Σ i + ( u i − u j ′ ) ( u i − u j ′ ) T ) ( 6 b )   β j = ∑ i ∈ π − 1 ( j ) α i ( 6 c ) u_j'={1 \over {\beta_j}}\sum_{i\in \pi^{-1}(j)} \alpha_i u_i \qquad (6a) \\ \ \\ \Sigma_j'= {1 \over \beta_j}\sum_{i\in \pi^{-1}(j)}\alpha_i\left(\Sigma_i+(u_i-u_j') (u_i-u_j')^T\right) \qquad (6b) \\ \ \\ \beta_j = \sum_{i\in \pi^{-1}(j)}\alpha_i \qquad (6c) uj=βj1iπ1(j)αiui(6a) Σj=βj1iπ1(j)αi(Σi+(uiuj)(uiuj)T)(6b) βj=iπ1(j)αi(6c)
4、重复2、3步骤,至收敛。

三、实验

问题:设F是3个高斯核的混合分布,G有两个高斯核,用G来拟合F。
1、定义高斯核

class guassian_component:
    def __init__(self, miu1, miu2, var1, var2, co12=0, co21=0):
        self.miu = np.asarray([[miu1], [miu2]])
        self.sigma = np.zeros((2,2))
        self.sigma[0][0] = var1
        self.sigma[1][1] = var2
        self.sigma[0][1] = co12
        self.sigma[1][0] = co21
        
    def getMiu(self):
        return self.miu
    
    def getSigma(self):
        return self.sigma
    
    def sigmaDet(self):
        #return self.sigma[0][0]*self.sigma[1][1]
        return np.linalg.det(self.sigma)
    
    def sigmaInv(self):
        '''
        inv = np.zeros((2,2))
        inv[0][0] = 1/self.sigma[0][0]
        inv[1][1] = 1/self.sigma[1][1]
        '''
        inv = np.linalg.inv(self.sigma)
        return inv

def Gaussian2D(x, gaussian_ker):
    y = x-gaussian_ker.getMiu()
    y = np.exp(-0.5*np.matmul(np.matmul(y.transpose(),gaussian_ker.sigmaInv()),y))
    y = 1/(2*np.pi)/np.sqrt(np.abs(gaussian_ker.sigmaDet()))*y
    return y

def f_contour(x,y, guassian_ker):#定义函数生成对于(x,y)的高值
    z = []
    for _y in y:
        for _x in x:
            _xy = np.asarray([[_x],[_y]])
            z.append(Gaussian2D(_xy,guassian_ker))
    z = np.asarray(z)
    return z.reshape((x.shape[0],y.shape[0]))

dx=0.01;dy=0.01
x=np.arange(-2.0,2.0,dx)
y=np.arange(-2.0,2.0,dx)
X,Y=np.meshgrid(x,y)#生成网格矩阵
    
c1 = guassian_component(0,0,0.1,0.2)
z = f_contour(x,y, c1)

fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(1,1,1)
C=ax.contour(X,Y,z,8,colors='black')#绘制等高线,为等高线设置为黑色
ax.contourf(X,Y,z,8)#添加颜色(颜色区分等高线)
ax.clabel(C,inline=1,fontsize=10)#0闭合添加数字,1表示开口添加等高线。等高线字体大小
plt.show()

在这里插入图片描述
2、混合高斯分布

c_a = guassian_component(1,0,0.1,0.3) # miu_x, miu_y, var_x, var_y
c_b = guassian_component(0,0,0.2,0.2)
c_c = guassian_component(0,0.5,0.2,0.1)

dx=0.01;dy=0.01
x=np.arange(-2.0,2.0,dx)
y=np.arange(-2.0,2.0,dx)

z_a = f_contour(x,y, c_a)
z_b = f_contour(x,y, c_b)
z_c = f_contour(x,y, c_c)

z = 0.4*z_a + 0.1*z_b + 0.5*z_c

fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(1,1,1)
C=ax.contour(X,Y,z,8,colors='black')#绘制等高线,为等高线设置为黑色
ax.contourf(X,Y,z,8)#添加颜色(颜色区分等高线)
ax.clabel(C,inline=1,fontsize=10)#0闭合添加数字,1表示开口添加等高线。等高线字体大小
plt.show()

在这里插入图片描述
3、高斯聚类

def gaussian_KL(f_ker, g_ker):
    y = np.log(np.abs(g_ker.sigmaDet())/np.abs(f_ker.sigmaDet()))+ \
        np.matmul(np.matmul((f_ker.getMiu()-g_ker.getMiu()).transpose(),g_ker.sigmaInv()), \
                  (f_ker.getMiu()-g_ker.getMiu())) + \
        np.trace(np.matmul(g_ker.sigmaInv(),f_ker.getSigma()))
    return 0.5*y

def new_GaussianComp(kernels, weights):
    '''
    args:
        kernels: (list) [#num_kernels, ]
        weights: (list) [#num_kernels, ]
        
    returns:
        guassian_component
    '''
    if weights == None:
        raise Exception("group is empty!")
        
    if len(weights)==1:
        #print('group is 1.')
        return kernels[0]
    
    
    miu = np.zeros((2,1))
    sigma = np.zeros((2,2))
    beta = 0
    for w,ker in zip(weights, kernels):
        miu += w*ker.getMiu()
        beta += w
    miu /= beta
    
    for w, ker in zip(weights, kernels):
        y = ker.getSigma()+np.matmul((ker.getMiu()-miu),(ker.getMiu()-miu).transpose())
        #print(ker.getSigma())
        #print(ker.getMiu()-miu)
        #print(y)
        #print()
        y *= w
        sigma += y
        
    sigma /= beta
    return guassian_component(miu[0][0],miu[1][0], sigma[0][0], sigma[1][1], sigma[0][1], sigma[1][0])


class GaussianCluster:
    def __init__(self, group_num, original_gaussians, weights):
        '''
        args:
            orginal_gaussians: (list) each of orginal_gaussian is guassian_component
            weights: (list) the sum of all weights is 1.
        '''
        self.original = original_gaussians
        self.weights = weights
        self.group_num = group_num
        
        self.match_grid = np.zeros((group_num, len(weights)))   # one-hot, one result to serveral original
        self.distance_grid = np.zeros((group_num, len(weights)))   # distance between original-kernels and result-kernels
        
        # initialize group_kers by some selected original-kernels
        idx = np.arange(len(weights))
        np.random.shuffle(idx)
        idx = idx[:group_num]
        
        self.group_kers = []     # new groups
        self.group_betas = []    # new mixture weights
        for i in idx:
            var1 = original_gaussians[i].getSigma()[0][0]
            var2 = original_gaussians[i].getSigma()[1][1]
            miu1 = original_gaussians[i].getMiu()[0][0]+np.random.randn()*np.sqrt(var1)
            miu2 = original_gaussians[i].getMiu()[1][0]+np.random.randn()*np.sqrt(var2)
            ker = guassian_component(miu1, miu2, var1*2,var2*2)
            
            self.group_kers.append(ker)
            
    def cal_distance(self):
        for i,g in enumerate(self.group_kers):
            for j,f in enumerate(self.original):
                self.distance_grid[i][j] = gaussian_KL(f,g)
                
    def cal_pie(self):
        self.match_grid = np.zeros((self.group_num, len(self.weights))) 
        self.cal_distance()
        pie = np.argmin(self.distance_grid, axis=0)
        for i,grp_idx in enumerate(pie):
            self.match_grid[grp_idx][i]=1
    
    def cal_kernels(self):
        self.group_kers = []
        self.group_betas = []
        for i in range(self.group_num):
            group_j = self.match_grid[i]
            #print(group_j)
            index = np.asarray([idx for idx in range(len(self.weights))])
            j_i = index[group_j>0]
            
            if (len(j_i)==0):
                print('j_i is Empty!') 
                # reinitialize the kernel:
                # initialize group_kers by some selected original-kernels
                idx = np.arange(len(self.weights))
                np.random.shuffle(idx)
                idx = idx[0]
                var1 = self.original[i].getSigma()[0][0]
                var2 = self.original[i].getSigma()[1][1]
                miu1 = self.original[i].getMiu()[0][0]+np.random.randn()*np.sqrt(var1)
                miu2 = self.original[i].getMiu()[1][0]+np.random.randn()*np.sqrt(var2)
                ker = guassian_component(miu1, miu2, var1*2,var2*2)
                
                self.group_kers.append(ker)
                self.group_betas.append(0)
                #break
                
            else:
                #print(j_i)
                gau = []
                wgt = []
                for idx in j_i:
                    gau.append(self.original[idx])
                    wgt.append(self.weights[idx])
                c_new = new_GaussianComp(gau,wgt)
                self.group_kers.append(c_new)
                self.group_betas.append(sum(wgt))
            
    def iteration(self, times):
        for _ in range(times):
            self.cal_pie()
            self.cal_kernels()
        return self.group_kers
    
    def getGroupKers(self):
        return self.group_kers
    
    def getGroupBetas(self):
        return self.group_betas

def mixture_for_contour(kernels, weights):
    dx=0.01;dy=0.01
    x=np.arange(-2.0,2.0,dx)
    y=np.arange(-2.0,2.0,dx)

	def f(x,y, guassian_kers, weights):#定义函数生成对于(x,y)的高值
        z = []
        for _y in y:
            for _x in x:
                mix = 0
                for ker,wgt in zip(guassian_kers, weights):
                    _xy = np.asarray([[_x],[_y]])
                    mix += wgt*Gaussian2D(_xy,ker)
                z.append(mix)
        z = np.asarray(z)
        return z.reshape((x.shape[0],y.shape[0]))

    z = f(x,y, kernels, weights)
    return x,y,z

kernels = [c_a, c_b, c_c]
weights = [0.4, 0.1, 0.5]
gCluster = GaussianCluster(2, kernels, weights)

kernels = gCluster.iteration(5)
weights = gCluster.getGroupBetas()
print(weights)

X,Y,Z = mixture_for_contour(kernels, weights)
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(1,1,1)
C=ax.contour(X,Y,Z,8,colors='black')#绘制融合后等高线
C=ax.contour(X0,Y0,Z0,8,colors='black') #绘制原图等高线
ax.contourf(X,Y,Z,8)#添加颜色(颜色区分等高线)
ax.clabel(C,inline=1,fontsize=10)#0闭合添加数字,1表示开口添加等高线。等高线字体大小
plt.show()

在这里插入图片描述

四、小结

该方法能够很好地实现高斯聚类,一般的迭代次数仅为5~6次,但它也有一些局限:
1、初始化对结果有很大的影响;
2、需要设定唯一的超级参数,即聚类的数量。
3、以上实现方法在聚类数量较大的时候,运行速度还是慢一些,可以再做一些优化,比如在期望相差较远的高斯核之间就不用计算其距离了,另外在设定被抛出的高斯核的初始值时,可以更科学一些。
最后,这种方法可以用来合并Object Detection产生的bbox,它的效果与NMS各有千秋,将其复合起来用,效果会更好。


【1】《Hierarchical Clustering of a Mixture Model》

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值