How to merge two gaussians?
-
μ ^ = w 1 μ 1 + w 2 μ 2 w 1 + w 2 \hat{\mu} = \frac{w_1\mu_1 + w_2\mu_2}{w_1 + w_2} μ^=w1+w2w1μ1+w2μ2
σ 2 = w 1 2 σ 1 2 + w 2 2 σ 2 2 + ( w 1 μ 1 + w 2 μ 2 ) 2 ( w 1 + w 2 ) 2 − μ \sigma^2 = \frac{w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + (w_1\mu_1 + w_2\mu_2)^2}{(w_1 + w_2)^2} - \mu σ2=(w1+w2)2w12σ12+w22σ22+(w1μ1+w2μ2)2−μ
In the multivariate case, suppose the covariance matrix is diagonal:
Σ = w 1 2 Σ 1 + w 2 2 Σ 2 + ( w 1 μ 1 + w 2 μ 2 ) ( w 1 μ 1 + w 2 μ 2 ) T ( w 1 + w 2 ) 2 − μ μ T \Sigma = \frac{w_1^2\Sigma_1 + w_2^2\Sigma_2 + (w_1\mu_1 +w_2\mu_2)(w_1\mu_1 + w_2\mu_2)^T}{(w_1 + w_2)^2}-\mu\mu^T Σ=(w1+w2)2w12Σ1+w22Σ2+(w1μ1+w2μ2)(w1μ1+w2μ2)T−μμT
EM for Gaussian Mixtures
Given a Gaussian Mixture model, the goal is to maximize the likelihood function with respect to the parameters (comprising the means and covariances of the components and the mixing coefficients). (Pattern recognition and machine learning, chapter 9)
-
Initialize the means μ k \mu_k μk, covariances Σ k \Sigma_k Σk and mixing coefficients π k \pi_k πk, and evaluate the initial value of the log likelihood.
-
E step. Evaluate the responsibilities using the current parameter values
γ ( z n k ) = π k N ( x n ∣ μ k , Σ k ) ∑ j = 1 K π j N ( x n ∣ μ j , Σ j ) \gamma(z_{nk}) = \frac{\pi_k N(x_n|\mu_k, \Sigma_k)}{\sum\limits_{j=1}^{K}\pi_j N(x_n|\mu_j, \Sigma_j)} γ(znk)=j=1∑KπjN(xn∣μj,Σj)πkN(xn∣μk,Σk)
-
M step. Re-estimate the parameters using the current responsibilities
μ k n e w = 1 N k ∑ n = 1 N γ ( z n k ) x n \mu_k^{new} = \frac{1}{N_k}\sum\limits_{n=1}^{N}\gamma(z_{nk})x_n μknew=Nk1n=1∑Nγ(znk)xn
Σ k n e w = 1 N k ∑ n = 1 N γ ( z n k ) ( x n − μ k n e w ) ( x n − μ k n e w ) T \Sigma_k^{new} = \frac{1}{N_k}\sum \limits_{n=1}^{N}\gamma(z_{nk})(x_n - \mu_k^{new})(x_n-\mu_k^{new})^T Σknew=Nk1n=1∑Nγ(znk)(xn−μknew)(xn−μknew)T
π k n e w = N k N \pi_k^{new} = \frac{N_k}{N} πknew=NNk
where N k = ∑ n = 1 N γ ( z n k ) N_k = \sum \limits_{n=1}^{N}\gamma(z_{nk}) Nk=n=1∑Nγ(znk)
-
Evaluate the log likelihood, and check for the convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied return to step 2.
ln p ( X ∣ μ , Σ , π ) = ∑ n = 1 N ln { ∑ k = 1 K π k N ( x n ∣ μ k , Σ k ) } \ln p(X|\mu, \Sigma, \pi) = \sum \limits_{n=1}^{N} \ln \begin{Bmatrix} \sum \limits_{k=1}^{K}\pi_k \mathcal{N}(x_n|\mu_k, \Sigma_k) \end{Bmatrix} lnp(X∣μ,Σ,π)=n=1∑Nln{k=1∑KπkN(xn∣μk,Σk)}