往期文章链接目录
文章目录
Gaussian mixture model (GMM)
A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters.
Interpretation from geometry
p ( x ) p(x) p(x) is a weighted sum of multiple Gaussian distribution.
p ( x ) = ∑ k = 1 K α k ⋅ N ( x ∣ μ k , Σ k ) p(x)=\sum_{k=1}^{K} \alpha_{k} \cdot \mathcal{N}\left(x | \mu_{k}, \Sigma_{k}\right) p(x)=k=1∑Kαk⋅N(x∣μk,Σk)
Interpretation from mixture model
setup:
-
The total number of Gaussian distribution K K K.
-
x x x, a sample (observed variable).
-
z z z, the distribution of the sample x x x (a latent variable), where
-
z ∈ { c 1 , c 2 , . . . , c K } z \in \{c_1, c_2, ..., c_K\} z∈{ c1,c2,...,cK}.
-
∑ k = 1 K p ( z = c k ) = 1 \sum_{k=1}^K p(z=c_k)= 1 ∑k=1Kp(z=ck)=1. We denote p ( z = c k ) p(z=c_k) p(z=ck) by p k p_k pk.
-
Mixture models are usually generative models, which means new data can be drawn from the distribution of models. Specifically, in the Gaussian Mixture Model (GMM), a new data is generated by first select a class c k c_k ck based on the probability distribution of all classes c c c, and then draw a value from the Gaussian distribution of that class. Therefore, we could write p ( x ) p(x) p(x) as the following
p ( x ) = ∑ z p ( x , z ) = ∑ k = 1 K p ( x , z = c k ) = ∑ k = 1 K p ( z = c k ) ⋅ p ( x ∣ z = c k ) = ∑ k = 1 K p k ⋅ N ( x ∣ μ k , Σ k ) \begin{aligned} p(x) &= \sum_z p(x,z) \\ &= \sum_{k=1}^{K} p(x, z=c_k) \\ &= \sum_{k=1}^{K} p(z=c_k) \cdot p(x|z=c_k) \\ &= \sum_{k=1}^{K} p_k \cdot \mathcal{N}(x | \mu_{k}, \Sigma_{k}) \end{aligned} p(x)=z∑p(x,z)=k=1∑Kp(x,z=ck)=k=1∑Kp(z=ck)⋅p(x∣z=ck)=k=1∑Kpk⋅N(x∣μk,Σk)
We see that two ways of interpretation reach to the same result.
GMM Derivation
set up
-
X: observed data, where X = ( x 1 , x 2 , . . . , x N ) X = (x_1, x_2, ..., x_N) X=(x1,x2,...,xN)
-
θ \theta θ: parameter of the model, where θ = { p 1 , p 2 , ⋯ , p K , μ 1 , μ 2 , ⋯ , μ K , Σ 1 , Σ 2 , ⋯ , Σ K } \theta=\left\{p_{1}, p_{2}, \cdots, p_{K}, \mu_{1}, \mu_{2}, \cdots, \mu_{K}, \Sigma_{1}, \Sigma_{2}, \cdots, \Sigma_{K}\right\} θ={ p1,p2,⋯,pK,μ1,μ2,⋯,μK,Σ1,Σ2,⋯,ΣK}
-
p ( x ) = ∑ k = 1 K p k ⋅ N ( x ∣ μ k , Σ k ) p(x) = \sum_{k=1}^{K} p_k \cdot \mathcal{N}(x | \mu_{k}, \Sigma_{k}) p(x)=∑k=1Kpk⋅N(x∣μk,Σk).
-
p ( x , z ) = p ( z ) ⋅ p ( x ∣ z ) = p z ⋅ N ( x ∣ μ z , Σ z ) p(x,z) = p(z) \cdot p(x|z) = p_z \cdot \mathcal{N}(x | \mu_{z}, \Sigma_{z}) p(x,z)=p(z)⋅p(x∣z)=pz⋅N(x∣μz,Σz)
-
p ( z ∣ x ) = p ( x , z ) p ( x ) = p z ⋅ N ( x ∣ μ z , Σ z ) ∑ k = 1 K p z ⋅ N ( x ∣ μ z , Σ z ) p(z|x) = \frac{p(x,z)}{p(x)} = \frac{p_z \cdot \mathcal{N}(x | \mu_{z}, \Sigma_{z})}{\sum_{k=1}^K p_z \cdot \mathcal{N}(x | \mu_{z}, \Sigma_{z})} p(z∣x)=p(x)p(x,z)=∑k=1Kpz⋅N(x∣μz,Σz)pz⋅N(x∣μz,Σz)
Solve by MLE
θ ^ M L E = argmax θ p ( X ) = argmax θ log p ( X ) = argmax θ ∑ i = 1 N log p ( x i ) = argmax θ ∑ i = 1 N log [ ∑ i = 1 K p k ⋅ N ( x i ∣ μ k , Σ k ) ] \begin{aligned} \hat{\theta}_{MLE} &= \underset{\theta}{\operatorname{argmax}} p(X) \\ &=\underset{\theta}{\operatorname{argmax}} \log p(X) \\ &=\underset{\theta}{\operatorname{argmax}} \sum_{i=1}^{N} \log p\left(x_{i}\right) \\ &=\underset{\theta}{\operatorname{argmax}} \sum_{i=1}^{N} \log \, [\sum_{i=1}^{K} p_{k} \cdot \mathcal{N}\left(x_{i} | \mu_{k}, \Sigma_{k}\right)] \end{aligned} θ^MLE=θargmaxp(X)=θargmaxlogp(X)=θargmaxi=1∑Nlogp(xi)=θa