两个高斯分布的KL散度其实很简单,只要找到合适的方法。
一. 一维高斯分布
KL散度的定义为:
K L ( N ( μ 1 , σ 1 2 ) ∣ ∣ N ( μ 2 , σ 2 2 ) ) = ∫ x 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 log 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 1 2 π σ 2 e − ( x − μ 2 ) 2 2 σ 2 2 d x = ∫ x 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 [ log σ 2 σ 1 − ( x − μ 1 ) 2 2 σ 1 2 + ( x − μ 2 ) 2 2 σ 2 2 ] d x \begin{aligned} KL(\mathcal{N}(\mu_1, \sigma_1^2) || \mathcal{N}(\mu_2, \sigma_2^2)) &= \int_x \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} \log \frac{\frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}}}{\frac{1}{\sqrt{2\pi}\sigma_2} e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}}} dx \\ &= \int_x \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} \Bigg[ \log \frac{\sigma_2}{\sigma_1} - \frac{(x-\mu_1)^2}{2\sigma_1^2} + \frac{(x-\mu_2)^2}{2\sigma_2^2} \Bigg] dx \end{aligned} KL(N(μ1,σ12)∣∣N(μ2,σ22))=∫x2πσ11e−2σ12(x−μ1)2log2πσ21e−2σ22(x−μ2)22πσ11e−2σ12(x−μ1)2dx=∫x2πσ11e−2σ12(x−μ1)2[logσ1σ2−2σ12(x−μ1)2+2σ22(x−μ2)2]dx
第一项很简单,用全积分为1的性质即可:
log σ 2 σ 1 ∫ x 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = log σ 2 σ 1 \begin{aligned} \log \frac{\sigma_2}{\sigma_1} \int_x \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx = \log \frac{\sigma_2}{\sigma_1} \end{aligned} logσ1σ2∫x2πσ11e−2σ12(x−μ1)2dx=logσ1σ2
第二项需要分辨出积分项为方差:
− 1 2 σ 1 2 ∫ x ( x − μ 1 ) 2 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = − 1 2 σ 1 2 σ 1 2 = − 1 2 \begin{aligned} -\frac{1}{2\sigma_1^2} \int_x (x-\mu_1)^2 \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx = -\frac{1}{2\sigma_1^2} \sigma_1^2 = -\frac{1}{2} \end{aligned} −2σ121∫x(x−μ1)22πσ11e−2σ12(x−μ1)2dx=−2σ121σ12=−21
第三项的积分内部分别是均方值、均值和常数,因此可以得到:
1 2 σ 2 2 ∫ x ( x − μ 2 ) 2 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = 1 2 σ 2 2 ∫ x ( x 2 − 2 μ 2 x + μ 2 2 ) 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = σ 1 2 + μ 1 2 − 2 μ 1 μ 2 + μ 2 2 2 σ 2 2 = σ 1 2 + ( μ 1 − μ 2 ) 2 2 σ 2 2 \begin{aligned} \frac{1}{2\sigma_2^2} \int_x (x-\mu_2)^2 \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx &= \frac{1}{2\sigma_2^2} \int_x ( x^2 - 2\mu_2 x + \mu_2^2 ) \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx \\ &= \frac{\sigma_1^2 + \mu_1^2 - 2 \mu_1 \mu_2+ \mu_2^2}{2\sigma_2^2} = \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma_2^2} \end{aligned} 2σ221∫x(x−μ2)22πσ11e−2σ12(x−μ1)2dx=2σ2