单变量高斯分布的概率密度函数(Probability Density Function,PDF)定义如下:
f
(
x
)
=
1
2
π
σ
2
e
x
p
{
−
(
x
−
μ
)
2
2
σ
2
}
f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\}
f(x)=2πσ21exp{−2σ2(x−μ)2}
上式表示变量
x
∼
N
(
μ
,
σ
2
)
x\sim \mathcal{N}(\mu,\sigma^2)
x∼N(μ,σ2),
μ
\mu
μ是均值,
σ
2
\sigma^2
σ2是方差。
单变量高斯分布的KL散度
K
L
(
p
,
q
)
=
∫
p
(
x
)
log
p
(
x
)
q
(
x
)
d
x
=
−
∫
p
(
x
)
log
q
(
x
)
d
x
+
∫
p
(
x
)
log
p
(
x
)
d
x
=
∫
p
(
x
)
(
−
1
2
log
2
π
−
log
σ
1
−
(
x
−
μ
1
)
2
2
σ
1
2
+
1
2
log
2
π
+
log
σ
2
+
(
x
−
μ
2
)
2
2
σ
2
2
)
d
x
=
∫
p
(
x
)
(
log
σ
2
σ
1
−
(
x
−
μ
1
)
2
2
σ
1
2
+
(
x
−
μ
2
)
2
2
σ
2
2
)
d
x
=
log
σ
2
σ
1
−
∫
p
(
x
)
(
(
x
−
μ
1
)
2
2
σ
1
2
)
d
x
+
∫
p
(
x
)
(
(
x
−
μ
2
)
2
2
σ
2
2
)
d
x
由于
σ
1
2
=
∫
p
(
x
)
(
x
−
μ
1
)
2
d
x
=
log
σ
2
σ
1
−
1
2
+
1
2
σ
2
2
∫
p
(
x
)
(
x
−
μ
1
+
μ
1
−
μ
2
)
2
d
x
=
log
σ
2
σ
1
−
1
2
+
1
2
σ
2
2
{
∫
p
(
x
)
(
x
−
μ
1
)
2
d
x
+
∫
p
(
x
)
(
μ
1
−
μ
2
)
2
d
x
+
2
∫
p
(
x
)
(
x
−
μ
1
)
(
μ
1
−
μ
2
)
d
x
}
=
log
σ
2
σ
1
−
1
2
+
1
2
σ
2
2
{
∫
p
(
x
)
(
x
−
μ
1
)
2
d
x
+
(
μ
1
−
μ
2
)
2
}
=
1
2
log
(
2
π
σ
2
2
)
+
σ
1
2
+
(
μ
1
−
μ
2
)
2
2
σ
2
2
−
1
2
(
1
+
log
2
π
σ
1
2
)
=
log
σ
2
σ
1
+
σ
1
2
+
(
μ
1
−
μ
2
)
2
2
σ
2
2
−
1
2
=
1
2
(
log
σ
2
2
σ
1
2
−
1
+
σ
1
2
+
(
μ
2
−
μ
1
)
2
σ
2
2
)
\begin{aligned} &\quad KL(p, q) = \int p(x) \log \frac{p(x)}{q(x)} dx=- \int p(x) \log q(x) dx + \int p(x) \log p(x) dx\\ &=\int p(x)(-\frac{1}{2}\log 2\pi-\log \sigma_1-\frac{(x-\mu_1)^2}{2\sigma_1^2}+\frac{1}{2}\log 2\pi+\log \sigma_2+\frac{(x-\mu_2)^2}{2\sigma_2^2})dx\\ &=\int p(x)(\log \frac{\sigma_2}{\sigma_1}-\frac{(x-\mu_1)^2}{2\sigma_1^2}+\frac{(x-\mu_2)^2}{2\sigma_2^2})dx\\ &=\log \frac{\sigma_2}{\sigma_1}-\int p(x)(\frac{(x-\mu_1)^2}{2\sigma_1^2})dx+\int p(x)(\frac{(x-\mu_2)^2}{2\sigma_2^2})dx \quad \text{由于$\sigma_1^2=\int p(x)(x-\mu_1)^2dx$}\\ &=\log \frac{\sigma_2}{\sigma_1}-\frac{1}{2}+\frac{1}{2\sigma_2^2}\int p(x)(x-\mu_1+\mu_1-\mu_2)^2dx\\ &=\log \frac{\sigma_2}{\sigma_1}-\frac{1}{2}+\frac{1}{2\sigma_2^2}\{\int p(x)(x-\mu_1)^2dx+\int p(x)(\mu_1-\mu_2)^2dx+2\int p(x)(x-\mu_1)(\mu_1-\mu_2)dx\}\\ &=\log \frac{\sigma_2}{\sigma_1}-\frac{1}{2}+\frac{1}{2\sigma_2^2}\{\int p(x)(x-\mu_1)^2dx+(\mu_1-\mu_2)^2\}\\ &=\frac{1}{2} \log (2 \pi \sigma_2^2) + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2} (1 + \log 2 \pi \sigma_1^2)\\ &= \log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2} =\frac{1}{2}(\log \frac{\sigma_2^2}{\sigma_1^2}-1+\frac{\sigma_1^2 +(\mu_2 - \mu_1)^2}{\sigma_2^2}) \end{aligned}
KL(p,q)=∫p(x)logq(x)p(x)dx=−∫p(x)logq(x)dx+∫p(x)logp(x)dx=∫p(x)(−21log2π−logσ1−2σ12(x−μ1)2+21log2π+logσ2+2σ22(x−μ2)2)dx=∫p(x)(logσ1σ2−2σ12(x−μ1)2+2σ22(x−μ2)2)dx=logσ1σ2−∫p(x)(2σ12(x−μ1)2)dx+∫p(x)(2σ22(x−μ2)2)dx由于σ12=∫p(x)(x−μ1)2dx=logσ1σ2−21+2σ221∫p(x)(x−μ1+μ1−μ2)2dx=logσ1σ2−21+2σ221{∫p(x)(x−μ1)2dx+∫p(x)(μ1−μ2)2dx+2∫p(x)(x−μ1)(μ1−μ2)dx}=logσ1σ2−21+2σ221{∫p(x)(x−μ1)2dx+(μ1−μ2)2}=21log(2πσ22)+2σ22σ12+(μ1−μ2)2−21(1+log2πσ12)=logσ1σ2+2σ22σ12+(μ1−μ2)2−21=21(logσ12σ22−1+σ22σ12+(μ2−μ1)2)
最后一行等于0当且仅当
μ
1
=
μ
2
\mu_1=\mu_2
μ1=μ2 和
σ
1
=
σ
2
\sigma_1=\sigma_2
σ1=σ2.
参考:
[1]: https://stats.stackexchange.com/questions/7440/kl-divergence-between-two-univariate-gaussians
[2]: https://www.cnblogs.com/huangshiyu13/p/6898212.html