两个多维高斯分布的KL散度公式推导

两个多维高斯分布的KL散度公式推导是变分自编码器中损失函数的一部分,目标是将KL散度表示成两个高斯分布的均值和方差。

一维正态分布的均值与方差公式推导

一维正态分布的概率密度函数为:

\displaystyle f(x) = \frac{1}{\sqrt{2\pi}\sigma}exp{(-\frac{(x-\mu)^2}{2\sigma^2})}

一维正态分布的均值为:

\begin{aligned} E(x)&=\int_{-\infty}^{+\infty}xf(x)dx\\ &= \int_{-\infty}^{+\infty}x\frac{1}{\sqrt{2\pi}\sigma}exp{(-\frac{(x-\mu)^2}{2\sigma^2})}dx\\ &= \int_{-\infty}^{+\infty}(x-\mu)\frac{1}{\sqrt{2\pi}\sigma}exp{(-\frac{(x-\mu)^2}{2\sigma^2})}dx+\mu\int_{-\infty}^{+\infty}\frac{1}{\sqrt{2\pi}\sigma}exp{(-\frac{(x-\mu)^2}{2\sigma^2})}dx\\ &\overset{t=(x-\mu)}{=}\int_{-\infty}^{+\infty}t\frac{1}{\sqrt{2\pi}\sigma}exp{(-\frac{t^2}{2\sigma^2})}dt+\mu\int_{-\infty}^{+\infty}f(x)dx\\ &=\mu \end{aligned}

一维正态分布的方差为:

\begin{aligned} D(x)&=E[(x-\mu)^2]\\ &=\int_{-\infty}^{+\infty}(x-\mu)^2 \frac{1}{\sqrt{2\pi}\sigma}exp{(-\frac{(x-\mu)^2}{2\sigma^2})}dx\\ &\overset{t=(x-\mu)}{=}\frac{1}{\sqrt{2\pi}\sigma}\int_{-\infty}^{+\infty}t^2exp{(-\frac{t^2}{2\sigma^2})}dt \end{aligned}

因为

\displaystyle \begin{aligned} \int_{-\infty}^{+\infty}f(x)dx& =\frac{1}{\sqrt{2\pi}\sigma} \int_{-\infty}^{+\infty}exp{(-\frac{(x-\mu)^2}{2\sigma^2})}dx\\& \overset{t=(x-\mu)}{=}\frac{1}{\sqrt{2\pi}\sigma} \int_{-\infty}^{+\infty}exp{(-\frac{t^2}{2\sigma^2})}dt \\&=\frac{1}{\sqrt{2\pi}\sigma}(t\times exp{(-\frac{t^2}{2\sigma^2})}|_{-\infty}^{+\infty}- \int_{-\infty}^{+\infty}tdexp{(-\frac{t^2}{2\sigma^2})})\\ &= \int_{-\infty}^{+\infty}t^2\frac{1}{\sqrt{2\pi}\sigma^3}exp{(-\frac{t^2}{2\sigma^2})}dt\\ &=1 \end{aligned}

D(x)=\sigma ^2

两个多维高斯分布的KL散度

\begin{aligned} D_{KL}(q(z)||p(z)) &= \int_{z}q(z)\log{\frac{q(z)}{p(z)}}dz\\ &=\int_{z}q(z)\log{q(z)}-q(z)\log{p(z)}dz\\ &=\int_{z}q(z)\log{q(z)}dz-\int_{z}q(z)\log{p(z)}dz \end{aligned}

假定隐变量z是高维变量,维数为M。两个多元高斯分布各变量相互独立,q(z)\sim N(z;\mu_1,\Sigma_1)p(z)\sim N(z;\mu_2,\Sigma_2)

先计算KL散度中的第一项,

\begin{aligned} \int_{z}q(z)\log{q(z)}dz &= \int_{z}N(z;\mu_1,\Sigma_1)\log{N(z;\mu_1,\Sigma_1)}dz\\ &=\sum_{m=1}^{M}E_{q(z_m)}\log { (\frac{1}{\sqrt{2\pi\sigma_{1,m}^2}}exp{(-\frac{(z_m-\mu_{1,m})^2}{2\sigma_{1,m}^2})})}\\ &=\sum_{m=1}^{M}E_{q(z_m)}(-\frac{1}{2}\log2\pi\sigma_{1,m}^2-\frac{(z_m-\mu_{1,m})^2}{2\sigma_{1,m}^2})\\ &=\sum_{m=1}^M[(-\frac{1}{2}\log2\pi\sigma_{1,m}^2)-E_{q(z_m)}\frac{(z_m-\mu_{1,m})^2}{2\sigma_{1,m}^2}]\\ &=-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M\log{\sigma_{1,m}^2}-\frac{1}{2}\sum_{m=1}^ME_{q(z_m)}\frac{(z_m-\mu_{1,m})^2}{\sigma_{1,m}^2}\\ &=-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M\log{\sigma_{1,m}^2}-\frac{1}{2}\sum_{m=1}^M\frac{1}{\sigma_{1,m}^2}E_{q(z_m)}(z_m-\mu_{1,m})^2\\ &=-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M\log{\sigma_{1,m}^2}-\frac{1}{2}\sum_{m=1}^M\frac{1}{\sigma_{1,m}^2}\sigma_{1,m}^2\\ &=-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M(\log{\sigma_{1,m}^2+1)} \end{aligned}

接着计算KL散度的第二项

\begin{aligned} \int _{z}q(z)\log{p(z)}dz &= \int _{z} N(z;\mu_1,\Sigma_1)\log{N(z;\mu_2,\Sigma_2)}dz\\ &=\sum_{m=1}^{M}E_{q(z_m)}\log { (\frac{1}{\sqrt{2\pi\sigma_{2,m}^2}}exp{(-\frac{(z_m-\mu_{2,m})^2}{2\sigma_{2,m}^2})})}\\ &=\sum_{m=1}^M[(-\frac{1}{2}\log2\pi\sigma_{2,m}^2)-E_{q(z_m)}\frac{(z_m-\mu_{2,m})^2}{2\sigma_{2,m}^2}]\\ &=-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M\log{\sigma_{2,m}^2}-\frac{1}{2}\sum_{m=1}^M\frac{1}{\sigma_{2,m}^2}E_{q(z_m)}(z_m-\mu_{2,m})^2\\ &=-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M\log{\sigma_{2,m}^2}-\frac{1}{2}\sum_{m=1}^M\frac{1}{\sigma_{2,m}^2}E_{q(z_m)}(z_m^2-2z_m\mu_{2,m}+\mu_{2,m}^2)\\ &=-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M\log{\sigma_{2,m}^2}-\frac{1}{2}\sum_{m=1}^M\frac{1}{\sigma_{2,m}^2}(\sigma_{1,m}^2+\mu_{1,m}^2-2\mu_{1,m}\mu_{2,m}+\mu_{2,m}^2)\\ &=-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M\log{\sigma_{2,m}^2}-\frac{1}{2}\sum_{m=1}^M\frac{1}{\sigma_{2,m}^2}[\sigma_{1,m}^2+(\mu_{1,m}-\mu_{2,m})^2]\\ &=-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M\log{\sigma_{2,m}^2}-\frac{1}{2}\sum_{m=1}^M(\frac{\sigma_{1,m}^2}{\sigma_{2,m}^2}+\frac{(\mu_{1,m}-\mu_{2,m})^2}{\sigma_{2,m}^2})\\ \end{aligned}

其中E_{q(z_m)}z_m^2=\sigma_{1,m}^2+\mu_{1,m}^2,因为D(x) = E(x^2)-[E(x)]^2\sigma_{1,m}^2 =E_{q(z_m)}z_m^2-\mu_{1,m}^2。而E_{q(z_m)}z_m=\mu_{1,m}

将两式合并可得

\displaystyle D_{KL}(q(z)||p(z))\\= \int_{z}q(z)\log{q(z)}dz-\int_{z}q(z)\log{p(z)}dz\\= -\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M(\log{\sigma_{1,m}^2+1)}-[-\frac{M}{2}\log{2\pi}-\frac{1}{2}\sum_{m=1}^M\log{\sigma_{2,m}^2}-\frac{1}{2}\sum_{m=1}^M(\frac{\sigma_{1,m}^2}{\sigma_{2,m}^2}+\frac{(\mu_{1,m}-\mu_{2,m})^2}{\sigma_{2,m}^2})]\\ =\frac{1}{2}\sum_{m=1}^{M}[\log{\frac{\sigma_{2,m}^2}{\sigma_{1,m}^2}}+\frac{\sigma_{1,m}^2}{\sigma_{2,m}^2}+\frac{(\mu_{1,m}-\mu_{2,m})^2}{\sigma_{2,m}^2}-1]

在变分自编码器中,q(z)\sim N(z;\mu_I,\sigma_I^2I),p(z)\sim N(z;0,I),故

\begin{aligned} \displaystyle D_{KL}(q(z)||p(z))&=\frac{1}{2}\sum_{m=1}^{M}(-\log{\sigma_{I}^2}+\sigma_{I}^2+\mu_I^2-1)\\ &=\frac{1}{2}(-\log{|\sigma_{I}^2I|}+tr(\sigma_{I}^2I)+\mu_I^T\mu_I-M) \end{aligned}

参考资料

两个多变量高斯分布之间的KL散度icon-default.png?t=N6B9https://zhuanlan.zhihu.com/p/55778595https://github.com/jojonki/AutoEncoders/blob/master/kl_divergence_between_two_gaussians.pdficon-default.png?t=N6B9https://github.com/jojonki/AutoEncoders/blob/master/kl_divergence_between_two_gaussians.pdf

  • 0
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值