高斯分布的KLD总结


前置基础

矩阵内积

给定两个 m × n m\times n m×n的矩阵 A \mathbf A A B \mathbf B B,其矩阵内积(也称为Frobenius inner product)定义为:
< A , B > = ∑ i = 1 m ∑ j = 1 n a i j b i j = t r ( A T B ) <\mathbf A,\mathbf B>=\sum_{i=1}^{m}\sum_{j=1}^{n}a_{ij}b_{ij}=tr(\mathbf A^T\mathbf B) <A,B>=i=1mj=1naijbij=tr(ATB)

分布与期望

给定随机变量 X ∼ N ( μ , Σ ) X\sim\mathcal{N}(\boldsymbol \mu,\boldsymbol \Sigma) XN(μ,Σ),则有如下等式成立:

  1. E [ x x T ] = Σ + μ μ T E[\mathbf x\mathbf x^T]=\boldsymbol \Sigma+\boldsymbol \mu\boldsymbol \mu^T E[xxT]=Σ+μμT
  2. E [ x T A x ] = t r ( A Σ ) + μ T A μ E[\mathbf x^T\mathbf A\mathbf x]=tr(\mathbf A\mathbf \Sigma)+\boldsymbol \mu^T\mathbf A\boldsymbol \mu E[xTAx]=tr()+μTAμ
    p r o o f : proof: proof: E [ x T A x ] = E [ t r ( x T A x ) ] = E [ t r ( A x x T ) ] = t r ( A E [ x x T ] ) = t r ( A ( Σ + μ μ T ) ) = t r ( A Σ ) + μ T A μ E[\mathbf x^T\mathbf A\mathbf x]=E[tr(\mathbf x^T\mathbf A\mathbf x)]=E[tr(\mathbf A\mathbf x\mathbf x^T)]=tr(\mathbf AE[\mathbf x\mathbf x^T])=tr(\mathbf A(\mathbf \Sigma+\boldsymbol \mu\boldsymbol \mu^T))=tr(\mathbf A\mathbf \Sigma)+\boldsymbol \mu^T\mathbf A\boldsymbol \mu E[xTAx]=E[tr(xTAx)]=E[tr(AxxT)]=tr(AE[xxT])=tr(A(Σ+μμT))=tr()+μTAμ
  3. E [ ( x − μ 1 ) T A ( x − μ 1 ) ] = t r ( A Σ ) + ( μ − μ 1 ) T A ( μ − μ 1 ) E[(\mathbf x-\boldsymbol \mu_1)^T\mathbf A(\mathbf x-\boldsymbol \mu_1)]=tr(\mathbf A\mathbf \Sigma)+(\boldsymbol \mu-\boldsymbol \mu_1)^T\mathbf A(\boldsymbol \mu-\boldsymbol \mu_1) E[(xμ1)TA(xμ1)]=tr()+(μμ1)TA(μμ1)

KLD定义

给定两个连续时间概率分布的概率密度函数分别为 p ( x ) p(x) p(x) q ( x ) q(x) q(x),其KLD定义为:
D K L ( P ∣ ∣ Q ) = ∫ p ( x ) l o g ( p ( x ) q ( x ) ) d x D_{KL}(P||Q)=\int p(x)log(\frac{p(x)}{q(x)})dx DKL(P∣∣Q)=p(x)log(q(x)p(x))dx
对于离散变量,给定两个概率分布 P ( x ) P(x) P(x) Q ( x ) Q(x) Q(x),KLD定义为:
D K L ( P ∣ ∣ Q ) = ∑ x P ( x ) l o g ( P ( x ) Q ( x ) ) D_{KL}(P||Q)=\sum_x P(x)log(\frac{P(x)}{Q(x)}) DKL(P∣∣Q)=xP(x)log(Q(x)P(x))

一元高斯分布

假设连续时间的两个分布均为高斯分布,其中 P P P分布均值 μ 1 \mu_1 μ1,方差为 σ 1 \sigma_1 σ1 Q Q Q分布均值 μ 2 \mu_2 μ2,方差为 σ 2 \sigma_2 σ2,则可以推导对应的KLD:
D K L ( P ∣ ∣ Q ) = ∫ p ( x ) l o g ( p ( x ) q ( x ) ) d x = ∫ 1 2 π σ 1 2 e − ( x − μ 1 ) 2 2 σ 1 2 [ l o g ( σ 2 σ 1 ) − ( x − μ 1 ) 2 2 σ 1 2 + ( x − μ 2 ) 2 2 σ 2 2 ] d x = l o g ( σ 2 σ 1 ) − V a r ( x ) 2 σ 1 2 + V a r ( x ) + ( μ 1 − μ 2 ) 2 2 σ 2 2 = l o g ( σ 2 σ 1 ) − 1 2 + σ 1 2 + ( μ 1 − μ 2 ) 2 2 σ 2 2 \begin{equation} \begin{aligned} D_{KL}(P||Q)&=\int p(x)log(\frac{p(x)}{q(x)})dx \\ &=\int \frac{1}{\sqrt{2\pi\sigma_1^2}}e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}}[log(\frac{\sigma_2}{\sigma_1})-\frac{(x-\mu_1)^2}{2\sigma_1^2}+\frac{(x-\mu_2)^2}{2\sigma_2^2}]dx \\ &=log(\frac{\sigma_2}{\sigma_1})-\frac{Var(x)}{2\sigma_1^2}+\frac{Var(x)+(\mu_1-\mu_2)^2}{2\sigma_2^2} \\ &=log(\frac{\sigma_2}{\sigma_1})-\frac{1}{2}+\frac{\sigma_1^2+(\mu_1-\mu_2)^2}{2\sigma_2^2} \end{aligned} \notag \end{equation} DKL(P∣∣Q)=p(x)log(q(x)p(x))dx=2πσ12 1e2σ12(xμ1)2[log(σ1σ2)2σ12(xμ1)2+2σ22(xμ2)2]dx=log(σ1σ2)2σ12Var(x)+2σ22Var(x)+(μ1μ2)2=log(σ1σ2)21+2σ22σ12+(μ1μ2)2

多元高斯分布

对于 n n n维随机变量 X X X,假设 P P P Q Q Q分别满足 N ( μ 1 , Σ 1 ) \mathcal{N}(\boldsymbol \mu_1,\boldsymbol \Sigma_1) N(μ1,Σ1) N ( μ 2 , Σ 2 ) \mathcal{N}(\boldsymbol \mu_2,\boldsymbol \Sigma_2) N(μ2,Σ2)的分布,则其KLD推导如下:
D K L ( P ∣ ∣ Q ) = ∫ R n p ( x ) l o g ( p ( x ) q ( x ) ) d x = ∫ p ( x ) l o g ( p ( x ) ) d x − ∫ p ( x ) l o g ( q ( x ) ) d x = − 1 2 ( l o g ( 2 π n ∣ Σ 1 ∣ ) + E [ ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) ] ) + 1 2 ( l o g ( 2 π n ∣ Σ 2 ∣ ) + E [ ( x − μ 2 ) T Σ 2 − 1 ( x − μ 2 ) ] ) = 1 2 [ l o g ( ∣ Σ 2 ∣ ∣ Σ 1 ∣ ) − n + t r ( Σ 2 − 1 Σ 1 ) + ( μ 1 − μ 2 ) T Σ 2 − 1 ( μ 1 − μ 2 ) ] = 1 2 [ < Σ 2 − 1 , Σ 1 > + ∣ ∣ μ 1 − μ 2 ∣ ∣ Σ 2 − 1 − l o g ( ∣ Σ 2 ∣ − 1 ∣ Σ 1 ∣ ) − n ] \begin{equation} \begin{aligned} &D_{KL}(P||Q)\\ =&\int_{\mathbb{R}^n} p(\mathbf x)log(\frac{p(\mathbf x)}{q(\mathbf x)})d\mathbf x \\ =&\int p(\mathbf x)log(p(\mathbf x))d\mathbf x-\int p(\mathbf x)log(q(\mathbf x))d\mathbf x \\ =&-\frac{1}{2}(log(2\pi^n|\boldsymbol \Sigma_1|)+E[(\mathbf x-\boldsymbol \mu_1)^T\boldsymbol \Sigma_1^{-1}(\mathbf x-\boldsymbol \mu_1)])+\frac{1}{2}(log(2\pi^n|\boldsymbol \Sigma_2|)+E[(\mathbf x-\boldsymbol \mu_2)^T\boldsymbol \Sigma_2^{-1}(\mathbf x-\boldsymbol \mu_2)]) \\ =&\frac{1}{2}[log(\frac{|\boldsymbol \Sigma_2|}{|\boldsymbol \Sigma_1|})-n+tr(\boldsymbol \Sigma_2^{-1}\boldsymbol \Sigma_1)+(\boldsymbol \mu_1-\boldsymbol \mu_2)^T\mathbf \Sigma_2^{-1}(\boldsymbol \mu_1-\boldsymbol \mu_2)]\\ =&\frac{1}{2}[<\boldsymbol \Sigma_2^{-1},\boldsymbol \Sigma_1>+||\boldsymbol \mu_1-\boldsymbol \mu_2||_{\mathbf \Sigma_2^{-1}}-log(|\mathbf \Sigma_2|^{-1}|\mathbf \Sigma_1|)-n] \end{aligned} \notag \end{equation} =====DKL(P∣∣Q)Rnp(x)log(q(x)p(x))dxp(x)log(p(x))dxp(x)log(q(x))dx21(log(2πnΣ1)+E[(xμ1)TΣ11(xμ1)])+21(log(2πnΣ2)+E[(xμ2)TΣ21(xμ2)])21[log(Σ1Σ2)n+tr(Σ21Σ1)+(μ1μ2)TΣ21(μ1μ2)]21[<Σ21,Σ1>+∣∣μ1μ2Σ21log(Σ21Σ1)n]

测试验证

% Generate sample data
% case 1
mu_p = [0.5, 1.0]';
sigma_p = diag([1.2, 0.8]);
mu_q = [0.5, 1.0]';
sigma_q = diag([1.2, 0.8]);

% case 2
% mu_p = [0.5, 1.0]';
% sigma_p = diag([1.2 0.8]);
% mu_q = [0.0, 1.5]';
% sigma_q = diag([0.9, 1.1]);

% Calculate KL divergence
kld = cal_KLD(mu_p, sigma_p, mu_q, sigma_q);

% Print the result
disp(['KL divergence: ', num2str(kl_loss)]);

% case 1 output: 0
% case 2 output: 0.44175

function kld = cal_KLD(mu_p, sigma_p, mu_q, sigma_q)
    eps = 1e-8;
    
    sigma_p = sigma_p .^ 2;
    sigma_q = sigma_q .^ 2;
    sigma_p_det = det(sigma_p);
    sigma_q_det = det(sigma_q);
    
    sigma_q_inv = inv(sigma_q);
    delta_u = (mu_q - mu_p);
    term1 = trace(sigma_q \ sigma_p);
    term2 = delta_u' * sigma_q_inv * delta_u;
    term3 = - length(mu_p);
    term4 = log(sigma_q_det + eps) - log(sigma_p_det + eps);
    kld = 0.5 * (term1 + term2 + term3 + term4);
    kld = max(kld, 0);
end

参考网址

[1] 两个高斯分布KL散度推导
[2] 多元高斯分布间的KL散度及其Pytorch实现
[3] 多变量高斯分布之间的KL散度
[4] 矩阵内积

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值