两个高斯分布的KL散度其实很简单,只要找到合适的方法。
一. 一维高斯分布
KL散度的定义为:
K
L
(
N
(
μ
1
,
σ
1
2
)
∣
∣
N
(
μ
2
,
σ
2
2
)
)
=
∫
x
1
2
π
σ
1
e
−
(
x
−
μ
1
)
2
2
σ
1
2
log
1
2
π
σ
1
e
−
(
x
−
μ
1
)
2
2
σ
1
2
1
2
π
σ
2
e
−
(
x
−
μ
2
)
2
2
σ
2
2
d
x
=
∫
x
1
2
π
σ
1
e
−
(
x
−
μ
1
)
2
2
σ
1
2
[
log
σ
2
σ
1
−
(
x
−
μ
1
)
2
2
σ
1
2
+
(
x
−
μ
2
)
2
2
σ
2
2
]
d
x
\begin{aligned} KL(\mathcal{N}(\mu_1, \sigma_1^2) || \mathcal{N}(\mu_2, \sigma_2^2)) &= \int_x \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} \log \frac{\frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}}}{\frac{1}{\sqrt{2\pi}\sigma_2} e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}}} dx \\ &= \int_x \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} \Bigg[ \log \frac{\sigma_2}{\sigma_1} - \frac{(x-\mu_1)^2}{2\sigma_1^2} + \frac{(x-\mu_2)^2}{2\sigma_2^2} \Bigg] dx \end{aligned}
KL(N(μ1,σ12)∣∣N(μ2,σ22))=∫x2πσ11e−2σ12(x−μ1)2log2πσ21e−2σ22(x−μ2)22πσ11e−2σ12(x−μ1)2dx=∫x2πσ11e−2σ12(x−μ1)2[logσ1σ2−2σ12(x−μ1)2+2σ22(x−μ2)2]dx
第一项很简单,用全积分为1的性质即可:
log
σ
2
σ
1
∫
x
1
2
π
σ
1
e
−
(
x
−
μ
1
)
2
2
σ
1
2
d
x
=
log
σ
2
σ
1
\begin{aligned} \log \frac{\sigma_2}{\sigma_1} \int_x \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx = \log \frac{\sigma_2}{\sigma_1} \end{aligned}
logσ1σ2∫x2πσ11e−2σ12(x−μ1)2dx=logσ1σ2
第二项需要分辨出积分项为方差:
−
1
2
σ
1
2
∫
x
(
x
−
μ
1
)
2
1
2
π
σ
1
e
−
(
x
−
μ
1
)
2
2
σ
1
2
d
x
=
−
1
2
σ
1
2
σ
1
2
=
−
1
2
\begin{aligned} -\frac{1}{2\sigma_1^2} \int_x (x-\mu_1)^2 \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx = -\frac{1}{2\sigma_1^2} \sigma_1^2 = -\frac{1}{2} \end{aligned}
−2σ121∫x(x−μ1)22πσ11e−2σ12(x−μ1)2dx=−2σ121σ12=−21
第三项的积分内部分别是均方值、均值和常数,因此可以得到:
1
2
σ
2
2
∫
x
(
x
−
μ
2
)
2
1
2
π
σ
1
e
−
(
x
−
μ
1
)
2
2
σ
1
2
d
x
=
1
2
σ
2
2
∫
x
(
x
2
−
2
μ
2
x
+
μ
2
2
)
1
2
π
σ
1
e
−
(
x
−
μ
1
)
2
2
σ
1
2
d
x
=
σ
1
2
+
μ
1
2
−
2
μ
1
μ
2
+
μ
2
2
2
σ
2
2
=
σ
1
2
+
(
μ
1
−
μ
2
)
2
2
σ
2
2
\begin{aligned} \frac{1}{2\sigma_2^2} \int_x (x-\mu_2)^2 \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx &= \frac{1}{2\sigma_2^2} \int_x ( x^2 - 2\mu_2 x + \mu_2^2 ) \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx \\ &= \frac{\sigma_1^2 + \mu_1^2 - 2 \mu_1 \mu_2+ \mu_2^2}{2\sigma_2^2} = \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma_2^2} \end{aligned}
2σ221∫x(x−μ2)22πσ11e−2σ12(x−μ1)2dx=2σ221∫x(x2−2μ2x+μ22)2πσ11e−2σ12(x−μ1)2dx=2σ22σ12+μ12−2μ1μ2+μ22=2σ22σ12+(μ1−μ2)2
也可以用一个小技巧来化简,其中第一项为方差,第二项为奇函数全积分为0,第三项为常数可以提取为系数:
1
2
σ
2
2
∫
x
(
x
−
μ
2
)
2
1
2
π
σ
1
e
−
(
x
−
μ
1
)
2
2
σ
1
2
d
x
=
1
2
σ
2
2
∫
x
[
(
x
−
μ
1
)
2
+
2
(
μ
1
−
μ
2
)
(
x
−
μ
1
)
+
(
μ
1
−
μ
2
)
2
]
1
2
π
σ
1
e
−
(
x
−
μ
1
)
2
2
σ
1
2
d
x
=
σ
1
2
+
(
μ
1
−
μ
2
)
2
2
σ
2
2
\begin{aligned} \frac{1}{2\sigma_2^2} \int_x (x-\mu_2)^2 \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx &= \frac{1}{2\sigma_2^2} \int_x \big[ (x-\mu_1)^2 + 2(\mu_1 - \mu_2)(x - \mu_1) + (\mu_1 - \mu_2)^2 \big] \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx \\ &= \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma_2^2} \end{aligned}
2σ221∫x(x−μ2)22πσ11e−2σ12(x−μ1)2dx=2σ221∫x[(x−μ1)2+2(μ1−μ2)(x−μ1)+(μ1−μ2)2]2πσ11e−2σ12(x−μ1)2dx=2σ22σ12+(μ1−μ2)2
整理最终结果,两个高斯分布的KL散度为:
K
L
(
N
(
μ
1
,
σ
1
2
)
∣
∣
N
(
μ
2
,
σ
2
2
)
)
=
log
σ
2
σ
1
−
1
2
+
σ
1
2
+
(
μ
1
−
μ
2
)
2
2
σ
2
2
KL(\mathcal{N}(\mu_1, \sigma_1^2) || \mathcal{N}(\mu_2, \sigma_2^2)) = \log \frac{\sigma_2}{\sigma_1} -\frac{1}{2} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma_2^2}
KL(N(μ1,σ12)∣∣N(μ2,σ22))=logσ1σ2−21+2σ22σ12+(μ1−μ2)2
二. 多元高斯分布
N ( x ∣ μ , Σ ) = 1 ( 2 π ) K 2 ∣ Σ ∣ 1 2 e − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) \begin{aligned} \mathcal{N}(x | \mu, \Sigma) = \frac{1}{(2\pi)^\frac{K}{2} |\Sigma|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu)} \end{aligned} N(x∣μ,Σ)=(2π)2K∣Σ∣211e−21(x−μ)TΣ−1(x−μ)
K
L
(
N
(
x
∣
μ
1
,
Σ
1
)
∣
∣
N
(
x
∣
μ
2
,
Σ
2
)
)
=
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
log
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
1
(
2
π
)
K
2
∣
Σ
2
∣
1
2
e
−
1
2
(
x
−
μ
2
)
T
Σ
2
−
1
(
x
−
μ
2
)
d
x
1
⋯
d
x
K
=
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
−
1
(
x
−
μ
1
)
[
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
+
1
2
(
x
−
μ
2
)
T
Σ
2
−
1
(
x
−
μ
2
)
]
d
x
1
⋯
d
x
K
\begin{aligned} KL(\mathcal{N}(x | \mu_1, \Sigma_1) || \mathcal{N}(x | \mu_2, \Sigma_2)) &= \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} \log \frac{\frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)}}{\frac{1}{(2\pi)^\frac{K}{2} |\Sigma_2|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_2)^T \Sigma_2^{-1} (x - \mu_2)}} dx_1 \cdots dx_K \\ &= \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma^{-1} (x - \mu_1)} \Bigg[ \frac{1}{2} \log \frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1) + \frac{1}{2}(x - \mu_2)^T \Sigma_2^{-1} (x - \mu_2) \Bigg] dx_1 \cdots dx_K \end{aligned}
KL(N(x∣μ1,Σ1)∣∣N(x∣μ2,Σ2))=∫x1⋯∫xK(2π)2K∣Σ1∣211e−21(x−μ1)TΣ1−1(x−μ1)log(2π)2K∣Σ2∣211e−21(x−μ2)TΣ2−1(x−μ2)(2π)2K∣Σ1∣211e−21(x−μ1)TΣ1−1(x−μ1)dx1⋯dxK=∫x1⋯∫xK(2π)2K∣Σ1∣211e−21(x−μ1)TΣ−1(x−μ1)[21log∣Σ1∣∣Σ2∣−21(x−μ1)TΣ1−1(x−μ1)+21(x−μ2)TΣ2−1(x−μ2)]dx1⋯dxK
同样分别计算三项的结果,第一项:
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
d
x
1
⋯
d
x
K
=
1
2
log
∣
Σ
2
∣
∣
Σ
1
∣
\begin{aligned} \frac{1}{2} \log \frac{|\Sigma_2|}{|\Sigma_1|} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} dx_1 \cdots dx_K = \frac{1}{2} \log \frac{|\Sigma_2|}{|\Sigma_1|} \end{aligned}
21log∣Σ1∣∣Σ2∣∫x1⋯∫xK(2π)2K∣Σ1∣211e−21(x−μ1)TΣ1−1(x−μ1)dx1⋯dxK=21log∣Σ1∣∣Σ2∣
第二项:
−
1
2
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
d
x
1
⋯
d
x
K
\begin{aligned} &-\frac{1}{2} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1) dx_1 \cdots dx_K \\ \end{aligned}
−21∫x1⋯∫xK(2π)2K∣Σ1∣211e−21(x−μ1)TΣ1−1(x−μ1)(x−μ1)TΣ1−1(x−μ1)dx1⋯dxK
Σ
1
\Sigma_1
Σ1为半正定对称矩阵,设
Σ
1
−
1
=
U
T
U
\Sigma_1^{-1} = U^T U
Σ1−1=UTU,
y
=
U
(
x
−
μ
1
)
y = U(x - \mu_1)
y=U(x−μ1),由于线性变换矩阵就是雅克比矩阵,因此
d
y
1
⋯
d
y
K
=
∣
U
∣
d
x
1
⋯
d
x
K
dy_1 \cdots dy_K = |U| dx_1 \cdots dx_K
dy1⋯dyK=∣U∣dx1⋯dxK
由
∣
Σ
1
−
1
∣
=
∣
U
∣
2
|\Sigma_1^{-1}| = |U|^2
∣Σ1−1∣=∣U∣2,可知
∣
Σ
1
−
1
2
∣
=
∣
Σ
1
∣
−
1
2
=
∣
U
∣
|\Sigma_1^{-\frac{1}{2}}| = |\Sigma_1|^{-\frac{1}{2}} = |U|
∣Σ1−21∣=∣Σ1∣−21=∣U∣,因此
−
1
2
∣
Σ
1
∣
1
2
∫
y
1
⋯
∫
y
K
1
(
2
π
)
K
2
e
−
1
2
y
T
y
y
T
y
∣
U
∣
−
1
d
y
1
⋯
d
y
K
=
−
1
2
∣
Σ
1
∣
1
2
∣
Σ
1
∣
1
2
⋅
K
=
−
K
2
\begin{aligned} &-\frac{1}{2 |\Sigma_1|^{\frac{1}{2}}} \int_{y_1} \cdots \int_{y_K} \frac{1}{(2\pi)^\frac{K}{2} } e^{-\frac{1}{2} y^Ty} y^Ty |U|^{-1} dy_1 \cdots dy_K \\ &= -\frac{1}{2 |\Sigma_1|^{\frac{1}{2}}} |\Sigma_1|^{\frac{1}{2}} \cdot K = -\frac{K}{2} \end{aligned}
−2∣Σ1∣211∫y1⋯∫yK(2π)2K1e−21yTyyTy∣U∣−1dy1⋯dyK=−2∣Σ1∣211∣Σ1∣21⋅K=−2K
第三项需要利用一个小技巧:
x
T
A
x
=
t
r
(
A
x
x
T
)
x^T A x = tr(A xx^T)
xTAx=tr(AxxT)
1
2
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
(
x
−
μ
2
)
T
Σ
2
−
1
(
x
−
μ
2
)
d
x
1
⋯
d
x
K
=
1
2
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
t
r
[
Σ
2
−
1
(
x
−
μ
2
)
(
x
−
μ
2
)
T
]
d
x
1
⋯
d
x
K
=
1
2
t
r
[
Σ
2
−
1
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
(
x
−
μ
2
)
(
x
−
μ
2
)
T
]
d
x
1
⋯
d
x
K
=
1
2
t
r
[
Σ
2
−
1
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
(
x
x
T
−
μ
2
x
T
−
x
μ
2
T
+
μ
2
μ
2
T
)
]
d
x
1
⋯
d
x
K
\begin{aligned} &\frac{1}{2} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x - \mu_2)^T \Sigma_2^{-1} (x - \mu_2) dx_1 \cdots dx_K \\ &= \frac{1}{2} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} tr[ \Sigma_2^{-1} (x - \mu_2) (x - \mu_2)^T ] dx_1 \cdots dx_K\\ &= \frac{1}{2} tr \Bigg[ \Sigma_2^{-1} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x - \mu_2) (x - \mu_2)^T \Bigg] dx_1 \cdots dx_K\\ &= \frac{1}{2} tr \Bigg[ \Sigma_2^{-1} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x x^T - \mu_2 x^T - x \mu_2^T + \mu_2 \mu_2^T ) \Bigg] dx_1 \cdots dx_K\\ \end{aligned}
21∫x1⋯∫xK(2π)2K∣Σ1∣211e−21(x−μ1)TΣ1−1(x−μ1)(x−μ2)TΣ2−1(x−μ2)dx1⋯dxK=21∫x1⋯∫xK(2π)2K∣Σ1∣211e−21(x−μ1)TΣ1−1(x−μ1)tr[Σ2−1(x−μ2)(x−μ2)T]dx1⋯dxK=21tr[Σ2−1∫x1⋯∫xK(2π)2K∣Σ1∣211e−21(x−μ1)TΣ1−1(x−μ1)(x−μ2)(x−μ2)T]dx1⋯dxK=21tr[Σ2−1∫x1⋯∫xK(2π)2K∣Σ1∣211e−21(x−μ1)TΣ1−1(x−μ1)(xxT−μ2xT−xμ2T+μ2μ2T)]dx1⋯dxK
其中积分之后第一项为均方值,第二、三项为均值,第三项为常数:
1
2
t
r
[
Σ
2
−
1
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
1
∣
1
2
e
−
1
2
(
x
−
μ
1
)
T
Σ
1
−
1
(
x
−
μ
1
)
(
x
x
T
−
μ
2
x
T
−
x
μ
2
T
+
μ
2
μ
2
T
)
]
d
x
1
⋯
d
x
K
=
1
2
t
r
[
Σ
2
−
1
(
Σ
1
+
μ
1
μ
1
T
−
μ
2
μ
1
T
−
μ
1
μ
2
T
+
μ
2
μ
2
T
)
]
=
1
2
[
t
r
(
Σ
2
−
1
Σ
1
)
+
t
r
(
Σ
2
−
1
(
μ
1
−
μ
2
)
(
μ
1
−
μ
2
)
T
)
]
=
1
2
[
t
r
(
Σ
2
−
1
Σ
1
)
+
(
μ
1
−
μ
2
)
T
Σ
2
−
1
(
μ
1
−
μ
2
)
]
\begin{aligned} &\frac{1}{2} tr \Bigg[ \Sigma_2^{-1} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x x^T - \mu_2 x^T - x \mu_2^T + \mu_2 \mu_2^T ) \Bigg] dx_1 \cdots dx_K\\ &= \frac{1}{2} tr [ \Sigma_2^{-1} (\Sigma_1 + \mu_1 \mu_1^T - \mu_2 \mu_1^T - \mu_1 \mu_2^T + \mu_2 \mu_2^T)] \\ &= \frac{1}{2} \big[ tr ( \Sigma_2^{-1} \Sigma_1 ) + tr( \Sigma_2^{-1} (\mu_1 - \mu_2) (\mu_1 - \mu_2)^T ) \big] \\ &= \frac{1}{2} \big[ tr ( \Sigma_2^{-1} \Sigma_1 ) + (\mu_1 - \mu_2)^T \Sigma_2^{-1} (\mu_1 - \mu_2) \big] \\ \end{aligned}
21tr[Σ2−1∫x1⋯∫xK(2π)2K∣Σ1∣211e−21(x−μ1)TΣ1−1(x−μ1)(xxT−μ2xT−xμ2T+μ2μ2T)]dx1⋯dxK=21tr[Σ2−1(Σ1+μ1μ1T−μ2μ1T−μ1μ2T+μ2μ2T)]=21[tr(Σ2−1Σ1)+tr(Σ2−1(μ1−μ2)(μ1−μ2)T)]=21[tr(Σ2−1Σ1)+(μ1−μ2)TΣ2−1(μ1−μ2)]
整理最终结果,两个高斯分布的KL散度为:
K
L
(
N
(
x
∣
μ
1
,
Σ
1
)
∣
∣
N
(
x
∣
μ
2
,
Σ
2
)
)
=
1
2
[
log
∣
Σ
2
∣
∣
Σ
1
∣
−
K
+
t
r
(
Σ
2
−
1
Σ
1
)
+
(
μ
1
−
μ
2
)
T
Σ
2
−
1
(
μ
1
−
μ
2
)
]
\begin{aligned} KL(\mathcal{N}(x | \mu_1, \Sigma_1) || \mathcal{N}(x | \mu_2, \Sigma_2)) = \frac{1}{2} \Bigg[ \log \frac{|\Sigma_2|}{|\Sigma_1|} - K + tr ( \Sigma_2^{-1} \Sigma_1 ) + (\mu_1 - \mu_2)^T \Sigma_2^{-1} (\mu_1 - \mu_2) \Bigg] \\ \end{aligned}
KL(N(x∣μ1,Σ1)∣∣N(x∣μ2,Σ2))=21[log∣Σ1∣∣Σ2∣−K+tr(Σ2−1Σ1)+(μ1−μ2)TΣ2−1(μ1−μ2)]