两个高斯分布分别为:
p
(
x
)
=
N
(
x
j
;
μ
,
∑
)
=
1
(
2
π
)
n
2
∣
∑
∣
1
2
e
x
p
{
−
1
2
(
x
−
μ
)
T
(
∑
)
−
1
(
x
−
μ
)
}
p(x)=N(x_j;\mu,\sum)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=\frac{1}{(2\pi)^{\frac{n}{2}}|\sum|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg\}
p(x)=N(xj;μ,∑) =(2π)2n∣∑∣211exp{−21(x−μ)T(∑)−1(x−μ)}
q
(
x
)
=
N
(
x
j
;
m
,
L
)
=
1
(
2
π
)
n
2
∣
L
∣
1
2
e
x
p
{
−
1
2
(
x
−
m
)
T
L
−
1
(
x
−
m
)
}
q(x)=N(x_j;m,L)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=\frac{1}{(2\pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\bigg\}
q(x)=N(xj;m,L) =(2π)2n∣L∣211exp{−21(x−m)TL−1(x−m)}
矩阵迹(tr)的性质:
t
r
(
α
A
+
β
B
)
=
α
t
r
(
A
)
+
β
t
r
(
B
)
.
.
.
.
.
.
①
tr(\alpha A+\beta B)=\alpha tr(A)+\beta tr(B)......①
tr(αA+βB)=αtr(A)+βtr(B)......①
t
r
(
A
)
=
t
r
(
A
T
)
.
.
.
.
.
.
②
tr(A)=tr(A^T)......②
tr(A)=tr(AT)......②
t
r
(
A
B
)
=
t
r
(
B
A
)
.
.
.
.
.
.
③
tr(AB)=tr(BA) ...... ③
tr(AB)=tr(BA)......③
t
r
(
A
B
C
)
=
t
r
(
B
C
A
)
=
t
r
(
C
A
B
)
.
.
.
.
.
.
④
(
由
③
得
)
tr(ABC)=tr(BCA)=tr(CAB)...... ④(由③得)
tr(ABC)=tr(BCA)=tr(CAB)......④(由③得)
一个重要公式:
λ
T
A
λ
=
t
r
(
λ
T
A
λ
)
=
t
r
(
A
λ
λ
T
)
.
.
.
.
.
.
⑤
\lambda^TA\lambda=tr(\lambda^TA\lambda)=tr(A\lambda\lambda^T)......⑤
λTAλ=tr(λTAλ)=tr(AλλT)......⑤
多元分布中期望E与协方差
∑
\sum
∑的性质:
E
(
x
x
T
)
=
∑
+
μ
μ
T
.
.
.
.
.
.
⑥
E(xx^T)=\sum+\mu\mu^T...... ⑥
E(xxT)=∑+μμT......⑥
证明:
∑
=
E
[
(
x
−
μ
)
(
x
−
μ
T
)
]
=
E
(
x
x
T
−
x
μ
T
−
μ
x
T
+
μ
μ
T
)
=
E
(
x
x
T
−
μ
μ
T
−
μ
μ
T
+
μ
μ
T
)
=
E
(
x
x
T
)
−
μ
μ
T
\sum=E\big[(x-\mu)(x-\mu^T)\big] \\=E\big(xx^T-x\mu^T-\mu x^T+\mu\mu^T\big) \\=E\big(xx^T-\mu\mu^T-\mu\mu^T+\mu\mu^T\big) \\=E\big(xx^T\big)-\mu\mu^T \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \
∑=E[(x−μ)(x−μT)]=E(xxT−xμT−μxT+μμT)=E(xxT−μμT−μμT+μμT)=E(xxT)−μμT
E
(
x
T
A
x
)
=
t
r
(
A
∑
)
+
μ
T
A
μ
.
.
.
.
.
.
⑦
E\big(x^TAx\big)=tr\big(A\sum\big)+\mu^TA\mu......⑦
E(xTAx)=tr(A∑)+μTAμ......⑦
证明:
E
(
x
T
A
x
)
=
E
[
t
r
(
x
T
A
x
)
]
=
E
[
t
r
(
A
x
x
T
)
]
=
t
r
[
E
(
A
x
x
T
)
]
=
t
r
[
A
E
(
x
x
T
)
]
=
t
r
[
A
(
∑
+
μ
μ
T
)
]
=
t
r
(
A
∑
)
+
t
r
(
A
μ
μ
T
)
=
t
r
(
A
∑
)
+
t
r
(
μ
T
A
μ
)
=
t
r
(
A
∑
)
+
μ
T
A
μ
E\big(x^TAx\big) \\=E\big[tr(x^TAx)\big] \\=E\big[tr(Axx^T)\big] \\=tr\big[E(Axx^T)\big] \\=tr\big[AE(xx^T)\big] \\=tr\big[A(\sum+\mu\mu^T)\big] \\=tr(A\sum)+tr(A\mu\mu^T) \\=tr(A\sum)+tr(\mu^TA\mu) \\=tr(A\sum)+\mu^TA\mu
E(xTAx)=E[tr(xTAx)]=E[tr(AxxT)]=tr[E(AxxT)]=tr[AE(xxT)]=tr[A(∑+μμT)]=tr(A∑)+tr(AμμT)=tr(A∑)+tr(μTAμ)=tr(A∑)+μTAμ
K
L
散
度
的
定
义
:
KL散度的定义:
KL散度的定义:
K
L
(
p
∣
∣
q
)
=
E
p
[
l
o
g
p
(
x
)
q
(
x
)
]
KL(p||q)=E_p\bigg[log\frac{p(x)}{q(x)}\bigg]
KL(p∣∣q)=Ep[logq(x)p(x)]
p
(
x
)
q
(
x
)
=
1
(
2
π
)
n
2
∣
∑
∣
1
2
e
x
p
{
−
1
2
(
x
−
μ
)
T
(
∑
)
−
1
(
x
−
μ
)
}
1
(
2
π
)
n
2
∣
L
∣
1
2
e
x
p
{
−
1
2
(
x
−
m
)
T
L
−
1
(
x
−
m
)
}
=
(
∣
L
∣
∣
∑
∣
)
1
2
e
x
p
{
−
1
2
(
x
−
μ
)
T
(
∑
)
−
1
(
x
−
μ
)
−
[
−
1
2
(
x
−
m
)
T
L
−
1
(
x
−
m
)
]
}
=
(
∣
L
∣
∣
∑
∣
)
1
2
e
x
p
{
1
2
[
(
x
−
m
)
T
L
−
1
(
x
−
m
)
−
(
x
−
μ
)
T
(
∑
)
−
1
(
x
−
μ
)
]
}
\frac{p(x)}{q(x)}=\frac{\frac{1}{(2\pi)^{\frac{n}{2}}|\sum|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg\}}{\frac{1}{(2\pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\bigg\}} \\=(\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)-\big[{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\big]\bigg\} \\=(\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg\}
q(x)p(x)=(2π)2n∣L∣211exp{−21(x−m)TL−1(x−m)}(2π)2n∣∑∣211exp{−21(x−μ)T(∑)−1(x−μ)}=(∣∑∣∣L∣)21exp{−21(x−μ)T(∑)−1(x−μ)−[−21(x−m)TL−1(x−m)]}=(∣∑∣∣L∣)21exp{21[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)]}
l
o
g
p
(
x
)
q
(
x
)
=
l
o
g
(
(
∣
L
∣
∣
∑
∣
)
1
2
e
x
p
{
1
2
[
(
x
−
m
)
T
L
−
1
(
x
−
m
)
−
(
x
−
μ
)
T
(
∑
)
−
1
(
x
−
μ
)
]
}
)
=
1
2
l
o
g
∣
L
∣
∣
∑
∣
+
1
2
[
(
x
−
m
)
T
L
−
1
(
x
−
m
)
−
(
x
−
μ
)
T
(
∑
)
−
1
(
x
−
μ
)
]
log\frac{p(x)}{q(x)}=log\Bigg((\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg\}\Bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]
logq(x)p(x)=log((∣∑∣∣L∣)21exp{21[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)]})=21log∣∑∣∣L∣+21[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)]
E
p
[
l
o
g
p
(
x
)
q
(
x
)
]
=
E
p
(
1
2
l
o
g
∣
L
∣
∣
∑
∣
+
1
2
[
(
x
−
m
)
T
L
−
1
(
x
−
m
)
−
(
x
−
μ
)
T
(
∑
)
−
1
(
x
−
μ
)
]
)
=
1
2
E
p
(
l
o
g
∣
L
∣
∣
∑
∣
)
+
1
2
E
p
(
(
x
−
m
)
T
L
−
1
(
x
−
m
)
−
(
x
−
μ
)
T
(
∑
)
−
1
(
x
−
μ
)
)
=
1
2
l
o
g
∣
L
∣
∣
∑
∣
+
1
2
E
p
(
t
r
[
L
−
1
(
x
−
m
)
(
x
−
m
)
T
]
−
t
r
[
(
∑
)
−
1
(
x
−
μ
)
(
x
−
μ
)
T
]
)
.
.
.
.
.
.
(
性
质
⑤
)
=
1
2
l
o
g
∣
L
∣
∣
∑
∣
+
1
2
t
r
(
E
p
[
L
−
1
(
x
−
m
)
(
x
−
m
)
T
]
)
−
1
2
t
r
(
E
p
[
(
∑
)
−
1
(
x
−
μ
)
(
x
−
μ
)
T
]
)
.
.
.
.
.
.
(
性
质
①
)
=
1
2
l
o
g
∣
L
∣
∣
∑
∣
+
1
2
t
r
(
E
p
[
L
−
1
(
x
x
T
−
m
x
T
−
x
m
T
+
m
m
T
)
]
)
−
1
2
t
r
(
(
∑
)
−
1
E
p
[
(
∑
)
−
1
(
x
−
μ
)
(
x
−
μ
)
T
]
)
=
1
2
l
o
g
∣
L
∣
∣
∑
∣
+
1
2
t
r
(
L
−
1
[
E
p
(
x
x
T
−
m
x
T
−
x
m
T
+
m
m
T
)
]
)
−
1
2
t
r
(
(
∑
)
−
1
∑
)
=
1
2
l
o
g
∣
L
∣
∣
∑
∣
+
1
2
t
r
(
L
−
1
[
∑
+
μ
μ
T
⏟
性
质
⑥
−
m
x
T
−
x
m
T
+
m
m
T
]
)
−
n
2
=
1
2
{
l
o
g
∣
L
∣
∣
∑
∣
−
n
+
t
r
(
L
−
1
∑
)
+
t
r
(
L
−
1
[
μ
μ
T
−
m
x
T
−
x
m
T
+
m
m
T
]
)
}
=
1
2
{
l
o
g
∣
L
∣
∣
∑
∣
−
n
+
t
r
(
L
−
1
∑
)
+
t
r
(
L
−
1
μ
μ
T
−
L
−
1
m
x
T
−
L
−
1
x
m
T
+
L
−
1
m
m
T
)
}
=
1
2
{
l
o
g
∣
L
∣
∣
∑
∣
−
n
+
t
r
(
L
−
1
∑
)
+
t
r
(
μ
T
L
−
1
μ
−
2
x
T
L
−
1
m
+
m
T
L
−
1
m
)
}
=
1
2
{
l
o
g
∣
L
∣
∣
∑
∣
−
n
+
t
r
(
L
−
1
∑
)
+
t
r
(
L
−
1
μ
μ
T
−
L
−
1
m
x
T
−
L
−
1
x
m
T
+
L
−
1
m
m
T
)
}
=
1
2
{
l
o
g
∣
L
∣
∣
∑
∣
−
n
+
t
r
(
L
−
1
∑
)
+
(
x
−
m
)
T
L
−
1
(
x
−
m
)
}
E_p\bigg[log\frac{p(x)}{q(x)}\bigg] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=E_p\bigg(\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg) \\=\frac{1}{2}E_p\bigg(log\frac{|L|}{|\sum|}\bigg)+\frac{1}{2}E_p\bigg((x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}E_p\bigg(tr\big[L^{-1} (x-m)(x-m)^T\big]-tr\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg)......(性质⑤) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(E_p\big[L^{-1} (x-m)(x-m)^T\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\bigg(E_p\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg) ......(性质①) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(E_p\big[ L^{-1}(xx^T-mx^T-xm^T+mm^T)\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\bigg((\sum)^{-1}E_p\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(L^{-1}\big[ E_p(xx^T-mx^T-xm^T+mm^T)\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\big((\sum)^{-1}\sum\big) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(L^{-1}\big[\underbrace{\sum+\mu\mu^T}_{性质⑥}-mx^T-xm^T+mm^T\big]\bigg)-\frac{n}{2} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}[\mu\mu^T-mx^T-xm^T+mm^T]\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}\mu\mu^T-L^{-1}mx^T-L^{-1}xm^T+L^{-1}mm^T\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(\mu^TL^{-1}\mu-2x^TL^{-1}m+m^TL^{-1}m\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}\mu\mu^T-L^{-1}mx^T-L^{-1}xm^T+L^{-1}mm^T\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+\big(x-m\big)^TL^{-1}\big(x-m\big)\Bigg\}
Ep[logq(x)p(x)] =Ep(21log∣∑∣∣L∣+21[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)])=21Ep(log∣∑∣∣L∣)+21Ep((x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ))=21log∣∑∣∣L∣+21Ep(tr[L−1(x−m)(x−m)T]−tr[(∑)−1(x−μ)(x−μ)T])......(性质⑤)=21log∣∑∣∣L∣+21tr(Ep[L−1(x−m)(x−m)T]) −21tr(Ep[(∑)−1(x−μ)(x−μ)T])......(性质①)=21log∣∑∣∣L∣+21tr(Ep[L−1(xxT−mxT−xmT+mmT)]) −21tr((∑)−1Ep[(∑)−1(x−μ)(x−μ)T])=21log∣∑∣∣L∣+21tr(L−1[Ep(xxT−mxT−xmT+mmT)]) −21tr((∑)−1∑)=21log∣∑∣∣L∣+21tr(L−1[性质⑥
∑+μμT−mxT−xmT+mmT])−2n=21{log∣∑∣∣L∣−n+tr(L−1∑)+tr(L−1[μμT−mxT−xmT+mmT])}=21{log∣∑∣∣L∣−n+tr(L−1∑)+tr(L−1μμT−L−1mxT−L−1xmT+L−1mmT)}=21{log∣∑∣∣L∣−n+tr(L−1∑)+tr(μTL−1μ−2xTL−1m+mTL−1m)}=21{log∣∑∣∣L∣−n+tr(L−1∑)+tr(L−1μμT−L−1mxT−L−1xmT+L−1mmT)}=21{log∣∑∣∣L∣−n+tr(L−1∑)+(x−m)TL−1(x−m)}