题目
请展示多元高斯分布
N
(
x
∣
μ
,
Σ
)
N(x| \mu, \Sigma)
N(x∣μ,Σ)的熵(enrtopy)是
H
[
x
]
=
1
2
log
∣
Σ
∣
+
D
2
(
1
+
log
(
2
π
)
)
H[x]=\frac{1}{2}\log |\Sigma| + \frac{D}{2}(1+\log(2\pi))
H[x]=21log∣Σ∣+2D(1+log(2π))
D
D
D是
x
x
x的维度。
解答
这道题有两种解答的方式
- 利用熵的定义,也就是 H [ x ] = E [ − log p ( x ) ] H[x]=E[-\log p(x)] H[x]=E[−logp(x)][1]
- 对高斯分布直接展开,然后化简[2]
方式一:利用熵的定义
首先写出多元高斯分布的熵用定义表示的形式
H
[
x
]
=
E
[
−
log
N
(
x
∣
μ
,
Σ
)
]
H[x]=E[-\log N(x| \mu, \Sigma)]
H[x]=E[−logN(x∣μ,Σ)]
H
[
x
]
=
E
[
−
log
1
(
2
π
)
D
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
]
H[x]=E[-\log \frac{1}{(2\pi)^{\frac{D}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)}]
H[x]=E[−log(2π)2D∣Σ∣211e−21(x−μ)TΣ−1(x−μ)]
H
[
x
]
=
−
E
[
−
log
(
(
2
π
)
D
2
∣
Σ
∣
1
2
)
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
]
H[x]=-E[-\log ((2\pi)^{\frac{D}{2}}|\Sigma|^{\frac{1}{2}}) - \frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)]
H[x]=−E[−log((2π)2D∣Σ∣21)−21(x−μ)TΣ−1(x−μ)]
H
[
x
]
=
D
2
log
(
2
π
)
+
1
2
log
∣
Σ
∣
+
1
2
E
[
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
]
H[x]=\frac{D}{2} \log (2\pi) + \frac{1}{2}\log |\Sigma| + \frac{1}{2}E[(x-\mu)^T\Sigma^{-1}(x-\mu)]
H[x]=2Dlog(2π)+21log∣Σ∣+21E[(x−μ)TΣ−1(x−μ)]
这个时候我们已经有了一个相对明显的形式,只要计算出来最后一项就可以了,也就是
E
[
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
]
E[(x-\mu)^T\Sigma^{-1}(x-\mu)]
E[(x−μ)TΣ−1(x−μ)],在这里我们需要用到 “矩阵的迹的技巧”,也就是Trace Track[3]。
E
[
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
]
=
E
[
t
r
(
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
]
E[(x-\mu)^T\Sigma^{-1}(x-\mu)]=E[tr((x-\mu)^T\Sigma^{-1}(x-\mu))]
E[(x−μ)TΣ−1(x−μ)]=E[tr((x−μ)TΣ−1(x−μ))]
=
E
[
t
r
(
Σ
−
1
(
x
−
μ
)
T
(
x
−
μ
)
)
]
=
t
r
(
E
[
Σ
−
1
(
x
−
μ
)
T
(
x
−
μ
)
]
)
=E[tr(\Sigma^{-1}(x-\mu)^T(x-\mu))]=tr(E[\Sigma^{-1}(x-\mu)^T(x-\mu)])
=E[tr(Σ−1(x−μ)T(x−μ))]=tr(E[Σ−1(x−μ)T(x−μ)])
=
t
r
(
Σ
−
1
E
[
(
x
−
μ
)
T
(
x
−
μ
)
]
)
=
t
r
(
Σ
−
1
Σ
)
=
t
r
(
I
)
=
D
=tr(\Sigma^{-1}E[(x-\mu)^T(x-\mu)])=tr(\Sigma^{-1}\Sigma)=tr(I)=D
=tr(Σ−1E[(x−μ)T(x−μ)])=tr(Σ−1Σ)=tr(I)=D
把这个公式的结果代入到刚才的公式中,我们就可以得到
H
[
x
]
=
D
2
log
(
2
π
)
+
1
2
log
∣
Σ
∣
+
1
2
D
H[x]=\frac{D}{2} \log (2\pi) + \frac{1}{2}\log |\Sigma| + \frac{1}{2}D
H[x]=2Dlog(2π)+21log∣Σ∣+21D
方式二:展开高斯分布
像第一种方法一样,我们先写出多元高斯分布的熵的形式:
H
[
x
]
=
−
∫
1
(
2
π
)
D
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
log
1
(
2
π
)
D
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
d
x
H[x]=-\int \frac{1}{(2\pi)^{\frac{D}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)} \log \frac{1}{(2\pi)^{\frac{D}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)}dx
H[x]=−∫(2π)2D∣Σ∣211e−21(x−μ)TΣ−1(x−μ)log(2π)2D∣Σ∣211e−21(x−μ)TΣ−1(x−μ)dx
为了方便表示,我们用
p
(
x
)
p(x)
p(x)表示多元高斯分布的概率,所以我们就可以得到:
H
[
x
]
=
−
∫
p
(
x
)
log
1
(
2
π
)
D
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
d
x
H[x]=-\int p(x)\log \frac{1}{(2\pi)^{\frac{D}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)}dx
H[x]=−∫p(x)log(2π)2D∣Σ∣211e−21(x−μ)TΣ−1(x−μ)dx
=
∫
p
(
x
)
(
D
2
ln
(
2
π
)
+
1
2
ln
∣
Σ
∣
)
d
x
+
1
2
∫
p
(
x
)
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
d
x
=\int p(x)(\frac{D}{2}\ln (2\pi) + \frac{1}{2}\ln |\Sigma|)dx + \frac{1}{2}\int p(x)(x-\mu)^T\Sigma^{-1}(x-\mu)dx
=∫p(x)(2Dln(2π)+21ln∣Σ∣)dx+21∫p(x)(x−μ)TΣ−1(x−μ)dx
=
D
2
ln
(
2
π
)
+
1
2
ln
∣
Σ
∣
+
1
2
∫
p
(
x
)
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
d
x
=\frac{D}{2}\ln (2\pi) + \frac{1}{2}\ln |\Sigma| + \frac{1}{2}\int p(x)(x-\mu)^T\Sigma^{-1}(x-\mu)dx
=2Dln(2π)+21ln∣Σ∣+21∫p(x)(x−μ)TΣ−1(x−μ)dx
现在我们只要处理右边的项就可以了,对于右边的项,我们也可以对于最后一项中的标量用Trace Track[3]进行求解
∫
p
(
x
)
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
d
x
=
∫
p
(
x
)
t
r
(
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
d
x
\int p(x)(x-\mu)^T\Sigma^{-1}(x-\mu)dx = \int p(x)tr((x-\mu)^T\Sigma^{-1}(x-\mu))dx
∫p(x)(x−μ)TΣ−1(x−μ)dx=∫p(x)tr((x−μ)TΣ−1(x−μ))dx
=
∫
p
(
x
)
t
r
(
Σ
−
1
(
x
−
μ
)
T
(
x
−
μ
)
)
d
x
=
∫
t
r
(
Σ
−
1
p
(
x
)
(
x
−
μ
)
T
(
x
−
μ
)
)
d
x
= \int p(x)tr(\Sigma^{-1}(x-\mu)^T(x-\mu))dx = \int tr(\Sigma^{-1}p(x)(x-\mu)^T(x-\mu))dx
=∫p(x)tr(Σ−1(x−μ)T(x−μ))dx=∫tr(Σ−1p(x)(x−μ)T(x−μ))dx
=
t
r
(
Σ
−
1
∫
p
(
x
)
(
x
−
μ
)
T
(
x
−
μ
)
d
x
)
=
t
r
(
Σ
−
1
Σ
)
=
t
r
(
I
)
=
D
=tr(\Sigma^{-1} \int p(x)(x-\mu)^T(x-\mu)dx)=tr(\Sigma^{-1}\Sigma)=tr(I)=D
=tr(Σ−1∫p(x)(x−μ)T(x−μ)dx)=tr(Σ−1Σ)=tr(I)=D
带入到最后的式子中,我们可以得到
H
[
x
]
=
D
2
log
(
2
π
)
+
1
2
log
∣
Σ
∣
+
1
2
D
H[x]=\frac{D}{2} \log (2\pi) + \frac{1}{2}\log |\Sigma| + \frac{1}{2}D
H[x]=2Dlog(2π)+21log∣Σ∣+21D
引用
[1]从多元高斯分布推导其熵
[2]从概念推导熵最大值
[3]矩阵的迹的技巧