高斯分布的熵很简单:
H
[
N
(
μ
,
σ
2
)
]
=
−
∫
x
1
2
π
σ
e
−
(
x
−
μ
)
2
2
σ
2
log
1
2
π
σ
e
−
(
x
−
μ
)
2
2
σ
2
d
x
=
−
∫
x
1
2
π
σ
e
−
(
x
−
μ
)
2
2
σ
2
[
−
1
2
log
2
π
σ
2
−
(
x
−
μ
)
2
2
σ
2
]
d
x
=
1
2
log
2
π
σ
2
+
1
2
σ
2
∫
x
(
x
−
μ
)
2
1
2
π
σ
e
−
(
x
−
μ
)
2
2
σ
2
d
x
=
1
2
log
2
π
σ
2
+
σ
2
2
σ
2
=
1
2
log
2
π
e
σ
2
\begin{aligned} H[\mathcal{N}(\mu, \sigma^2)] &= -\int_x \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \log \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx \\ &= -\int_x \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \Bigg[ -\frac{1}{2}\log 2\pi \sigma^2 - \frac{(x-\mu)^2}{2\sigma^2} \Bigg] dx \\ &= \frac{1}{2}\log 2\pi \sigma^2 + \frac{1}{2\sigma^2} \int_x (x-\mu)^2\frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx \\ &= \frac{1}{2}\log 2\pi \sigma^2 + \frac{\sigma^2}{2\sigma^2} \\ &= \frac{1}{2}\log 2\pi e \sigma^2 \end{aligned}
H[N(μ,σ2)]=−∫x2πσ1e−2σ2(x−μ)2log2πσ1e−2σ2(x−μ)2dx=−∫x2πσ1e−2σ2(x−μ)2[−21log2πσ2−2σ2(x−μ)2]dx=21log2πσ2+2σ21∫x(x−μ)22πσ1e−2σ2(x−μ)2dx=21log2πσ2+2σ2σ2=21log2πeσ2
多元高斯分布的熵:
H
[
N
(
x
∣
μ
,
Σ
)
]
=
−
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
log
1
(
2
π
)
K
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
d
x
1
⋯
d
x
K
=
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
[
log
(
2
π
)
K
2
∣
Σ
∣
1
2
+
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
]
d
x
1
⋯
d
x
K
\begin{aligned} H[\mathcal{N}(x | \mu, \Sigma)] &= -\int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu)} \log \frac{1}{(2\pi)^\frac{K}{2} |\Sigma|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu)} dx_1 \cdots dx_K \\ &= \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu)} \Bigg[ \log (2\pi)^\frac{K}{2} |\Sigma|^{\frac{1}{2}} + \frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu) \Bigg] dx_1 \cdots dx_K \end{aligned}
H[N(x∣μ,Σ)]=−∫x1⋯∫xK(2π)2K∣Σ∣211e−21(x−μ)TΣ−1(x−μ)log(2π)2K∣Σ∣211e−21(x−μ)TΣ−1(x−μ)dx1⋯dxK=∫x1⋯∫xK(2π)2K∣Σ∣211e−21(x−μ)TΣ−1(x−μ)[log(2π)2K∣Σ∣21+21(x−μ)TΣ−1(x−μ)]dx1⋯dxK
其中第二项:
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
d
x
1
⋯
d
x
K
=
1
2
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
t
r
[
Σ
−
1
(
x
−
μ
)
(
x
−
μ
)
T
]
d
x
1
⋯
d
x
K
=
1
2
t
r
{
Σ
−
1
∫
x
1
⋯
∫
x
K
1
(
2
π
)
K
2
∣
Σ
∣
1
2
e
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
(
x
x
T
−
μ
x
T
−
x
μ
T
+
μ
μ
T
)
d
x
1
⋯
d
x
K
}
=
1
2
t
r
(
Σ
−
1
(
Σ
+
μ
μ
T
−
μ
μ
T
−
μ
μ
T
+
μ
μ
T
)
)
=
1
2
t
r
(
I
)
=
K
2
\begin{aligned} &\int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu)} \frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu) dx_1 \cdots dx_K \\ &= \frac{1}{2} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu)} tr[ \Sigma^{-1} (x - \mu) (x - \mu)^T ] dx_1 \cdots dx_K \\ &= \frac{1}{2} tr \Bigg\{ \Sigma^{-1} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu)} (xx^T - \mu x^T - x \mu^T + \mu \mu^T ) dx_1 \cdots dx_K \Bigg\} \\ &= \frac{1}{2} tr( \Sigma^{-1}( \Sigma + \mu \mu^T - \mu \mu^T - \mu \mu^T + \mu \mu^T) ) \\ &= \frac{1}{2} tr(I) = \frac{K}{2} \end{aligned}
∫x1⋯∫xK(2π)2K∣Σ∣211e−21(x−μ)TΣ−1(x−μ)21(x−μ)TΣ−1(x−μ)dx1⋯dxK=21∫x1⋯∫xK(2π)2K∣Σ∣211e−21(x−μ)TΣ−1(x−μ)tr[Σ−1(x−μ)(x−μ)T]dx1⋯dxK=21tr{Σ−1∫x1⋯∫xK(2π)2K∣Σ∣211e−21(x−μ)TΣ−1(x−μ)(xxT−μxT−xμT+μμT)dx1⋯dxK}=21tr(Σ−1(Σ+μμT−μμT−μμT+μμT))=21tr(I)=2K
整理最终结果:
H
[
N
(
x
∣
μ
,
Σ
)
]
=
K
2
(
log
2
π
+
1
)
+
1
2
log
∣
Σ
∣
\begin{aligned} H[\mathcal{N}(x | \mu, \Sigma)] = \frac{K}{2} (\log 2\pi + 1) + \frac{1}{2} \log |\Sigma| \end{aligned}
H[N(x∣μ,Σ)]=2K(log2π+1)+21log∣Σ∣