1. 形式
一般而言,P代表数据x的真实分布,Q代表理论分布。
离散:
K
L
(
P
∣
∣
Q
)
=
∑
P
(
x
)
log
P
(
x
)
Q
(
x
)
KL(P||Q) = \sum P(x) \log \frac{P(x)}{Q(x)} \\
KL(P∣∣Q)=∑P(x)logQ(x)P(x)
连续:
K
L
(
P
∣
∣
Q
)
=
∫
P
(
x
)
log
P
(
x
)
Q
(
x
)
d
x
KL(P||Q) = \int P(x) \log \frac{P(x)}{Q(x)} {\bf d} x
KL(P∣∣Q)=∫P(x)logQ(x)P(x)dx
2. KL 散度一定大于零
用到 Jensen 不等式
K
L
(
P
∣
∣
Q
)
=
∑
P
(
x
)
log
P
(
x
)
Q
(
x
)
=
E
(
log
P
(
x
)
Q
(
x
)
)
=
E
(
−
log
Q
(
x
)
P
(
x
)
)
KL(P||Q) = \sum P(x) \log \frac{P(x)}{Q(x)} = \mathbb{E} ( \log \frac{P(x)}{Q(x)} ) = \mathbb{E} ( - \log \frac{Q(x)}{P(x)} ) \\
KL(P∣∣Q)=∑P(x)logQ(x)P(x)=E(logQ(x)P(x))=E(−logP(x)Q(x))
对数函数是凹函数:
E
[
−
log
Q
(
x
)
P
(
x
)
]
≥
−
log
[
∑
P
(
x
)
Q
(
x
)
P
(
x
)
]
=
−
log
[
∑
Q
(
x
)
]
=
0
\mathbb{E} \left[ - \log \frac{Q(x)}{P(x)} \right] \geq -\log \left[ \sum P(x) \frac{Q(x)}{P(x)} \right] = - \log \left[ \sum Q(x) \right] = 0
E[−logP(x)Q(x)]≥−log[∑P(x)P(x)Q(x)]=−log[∑Q(x)]=0
因此, K L ( P ∣ ∣ Q ) ≥ 0 KL(P||Q) \geq 0 KL(P∣∣Q)≥0