KL
散度公式
D
(
p
∥
q
)
=
∑
x
∈
X
p
(
x
)
log
p
(
x
)
q
(
x
)
=
E
[
log
p
(
X
)
q
(
X
)
)
]
\begin{aligned}D(p\|q) &=\sum_{x \in X} p(x) \log \frac{p(x)}{q(x)} \\&\left.=E\left[\log \frac{p(X)}{q(X)}\right)\right]\end{aligned}
D(p∥q)=x∈X∑p(x)logq(x)p(x)=E[logq(X)p(X))]
互信息
公式
I
(
X
,
Y
)
=
H
(
X
)
−
H
(
X
∣
Y
)
=
∑
x
,
y
p
(
x
,
y
)
log
p
(
y
∣
x
)
p
(
y
)
)
\begin{aligned} I(X, Y) &=H(X)-H(X \mid Y) \\ &=\sum_{x, y} p(x, y) \log \frac{p(y \mid x)}{p(y))} \end{aligned}
I(X,Y)=H(X)−H(X∣Y)=x,y∑p(x,y)logp(y))p(y∣x)
KL
与互信息
I
(
X
,
Y
)
=
H
(
X
)
−
H
(
X
∣
Y
)
=
H
(
Y
)
−
H
(
Y
∣
X
)
=
D
(
p
(
x
,
y
)
∥
p
(
x
)
p
(
y
)
)
=
E
[
log
p
(
x
,
y
)
p
(
x
)
p
(
y
)
]
\begin{aligned} I(X, Y) &=H(X)-H(X \mid Y) \\ &=H(Y)-H(Y \mid X) \\ &=D(p(x, y) \| p(x) p(y)) \\ &=E\left[\log \frac{p(x, y)}{p(x) p(y)}\right] \end{aligned}
I(X,Y)=H(X)−H(X∣Y)=H(Y)−H(Y∣X)=D(p(x,y)∥p(x)p(y))=E[logp(x)p(y)p(x,y)]
Jensen-Shannon Divergence
公式
JSD
(
P
∥
Q
)
=
1
2
D
(
P
∥
M
)
+
1
2
D
(
Q
∥
M
)
where
M
=
1
2
(
P
+
Q
)
\begin{aligned} &\quad \operatorname{JSD}(P \| Q)=\frac{1}{2} D(P \| M)+\frac{1}{2} D(Q \| M) \\ &\text { where } M=\frac{1}{2}(P+Q) \end{aligned}
JSD(P∥Q)=21D(P∥M)+21D(Q∥M) where M=21(P+Q)
tensorflow KL损失函数
import numpy as np
import pandas as pd
import tensorflow as tf
y_true = np.random.randint(0, 2, size=(2, 3)).astype(np.float64)
y_pred = np.random.random(size=(2, 3))
loss = tf.keras.losses.kullback_leibler_divergence(y_true, y_pred)
assert loss.shape == (2,)
y_true = tf.keras.backend.clip(y_true, 1e-7, 1)
y_pred = tf.keras.backend.clip(y_pred, 1e-7, 1)
assert np.array_equal(
loss.numpy(), np.sum(y_true * np.log(y_true / y_pred), axis=-1))
print(y_true * np.log(y_true / y_pred))
参考资料
【1】fabian1heinrich/LearningCommunicationChannelsWithAutoencoders (github.com)
【2】Jensen–Shannon divergence - Wikipedia
【3】tf.keras.metrics.kl_divergence | TensorFlow Core v2.7.0