KL距离
- 全称:
Kullback-Leibler差异(Kullback-Leibler divergence) - 又称:
相对熵(relative entropy) - 数学本质:
衡量相同事件空间里两个概率分布相对差距的测度 - 定义:
D ( p ∣ ∣ q ) = ∑ x ∈ X p ( x ) l o g p ( x ) q ( x ) D(p||q)= \sum_{x \in X} p(x) log \frac {p(x)}{q(x)} D(p∣∣q)=x∈X∑p(x)logq(x)p(x)
其中, p ( x ) p(x) p(x)与 q ( x ) q(x) q(x)是两个概率分布。
定义中约定:
0 l o g ( 0 / q ) = 0 0log(0/q)=0 0log(0/q)=0、 p l o g ( p / 0 ) = ∞ plog(p/0)=\infty plog(p/0)=∞
等价形式:
D
(
p
∣
∣
q
)
=
E
p
[
l
o
g
p
(
X
)
q
(
X
)
]
D(p||q)=E_{p}[log\frac{p(X)}{q(X)}]
D(p∣∣q)=Ep[logq(X)p(X)]
-
说明:
- 两个概率分布的差距越大,KL距离越大;
- 当两个概率分布相同时,KL距离为0
-
推论:
-
互信息衡量一个联合分布与独立性有多大的差距:
I ( X ; Y ) = X ( X ) − H ( X ∣ Y ) = − ∑ x ∈ X p ( x ) l o g p ( x ) + ∑ x ∈ X ∑ y ∈ Y p ( x , y ) l o g p ( x ∣ y ) = ∑ x ∈ X ∑ y ∈ Y p ( x , y ) l o g p ( x ∣ y ) p ( x ) = ∑ x ∈ X ∑ y ∈ Y p ( x , y ) l o g p ( x , y ) p ( x ) p ( y ) = D [ p ( x , y ) ∣ ∣ p ( x ) p ( y ) ] \begin{aligned} I(X;Y) &=X(X)-H(X|Y) \\ & =-\sum_{x \in X}p(x)logp(x)+\sum_{x \in X}\sum_{y \in Y}p(x,y)logp(x|y) \\ & =\sum_{x \in X}\sum_{y \in Y}p(x,y)log\frac{p(x|y)}{p(x)} \\ & =\sum_{x \in X}\sum_{y \in Y}p(x,y)log\frac{p(x,y)}{p(x)p(y)} \\ & =D[p(x,y)||p(x)p(y)] \end{aligned} I(X;Y)=X(X)−H(X∣Y)=−x∈X∑p(x)logp(x)+x∈X∑y∈Y∑p(x,y)logp(x∣y)=x∈X∑y∈Y∑p(x,y)logp(x)p(x∣y)=x∈X∑y∈Y∑p(x,y)logp(x)p(y)p(x,y)=D[p(x,y)∣∣p(x)p(y)] -
条件相对熵:
D [ p ( y ∣ x ) ∣ ∣ q ( y ∣ x ) ] = ∑ x p ( x ) ∑ y p ( y ∣ x ) l o g p ( y ∣ x ) q ( y ∣ x ) D[p(y|x)||q(y|x)]=\sum_{x}p(x)\sum_{y}p(y|x)log\frac{p(y|x)}{q(y|x)} D[p(y∣x)∣∣q(y∣x)]=x∑p(x)y∑p(y∣x)logq(y∣x)p(y∣x) -
相对熵的链式法则:
D [ p ( x , y ) ∣ ∣ q ( x , y ) ] = D [ p ( x ) ∣ ∣ q ( x ) ] + D [ p ( y ∣ x ) ∣ ∣ q ( y ∣ x ) ] D[p(x,y)||q(x,y)]=D[p(x)||q(x)]+D[p(y|x)||q(y|x)] D[p(x,y)∣∣q(x,y)]=D[p(x)∣∣q(x)]+D[p(y∣x)∣∣q(y∣x)]