Pinsker’s inequality
D ( P 1 ∥ P 2 ) ≥ 1 2 ln 2 ∥ P 1 − P 2 ∥ 1 2 D\left(P_1 \| P_2\right) \geq \frac{1}{2 \ln 2}\left\|P_1-P_2\right\|_1^2 D(P1∥P2)≥2ln21∥P1−P2∥12
证明
二进制
首先证明二进制的情况,考虑两个参数分别为
p
,
q
,
p
≥
q
p,q,p \geq q
p,q,p≥q的二进制分布。我们需要证的是:
p
log
p
q
+
(
1
−
p
)
log
1
−
p
1
−
q
≥
4
2
ln
2
(
p
−
q
)
2
p \log \frac{p}{q}+(1-p) \log \frac{1-p}{1-q} \geq \frac{4}{2 \ln 2}(p-q)^2
plogqp+(1−p)log1−q1−p≥2ln24(p−q)2不等式两边的差
g
(
p
,
q
)
g(p,q)
g(p,q):
g
(
p
,
q
)
=
p
log
p
q
+
(
1
−
p
)
log
1
−
p
1
−
q
−
4
2
ln
2
(
p
−
q
)
2
g(p,q) = p \log \frac{p}{q}+(1-p) \log \frac{1-p}{1-q} - \frac{4}{2 \ln 2}(p-q)^2
g(p,q)=plogqp+(1−p)log1−q1−p−2ln24(p−q)2
对
q
q
q求导得
d
g
(
p
,
q
)
d
q
=
−
p
q
ln
2
+
1
−
p
(
1
−
q
)
ln
2
−
4
2
ln
2
2
(
q
−
p
)
=
q
−
p
q
(
1
−
q
)
ln
2
−
4
ln
2
(
q
−
p
)
≤
0
\begin{aligned} \frac{d g(p, q)}{d q} & =-\frac{p}{q \ln 2}+\frac{1-p}{(1-q) \ln 2}-\frac{4}{2 \ln 2} 2(q-p) \\\\ & =\frac{q-p}{q(1-q) \ln 2}-\frac{4}{\ln 2}(q-p) \\\\ & \leq 0 \end{aligned}
dqdg(p,q)=−qln2p+(1−q)ln21−p−2ln242(q−p)=q(1−q)ln2q−p−ln24(q−p)≤0最后一步因为
q
(
1
−
q
)
≤
1
/
4
q(1-q) \leq 1/4
q(1−q)≤1/4,
q
≤
p
q\leq p
q≤p。
当
q
=
p
q=p
q=p时,有
g
(
p
,
q
)
=
0
g(p,q) = 0
g(p,q)=0,所以当
q
≤
p
q\leq p
q≤p时候,
g
(
p
,
q
)
≥
0
g(p,q) \geq 0
g(p,q)≥0,不等式得证。
一般的情况
对于任意两个分布
P
1
,
P
2
P_1,P_2
P1,P2,记:
A
=
{
x
:
P
1
(
x
)
>
P
2
(
x
)
}
A=\left\{x: P_1(x)>P_2(x)\right\}
A={x:P1(x)>P2(x)}定义一个新的二进制随机变量
Y
=
ϕ
(
X
)
Y=\phi(X)
Y=ϕ(X),集合
A
A
A的指示器,记
P
^
1
,
P
^
2
\hat P_1,\hat P_2
P^1,P^2是
Y
Y
Y的分布,是
P
1
,
P
2
P_1,P_2
P1,P2的量化版本。
将data-processing inequality应用到相对熵中得到:
D
(
P
1
∥
P
2
)
≥
D
(
P
^
1
∥
P
^
2
)
≥
4
2
ln
2
(
P
1
(
A
)
−
P
2
(
A
)
)
2
=
1
2
ln
2
∥
P
1
−
P
2
∥
1
2
,
\begin{aligned} D\left(P_1 \| P_2\right) & \geq D\left(\hat{P}_1 \| \hat{P}_2\right) \\\\ & \geq \frac{4}{2 \ln 2}\left(P_1(A)-P_2(A)\right)^2 \\\\ & =\frac{1}{2 \ln 2}\left\|P_1-P_2\right\|_1^2, \end{aligned}
D(P1∥P2)≥D(P^1∥P^2)≥2ln24(P1(A)−P2(A))2=2ln21∥P1−P2∥12,
变分距离
任意两个分布之间的变分距离定义为:
∥
P
1
−
P
2
∥
1
=
∑
a
∈
X
∣
P
1
(
a
)
−
P
2
(
a
)
∣
\left\|P_1-P_2\right\|_1=\sum_{a \in \mathcal{X}}\left|P_1(a)-P_2(a)\right|
∥P1−P2∥1=a∈X∑∣P1(a)−P2(a)∣记:
A
=
{
x
:
P
1
(
x
)
>
P
2
(
x
)
}
A=\left\{x: P_1(x)>P_2(x)\right\}
A={x:P1(x)>P2(x)}有
∥
P
1
−
P
2
∥
1
=
∑
x
∈
X
∣
P
1
(
x
)
−
P
2
(
x
)
∣
=
∑
x
∈
A
(
P
1
(
x
)
−
P
2
(
x
)
)
+
∑
x
∈
A
c
(
P
2
(
x
)
−
P
1
(
x
)
)
=
P
1
(
A
)
−
P
2
(
A
)
+
P
2
(
A
c
)
−
P
1
(
A
c
)
=
P
1
(
A
)
−
P
2
(
A
)
+
1
−
P
2
(
A
)
−
1
+
P
1
(
A
)
=
2
(
P
1
(
A
)
−
P
2
(
A
)
)
.
\begin{aligned} \left\|P_1-P_2\right\|_1 & =\sum_{x \in \mathcal{X}}\left|P_1(x)-P_2(x)\right| \\\\ & =\sum_{x \in A}\left(P_1(x)-P_2(x)\right)+\sum_{x \in A^c}\left(P_2(x)-P_1(x)\right) \\\\ & =P_1(A)-P_2(A)+P_2\left(A^c\right)-P_1\left(A^c\right) \\\\ & =P_1(A)-P_2(A)+1-P_2(A)-1+P_1(A) \\\\ & =2\left(P_1(A)-P_2(A)\right) . \end{aligned}
∥P1−P2∥1=x∈X∑∣P1(x)−P2(x)∣=x∈A∑(P1(x)−P2(x))+x∈Ac∑(P2(x)−P1(x))=P1(A)−P2(A)+P2(Ac)−P1(Ac)=P1(A)−P2(A)+1−P2(A)−1+P1(A)=2(P1(A)−P2(A)).