相对熵的凸性证明
相对熵的定义式:
D
(
p
∣
∣
q
)
=
∑
x
∈
X
p
(
x
)
log
2
p
(
x
)
q
(
x
)
D(p||q) = \sum_{x\in X} p(x)\log_2^{\cfrac {p(x)}{q(x)}}
D(p∣∣q)=x∈X∑p(x)log2q(x)p(x)
欲证明相对熵是下凸函数即证明不等式
D
(
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
∣
∣
λ
q
1
(
x
)
+
(
1
−
λ
q
2
(
x
)
)
)
≤
λ
D
(
p
1
(
x
)
∣
∣
q
1
(
x
)
)
+
(
1
−
λ
)
D
(
p
2
(
x
)
∣
∣
q
2
(
x
)
)
D(\lambda p_1(x)+(1-\lambda p_2(x))||\lambda q_1(x)+(1-\lambda q_2(x))) \le \lambda D(p_1(x)||q_1(x))+(1- \lambda)D(p_2(x)||q_2(x))
D(λp1(x)+(1−λp2(x))∣∣λq1(x)+(1−λq2(x)))≤λD(p1(x)∣∣q1(x))+(1−λ)D(p2(x)∣∣q2(x))成立
对不等式左项使用对数和不等式
D
(
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
∣
∣
λ
q
1
(
x
)
+
(
1
−
λ
q
2
(
x
)
)
)
=
∑
x
∈
X
(
(
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
)
log
2
(
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
(
λ
q
1
(
x
)
+
(
1
−
λ
q
2
(
x
)
)
D(\lambda p_1(x)+(1-\lambda p_2(x))||\lambda q_1(x)+(1-\lambda q_2(x))) = \sum_{x \in X}((\lambda p_1(x)+(1-\lambda p_2(x)))\log_2^{\cfrac {(\lambda p_1(x)+(1-\lambda p_2(x))}{(\lambda q_1(x)+(1-\lambda q_2(x))}}
D(λp1(x)+(1−λp2(x))∣∣λq1(x)+(1−λq2(x)))=x∈X∑((λp1(x)+(1−λp2(x)))log2(λq1(x)+(1−λq2(x))(λp1(x)+(1−λp2(x))
把这里面的
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
\lambda p_1(x)+(1-\lambda p_2(x))
λp1(x)+(1−λp2(x))看作同一 概率矢量的累加和则可以使用对数和不等式
(
(
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
)
log
2
(
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
(
λ
q
1
(
x
)
+
(
1
−
λ
q
2
(
x
)
)
≤
λ
p
1
(
x
)
l
o
g
2
λ
p
1
(
x
)
λ
q
1
(
x
)
+
(
1
−
λ
)
p
2
(
x
)
l
o
g
2
(
1
−
λ
)
p
2
(
x
)
(
1
−
λ
)
q
2
(
x
)
((\lambda p_1(x)+(1-\lambda p_2(x)))\log_2^{\cfrac {(\lambda p_1(x)+(1-\lambda p_2(x))}{(\lambda q_1(x)+(1-\lambda q_2(x))}} \le \lambda p_1(x)log_2^{\cfrac {\lambda p_1(x)}{\lambda q_1(x)}}+(1-\lambda)p_2(x)log_2^{\cfrac {(1-\lambda)p_2(x)}{(1-\lambda)q_2(x)}}
((λp1(x)+(1−λp2(x)))log2(λq1(x)+(1−λq2(x))(λp1(x)+(1−λp2(x))≤λp1(x)log2λq1(x)λp1(x)+(1−λ)p2(x)log2(1−λ)q2(x)(1−λ)p2(x)
对不等式两边求和则得到
∑
x
∈
X
(
(
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
)
log
2
(
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
(
λ
q
1
(
x
)
+
(
1
−
λ
q
2
(
x
)
)
≤
∑
x
∈
X
λ
p
1
(
x
)
l
o
g
2
λ
p
1
(
x
)
λ
q
1
(
x
)
+
∑
x
∈
X
(
1
−
λ
)
p
2
(
x
)
l
o
g
2
(
1
−
λ
)
p
2
(
x
)
(
1
−
λ
)
q
2
(
x
)
\sum_{x\in X} ((\lambda p_1(x)+(1-\lambda p_2(x)))\log_2^{\cfrac {(\lambda p_1(x)+(1-\lambda p_2(x))}{(\lambda q_1(x)+(1-\lambda q_2(x))}} \le \sum_{x\in X} \lambda p_1(x)log_2^{\cfrac {\lambda p_1(x)}{\lambda q_1(x)}}+\sum_{x\in X}(1-\lambda)p_2(x)log_2^{\cfrac {(1-\lambda)p_2(x)}{(1-\lambda)q_2(x)}}
x∈X∑((λp1(x)+(1−λp2(x)))log2(λq1(x)+(1−λq2(x))(λp1(x)+(1−λp2(x))≤x∈X∑λp1(x)log2λq1(x)λp1(x)+x∈X∑(1−λ)p2(x)log2(1−λ)q2(x)(1−λ)p2(x)
即
D
(
λ
p
1
(
x
)
+
(
1
−
λ
p
2
(
x
)
)
∣
∣
λ
q
1
(
x
)
+
(
1
−
λ
q
2
(
x
)
)
)
≤
λ
D
(
p
1
(
x
)
∣
∣
q
1
(
x
)
)
+
(
1
−
λ
)
D
(
p
2
(
x
)
∣
∣
q
2
(
x
)
)
D(\lambda p_1(x)+(1-\lambda p_2(x))||\lambda q_1(x)+(1-\lambda q_2(x))) \le \lambda D(p_1(x)||q_1(x))+(1- \lambda)D(p_2(x)||q_2(x))
D(λp1(x)+(1−λp2(x))∣∣λq1(x)+(1−λq2(x)))≤λD(p1(x)∣∣q1(x))+(1−λ)D(p2(x)∣∣q2(x))成立