给定p(x)下I(X,Y)的凸性
在p(x)给定的情况下,我们考虑这样两个分布
p
1
(
x
,
y
)
与
p
2
(
x
,
y
)
p_1(x,y)与p_2(x,y)
p1(x,y)与p2(x,y)满足
p
1
(
x
,
y
)
=
p
(
x
)
p
1
(
y
∣
x
)
,
p
2
(
x
,
y
)
=
p
(
x
)
p
2
(
y
∣
x
)
p_1(x,y) = p(x)p_1(y|x),p_2(x,y) = p(x)p_2(y|x)
p1(x,y)=p(x)p1(y∣x),p2(x,y)=p(x)p2(y∣x)
则这两个分布的边缘分布就是
p
(
x
)
,
p
1
(
y
)
,
p
(
x
)
,
p
2
(
y
)
p(x),p_1(y),p(x),p_2(y)
p(x),p1(y),p(x),p2(y)
考虑这样的一个条件分布
p
λ
(
y
∣
x
)
=
λ
p
1
(
y
∣
x
)
+
(
1
−
λ
)
p
2
(
y
∣
x
)
(
0
≤
λ
≤
1
)
p_{\lambda}(y|x) = \lambda p_1(y|x)+(1-\lambda)p_2(y|x)(0 \le \lambda \le 1)
pλ(y∣x)=λp1(y∣x)+(1−λ)p2(y∣x)(0≤λ≤1)
那么对应的联合分布就是
p
λ
(
x
,
y
)
=
λ
p
1
(
x
,
y
)
+
(
1
−
λ
)
p
2
(
x
,
y
)
(
0
≤
λ
≤
1
)
p_{\lambda}(x,y) = \lambda p_1(x,y)+(1-\lambda)p_2(x,y)(0 \le \lambda \le 1)
pλ(x,y)=λp1(x,y)+(1−λ)p2(x,y)(0≤λ≤1)
同样边缘分布为
p
λ
(
y
)
=
λ
p
1
(
y
)
+
(
1
−
λ
)
p
2
(
y
)
(
0
≤
λ
≤
1
)
p_{\lambda}(y) = \lambda p_1(y)+(1-\lambda)p_2(y)(0 \le \lambda \le 1)
pλ(y)=λp1(y)+(1−λ)p2(y)(0≤λ≤1)
如果我们令
q
λ
(
x
,
y
)
=
p
(
x
)
p
λ
(
y
∣
x
)
q_{\lambda}(x,y) = p(x)p_{\lambda}(y|x)
qλ(x,y)=p(x)pλ(y∣x)
则
q
λ
(
x
,
y
)
=
λ
q
1
(
x
,
y
)
+
(
1
−
λ
)
q
2
(
x
,
y
)
(
0
≤
λ
≤
1
)
q_{\lambda}(x,y) = \lambda q_1(x,y)+(1-\lambda)q_2(x,y)(0 \le \lambda \le 1)
qλ(x,y)=λq1(x,y)+(1−λ)q2(x,y)(0≤λ≤1)
此时
I
(
X
;
Y
)
=
∑
x
∈
X
∑
y
∈
Y
p
λ
(
x
,
y
)
l
o
g
2
p
λ
(
y
∣
x
)
p
(
x
)
I(X;Y) = \sum_{x\in X}\sum_{y \in Y}p_{\lambda}(x,y)log_2^{\cfrac {p_{\lambda}(y|x)}{p(x)}}
I(X;Y)=x∈X∑y∈Y∑pλ(x,y)log2p(x)pλ(y∣x)
而·
D
(
p
λ
(
x
,
y
)
∣
∣
q
λ
(
x
,
y
)
)
=
∑
x
∈
X
∑
y
∈
Y
p
λ
(
x
,
y
)
l
o
g
2
p
λ
(
x
,
y
)
q
λ
(
x
,
y
)
=
∑
x
∈
X
∑
y
∈
Y
p
λ
(
x
,
y
)
l
o
g
2
p
λ
(
y
∣
x
)
p
(
x
)
=
I
(
X
;
Y
)
D(p_{\lambda}(x,y)||q_{\lambda}(x,y)) = \sum_{x\in X}\sum_{y \in Y}p_{\lambda}(x,y)log_2^{\cfrac {p_{\lambda}(x,y)}{q_{\lambda}(x,y)}} = \sum_{x\in X}\sum_{y \in Y} p_{\lambda}(x,y)log_2^{\cfrac {p_{\lambda}(y|x)}{p(x)}} = I(X;Y)
D(pλ(x,y)∣∣qλ(x,y))=x∈X∑y∈Y∑pλ(x,y)log2qλ(x,y)pλ(x,y)=x∈X∑y∈Y∑pλ(x,y)log2p(x)pλ(y∣x)=I(X;Y)
因为相对熵是个下凸函数,所以在p(x)给定时I(X;Y)也是下凸函数。