Reference:
Elements of Information Theory, 2nd Edition
Slides of EE4560, TUD
Content
Introduction
-
We know how to encode a source X X X. A rate R ≥ H ( X ) R\ge H(X) R≥H(X) is sufficient.
-
If there are two sources ( X , Y ) (X,Y) (X,Y), a rate R ≥ H ( X , Y ) R\ge H(X,Y) R≥H(X,Y) is sufficient.
-
But what if the X X X and Y Y Y sources must be described separately for some user who wishes to reconstruct both X X X and Y Y Y?
-
Clearly, by separately encoding X X X and Y Y Y, it is seen that a rate R = R x + R y ≥ H ( X ) + H ( Y ) R=R_x+R_y\ge H(X)+H(Y) R=Rx+Ry≥H(X)+H(Y) is sufficient.
-
However, in a surprising and fundamental paper by Slepian and Wolf, it is shown that a total rate R = H ( X , Y ) R=H(X,Y) R=H(X,Y) is sufficient even for separate encoding of correlated sources.
-
Intuitively, since H ( X , Y ) = H ( X ) + H ( Y ∣ X ) H(X,Y)=H(X)+H(Y|X) H(X,Y)=H(X)+H(Y∣X), we can first encode source X X X at a rate R 1 ≥ H ( X ) R_1\ge H(X) R1≥H(X) after which we encode source Y Y Y, given X X X, at a rate R 2 ≥ H ( Y ∣ X ) R_2\ge H(Y|X) R2≥H(Y∣X).
More specifically,
- Using n H ( X ) n H(X) nH(X) bits we can encode X n X^{n} Xn efficiently, so that the decoder can reconstruct X n X^{n} Xn with arbitrarily low probability of error
- Associated with every x n x^{n} xn is a typical “fan” of y n y^{n} yn sequences that are jointly typical with the given x n , 2 n H ( Y ∣ X ) x^{n}, 2^{n H(Y | X)} xn,2nH(Y∣X) in total
- The encoder can send the index of the y n y^{n} yn within this typical fan for which he needs n H ( Y ∣ X ) n H(Y | X) nH(Y∣X) bits
- The decoder, also knowing x n x^{n} xn, can then construct the typical fan and hence reconstruct y n y^{n} yn
The graph of the whole process can be presented as
- But what if the Y Y Y encoder does not know which sequence x n x^n xn is encoded?
Slepian-Wolf Coding
Let ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , … \left(X_{1}, Y_{1}\right),\left(X_{2}, Y_{2}\right), \ldots (X1,Y1),(X2,Y2),… be a sequence of jointly distributed random variables i.i.d. ∼ p ( x , y ) \sim p(x, y) ∼p(x,y)
Definition 1 (Distributed source code):
A
(
(
2
n
R
1
,
2
n
R
2
)
,
n
)
\left(\left(2^{n R_{1}}, 2^{n R_{2}}\right), n\right)
((2nR1,2nR2),n) distributed source code for the joint sources
(
X
,
Y
)
(X, Y)
(X,Y) consists of two encoder maps
f
1
:
X
n
→
{
1
,
2
,
…
,
2
n
R
1
}
f
2
:
Y
n
→
{
1
,
2
,
…
,
2
n
R
2
}
\begin{array}{l} f_{1}: \mathcal{X}^{n} \rightarrow\left\{1,2, \ldots, 2^{n R_{1}}\right\} \\ f_{2}: \mathcal{Y}^{n} \rightarrow\left\{1,2, \ldots, 2^{n R_{2}}\right\} \end{array}
f1:Xn→{1,2,…,2nR1}f2:Yn→{1,2,…,2nR2}
and a decoder map
g
:
{
1
,
2
,
…
,
2
n
R
1
}
×
{
1
,
2
,
…
,
2
n
R
2
}
→
X
n
×
Y
n
g:\left\{1,2, \ldots, 2^{n R_{1}}\right\} \times\left\{1,2, \ldots, 2^{n R_{2}}\right\} \rightarrow \mathcal{X}^{n} \times \mathcal{Y}^{n}
g:{1,2,…,2nR1}×{1,2,…,2nR2}→Xn×Yn
Definition 2 (Probability of error):
The probability of error for a distributed source code is defined as
P
ϵ
(
n
)
=
Pr
(
g
(
f
1
(
X
n
)
,
f
2
(
Y
n
)
)
≠
(
X
n
,
Y
n
)
)
P_{\epsilon}^{(n)}=\operatorname{Pr}\left(g\left(f_{1}\left(X^{n}\right), f_{2}\left(Y^{n}\right)\right) \neq\left(X^{n}, Y^{n}\right)\right)
Pϵ(n)=Pr(g(f1(Xn),f2(Yn))=(Xn,Yn))
Definition 3 (Achievable):
A rate pair ( R 1 , R 2 ) \left(R_{1}, R_{2}\right) (R1,R2) is said to be achievable for a distributed source if there exists a sequence of ( ( 2 n R 1 , 2 n R 2 ) , n ) \left(\left(2^{n R_{1}}, 2^{n R_{2}}\right), n\right) ((2nR1,2nR2),n) distributed source codes with probability of error P ϵ ( n ) → 0 P_{\epsilon}^{(n)} \rightarrow 0 Pϵ(n)→0. The achievable rate region is the closure of the set of achievable rates.
Theorem 1 (Slepian-Wolf):
For a distributed source coding problem for the source
(
X
,
Y
)
(X, Y)
(X,Y) drawn i.i.d.
∼
p
(
x
,
y
)
\sim p(x, y)
∼p(x,y), the achievable rate region is given by
R
1
≥
H
(
X
∣
Y
)
R
2
≥
H
(
Y
∣
X
)
R
1
+
R
2
≥
H
(
X
,
Y
)
\begin{aligned} R_{1} & \geq H(X | Y) \\ R_{2} & \geq H(Y | X) \\ R_{1}+R_{2} & \geq H(X, Y) \end{aligned}
R1R2R1+R2≥H(X∣Y)≥H(Y∣X)≥H(X,Y)
[Example, Slides 17-18]
Random Binning
It is an encoding and decoding scheme that enables ( R 1 , R 2 ) (R_1,R_2) (R1,R2) to be achievable when R 1 + R 2 = H ( X , Y ) R_1+R_2=H(X,Y) R1+R2=H(X,Y) even if the Y Y Y encoder does not know which sequence x n x^n xn is encoded.
Encoding and Decoding Scheme
Encoding:
- For each sequence x n x^{n} xn, draw an index at random from { 1 , 2 , … , 2 n R } \left\{1,2, \ldots, 2^{n R}\right\} {1,2,…,2nR}.
- Sequences x n x^{n} xn having the same index are said to form a bin.
Decoding:
- Given a bin index, we look for a typical x n x^n xn sequence in the bin
- If there is one and only one typical x n x^n xn in the bin, we declare it to be the estimate x ^ n \hat x^n x^n; otherwise an error is declared
Error: Given a bin index
- the sequence in the bin is non-typical
- there is more than one typical sequence in the bin
We first prove that under this scheme, if
R
≥
H
(
X
)
R\ge H(X)
R≥H(X), the probability of error is arbitrarily small and the code achieves the same result as the code introduced by Shannon (typical set coding).
Pr
(
g
(
f
(
X
n
)
)
≠
X
n
)
=
Pr
(
A
ˉ
ϵ
(
n
)
)
+
∑
x
n
∈
A
ϵ
(
n
)
p
(
x
n
)
Pr
(
∃
x
~
n
≠
x
n
:
f
(
x
~
n
)
=
f
(
x
n
)
)
≤
ϵ
+
∑
x
n
∈
A
ϵ
(
n
)
p
(
x
n
)
∑
x
~
n
∈
A
ϵ
(
n
)
2
−
n
R
≤
ϵ
+
2
n
(
H
(
X
)
+
ϵ
)
2
−
n
R
≤
ϵ
′
\begin{aligned} \Pr\left(g\left(f\left(X^{n}\right)\right) \neq X^{n}\right)&=\operatorname{Pr}\left(\bar{A}_{\epsilon}^{(n)}\right)+\sum_{x^{n} \in A_{\epsilon}^{(n)}} p\left(x^{n}\right) \operatorname{Pr}\left(\exists \tilde{x}^{n} \neq x^{n}: f\left(\tilde{x}^{n}\right)=f\left(x^{n}\right)\right) \\ &\leq \epsilon+\sum_{x^{n} \in A_{\epsilon}^{(n)}} p\left(x^{n}\right) \sum_{\tilde{x}^{n} \in A_{\epsilon}^{(n)}} 2^{-n R} \\ &\leq \epsilon+2^{n(H(X)+\epsilon)} 2^{-n R} \\ &\leq \epsilon^{\prime} \end{aligned}
Pr(g(f(Xn))=Xn)=Pr(Aˉϵ(n))+xn∈Aϵ(n)∑p(xn)Pr(∃x~n=xn:f(x~n)=f(xn))≤ϵ+xn∈Aϵ(n)∑p(xn)x~n∈Aϵ(n)∑2−nR≤ϵ+2n(H(X)+ϵ)2−nR≤ϵ′
if
R
>
H
(
X
)
+
ϵ
R>H(X)+\epsilon
R>H(X)+ϵ and
n
n
n sufficiently large.
Remarks:
- The binning scheme does not require an explicit characterization of the typical set at the encoder; it is needed only at the decoder
- It is this property that enables this code to continue to work in the case of a distributed source
Outline of Proof: Achievability
- Random code generation: Assign every x n ∈ X n x^{n} \in \mathcal{X}^{n} xn∈Xn to one of 2 n R 1 2^{n R_{1}} 2nR1 bins independently according to a uniform distribution on { 1 , 2 , … , 2 n R 1 } . \left\{1,2, \ldots, 2^{n R_{1}}\right\} . {1,2,…,2nR1}. Similarly, randomly assign every y n ∈ Y n y^{n} \in \mathcal{Y}^{n} yn∈Yn to one of 2 n R 2 2^{n R_{2}} 2nR2 bins. Reveal the assignments f 1 f_{1} f1 and f 2 f_{2} f2 to both sender and receiver.
- Encoding: Encoder 1 sends the index of the bin to which x n x^{n} xn belongs. Encoder 2 sends the index of the bin to which y n y^{n} yn belongs.
- Decoding: Given the received index pair ( i , j ) (i, j) (i,j), declare ( x ^ n , y ^ n ) = \left(\hat{x}^{n}, \hat{y}^{n}\right)= (x^n,y^n)= ( x n , y n ) \left(x^{n}, y^{n}\right) (xn,yn) if there is one and only one pair of sequences ( x n , y n ) \left(x^{n}, y^{n}\right) (xn,yn) such that f 1 ( x n ) = i f_{1}\left(x^{n}\right)=i f1(xn)=i and f 2 ( y n ) = j f_{2}\left(y^{n}\right)=j f2(yn)=j and ( x n , y n ) ∈ A ϵ ( n ) \left(x^{n}, y^{n}\right) \in A_{\epsilon}^{(n)} (xn,yn)∈Aϵ(n). Otherwise, declare an error.
Let
(
X
i
,
Y
i
)
∼
p
(
x
,
y
)
\left(X_{i}, Y_{i}\right) \sim p(x, y)
(Xi,Yi)∼p(x,y). Define the events
E
0
=
{
(
x
n
,
y
n
)
∉
A
ϵ
(
n
)
}
E
1
=
{
∃
x
~
n
≠
x
n
:
f
1
(
x
~
n
)
=
f
1
(
x
n
)
and
(
x
~
n
,
y
n
)
∈
A
ϵ
(
n
)
}
E
2
=
{
∃
y
~
n
≠
y
n
:
f
2
(
y
~
n
)
=
f
2
(
y
n
)
and
(
x
n
,
y
~
n
)
∈
A
ϵ
(
n
)
}
E
3
=
{
∃
(
x
~
n
,
y
~
n
)
:
x
~
n
≠
x
n
,
y
~
n
≠
y
n
,
f
1
(
x
~
n
)
=
f
1
(
x
n
)
f
2
(
y
~
n
)
=
f
2
(
y
n
)
and
(
x
~
n
,
y
~
n
)
∈
A
ϵ
(
n
)
}
P
e
(
n
)
=
Pr
(
E
0
∪
E
1
∪
E
2
∪
E
3
)
≤
Pr
(
E
0
)
+
Pr
(
E
1
)
+
Pr
(
E
2
)
+
Pr
(
E
3
)
\begin{aligned} E_{0}=&\left\{\left(x^{n}, y^{n}\right) \notin A_{\epsilon}^{(n)}\right\} \\ E_{1}=&\left\{\exists \tilde{x}^{n} \neq x^{n}: f_{1}\left(\tilde{x}^{n}\right)=f_{1}\left(x^{n}\right) \text { and }\left(\tilde{x}^{n}, y^{n}\right) \in A_{\epsilon}^{(n)}\right\} \\ E_{2}=&\left\{\exists \tilde{y}^{n} \neq y^{n}: f_{2}\left(\tilde{y}^{n}\right)=f_{2}\left(y^{n}\right) \text { and }\left(x^{n}, \tilde{y}^{n}\right) \in A_{\epsilon}^{(n)}\right\} \\ E_{3}=&\left\{\exists\left(\tilde{x}^{n}, \tilde{y}^{n}\right): \tilde{x}^{n} \neq x^{n}, \tilde{y}^{n} \neq y^{n}, f_{1}\left(\tilde{x}^{n}\right)=f_{1}\left(x^{n}\right)\right.\\ &\left.f_{2}\left(\tilde{y}^{n}\right)=f_{2}\left(y^{n}\right) \text { and }\left(\tilde{x}^{n}, \tilde{y}^{n}\right) \in A_{\epsilon}^{(n)}\right\} \\ P_{e}^{(n)}=& \operatorname{Pr}\left(E_{0} \cup E_{1} \cup E_{2} \cup E_{3}\right) \\ \leq & \operatorname{Pr}\left(E_{0}\right)+\operatorname{Pr}\left(E_{1}\right)+\operatorname{Pr}\left(E_{2}\right)+\operatorname{Pr}\left(E_{3}\right) \end{aligned}
E0=E1=E2=E3=Pe(n)=≤{(xn,yn)∈/Aϵ(n)}{∃x~n=xn:f1(x~n)=f1(xn) and (x~n,yn)∈Aϵ(n)}{∃y~n=yn:f2(y~n)=f2(yn) and (xn,y~n)∈Aϵ(n)}{∃(x~n,y~n):x~n=xn,y~n=yn,f1(x~n)=f1(xn)f2(y~n)=f2(yn) and (x~n,y~n)∈Aϵ(n)}Pr(E0∪E1∪E2∪E3)Pr(E0)+Pr(E1)+Pr(E2)+Pr(E3)
Pr
(
E
1
)
=
∑
(
x
n
,
y
n
)
∈
A
ϵ
(
n
)
p
(
x
n
,
y
n
)
Pr
(
∃
x
~
n
≠
x
n
:
f
1
(
x
~
n
)
=
f
1
(
x
n
)
and
(
x
~
n
,
y
n
)
∈
A
ϵ
(
n
)
)
≤
∑
(
x
n
,
y
n
)
∈
A
ϵ
(
n
)
p
(
x
n
,
y
n
)
∑
x
~
n
:
(
x
~
n
,
y
n
)
∈
A
ϵ
(
n
)
2
−
n
R
1
≤
2
n
(
H
(
X
∣
Y
)
+
ϵ
)
2
−
n
R
1
≤
ϵ
′
\begin{aligned} \operatorname{Pr}\left(E_{1}\right) &=\sum_{\left(x^{n}, y^{n}\right) \in A_{\epsilon}^{(n)}} p\left(x^{n}, y^{n}\right) \operatorname{Pr}\left(\exists \tilde{x}^{n} \neq x^{n}: f_{1}\left(\tilde{x}^{n}\right)=f_{1}\left(x^{n}\right) \text { and }\left(\tilde{x}^{n}, y^{n}\right) \in A_{\epsilon}^{(n)}\right) \\ & \leq \sum_{\left(x^{n}, y^{n}\right) \in A_{\epsilon}^{(n)}} p\left(x^{n}, y^{n}\right) \sum_{\tilde{x}^{n}:\left(\tilde{x}^{n}, y^{n}\right) \in A_{\epsilon}^{(n)}} 2^{-n R_{1}} \\ & \leq 2^{n(H(X \mid Y)+\epsilon)} 2^{-n R_{1}} \\ & \leq \epsilon^{\prime} \end{aligned}
Pr(E1)=(xn,yn)∈Aϵ(n)∑p(xn,yn)Pr(∃x~n=xn:f1(x~n)=f1(xn) and (x~n,yn)∈Aϵ(n))≤(xn,yn)∈Aϵ(n)∑p(xn,yn)x~n:(x~n,yn)∈Aϵ(n)∑2−nR1≤2n(H(X∣Y)+ϵ)2−nR1≤ϵ′
if
R
1
>
H
(
X
∣
Y
)
+
ϵ
R_{1}>H(X \mid Y)+\epsilon
R1>H(X∣Y)+ϵ and
n
n
n sufficiently large.
Similarly, we find that for sufficiently large n n n, Pr ( E 2 ) < ϵ ′ \operatorname{Pr}\left(E_{2}\right)<\epsilon^{\prime} Pr(E2)<ϵ′ if R 2 > R_{2}> R2> H ( Y ∣ X ) H(Y |X) H(Y∣X) and Pr ( E 3 ) < ϵ ′ \operatorname{Pr}\left(E_{3}\right)<\epsilon^{\prime} Pr(E3)<ϵ′ if R 1 + R 2 > H ( X , Y ) R_{1}+R_{2}>H(X, Y) R1+R2>H(X,Y). Since Pr ( E 0 ) < ϵ \operatorname{Pr}\left(E_{0}\right)<\epsilon Pr(E0)<ϵ, we conclude that the probability of error P e ( n ) → 0 P_{e}^{(n)} \rightarrow 0 Pe(n)→0 as n → ∞ n \rightarrow \infty n→∞
Interpretation of Slepian-Wolf Coding
Consider the corner point of the rate region in Slepian-Wolf encoding, where R 1 = H ( X ) R_1=H(X) R1=H(X) and R 2 = H ( Y ∣ X ) R_2=H(Y|X) R2=H(Y∣X).
-
Instead of trying to determine the typical fan, the Y Y Y encoder randomly assigns to all y n y^{n} yn sequences ( 2 n H ( Y ) 2^{n H(Y)} 2nH(Y) in total) an index at random from { 1 , 2 , … , 2 n R 2 } \left\{1,2, \ldots, 2^{n R_{2}}\right\} {1,2,…,2nR2}
-
If the number of indices is high enough, then with high probability every element in the typical fan associated with x n x^{n} xn will have a unique index
-
For R 2 > H ( Y ∣ X ) R_{2}>H(Y | X) R2>H(Y∣X), this number is exponentially larger than the number of elements in the fan
-
The decoder, also knowing x n x^{n} xn, can construct the typical fan and the received Y Y Y index will uniquely determine the y n y^{n} yn sequence within the x n x^{n} xn fan
Interpretation:
How does the random binning scheme bypass the problem that the Y Y Y encoder does not know which sequence x n x^n xn is encoded?
Under the random binning scheme the decoder is able to decode x n x^n xn since the assignments f 1 f_{1} f1 is revealed to both sender and receiver. Therefore, the number of possible y n y^n yn is narrowed down from ∣ y n ∣ = 2 n H ( Y ) |y^n|=2^{nH(Y)} ∣yn∣=2nH(Y) to ∣ y n ∣ = 2 n H ( Y ∣ X ) |y^n|=2^{nH(Y|X)} ∣yn∣=2nH(Y∣X). The Y Y Y encoder can now focus on discriminate 2 n H ( Y ∣ X ) 2^{nH(Y|X)} 2nH(Y∣X) sequences.
We can conclude that the decoder knowing which sequence x n x^n xn is encoded is equivalent to the Y Y Y encoder knowing which sequence x n x^n xn is encoded in effect.