Basics of error-correcting codes
Generation of wireless communications
Gneration | time | technic | max speed |
---|---|---|---|
1G | early 1980’s | Analog, FM | 12kbps |
2G | 1991 | digital, TDMA | 50kbps~1Mbps |
3G (include UMTS & CDMA 2000 & 3GPP) | 1998 | CDMA | 20Mbps |
4G (LTE) | 2008 | OFDM | 1Gbps |
5G | 2020 | up to 10Gbps |
communication system
channel is characterized by the transition probabilities
P
r
{
Y
=
y
∣
X
=
x
}
f
o
r
a
n
y
x
∈
X
,
y
∈
Y
x
=
{
0
,
1
}
,
y
=
R
Pr\{Y=y \,| \,X=x\} \,for\,\, any\,\, x\in X, \, y \in Y\\ x=\{0,1\},y=R
Pr{Y=y∣X=x}foranyx∈X,y∈Yx={0,1},y=R
in the case of continuous y, we use condition pdf
Channel model
-
BSC: binary symetric channel
capacity: bit per channel use, 1 − h 2 ( p ) 1-h_2(p) 1−h2(p), h 2 ( ) h_2() h2() is a function -
BEC: binary erasure channel
capacity: 1 − ε 1-\varepsilon 1−ε -
AWGN: ADDITIVE WHITE Gaussian Noise channel
x ∈ { − 1 , + 1 } , b i n a r y , o r c o n t i n u o u s x = R x\in \{-1,+1\}, binary, \\ or\,\, continuous\,\, x=\reals x∈{−1,+1},binary,orcontinuousx=R
x is subject to power constrain E [ X 2 ] = P E[X^2]=P E[X2]=P
P is power of transistor
capacity: 1 2 l g ( 1 + S N R ) = P δ 2 \frac 1 2 lg(1+SNR)=\frac P {\delta^2} 21lg(1+SNR)=δ2P
also W l g ( 1 + S N R ) W lg(1+SNR) Wlg(1+SNR) W:bandwidth -
Rayleigh fading channel
x ∈ X y ∈ R y = a x + n x\in X\,\,\,\, y \in \reals \,\,\,\, y=ax+n x∈Xy∈Ry=ax+n a: RV, obey Rayleigh distribution with scalar parameter τ 2 \tau^2 τ2
n ∼ N ( 0 , δ 2 ) n\thicksim \mathcal N (0,\delta^2) n∼N(0,δ2)
code design
Code: s structured subset of an ambient set, collection of all codewords.
Encoder: A mapping between the set of message and the set of codewords.
Decoder: Given an elementary
∈
A
\in A
∈A, (y is the received symbol or a sequence of such y’s), find the “most likely” codeword/message.
m: message
c: codeword
C: code
Minimize the prob of error
P
r
{
m
ˆ
≠
m
}
Pr\{\^{m}\not = m\}
Pr{mˆ=m} through structure of code
A natural structure with algebraic strutures to play with a linear subspacce to an ambient vector.
A linear code C of dimension k in A, here, F n F^n Fn is field: 0, 1
Each element in c ∈ C c\in C c∈C is represented as a vector of length n, c = ( c 1 , c 2 , c 3 , . . . , c n ) , c i ∈ F c=(c_1,c_2,c_3,...,c_n)\,\, ,c_i\in F c=(c1,c2,c3,...,cn),ci∈F, c is binary sequence, n is the length of the code.
( n , k ) (n,k) (n,k) code,n is block length, k is dimension of code, k ≤ n k \le n k≤n
Example:
Let c be an
(
n
,
n
−
1
)
(n,n-1)
(n,n−1) linear code as follows
c is one-parity code.
rate: bit per channel/symbol use
Rate of a code C of length n over an alphabet of size q q q: r a t e ( C ) = l o g q ∣ C ∣ n ∣ ∣ C ∣ = q k = k n q k : s i z e o f t h e c o d e o f d i m e n s i o n k q : p o s s i b l e n u m b e r o f c o d e s rate(C)=\frac {log_q|C|} n\,|_{|C|=q^k}\,=\frac k n\\ q^k:\,size\,\,of\,\,the\,\,code\,\,of\,\,dimension\,\,k\\ q: possible\,\,number\,\,of\,\,codes rate(C)=nlogq∣C∣∣∣C∣=qk=nkqk:sizeofthecodeofdimensionkq:possiblenumberofcodes
Hamming distance
d
H
(
x
,
y
)
d_H(x,y)
dH(x,y)
d
H
(
x
,
y
)
d_H(x,y)
dH(x,y)=number of positions(bits) in which
x
x
x and
y
y
y differ
x
:
x:
x:transmitted ,
y
:
y:
y:received
properties:
- d H ( x , y ) ≥ 0 d_H(x,y)\ge0 dH(x,y)≥0
- d H ( x , y ) = 0 ⟺ x = y d_H(x,y)=0\,\Longleftrightarrow\, x=y dH(x,y)=0⟺x=y
- d H ( x , y ) = d H ( y , x ) d_H(x,y)=d_H(y,x) dH(x,y)=dH(y,x)
- triangle inequality: d H ( x , z ) ≤ d H ( x , y ) + d H ( y , z ) d_H(x,z)\le d_H(x,y)+d_H(y,z) dH(x,z)≤dH(x,y)+dH(y,z)
Hamming weight: number of non-zero entries of
x
⃗
\vec{x}
x
Hamming weight at a vector
x
⃗
\vec{x}
x:
w
H
(
x
)
=
d
H
(
x
,
0
)
w_H(x)=d_H(x,0)
wH(x)=dH(x,0)
0
:
z
e
r
o
c
o
d
e
0: zero\,\,code
0:zerocode
Minimum distance of a code
d
m
i
n
(
c
)
=
d_{min}(c)=
dmin(c)=
m
i
n
x
,
x
′
∈
c
x
≠
x
′
min \atop {x,x'\in c \atop x\not =x'}
x=x′x,x′∈cmin
d
H
(
x
,
x
′
)
d_H(x,x')
dH(x,x′)
for linear code C, the d m i n ( c ) = d_{min}(c)= dmin(c)= m i n c ∈ C c ≠ 0 min \atop {c\in C \atop c\not =0} c=0c∈Cmin w H ( c ) w_H(c) wH(c)
if we want to find the minimum distance, just need to find the minimum distance of non-zero codeword to the all-zero codeword.
Theorem: (worse case guarantee) Let d = d m i n ( c ) d=d_{min}(c) d=dmin(c), then c can correct up to ∣ d − 1 2 ∣ |\frac {d-1} 2| ∣2d−1∣ errors.
Approach to design code
Construct codes with maximum distance, give a certain rate (or length an size)
Algebraic codes: Turbo Code (3G/4G used), LDPC and polar codes (in 5G)
linear code approach
Consider a basis for an (n,k) linear code C, which cover field
F
F
F, denoted by
c
1
,
c
2
,
.
.
.
,
c
k
c_1, c_2,...,c_k
c1,c2,...,ck
c
=
{
λ
1
c
1
+
λ
2
c
2
+
.
.
.
+
λ
k
c
k
∣
λ
i
∈
F
}
c=\{ \lambda_1c_1+\lambda_2c_2+...+\lambda_kc_k\, |\,\lambda_i\in F \}
c={λ1c1+λ2c2+...+λkck∣λi∈F}
Let
G
=
[
c
1
c
2
.
.
.
c
k
]
k
×
n
G=\begin{bmatrix} c_1 \\ c_2 \\...\\ c_k \end{bmatrix}_{k\times n}
G=
c1c2...ck
k×n , a generator matrix
for the code c
c
=
(
λ
1
,
λ
2
,
.
.
.
,
λ
k
)
×
G
c=(\lambda_1, \lambda_2,..., \lambda_k)\times G
c=(λ1,λ2,...,λk)×G
c
=
{
V
G
∣
V
∈
F
k
}
,
V
:
m
e
s
s
a
g
e
m
a
t
r
i
x
c=\{VG | V\in F^k\}, V:message\,\, matrix
c={VG∣V∈Fk},V:messagematrix
the generator matrix is not unique
encoding mapping:
V
→
V
G
V\rightarrow VG
V→VG
V
V
V:message of length
k
k
k,
k
k
k bit
V
G
VG
VG:encoded codeword
example:
one parity check code
(
x
1
,
x
2
,
.
.
.
,
x
n
−
1
)
→
(
x
1
,
x
2
,
.
.
.
,
x
n
−
1
,
∑
i
=
1
n
−
1
n
i
)
(x_1,x_2,...,x_{n-1})\rightarrow (x_1,x_2,...,x_{n-1},\sum_{i=1}^{n-1}n_i)
(x1,x2,...,xn−1)→(x1,x2,...,xn−1,i=1∑n−1ni)
(the left region of the dash line can be any number)
systematic encoder: every encoded codeword contains the original message as follows:
message=
(
u
1
,
u
2
,
.
.
.
,
u
k
)
(u_1,u_2,...,u_k)
(u1,u2,...,uk), codeword=
(
u
1
,
u
2
,
.
.
.
,
u
k
,
x
k
+
1
,
.
.
.
,
x
n
)
(u_1,u_2,...,u_k, x_{k+1},...,x_n)
(u1,u2,...,uk,xk+1,...,xn)
so
G
=
[
I
k
×
k
∣
A
k
×
(
n
−
k
)
]
k
×
n
G=\begin{bmatrix}I_{k\times k} |A_{k \times (n-k)} \end{bmatrix}_{k\times n}
G=[Ik×k∣Ak×(n−k)]k×n
no matter what matrix A is.
Theorem: Every linear code has a systematic encoder up to a permittion on the code bits, which can design generator matrix
For a code C with generator matrix
G
k
×
n
G_{k\times n}
Gk×n let
H
(
n
−
k
)
×
n
H_{(n-k)\times n}
H(n−k)×n denote the kernel of
G
k
×
n
G_{k\times n}
Gk×n,
G
H
T
=
0
k
×
(
n
−
k
)
GH^T=0_{k\times (n-k)}
GHT=0k×(n−k)
all rows of G are orthogonal to all rows of H
H: the parity-check matrix for c
Note: In binary field, non-zero vectors can be self-orthogonal. Any binary vector are even Hamming weight is self-orthogonal.
Example:
For on-parity check code C, C with
G
(
n
−
1
)
×
n
H
=
[
1
,
1
,
.
.
.
,
1
]
1
×
n
G_{(n-1)\times n}\,\,H=[1,1,...,1]_{1\times n}
G(n−1)×nH=[1,1,...,1]1×n. In general, for a systematic
G
=
[
I
k
×
k
∣
A
k
×
(
n
−
k
)
]
k
×
n
G=\begin{bmatrix} I_{k\times k}| A_{k\times (n-k)} \end{bmatrix}_{k\times n}
G=[Ik×k∣Ak×(n−k)]k×nwe have
H
=
[
−
A
T
∣
I
(
n
−
k
)
×
(
n
−
k
)
]
(
n
−
k
)
×
n
H=[-A^T\,|\,I_{(n-k)\times (n-k)}]_{(n-k)\times n}
H=[−AT∣I(n−k)×(n−k)](n−k)×n
Example:
Let C be a binary linear (6, 3) code with the generator matrix
G
=
[
1
0
1
1
0
1
0
1
0
1
1
0
0
0
1
0
0
1
]
G=\begin{bmatrix} 1\;0\;1\;1\;0\;1\\ 0\;1\;0\;1\;1\;0\\ 0\;0\;1\;0\;0\;1\\ \end{bmatrix}
G=
101101010110001001
a. Find a systematic generator matrix for C.
systematic form:
G
s
y
s
=
[
1
0
0
1
0
0
0
1
0
1
1
0
0
0
1
0
0
1
]
G_{sys}=\begin{bmatrix} 1\;0\;0\;1\;0\;0\\ 0\;1\;0\;1\;1\;0\\ 0\;0\;1\;0\;0\;1\\ \end{bmatrix}
Gsys=
100100010110001001
b. Find a parity-check matrix for C.
G
s
y
s
=
[
I
k
×
k
∣
A
k
×
(
n
−
k
)
]
k
×
n
H
=
[
−
A
T
∣
I
(
n
−
k
)
×
(
n
−
k
)
]
(
n
−
k
)
×
n
H
=
[
1
1
0
1
0
0
0
1
0
0
1
0
0
0
1
0
0
1
]
G_{sys}=\begin{bmatrix} I_{k\times k}| A_{k\times (n-k)} \end{bmatrix}_{k\times n}\\ H=[-A^T\,|\,I_{(n-k)\times (n-k)}]_{(n-k)\times n}\\ H=\begin{bmatrix} 1\;1\;0\;1\;0\;0\\ 0\;1\;0\;0\;1\;0\\ 0\;0\;1\;0\;0\;1\\ \end{bmatrix}
Gsys=[Ik×k∣Ak×(n−k)]k×nH=[−AT∣I(n−k)×(n−k)](n−k)×nH=
110100010010001001
c. What is the minimum distance of C?
The minimum distance is at least two, since there is no zero column in
H
H
H
And we do have a codeword of weight 2(the third row of
G
G
G),
d
m
i
n
(
C
)
=
2
d_{min}(C)=2
dmin(C)=2
Lemma properties: let c be a linear (n,k) code, with parity-check matrix H, then we have c ∈ C ⟺ H C T = 0 c\in C \Longleftrightarrow HC^T=0 c∈C⟺HCT=0
Graphical model representation of decoding, message passing algorithms
linear Code C
parity check matrix
H
(
n
−
k
)
×
n
H_{(n-k)\times n}
H(n−k)×n
n
n
n: block length
k
k
k: # of information bit
c
∈
C
⇔
H
C
T
=
0
c\in C \Leftrightarrow HC^T=0
c∈C⇔HCT=0
each row of H is parity checkequation
For any
y
∈
F
y\in F
y∈F, the syndrome of
y
y
y with respect to the code C with its parity check matrix H is define as
H
y
H^y
Hy, syndrome of y.
H
y
T
Hy^T
HyT: matrix ,size
(
n
−
k
)
×
1
(n-k)\times 1
(n−k)×1
number of possible symdroms:
2
n
−
k
2^{n-k}
2n−k
H
(
a
i
+
c
j
)
T
=
H
a
i
t
+
H
c
j
t
H(a_i+c_j)^T=Ha^t_i+Hc^t_j
H(ai+cj)T=Hait+Hcjt
where
H
c
j
t
=
0
Hc^t_j=0
Hcjt=0
Let S 1 , S 2 , . . . , S 2 n − k S_1, S_2,...,S_{2^{n-k}} S1,S2,...,S2n−k denote all possible syndromes, also let a i a_i ai be the minimum weight vector with H a i + = S i Ha^+_i=S_i Hai+=Si
Coset leader | standard array | syndromes |
---|---|---|
a 1 a_1 a1 | a 1 + c 1 . . . a 1 + c 2 k a_1+c_1\;\;...\;\;a_1+c_{2^k} a1+c1...a1+c2k | S 1 S_1 S1 |
a 2 a_2 a2 | a 2 + c 1 . . . a 2 + c 2 k a_2+c_1\;\;...\;\;a_2+c_{2^k} a2+c1...a2+c2k | S 2 S_2 S2 |
. . . ... ... | . . . ... ... | |
a 2 n − k a_{2^{n-k}} a2n−k | a 2 n − k + c 1 . . . a 2 n − k + c 2 k a_{2^{n-k}}+c_1\;\;...\;\;a_{2^{n-k}}+c_{2^k} a2n−k+c1...a2n−k+c2k | S 2 n − k S_{2^{n-k}} S2n−k |
c 1 , c 2 . . . , c k c_1,c_2...,c_k c1,c2...,ckdenote all the code words, so standard array is the possible y that can be received.
Syndrome decoding - only bitflip error
y y y :received vector (binary)
- compute the syndrome of y y y - H y T Hy^T HyT
- locate S 1 = H y T S_1=Hy^T S1=HyT in the standard array with coset leader a i a_i ai
- output codeword c = y − a i c=y-a_i c=y−ai, a i a_i ai:error pattern
The syndrome decoder is a minimum distance decoding, which mapping y y y to teh closest codeword. Let d m i n ( C ) = d d_{min}(C)=d dmin(C)=d, Then all binary vector of weight up to ∣ d − 1 2 ∣ |\frac {d-1}2| ∣2d−1∣ will be among coset leaders
Maximum likelihood decoder
consider a BSC(P ), P<0.5
P
r
{
r
e
c
e
i
v
i
n
g
y
∣
c
i
s
t
r
a
n
s
m
i
t
t
e
d
}
Pr\{receiving\; y \;| \;c\; is\; transmitted\}
Pr{receivingy∣cistransmitted}
w
=
w
H
(
y
−
c
)
w=w_H(y-c)
w=wH(y−c), number of position in which
y
y
y and
c
c
c are different,
w
=
p
w
(
1
−
p
)
n
−
w
=
(
p
1
−
p
)
w
(
1
−
p
)
n
w=p^w(1-p)^{n-w}=(\frac p {1-p})^w(1-p)^n
w=pw(1−p)n−w=(1−pp)w(1−p)n
ML decoder
⇔
\Leftrightarrow
⇔ maximize the probability
P
r
{
y
∣
m
i
n
(
c
)
}
Pr\{y | min(c)\}
Pr{y∣min(c)}
⇔
\Leftrightarrow
⇔minimize
w
w
w
⇔
\Leftrightarrow
⇔ minimize distance decoder
⇔
\Leftrightarrow
⇔ syndrome decoder
for BSC, these decoder are the same
Note that ML decoding has exponential (in n) complexity
Also syndrom decoding needs to search with in an array of exponential size
⇒
\Rightarrow
⇒ exponential complexity
LDPC Code: A low-density parity check code is a binary, linear block code for which the parity-check mtrix is sparse. (both row and coloum be sparse in terms of # of 1’s)
A regular LDPC code, has an equal # of 1’s in each row w r w_r wr and equal # of 1’s in each colum w c w_c wc
Note that w c ⋅ n = w r ⋅ m w_c\cdot n=w_r\cdot m wc⋅n=wr⋅m for H m × n H_{m\times n} Hm×n
with
m
≥
n
−
k
m \ge n-k
m≥n−k for an (n,k) code, this code is refer to as a (
w
c
,
w
r
w_c, w_r
wc,wr) regular LDPC code $$
Example:
A (2,4) regular LDPC code ,n=10, m=5, k=6,
w
c
=
2
w_c=2
wc=2,
w
r
=
4
w_r=4
wr=4
H
=
[
1
1
1
1
0
0
0
0
0
0
1
0
0
0
1
1
1
0
0
0
0
1
0
0
1
0
0
1
1
0
0
0
1
0
0
1
0
1
0
1
0
0
0
1
0
0
1
0
1
1
]
5
×
10
H=\begin{bmatrix} 1&1&1&1&0&0&0&0&0&0 \\ 1&0&0&0&1&1&1&0&0&0 \\ 0&1&0&0&1&0&0&1&1&0 \\ 0&0&1&0&0&1&0&1&0 &1\\ 0&0&0&1&0&0&1&0&1&1 \\ \end{bmatrix}_{5\times 10}
H=
11000101001001010001011000101001001001100010100011
5×10
r
a
n
k
(
H
)
=
n
−
k
=
4
,
k
=
n
−
r
a
n
k
(
H
)
rank(H)=n-k=4,\;\;k=n-rank(H)
rank(H)=n−k=4,k=n−rank(H)
Gallager’s early work, Gallager’s decoder
There exists a sequence of LDPC codes(regular) with increasing length and positive rate k / n > 0 k/n >0 k/n>0, positive d m i n / n > 0 d_{min}/n >0 dmin/n>0
Gallager’s decoder(hard-desicion bit flipping decoder)
- fix a threshold S (to be optimized)
- compute the syndrome bits
S
j
S_j
Sj’s,
H y T = [ S 1 S 2 . . . S m ] y T = [ . . . i − t h . . . ] Hy^T=\begin{bmatrix} S_1 \\ S_2 \\ ... \\S_m\\ \end{bmatrix} \;\;\;\;\;\;\; y^T=\begin{bmatrix} ...\\ i-th \\... \end{bmatrix} HyT= S1S2...Sm yT= ...i−th...
y is received vector - of all S j S_j Sj’s are 0, then stops
- otherwise bit i, i=1, 2, …, n
g i g_i gi: number of non-zero syndroms that involve the i-th bit - A = { i = g i > S } A=\{i=g_i>S\} A={i=gi>S}
- flip bit i for all i in A and back to step 2
Belief propagation Algorithm
Belief propagation (BP) is a type of message passing algorithm. It uses a Tanner graph representation of the code (A bi-partite graph)
one part: A node for each information bit (variable node), in other part A check node for each parity
There is an edge connection f to
x
i
x_i
xi if the (i,j) entry in matrix H is one
for instane over AWGN, y i = ( 2 x i − 1 ) + n i , n i ∼ N ( 0 , σ 2 ) y_i=(2{x_i}-1)+n_i,\;n_i\sim\mathcal N(0,\sigma^2) yi=(2xi−1)+ni,ni∼N(0,σ2)
if
f
1
f_1
f1 connect to
x
1
,
x
2
,
x
3
x_1,\;x_2,\;x_3
x1,x2,x3, then
x
1
+
x
2
+
x
3
=
0
x_1+x_2+x_3=0
x1+x2+x3=0
x
i
+
x
i
′
+
x
i
′
′
+
.
.
.
=
0
x_i+x_{i'}+x_{i''}+...=0
xi+xi′+xi′′+...=0
BP algorithm is an iterative decoding algorithm
In each iteration
- Each variable node sends a message to each check node
- each check node sends a message to each variable node
- each variable node update its ‘belief’ about x i x_i xi
Goal of decoding : compute
P
(
x
i
=
0
∣
y
1
,
y
2
,
.
.
.
y
n
a
n
d
a
l
l
p
a
r
i
t
y
b
i
t
s
b
e
i
n
g
"
0
"
)
P(x_i=0 \,|\,y_1,y_2,...y_n \, and\;all\;parity\;bits\;being\;"0")
P(xi=0∣y1,y2,...ynandallparitybitsbeing"0")
Also called:bit map decoder
q
i
j
(
x
)
=
P
(
x
i
=
x
∣
y
i
,
a
l
l
t
h
e
e
x
t
r
i
n
s
i
c
i
n
f
o
r
m
a
t
i
o
n
p
a
s
s
e
d
t
o
x
i
f
r
o
m
f
j
)
q_{ij}(x)=P(x_i=x \,|\,y_i, all\; the \;extrinsic\; information\; passed \;to\; x_i \;from \;f_j )
qij(x)=P(xi=x∣yi,alltheextrinsicinformationpassedtoxifromfj)
r
j
i
(
x
)
=
P
(
p
a
r
i
t
y
b
i
t
f
j
i
s
s
a
t
i
s
f
i
e
d
∣
x
i
=
x
,
o
t
h
e
r
b
i
t
s
X
i
′
′
s
c
o
n
n
e
c
t
e
d
t
o
f
j
(
o
t
h
e
r
t
h
a
n
X
i
)
a
r
e
d
i
s
t
r
i
b
u
t
e
d
w
i
t
h
q
i
′
,
j
)
r_{ji}(x)=P(parity \;bit\; f_j\; is \;satisfied\;|\;x_i=x, other \;bits\; X_{i'}'s \;connected \; to \;f_j\;(other\;than\;X_i)\;are\;distributed\;with \;q_{i',j} )
rji(x)=P(paritybitfjissatisfied∣xi=x,otherbitsXi′′sconnectedtofj(otherthanXi)aredistributedwithqi′,j)
How to compute
q
,
r
q,r
q,r
initialization:
q
i
,
j
(
x
)
=
P
(
X
i
=
x
∣
Y
i
=
y
i
)
,
x
∈
{
0
,
1
}
q_{i,j}(x)=P(X_i=x\,|\,Y_i=y_i),\;x\in \{0,1\}
qi,j(x)=P(Xi=x∣Yi=yi),x∈{0,1}
ratio P ( X i = 0 ∣ Y i = y i ) P ( X i = 1 ∣ Y i = y i ) \frac {P(X_i=0\,|\,Y_i=y_i)}{P(X_i=1\,|\,Y_i=y_i)} P(Xi=1∣Yi=yi)P(Xi=0∣Yi=yi) is likelihood ratio for making decision. In practice, we work with the log(likelihood ratio) LLR, if LLR is positive, ratio>1, x i = 0 x_i=0 xi=0
Notations:
P
i
=
P
(
X
i
=
1
∣
Y
i
=
y
i
)
∼
P_i=P(X_i=1\,|\,Y_i=y_i)\sim
Pi=P(Xi=1∣Yi=yi)∼
L
(
X
i
)
L(X_i)
L(Xi) in the LLR domain
R
j
∼
R_j\sim
Rj∼ indices of 1(s) in row j of H
C
i
∼
C_i\sim
Ci∼ indices of 1(s) in colum i of H
R
j
\
i
∼
R_{j \backslash i} \sim
Rj\i∼
R
j
R_j
Rj exclude i (for example, row 1 is [0 1 1 0 1],
R
1
\
2
=
{
3
,
5
}
R_{1 \backslash 2}=\{3,5\}
R1\2={3,5} )
Lemma: A = ( a 1 , a 2 , . . . , a L ) A=(a_1,a_2,...,a_L) A=(a1,a2,...,aL) of independent binary random variable with P ( a i = 1 ) = P i P(a_i=1)=P_i P(ai=1)=Pi, Then we have, look at P ( ∑ i = 1 L a i = 0 ) = 1 2 + 1 2 ∏ i = 1 L ( 1 − 2 P i ) P ( ∑ i = 1 L a i = 1 ) = 1 2 − 1 2 ∏ i = 1 L ( 1 − 2 P i ) P(\sum_{i=1}^L a_i=0)=\frac 1 2+\frac 1 2 \prod^L_{i=1}(1-2P_i)\\ P(\sum_{i=1}^L a_i=1)=\frac 1 2-\frac 1 2 \prod^L_{i=1}(1-2P_i) P(i=1∑Lai=0)=21+21i=1∏L(1−2Pi)P(i=1∑Lai=1)=21−21i=1∏L(1−2Pi)
message passing: r j , i ( 0 ) = 1 2 + 1 2 ∏ i ∈ R j \ i ( 1 − 2 q i ′ , j ( 1 ) ) r j , i ( 1 ) = 1 − r j , i ( 0 ) q i , j ( 0 ) q i , j ( 1 ) = ( 1 − P i ) P i ∏ j ′ ∈ C i \ j r j ′ , i ( 0 ) r j ′ , i ( 1 ) L ( q i , j ) = l o g ( q i , j ( 0 ) q i , j ( 1 ) ) L ( r j , i ) = l o g ( r j , i ( 0 ) r j , i ( 1 ) ) ⇒ { L ( q i , j ) = L ( X i ) + ∑ j ′ ∈ c i \ j L ( r j ′ , i ) L ( r j , i ) = 2 t a n h − 1 ( ∏ i ∈ R j \ i t a n h ( 1 2 L ( q i ′ , j ) ) ) u p d a t e b e l i e f o f X i ′ s L n e w ( X i ) = L ( x i ) + ∑ j ∈ c i L ( r j , i ) r_{j,i}(0)=\frac 1 2 +\frac 1 2 \prod_{i \in R_{j \backslash i}}(1-2q_{i',j}(1))\\ r_{j,i}(1)=1-r_{j,i}(0)\\ \frac{q_{i,j}(0)}{q_{i,j}(1)}=\frac {(1-P_i)}{P_i} \prod_{j'\in C_{i\backslash j}}\frac {r_{j',i(0)}}{r_{j',i}(1)} \\ L(q_{i,j})=log(\frac {q_{i,j}(0)}{q_{i,j}(1)})\\ L(r_{j,i})=log(\frac {r_{j,i}(0)}{r_{j,i}(1)})\\ \Rightarrow \begin{cases} L(q_{i,j})=L(X_i)+\sum_{j'\in c_{i\backslash j}} L(r_{j',i})\\ L(r_{j,i})=2tanh^{-1}(\prod _{i \in R_{j \backslash i}} tanh(\frac 1 2 L(q_{i',j})))\\ update \; belief \;of \;X_i \, {'s} \;\;L_{new}(X_i)=L(x_i)+\sum_{j\in c_i} L(r_{j,i}) \end{cases} rj,i(0)=21+21i∈Rj\i∏(1−2qi′,j(1))rj,i(1)=1−rj,i(0)qi,j(1)qi,j(0)=Pi(1−Pi)j′∈Ci\j∏rj′,i(1)rj′,i(0)L(qi,j)=log(qi,j(1)qi,j(0))L(rj,i)=log(rj,i(1)rj,i(0))⇒⎩ ⎨ ⎧L(qi,j)=L(Xi)+∑j′∈ci\jL(rj′,i)L(rj,i)=2tanh−1(∏i∈Rj\itanh(21L(qi′,j)))updatebeliefofXi′sLnew(Xi)=L(xi)+∑j∈ciL(rj,i)
The step also can be written as
α
i
,
j
=
S
i
g
n
(
L
(
q
i
,
j
)
)
β
i
,
j
=
∣
L
(
q
i
,
j
)
∣
ϕ
(
n
)
=
l
o
g
(
e
n
+
1
e
n
−
1
)
ϕ
i
s
s
e
l
f
i
n
v
e
r
s
e
:
ϕ
−
1
=
ϕ
L
(
r
j
,
i
)
=
∏
i
′
∈
R
j
\
i
α
i
′
,
j
ϕ
−
1
(
∑
i
′
∈
R
j
\
i
ϕ
(
β
i
′
,
j
)
)
\alpha_{i,j}=Sign(L(q_{i,j}))\\ \beta_{i,j}=|L(q_{i,j})|\\ \phi(n)=log(\frac {e^n+1}{e^n-1})\\ \phi \;\;is\;\; selfinverse:\phi^{-1}=\phi \\ L(r_{j,i})=\prod_{i' \in R_{j\backslash i}} \alpha_{i',j} \phi^{-1}(\sum_{i'\in R_{j\backslash i}}\phi(\beta_{i',j}))
αi,j=Sign(L(qi,j))βi,j=∣L(qi,j)∣ϕ(n)=log(en−1en+1)ϕisselfinverse:ϕ−1=ϕL(rj,i)=i′∈Rj\i∏αi′,jϕ−1(i′∈Rj\i∑ϕ(βi′,j))
Min-Sum approximation Lapproximate with
m
i
n
i
′
∈
R
j
\
i
min \atop{i'\in R_{j\backslash i}}
i′∈Rj\imin
β
i
′
,
j
\beta_{i',j}
βi′,j, the result smaller than
m
i
n
i
′
∈
R
j
\
i
min \atop{i'\in R_{j\backslash i}}
i′∈Rj\imin
β
i
′
,
j
\beta_{i',j}
βi′,j
offset Min-Sum approximation
L
(
r
j
,
i
)
=
∏
i
′
∈
R
j
\
i
α
i
′
,
j
(
L(r_{j,i})=\prod _{i'\in R_{j\backslash i}} \alpha_{i',j} (
L(rj,i)=∏i′∈Rj\iαi′,j(
m
i
n
i
′
∈
R
j
\
i
min \atop{i'\in R_{j\backslash i}}
i′∈Rj\imin
β
i
′
,
j
−
α
)
\beta_{i',j}-\alpha)
βi′,j−α)
α
\alpha
α: constant to be optimized per application
Example:
BP decoding with Min-Sum approximation A(2,3) regular LDPC code,
H
=
[
1
1
1
0
0
0
1
0
0
1
1
0
0
1
0
1
0
1
0
0
1
0
1
1
]
4
×
6
n
=
6
,
m
=
4
,
k
=
3
,
w
c
=
2
,
w
r
=
3
r
a
n
k
(
H
)
=
3
⟹
k
=
6
−
3
=
3
H=\begin{bmatrix} 1 &1& 1& 0& 0 &0 \\ 1 &0&0&1&1&0\\ 0 &1&0&1&0&1\\ 0 &0&1&0&1&1\\ \end{bmatrix}_{4\times 6}\\ n=6,\;m=4,\;k=3,\;w_c=2,\;w_r=3\\ rank(H)=3\implies k=6-3=3
H=
110010101001011001010011
4×6n=6,m=4,k=3,wc=2,wr=3rank(H)=3⟹k=6−3=3
Tunner graph representation
“1” in H matrix means having connection between check node and variable node.
suppose we have
L
(
X
i
)
=
−
1
,
2
,
3
,
−
4
,
4
,
1
f
o
r
i
=
1
,
2
,
3
,
4
,
5
,
6
L(X_i)=-1,2,3,-4,4,1\\ for\; i=1,2,3,4,5,6
L(Xi)=−1,2,3,−4,4,1fori=1,2,3,4,5,6
what is the updated belife of
X
i
X_i
Xi after on iteration of BP with Min-Sum approximation
write down
q
i
,
j
q_{i,j}
qi,j matching the connection
q
i
,
j
=
L
(
X
i
)
q_{i,j}=L(X_i)
qi,j=L(Xi) (in black, ingnore the arrow point, the
q
i
,
j
q_{i,j}
qi,j is the message from “circle” to “square”)
find
r
j
,
i
r_{j,i}
rj,i (number in red)
updated belief
L
(
n
e
w
)
(
X
i
)
L^{(new)}(X_i)
L(new)(Xi)
complexity:BP can be used (in principle) to decode any linear code given H. For LDPC code of “constant” degree(with respect number), the complexity of each iteration is O ( n ) O(n) O(n). Otherwise for a general code it’s O ( n 2 ) O(n^2) O(n2)
The exact LLR calculation (max-product) is rather complex and is often approximated but approximation works well when there are only “a few” terms (in the sum of ϕ ′ s \phi {'s} ϕ′s)
length of the shortest cycle=
2
l
2l
2l, the LLR equations hold up until
l
=
1
l=1
l=1 iterations
Another issue with BP for general code is the Tanner representation is dense
⇒
\Rightarrow
⇒ it will have too many short cycles. And short cycles adversely affect the performance of BP, since the independence of
r
j
,
i
r_{j,i}
rj,i's (for a fixed i) or
q
i
,
j
q_{i,j}
qi,j (for fixed j) would be violate
- when to stop
【In practice, after a fixed number of iterations(usually in the range between 5 and 20) - early stopping
【Maybe check H to see if parity-check equations are satified after making hard decision s on X i X_i Xi’s (according to update beliefs) - drawback
【expensive
CRC: cyclic redundancy check
An
m
×
n
m\times n
m×n partiy-check matrix, LLR=
l
o
g
(
P
(
X
i
=
0
∣
Y
j
′
s
)
P
(
X
i
=
1
∣
Y
j
′
s
)
)
log(\frac{P(X_i=0|Y_j's)}{P(X_i=1|Y_j's)})
log(P(Xi=1∣Yj′s)P(Xi=0∣Yj′s))
if
L
L
R
>
0
,
X
i
^
=
0
LLR>0,\;\hat{X_i} =0
LLR>0,Xi^=0
if
L
L
R
<
0
,
X
i
^
=
1
LLR<0,\;\hat{X_i} =1
LLR<0,Xi^=1
this step we called: making the hard decision
matrix
[
X
1
^
X
2
^
.
.
.
X
n
^
]
\begin{bmatrix} \hat{X_1}\\ \hat{X_2}\\ ...\\ \hat{X_n} \end{bmatrix}
X1^X2^...Xn^
if this matrix equals to 0, the decoder is done, output
X
i
^
\hat{X_i}
Xi^ 's
if does not equal to 0, still need to continue
This process is complex - comparable complexity to one iteration of BP DECODING
so we want Alternative solution (to stop), called CRC, cyclic redundancy check.
A few extra bits of redundancy (8,12,16,24) using a cyclic code - an algebraic code.
if CRC= 8 bit,
H
′
=
[
1001..
01001
.
.
.
.
]
8
×
n
H'=\begin{bmatrix} 1 0 0 1..\\ 01 0 0 1\\ .... \end{bmatrix}_{8\times n}
H′=
1001..01001....
8×n
each row of matrix H’ like a cyclic shift, CRC length=
l
(
8
)
l(8)
l(8)
overall H of the code + CRC =
[
H
H
′
]
(
m
+
l
)
×
n
\begin{bmatrix}H\\H'\end{bmatrix}_{(m+l)\times n}
[HH′](m+l)×n
(n,k)linear code +l bits of CRC, # of information bits =
k
−
l
k-l
k−l
[ H ( n − k ) × n H l × n ′ ] ( n − k + l ) × n \begin{bmatrix} H_{(n-k)\times n}\\H'_{l\times n} \end{bmatrix}_{(n-k+l)\times n} [H(n−k)×nHl×n′](n−k+l)×n
we can check CRC equation at the end of each iteration,
if CRC pass: stop the decoder, is CRC not pass: next iteration
At the end of decode , we also check CRC to see if we have read a “valid” codeword" ©
valid codeword:
H
C
T
=
0
(
v
a
l
i
d
c
o
d
e
)
H
′
C
T
=
0
(
v
a
l
i
d
C
R
C
)
HC^T=0(valid\;code)\;\;H'C^T=0(valid\;CRC)
HCT=0(validcode)H′CT=0(validCRC)
probability of CRC failure=
1
2
l
\frac 1 2 l
21l
Transport block
each codeblock has its own CRC
the entive transport block was another CRC
LDPC code over BEC
when decoding over BEC, LLRs do not matter as each coded bit is either known or erased.
q
i
j
=
{
X
i
if
X
i
i
s
k
n
o
w
n
e
if
X
i
i
s
e
r
a
s
e
d
q_{ij}= \begin{cases} X_i &\text{if } X_i \;is\; known \\ e &\text{if } X_i \;is \;erased \end{cases}
qij={Xieif Xiisknownif Xiiserased
r
j
i
=
∑
i
∈
R
j
\
i
X
i
=
{
k
n
o
w
n
if
a
l
l
X
i
(
i
∈
R
j
\
i
)
a
r
e
k
n
o
w
n
e
o
t
h
e
r
w
i
s
e
r_{ji}=\sum_{i\in R_{j\backslash i}} X_i= \begin{cases} known &\text{if } all \;X_i(i\in R_{j\backslash i}) \;are\; known \\ e &otherwise \end{cases}
rji=i∈Rj\i∑Xi={knowneif allXi(i∈Rj\i)areknownotherwise
update belief if X i X_i Xi is erased, X i X_i Xi is known if at least one of r j i r_{ji} rji's, j ∈ c i j\in c_i j∈ci is known ⇒ \Rightarrow ⇒ X i X_i Xi=known r j , i r_{j,i} rj,i, otherwise it remain erased
Example:
A stopping set is a set of erased, variables that can not be corrected regardless of other variables (even if all others are known)
How this happen?
Let G denote the set of neighbors of the stopping set
V
V
V, then every check node in G is connected to at least two variable node in
V
V
V.
The minimum stopping se
V
m
i
n
V_{min}
Vmin is the stopping set containing the fewest # of variable node.
Then the code can correct up to
∣
V
m
i
n
∣
=
1
|V_{min}|=1
∣Vmin∣=1 erasures
⇒
d
m
i
n
≥
∣
V
m
i
n
∣
\Rightarrow\;\;d_{min}\ge|V_{min}|
⇒dmin≥∣Vmin∣ (a code of minimum distance
d
m
i
n
d_{min}
dmin can correct up to
d
m
i
n
−
1
d_{min}-1
dmin−1 erasures)
Density evolution
→
\rightarrow
→ over BEC( p)
consider (
w
r
,
w
c
w_r, w_c
wr,wc), regular LDPC code, the probability that a variable node remains erased after the
l
l
l-th iteration (assuming independence
l
≤
L
/
2
l\le L/2
l≤L/2)
when l l l is the length of the shortest cycle also refer to as the “girth” of the Tanner graph, denoted by ε l \varepsilon_l εl
ε
0
=
P
ε
l
=
P
⋅
(
1
−
(
1
−
ε
l
−
1
)
w
r
−
1
)
w
c
−
1
f
o
r
l
≥
1
\varepsilon_0=P\\ \varepsilon_l=P\cdot(1-(1-\varepsilon_{l-1})^{w_r-1})^{w_c-1}\;\;for\;l\ge1
ε0=Pεl=P⋅(1−(1−εl−1)wr−1)wc−1forl≥1
P
P
P: bit
X
i
X_i
Xi is originally erased by channel
(
1
−
ε
l
−
1
)
w
r
−
1
(1-\varepsilon_{l-1})^{w_r-1}
(1−εl−1)wr−1: the probability that all
X
i
∈
R
j
\
i
X_{i\in R_{j\backslash i}}
Xi∈Rj\i are not erased.
1
−
(
1
−
ε
l
−
1
)
w
r
−
1
1-(1-\varepsilon_{l-1})^{w_r-1}
1−(1−εl−1)wr−1: the probability that
r
j
i
r_{ji}
rji is erased.
Given the degree distribution (
w
r
,
w
c
w_r,w_c
wr,wc) the threshold
ε
\varepsilon
ε is the maximum p for which
ε
l
→
0
\varepsilon_l\rightarrow0
εl→0 as
l
→
∞
l\rightarrow \infty
l→∞
n large, girth large
Example:
For the (3,6) regular LDPC as
n
→
∞
n\rightarrow \infty
n→∞, we have
ε
∗
=
0.4294
\varepsilon^*=0.4294
ε∗=0.4294, capacity
=
0.5706
=0.5706
=0.5706,
R
=
1
2
R=\frac 1 2
R=21, low capacity
For (d,2d) regular LDPC codes,
ε
∗
→
0.5
\varepsilon^* \rightarrow 0.5
ε∗→0.5 as
l
→
∞
l \rightarrow \infty
l→∞(we can achieve the capacity)
Assuming a random erasurable of all regular (d,2d) LDPC codes, the girth grow large enough as
n
n
n grows large with probability 1
Note that:
- Density evolution only describes the asymptotic performance of the Ramdon erasurable of codes.
- Goal of good H design x = { High girth High (or almost full) rank large minimum stopping set x = \begin{cases} \text{High girth} \\ \text{High (or almost full) rank}\\ \text{ large minimum stopping set} \end{cases} x=⎩ ⎨ ⎧High girthHigh (or almost full) rank large minimum stopping set
Channel coding techniques in 5G systems
Structure of LDPC in 5G
Protograph LDPC codes: lifting operation using a base matrix
For a lifting operation of size
z
z
z, each check node is replaced by
z
z
z check nodes (and variable node).
Then each edge in the Tanner graph is replaced by a shifted/performance matrix
it keeps the degree distribution of the Tanner graph (regardless of z)
lifting size :
z
z
z
each entry in the base matrix is a number from -1,0,1,…,
z
z
z
−
1
⇒
-1\Rightarrow
−1⇒ no edges:
z
×
z
z \times z
z×z of all-zero matrix
0
⇒
0 \Rightarrow
0⇒ Identity matrix, for i=1,2,…,
z
−
1
z-1
z−1 shifted permutation matrix by
i
i
i
Example:
There are two types of base graphs/matrixes in 5G LDPC code.
B
1
B_1
B1 of size 46
×
\times
× 68, and
B
2
B_2
B2 of size 42
×
\times
× 52.
There are lifting sizes up to 354 maximum block length supported by 5G LDPC is 384
×
\times
× 68 = 26112
Polar Code (channel dependence)
channel polarization theory: let W denote the channel BEC§
w
(
y
∣
x
)
w(y|x)
w(y∣x) denotes the probability of receiving
y
y
y given
x
x
x
W
(
0
∣
0
)
=
1
−
P
W
(
e
∣
0
)
=
P
W
(
1
∣
0
)
=
0
W
(
1
∣
1
)
=
1
−
P
W
(
e
∣
1
)
=
P
W
(
0
∣
1
)
=
0
W(0|0)=1-P\\ W(e|0)=P\\ W(1|0)=0\\ W(1|1)=1-P\\ W(e|1)=P\\ W(0|1)=0
W(0∣0)=1−PW(e∣0)=PW(1∣0)=0W(1∣1)=1−PW(e∣1)=PW(0∣1)=0
now consider two channels, the channel that u 1 u_1 u1 observes and the channel that u 2 u_2 u2 observes assuming u 1 u_1 u1 is known
(
u
1
u_1
u1 observes)
(
u
2
u_2
u2 observes assuming
u
1
u_1
u1 is known)
Given
w
−
w^-
w− is also a BEC, with erasure probability
1
−
(
1
−
p
)
2
=
2
p
−
p
2
1-(1-p)^2=2p-p^2
1−(1−p)2=2p−p2, because
u
1
u_1
u1 is known/decoded if and only if both
y
1
(
=
u
1
+
u
2
)
y_1(=u_1+u_2)
y1(=u1+u2) and
y
2
(
=
u
2
)
y_2(=u_2)
y2(=u2) are non-erasures
u
1
=
y
1
+
y
2
=
u
1
+
u
2
+
u
2
=
u
1
u_1=y_1+y_2=u_1+u_2+u_2=u_1
u1=y1+y2=u1+u2+u2=u1
w
+
w^+
w+ is also a BEC with erasure probability
p
2
p^2
p2 because
u
2
u_2
u2 is decoded if either
y
1
y_1
y1 or
y
2
y_2
y2 is a non-erasured (if both
y
1
y_1
y1 and
y
2
y_2
y2 are erased,
u
2
u_2
u2 unknown)
Note
- the sum-capacity is preserved (capacity of BEC( P) is 1-P),
c ( w − ) + c ( w + ) = 1 − 2 p + p 2 + 1 − p 2 = 2 ( 1 − p ) = 2 c ( w ) c(w^-)+c(w^+)=1-2p+p^2+1-p^2=2(1-p)=2c(w) c(w−)+c(w+)=1−2p+p2+1−p2=2(1−p)=2c(w) - Also, 2 p − p 2 > p 2 2p-p^2>p^2 2p−p2>p2, for 0 < p < 1 ∼ w + 0<p<1 \sim w^+ 0<p<1∼w+ is better than w − w^- w−
Channel splitting operation
Then
w
+
+
w^{++}
w++ input
u
4
u_4
u4 output
y
1
y
2
y
3
y
4
u
1
u
2
u
3
y_1\;y_2\;y_3\;y_4\;u_1\;u_2\;u_3
y1y2y3y4u1u2u3
w
+
−
w^{+-}
w+− input
u
3
u_3
u3 output
y
1
y
2
y
3
y
4
u
1
u
2
y_1\;y_2\;y_3\;y_4\;u_1\;u_2
y1y2y3y4u1u2
w
−
+
w^{-+}
w−+ input
u
2
u_2
u2 output
y
1
y
2
y
3
y
4
u
1
y_1\;y_2\;y_3\;y_4\;u_1
y1y2y3y4u1
w
−
−
w^{--}
w−− input
u
1
u_1
u1 output
y
1
y
2
y
3
y
4
y_1\;y_2\;y_3\;y_4
y1y2y3y4
This can continue recursively, for n = 2 m n=2^m n=2m, this is called polarization transform of length n n n denoted by p ( n ) p^{(n)} p(n) recursion stop from n n n to 2 n 2n 2n
w + + . . . + + . . . ↔ w ( i ) w^{++...++...}\leftrightarrow w^{(i)} w++...++...↔w(i) by mapping i − 1 i-1 i−1 into a binary format of length m ( m = l g ( n ) ) m\;(m=lg(n)) m(m=lg(n)) and replace “1” by “+” and “0” by “-”.
The sum-capacity is preserved:
∑
i
=
1
n
c
(
w
(
i
)
)
=
n
∗
c
(
w
)
\sum_{i=1}^{n} c(w^{(i)})=n*c(w)
i=1∑nc(w(i))=n∗c(w)
(for symmetric channels)
proof is by chain rule of mutual information assuming input bits
u
i
u_i
ui’s are uniform i.i.d(independent and identically distributed) )
polarization tree
channel polarization: As n grows large, the bit-channels become either completely noiseless (capcacity goes to one) or become completely noise (capacity goes to zero)
(except a vanishing fraction of bit-channels)
further more, the fraction of noiseless channel
→
c
(
w
)
\rightarrow c(w)
→c(w)
Example:
BEC(0.5), n=4
Let
z
(
i
)
z^{(i)}
z(i) denote the erasure prob of
w
(
i
)
w^{(i)}
w(i)
A prove for polarization for BEC
n
=
2
m
n=2m
n=2m bit channels
n
⋅
c
(
w
)
−
∑
i
=
1
n
c
(
w
(
i
)
)
n\cdot c(w) - \sum_{i=1}^n c(w^{(i)})
n⋅c(w)−∑i=1nc(w(i))
T
n
=
1
n
∑
i
=
1
n
(
1
−
z
(
i
)
)
T_n=\frac 1 n\sum_{i=1}^n(1-z^{(i)})
Tn=n1∑i=1n(1−z(i))
z
(
i
)
z^{(i)}
z(i): erasure probability of
w
(
i
)
w^{(i)}
w(i)
i
i
i-th bit channel
It is sufficient to prove that
l
i
m
n
→
∞
T
n
=
0
{lim \atop n\rightarrow\infty } T_n=0
n→∞limTn=0
z
2
(
1
−
z
2
)
+
(
2
z
−
z
2
)
(
1
−
2
z
+
z
2
)
=
2
z
(
1
−
z
)
(
1
−
z
(
1
−
z
)
)
z^2(1-z^2)+(2z-z^2)(1-2z+z^2)=2z(1-z)(1-z(1-z))
z2(1−z2)+(2z−z2)(1−2z+z2)=2z(1−z)(1−z(1−z))
define
α
i
=
z
(
i
)
(
1
−
z
(
i
)
)
\alpha_i=z^{(i)}(1-z^{(i)})
αi=z(i)(1−z(i))
T
2
n
=
1
2
n
∑
2
α
i
(
1
−
α
i
)
=
1
n
∑
α
i
−
α
i
2
=
1
n
∑
α
i
−
1
n
∑
α
i
2
≤
T
n
−
T
n
2
T_{2n}=\frac 1 {2n}\sum 2\alpha_i(1-\alpha_i)=\frac 1 n \sum \alpha_i-\alpha^2_i\\ = \frac 1 n\sum \alpha_i-\frac 1 n\sum \alpha_i^2 \,\le T_n-T_n^2
T2n=2n1∑2αi(1−αi)=n1∑αi−αi2=n1∑αi−n1∑αi2≤Tn−Tn2
Lemma:
∑
α
i
2
n
≥
(
∑
α
i
n
)
2
\frac {\sum \alpha_i^2} n\ge(\frac {\sum \alpha_i} n)^2
n∑αi2≥(n∑αi)2
Note that sequency of
{
T
n
}
n
−
1
\{T_n\}_{n-1}
{Tn}n−1 positive and strictly decresing
→
l
i
m
n
→
∞
T
n
\rightarrow {lim \atop n\rightarrow\infty } T_n
→n→∞limTn exists
Let
T
∞
=
l
i
m
n
→
∞
T
n
T_\infty= {lim \atop n\rightarrow\infty } T_n
T∞=n→∞limTn,
T
∞
=
T
∞
2
−
T
∞
⇒
T
∞
=
0
T_\infty=T^2_\infty-T_\infty\Rightarrow T_\infty=0
T∞=T∞2−T∞⇒T∞=0
β
n
=
∣
i
∣
ε
≤
z
(
i
)
≤
1
−
ε
∣
n
\beta_n=\frac {|{i|\varepsilon\le z^{(i)}\le1-\varepsilon}|}n
βn=n∣i∣ε≤z(i)≤1−ε∣ for some
ε
>
0
\varepsilon>0
ε>0
Note that
T
n
≥
β
n
−
ε
(
1
−
ε
)
T_n\ge\beta_n-\varepsilon(1-\varepsilon)
Tn≥βn−ε(1−ε)
for any fixed of
ε
\varepsilon
ε,
β
n
→
0
\beta_n\rightarrow0
βn→0 since
T
n
→
0
T_n\rightarrow0
Tn→0
Polarization transform
x
1
=
u
1
+
u
2
x
2
=
u
2
→
[
x
1
x
2
]
=
[
u
1
u
2
]
[
1
0
1
1
]
=
[
u
1
u
2
]
G
2
x_1=u_1+u_2\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\\ x_2=u_2 \qquad \rightarrow [x_1\;\;x_2]=[u_1\;\;u_2]\begin{bmatrix}1\;0\\1\; 1 \end{bmatrix}=[u_1\;\;u_2]G_2
x1=u1+u2x2=u2→[x1x2]=[u1u2][1011]=[u1u2]G2
G
2
n
=
[
G
n
0
n
×
n
G
n
G
n
]
=
G
n
×
G
2
=
G
2
⨂
G
2
⨂
G
2
⨂
.
.
.
⨂
G
2
⏟
m times m=
l
o
g
2
n
=
G
2
⨂
m
Kronecker power
G_{2n}=\begin{bmatrix} G_n\;0_{n\times n}\\G_n\;\;\;G_n \end{bmatrix}=G_n\times G_2=\underbrace{G_2\bigotimes G_2\bigotimes G_2\bigotimes ...\bigotimes G_2 }_{\text{m times m=$log_2 n $}} = G_2^{\bigotimes m} \;\;\text{Kronecker power}
G2n=[Gn0n×nGnGn]=Gn×G2=m times m=log2n
G2⨂G2⨂G2⨂...⨂G2=G2⨂mKronecker power
Kronecker product of A m × n A_{m\times n} Am×n and B p × q B_{p \times q} Bp×q
A ⨂ B = [ a 11 B . . . . . . . . . . a 1 n B . . . . . . a m 1 B . . . . . . . . . . a m n B ] m p × n q G 4 = [ 1 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 ] 4 × 4 (replacing "0" with "-1" results Hadamard matrix) ⇒ G n − 1 = G n o r G n × G n = I n × n s e l f − i n v e r s e A\bigotimes B=\begin{bmatrix} a_{11}B..........a_{1n}B\\...\qquad\qquad\quad...\\a_{m1}B..........a_{mn}B \end{bmatrix}_{mp\times nq}\\ G_4=\begin{bmatrix} 1\;0\;0\;0\\ 1\;1\;0\;0\\1\;0\;1\;0\\1\;1\;1\;1\\\end{bmatrix}_{4\times 4}\\ \text{(replacing "0" with "-1" results Hadamard matrix)}\\ \Rightarrow G_n^{-1}=G_n \;\;or \;\;G_n\times G_n=I_{n\times n} \;\;self-inverse A⨂B= a11B..........a1nB......am1B..........amnB mp×nqG4= 1000110010101111 4×4(replacing "0" with "-1" results Hadamard matrix)⇒Gn−1=GnorGn×Gn=In×nself−inverse
Encoding complexity
U
1
×
n
×
G
n
U_{1\times n}\times G_n
U1×n×Gn can be done with
O
(
n
l
o
g
(
n
)
)
O(nlog (n))
O(nlog(n)) complexity function
x
(
1
,
n
)
=
G
m
u
l
t
i
p
l
i
e
r
x(1,n)=G_{multiplier}
x(1,n)=Gmultiplier
(
u
(
1
:
n
)
)
(u(1:n))
(u(1:n))
if n==1
x=u
return
end
x1=G_multiplier(u(1*n/2))
x2=G_multiplier(u(x/2+1*n))
x=(x1+x2 , x2)
end
(
x
1
=
.
.
.
x
2
=
.
.
.
x_1=... \;\;\; x_2=...
x1=...x2=... these two steps can be done in parallel)
(
x
1
+
x
2
x_1+x_2
x1+x2 is entry-wise addition)
output of function:
x
1
×
n
=
u
1
×
n
G
n
x_{1\times n}=u_{1\times n}G_n
x1×n=u1×nGn
f
(
n
)
f(n)
f(n)=# of operations to compute this
u
1
×
n
G
n
u_{1\times n}G_n
u1×nGn
=
{
f
(
n
)
=
2
f
(
n
2
)
+
n
2
⇒
f
(
n
)
=
n
l
g
(
n
)
2
f
(
1
)
=
0
=\begin{cases} f(n)=2f(\frac n 2)+\frac n 2\Rightarrow f(n)=\frac {nlg(n)} 2 \\ f(1)=0 \end{cases}
={f(n)=2f(2n)+2n⇒f(n)=2nlg(n)f(1)=0
latency (time needed, assuming parallelization)
latency of computing
u
G
uG
uG with the function
G
m
u
l
t
i
p
l
i
e
r
G_{multiplier}
Gmultiplier
g
(
n
)
g(n)
g(n) : the latency,
g
(
n
)
=
g
(
n
2
)
+
1
⇒
g
(
n
)
=
l
g
(
n
)
g(n)=g(\frac n 2)+1\Rightarrow g(n)=lg(n)
g(n)=g(2n)+1⇒g(n)=lg(n) (fast enough)
polar code construction
length
n
n
n dimension
k
k
k, channel
w
w
w
pick the indices of the
k
k
k “best” bit-channels
w
(
i
)
s
w^{(i)} s
w(i)s in the polarization transform of length
n
n
n
the genrator matrix for (
n
,
k
n,k
n,k) polar code associated with
w
w
w
from matrix
G
n
×
n
G_{n\times n}
Gn×n, select the rows that are indexed by “good” bit-channel
Polar encoder
example:
n=8 k=4 for BEC(0.5)
k=4, pick 4 best one indices 4 6 7 8
u
1
×
8
=
[
0
0
0
m
1
0
m
2
m
3
m
4
]
u_{1\times 8}=[0\;0\;0\;m_1\;0\;m_2\;m_3\;m_4]
u1×8=[000m10m2m3m4]
message bit are
m
1
,
m
2
,
m
3
,
m
4
m_1,m_2,m_3,m_4
m1,m2,m3,m4
⇒
\Rightarrow
⇒ compute
u
1
×
8
G
8
u_{1\times 8}G_8
u1×8G8 to get the encoded codeword
Decoder polar code
Successive cancellation decoder
let A denote the set of indices of “good” bit-channels selected fro the code construction,
A
=
1
,
2
,
.
.
.
,
n
A={1,2,...,n}
A=1,2,...,n For
i
=
1
,
2
,
.
.
.
,
n
i=1,2,...,n
i=1,2,...,n
u
ˆ
i
\^u_i
uˆi :decoded version of
u
i
u_i
ui
u
ˆ
i
\^u_i
uˆi
{
0
if
i
∈
A
ML decision of
u
i
given
y
1
,
y
2
,
.
.
.
,
y
n
and
u
ˆ
1
,
u
ˆ
2
,
.
.
.
,
u
ˆ
n
\begin{cases} 0 &\text{if } i\in A \\ \text{ML decision of } u_i \text{ given } y_1,y_2,...,y_n \text{ and } \^u_1,\^u_2,...,\^u_n \end{cases}
{0ML decision of ui given y1,y2,...,yn and uˆ1,uˆ2,...,uˆnif i∈A
Let probability of error
P
e
(
u
i
)
=
P
e
(
w
(
i
)
)
Pe(u_i)=Pe(w^{(i)})
Pe(ui)=Pe(w(i)), assuming that
u
ˆ
1
i
−
1
=
u
1
i
−
1
\^u_1^{i-1}=u_1^{i-1}
uˆ1i−1=u1i−1 (
u
ˆ
1
i
−
1
:
u
1
,
u
2
,
.
.
.
,
u
i
−
1
\^u_1^{i-1}:u1,u2,...,u_{i-1}
uˆ1i−1:u1,u2,...,ui−1)
Lemma: Pe(the polar code associated with A and decoded with SC)
≤
∑
i
∈
A
P
e
(
u
i
)
\le \sum_{i\in A}Pe(u_i)
≤∑i∈APe(ui)
P
e
(
u
i
)
Pe(u_i)
Pe(ui): probability of error of individual bit-channel
PROOF:
by union bound on the error events
u
ˆ
i
≠
u
i
\^u_i\ne u_i
uˆi=ui for the first(smallest) i, going back to the construction of polar code, there are two criteria:
- For a fixed rate: sort the bit-channels and pick the best k − n R k-nR k−nR , R:given rate. Finding the rate, n n n block length, n = 2 m n=2^m n=2m rate R R R, dimension k = n R k=nR k=nR. Polarization transform of length n n n, split into n n n bit-channels w ( 1 ) w ( 2 ) . . . w ( n ) w^{(1)}\;w^{(2)}\;...w^{(n)}\; w(1)w(2)...w(n). Sort them (according to capacity or probability of erasure) pick the best k k k of them, let A = A= A= set of indices of the selected/good ones.
- For a given bound on
P
e
Pe
Pe. Pe(polar code associated with A under SC)
≤
∑
i
∈
A
P
e
(
u
i
)
\le \sum_{i\in A}Pe(u_i)
≤∑i∈APe(ui),
u
i
u_i
ui: for bit-channel
w
(
i
)
w^{(i)}
w(i).
Sort the bit-channels from best to worse u π ( 1 ) u π ( 2 ) . . . u π ( n ) u_{\pi(1)}\;u_{\pi(2)}\;...u_{\pi(n)}\; uπ(1)uπ(2)...uπ(n) - sorting permutation according to R e ( u i ) Re(u_i) Re(ui)
u n u_n un is always the best π ( 1 ) = n \pi(1)=n π(1)=n
u 1 u_1 u1 is always the worst π ( n ) = 1 \pi(n)=1 π(n)=1
u π ( n ) → u_{\pi(n)}\rightarrow uπ(n)→then accumulate as many u π ( i ) u_{\pi(i)} uπ(i)'s as possible (starting from π ( 1 ) \pi(1) π(1) till the sum ∑ i = 1 k P e ( u π ( i ) ) \sum_{i=1}^k Pe(u_{\pi(i)}) ∑i=1kPe(uπ(i)) reaches the bound on P e Pe Pe)
Example:
for n=8 k=4 BEC(
1
2
\frac 1 2
21)
standard :
P
e
<
1
3
Pe<\frac 1 3
Pe<31
∑
P
e
=
1
256
+
31
256
+
49
256
=
81
256
<
1
3
\sum Pe=\frac 1 {256}+\frac {31} {256}+\frac {49} {256}=\frac {81} {256}<\frac {1} {3}
∑Pe=2561+25631+25649=25681<31 good
but if
+
P
e
(
u
π
(
4
)
)
+Pe(u_{\pi(4)})
+Pe(uπ(4)),
∑
P
e
\sum Pe
∑Pe will greater than
1
3
\frac 1 3
31,
∼
k
=
3
\sim k=3
∼k=3, set of good bit-channels,
A
=
{
8
,
7
,
6
}
A=\{8,7,6\}
A={8,7,6}