Channel Capacity 3: Gaussian Channel

Reference:

Elements of Information Theory, 2nd Edition

Slides of EE4560, TUD

Differential Entropy

We now introduce the concept of differential entropy, which is the entropy of a continuous random variable.

Definition 1 (Differential Entropy):

The differential entropy h ( X ) h(X) h(X) of a continuous random variable X X X with density f ( x ) f(x) f(x) is defined as
h ( X ) = − ∫ S f ( x ) log ⁡ f ( x ) d x (1) h(X)=-\int_S f(x)\log f(x)dx \tag{1} h(X)=Sf(x)logf(x)dx(1)
where S S S is the support set of X X X (where f ( x ) > 0 f(x)>0 f(x)>0).

  • h ( X ) h(X) h(X) is sometimes denoted as h ( f ) h(f) h(f), as H ( X ) H(X) H(X) is sometimes denoted as H ( p ) H(p) H(p).
  • log ⁡ \log log here denotes log ⁡ 2 \log _2 log2. Do not forget to change the base.

Examples:

  • Uniform Distribution:
    h ( X ) = − ∫ 0 a 1 a log ⁡ 1 a d x = log ⁡ a (2) h(X)=-\int_{0}^{a}\frac{1}{a} \log \frac{1}{a}dx=\log a\tag{2} h(X)=0aa1loga1dx=loga(2)

    • larger  a → larger uncertainty → larger  h ( X ) \text{larger }a \to \text{larger uncertainty} \to \text{larger } h(X) larger alarger uncertaintylarger h(X)
    • For 0 < a < 1 0<a<1 0<a<1, the differential entropy log ⁡ a \log a loga is negative! This is different from H ( X ) H(X) H(X), which is always ≥ 0 \ge 0 0.
    • However, 2 h ( X ) = 2 log ⁡ a = a 2^{h(X)}=2^{\log a}=a 2h(X)=2loga=a is always positive.
  • Normal Distribution:
    h ( f ) = − ∫ f ( x ) log ⁡ f ( x ) d x = − ∫ f ( x ) ln ⁡ 2 [ − ( x − μ ) 2 2 σ 2 − ln ⁡ 2 π σ 2 ] d x = 1 ln ⁡ 2 [ E ( X 2 ) 2 σ 2 + 1 2 ln ⁡ ( 2 π σ 2 ) ] = 1 2 log ⁡ ( 2 π e σ 2 ) (3) \begin{aligned} h(f)&=-\int f(x)\log f(x)dx=-\int \frac{f(x)}{\ln 2}\left[\frac{-(x-\mu)^2}{2\sigma^2}-\ln \sqrt{2\pi \sigma^2} \right]dx\\ &=\frac{1}{\ln 2}\left[\frac{E(X^2)}{2\sigma^2}+\frac{1}{2}\ln (2\pi \sigma^2)\right]=\frac{1}{2}\log (2\pi e \sigma^2) \end{aligned}\tag{3} h(f)=f(x)logf(x)dx=ln2f(x)[2σ2(xμ)2ln2πσ2 ]dx=ln21[2σ2E(X2)+21ln(2πσ2)]=21log(2πeσ2)(3)

Definition 2 (Joint Differential Entropy):

The joint differential entropy h ( X ) h(X) h(X) of a set X 1 , X 2 , ⋯   , X n X_1,X_2,\cdots,X_n X1,X2,,Xn of random variables with density f ( x 1 , x 2 , ⋯   , x n ) f(x_1,x_2,\cdots,x_n) f(x1,x2,,xn) is defined as
h ( X 1 , X 2 , ⋯   , X n ) = − ∫ f ( x n ) log ⁡ f ( x n ) d x n (4) h(X_1,X_2,\cdots,X_n)=-\int f(x^n)\log f(x^n)dx^n \tag{4} h(X1,X2,,Xn)=f(xn)logf(xn)dxn(4)
N.B. x n x^n xn here is a short notation of ( x 1 , x 2 , ⋯   , x n ) (x_1,x_2,\cdots,x_n) (x1,x2,,xn)

Definition 3 (Conditional Differential Entropy):

If X , Y X, Y X,Y have a joint density function f ( x , y ) , f(x, y), f(x,y), we can define the conditional differential entropy h ( X ∣ Y ) h(X | Y) h(XY) as
h ( X ∣ Y ) = − ∫ f ( x , y ) log ⁡ f ( x ∣ y ) d x d y = h ( X , Y ) − h ( Y ) (5) h(X| Y)=-\int f(x, y) \log f(x | y) d x d y=h(X, Y)-h(Y)\tag{5} h(XY)=f(x,y)logf(xy)dxdy=h(X,Y)h(Y)(5)

Definition 4 (Mutual Information):

The mutual information I ( X ; Y ) I(X;Y) I(X;Y) between two random variables X X X and Y Y Y with joint density f ( x , y ) f(x,y) f(x,y) is defined as
I ( X ; Y ) = ∬ f ( x , y ) log ⁡ f ( x , y ) f ( x ) f ( y ) d x d y = h ( X ) − h ( X ∣ Y ) = h ( Y ) − h ( Y ∣ X ) = h ( X ) + h ( Y ) − h ( X , Y ) (6) \begin{aligned} I(X;Y)&=\iint f(x,y)\log \frac{f(x,y)}{f(x)f(y)}dx dy\\ &=h(X)-h(X|Y)=h(Y)-h(Y|X)\\ &=h(X)+h(Y)-h(X,Y) \end{aligned}\tag{6} I(X;Y)=f(x,y)logf(x)f(y)f(x,y)dxdy=h(X)h(XY)=h(Y)h(YX)=h(X)+h(Y)h(X,Y)(6)
N.B. I ( X ; Y ) ≥ 0 I(X;Y)\ge 0 I(X;Y)0 with equality if and only if X X X and Y Y Y are independent.

Gaussian Channels

The most important continuous alphabet channel is the Gaussian channel. This is a time-discrete channel with output Y i Y_i Yi at time i i i,where Y i Y_i Yi is the sum of the input X i X_i Xi and the noise Z i Z_i Zi. The noise Z i Z_i Zi is drawn i.i.d. from a Gaussian distribution with variance N N N. Thus,
Y i = X i + Z i , Z i ∼ N ( 0 , N ) (7) Y_i=X_i+Z_i,\quad Z_i \sim \mathcal N(0,N) \tag{7} Yi=Xi+Zi,ZiN(0,N)(7)
The noise Z i Z_i Zi is assumed to be independent of the signal X i X_i Xi.

在这里插入图片描述

The most common limitation on the input is an energy or power constraint. We assume an average power constraint. For any codeword ( x 1 , x 2 , . . . , x n ) (x_1,x_2,...,x_n) (x1,x2,...,xn) transmitted over the channel, we require that
1 n ∑ i = 1 n x i 2 ≤ P (8) \frac{1}{n}\sum_{i=1}^n x_i^2\le P \tag{8} n1i=1nxi2P(8)
[Example-Binary input, Gaussian noise: Slide 6-7]


Gaussian Channel Capacity

Definition 5 (Information Capacity):

The information capacity of the Gaussian channel with power constraint P P P is
C = max ⁡ f ( x ) : E X 2 ≤ P I ( X ; Y ) = 1 2 log ⁡ ( 1 + P N ) (9) C=\max _{f(x): E X^{2} \leq P} I(X ; Y)=\frac{1}{2} \log \left(1+\frac{P}{N}\right) \tag{9} C=f(x):EX2PmaxI(X;Y)=21log(1+NP)(9)
where the maximum is achieved when X ∼ N ( 0 , P ) X\sim \mathcal N(0,P) XN(0,P).

Proof: Expanding I ( X ; Y ) , I(X ; Y), I(X;Y), we have
I ( X ; Y ) = h ( Y ) − h ( Y ∣ X ) = h ( Y ) − h ( X + Z ∣ X ) = h ( Y ) − h ( Z ∣ X ) = h ( Y ) − h ( Z ) \begin{aligned} I(X ; Y) &=h(Y)-h(Y | X) \\ &=h(Y)-h(X+Z |X) \\ &=h(Y)-h(Z| X) \\ &=h(Y)-h(Z) \end{aligned} I(X;Y)=h(Y)h(YX)=h(Y)h(X+ZX)=h(Y)h(ZX)=h(Y)h(Z)
since Z Z Z is independent of X . X . X. From Eq. ( 3 ) (3) (3), h ( Z ) = 1 2 log ⁡ 2 π e N . h(Z)=\frac{1}{2} \log 2 \pi e N . h(Z)=21log2πeN. Also,
E Y 2 = E ( X + Z ) 2 = E X 2 + 2 E X E Z + E Z 2 = P + N E Y^{2}=E(X+Z)^{2}=E X^{2}+2 E X E Z+E Z^{2}=P+N EY2=E(X+Z)2=EX2+2EXEZ+EZ2=P+N
since X X X and Z Z Z are independent and E Z = 0. E Z=0 . EZ=0. Given E Y 2 = P + N , E Y^{2}=P+N, EY2=P+N, the entropy of Y Y Y is bounded by 1 2 log ⁡ 2 π e ( P + N ) \frac{1}{2} \log 2 \pi e(P+N) 21log2πe(P+N) by Theorem 8.6.5 (the normal maximizes the entropy for a given variance) [book 254] . Applying this result to bound the mutual information, we obtain
I ( X ; Y ) = h ( Y ) − h ( Z ) ≤ 1 2 log ⁡ 2 π e ( P + N ) − 1 2 log ⁡ 2 π e N = 1 2 log ⁡ ( 1 + P N ) \begin{aligned} I(X ; Y) &=h(Y)-h(Z) \\ & \leq \frac{1}{2} \log 2 \pi e(P+N)-\frac{1}{2} \log 2 \pi e N \\ &=\frac{1}{2} \log \left(1+\frac{P}{N}\right) \end{aligned} I(X;Y)=h(Y)h(Z)21log2πe(P+N)21log2πeN=21log(1+NP)
Next, it will be shown that this capacity is also the supremum of the rates achievable for the channel, i.e., the operational capacity.


Definition 6 (Code):

An ( M , n ) (M, n) (M,n) code for the Gaussian channel with power constraint P P P consists of the following:

  1. An index set { 1 , 2 , … , M } \{1,2, \ldots, M\} {1,2,,M}
  2. An encoding function x : { 1 , 2 , … , M } → X n x:\{1,2, \ldots, M\} \rightarrow \mathcal{X}^{n} x:{1,2,,M}Xn, yielding codewords x n ( 1 ) , x n ( 2 ) , … , x n ( M ) , x^{n}(1), x^{n}(2), \ldots, x^{n}(M), xn(1),xn(2),,xn(M), satisfying the power constraint P ; P ; P; that is, for every codeword
    ∑ i = 1 n x i 2 ( w ) ≤ n P , w = 1 , 2 , … , M \sum_{i=1}^{n} x_{i}^{2}(w) \leq n P, \quad w=1,2, \ldots, M i=1nxi2(w)nP,w=1,2,,M
  3. A decoding function

g : Y n → { 1 , 2 , … , M } g: \mathcal{Y}^{n} \rightarrow\{1,2, \ldots, M\} g:Yn{1,2,,M}

N.B. Rate R = log ⁡ M n R=\frac{\log M}{n} R=nlogM, as is defined in the discrete channel.

Definition 7 (Achievable):

A rate R R R is said to be achievable for a Gaussian channel with a power constraint P P P if there exists

  • a sequence of ( 2 n R , n ) \left(2^{n R}, n\right) (2nR,n) codes
  • with codewords satisfying the power constraint
  • such that the maximal probability of error λ ( n ) \lambda^{(n)} λ(n) tends to zero.

The capacity of the channel is the supremum of the achievable rates.

Theorem 1 (The capacity of a Gaussian channel):

The capacity of a Gaussian channel with power constraint P P P and noise variance N N N is
C = 1 2 log ⁡ ( 1 + P N )  bits per transmission (10) C=\frac{1}{2} \log \left(1+\frac{P}{N}\right) \quad \text{ bits per transmission} \tag{10} C=21log(1+NP) bits per transmission(10)
[Proof: book 266-268]

A plausibility argument:

在这里插入图片描述


Band-Limited Channel

A common model for communication over a radio network or a telephone line is a bandlimited channel with white noise. This is a continuous time channel. The output of such a channel can be described as the convolution
Y ( t ) = ( X ( t ) + Z ( t ) ) ∗ h ( t ) (11) Y(t)=(X(t)+Z(t))*h(t)\tag{11} Y(t)=(X(t)+Z(t))h(t)(11)
where

  • Y ( t ) Y(t) Y(t) is output signal waveform
  • X ( t ) X(t) X(t) is input signal waveform
  • Z ( t ) Z(t) Z(t) is white Gaussian noise waveform
  • h ( t ) h(t) h(t) is impulse response of an ideal bandpass filter (cuts out all frequencies greater than W W W).

Theorem 2 (The sampling theorem):

A function f ( t ) f(t) f(t), which is band-limited to W W W, is completely determined by samples of the function spaced 1 2 W \frac{1}{2W} 2W1 seconds apart.

[Proof: book 271]

Now we can formulate the problem of communication over a bandlimited channel:

  • Bandwidth W W W
  • Number of samples per second 2 W 2W 2W
  • Signal power P P P
  • Noise power N = N 0 W N=N_0W N=N0W, where N 0 N_0 N0 is the noise power spectral density

If channel is used over the time interval [ 0 , T ] [0,T] [0,T], then

  • energy per sample is P T 2 W T = P 2 W \frac{PT}{2WT}=\frac{P}{2W} 2WTPT=2WP
  • noise variance per sample is N 0 W T 2 W T = N 0 2 \frac{N_0 WT}{2WT}=\frac{N_0}{2} 2WTN0WT=2N0

Using Theorem 1 (Eq. ( 10 ) (10) (10)), we can obtain the capacity per sample:
C = 1 2 log ⁡ ( 1 + P 2 W N 0 2 ) = 1 2 log ⁡ ( 1 + P N 0 W )  bits per sample  (12) C=\frac{1}{2} \log \left(\frac{1+\frac{P}{2 W}}{\frac{N_{0}}{2}}\right)=\frac{1}{2} \log \left(1+\frac{P}{N_{0} W}\right) \quad \text { bits per sample } \tag{12} C=21log(2N01+2WP)=21log(1+N0WP) bits per sample (12)
Since there are 2 W 2W 2W samples per second, the capacity per second:
C = W log ⁡ ( 1 + P N 0 W )  bits per second  (13) C=W\log \left( 1+\frac{P}{N_0W} \right) \quad \text { bits per second } \tag{13} C=Wlog(1+N0WP) bits per second (13)
N.B. If W → ∞ W\to \infty W, using ln ⁡ ( 1 + x ) ∼ x   ( x → 0 ) \ln (1+x)\sim x ~(x\to 0) ln(1+x)x (x0), then C = P log ⁡ e N 0 b p s C=\frac{P \log e}{N_0}\mathrm{bps} C=N0Plogebps.

Definition 8 (Band Efficiency):

Bandwidth efficiency η \eta η is defined as the rate R R R (in b i t / s \mathrm{bit/s} bit/s) divided by the bandwidth W W W (in H z \mathrm{Hz} Hz):
η = R W   b i t / s / H z (14) \eta=\frac{R}{W}~\mathrm{bit} / \mathrm{s} / \mathrm{Hz} \tag{14} η=WR bit/s/Hz(14)
From channel capacity formula it follows that
R < C = W log ⁡ ( 1 + P W N 0 ) = W log ⁡ ( 1 + R E b W N 0 ) R<C=W \log \left(1+\frac{P}{W N_{0}}\right)=W \log \left(1+\frac{R E_{b}}{W N_{0}}\right) R<C=Wlog(1+WN0P)=Wlog(1+WN0REb)
where E b E_{b} Eb is the energy per bit. Hence,
η < log ⁡ ( 1 + η E b N 0 ) ,  i.e.,  E b N 0 > 2 η − 1 η (15) \eta<\log \left(1+\eta \frac{E_{b}}{N_{0}}\right), \text { i.e., } \frac{E_{b}}{N_{0}}>\frac{2^{\eta}-1}{\eta} \tag{15} η<log(1+ηN0Eb), i.e., N0Eb>η2η1(15)

在这里插入图片描述


Parallel Gaussian Channels

在这里插入图片描述

Problem to be solved:
minimize − ∑ j = 1 k C j = − ∑ j = 1 k 1 2 log ⁡ ( 1 + P j N j ) subject to ∑ j = 1 k P j ≤ P (16) \begin{aligned} &\text{minimize} && -\sum_{j=1}^{k}C_j =-\sum_{j=1}^{k} \frac{1}{2}\log \left(1+\frac{P_j}{N_j} \right)\\ &\text{subject to} && \sum_{j=1}^{k} P_j \le P \end{aligned}\tag{16} minimizesubject toj=1kCj=j=1k21log(1+NjPj)j=1kPjP(16)
Using Lagrange multipliers gives the function
L ( P 1 , ⋯   , P k , λ ) = − ∑ j = 1 k 1 2 log ⁡ ( 1 + P j N j ) + λ ( ∑ j = 1 k P j − P ) L(P_1,\cdots,P_k,\lambda)=-\sum_{j=1}^{k} \frac{1}{2}\log \left(1+\frac{P_j}{N_j} \right)+\lambda(\sum_{j=1}^{k} P_j -P) L(P1,,Pk,λ)=j=1k21log(1+NjPj)+λ(j=1kPjP)
KKT conditions:
∑ j = 1 k P j ≤ P , λ ≥ 0 ∇ P j L = 0 ⟹ P j = 1 2 λ − N j λ ( ∑ j = 1 k P j − P ) = 0 \sum_{j=1}^{k} P_j \le P,\quad \lambda\ge 0\\ \nabla _{P_j}L=0 \Longrightarrow P_j=\frac{1}{2\lambda}-N_j\\ \lambda(\sum_{j=1}^{k} P_j - P)=0 j=1kPjP,λ0PjL=0Pj=2λ1Njλ(j=1kPjP)=0
Together with the condition that P j P_j Pj are nonnegative gives the solution
P j = max ⁡ { 0 , 1 2 λ − N j } ≜ ( ν − N j ) + (17) P_j=\max \{0,\frac{1}{2\lambda}-N_j\}\triangleq(\nu-N_j)^+ \tag{17} Pj=max{0,2λ1Nj}(νNj)+(17)
where ν \nu ν is chosen such that ∑ ( ν − N j ) + = P \sum (\nu -N_j)^+=P (νNj)+=P.

This solution is illustrated graphically in Figure 9.4. The vertical levels indicate the noise levels in the various channels. As the signal power is increased from zero, we allot the power to the channels with the lowest noise. When the available power is increased still further, some of the power is put into noisier channels.

在这里插入图片描述

The process by which the power is distributed among the various bins is identical to the way in which water distributes itself in a vessel, hence this process is sometimes referred to as water-filling.

[Example: Slides 23-25]


Gaussian Channels with Feedback

The feedback allows the input of the channel to depend on the past values of the output:

在这里插入图片描述

  • Capacity without feedback:
    max ⁡ tr ⁡ ( K X ) ≤ n P 1 2 n log ⁡ ∣ K X + K Z ∣ ∣ K Z ∣ (18) \max _{\operatorname{tr}\left(K_{X}\right) \leq n P} \frac{1}{2 n} \log \frac{\left|K_{X}+K_{Z}\right|}{\left|K_{Z}\right|}\tag{18} tr(KX)nPmax2n1logKZKX+KZ(18)
  • Capacity with feedback:

max ⁡ tr ⁡ ( K X ) ≤ n P 1 2 n log ⁡ ∣ K X + Z ∣ ∣ K Z ∣ (19) \max _{\operatorname{tr}\left(K_{X}\right) \leq n P} \frac{1}{2 n} \log \frac{\left|K_{X+Z}\right|}{\left|K_{Z}\right|}\tag{19} tr(KX)nPmax2n1logKZKX+Z(19)
where K … K_{\ldots} K is n × n n \times n n×n covariance matrix.

Remarks:

  • Memoryless channels: feedback does not increase capacity!

  • Channels with memory: feedback does increase capacity!

  • Feedback does not improve capacity by more than 1 2 \frac{1}{2} 21 :
    C withFB  ≤ C withoutFB  + 1 2 (20) C_{\text {withFB }} \leq C_{\text {withoutFB }}+\frac{1}{2} \tag{20} CwithFB CwithoutFB +21(20)

  • Feedback does not improve capacity by more than a factor of two:
    C withFB  ≤ 2 C withoutFB  (21) C_{\text {withFB }} \leq 2 C_{\text {withoutFB }} \tag{21} CwithFB 2CwithoutFB (21)

  • Conclusion: feedback may help, but not much!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值