Rate Distortion Theory

Reference:

Elements of Information Theory, 2nd Edition

Slides of EE4560, TUD


For a stationary discrete source, the minimum number of bits to represent the source signal with arbitrarily small probability of error is given by the entropy rate H ∞ ( X ) H_\infty (X) H(X).

In many situations, however, it is not necessary to perfectly represent the source signal.

For instance, the description of an arbitrary real number requires an infinite number of bits, so a finite representation of a continuous random variable can never be perfect.

How well can we do? → \to Define the “goodness” of a representation of a source → \to Define a distortion measure

Given a source distribution and distortion measure,

  • What is the minimum expected distortion achievable at a particular bit rate? D ( R ) D(R) D(R)
  • What is the minimum rate description required to achieve a particular distortion? R ( D ) R(D) R(D)

Quantization

Let X ^ ( X ) \hat X(X) X^(X) denote the representation of the random variable X X X. Using R R R bits to represent X X X, the function X ^ \hat X X^ can taken on 2 R 2^R 2R values.

Problem: Find the optimal set of values for X ^ \hat X X^ and the regions that are associate with each value X ^ \hat X X^.

An L L L-level quantizer is characterized by a set of L + 1 L+1 L+1 decision levels or decision thresholds x 0 < x 1 < ⋯ < x L x_0<x_1<\cdots<x_L x0<x1<<xL and a set X ^ = { x ^ k , k = 1 , ⋯   , L } \hat X=\{ \hat x_k,k=1,\cdots,L \} X^={ x^k,k=1,,L} such that x ^ = x ^ k \hat x=\hat x_k x^=x^k if and only if x k − 1 ≤ x < x k x_{k-1}\le x <x_k xk1x<xk, where x 0 = − ∞ x_0=-\infty x0= and x L = ∞ x_L=\infty xL=.

The numbers x ^ k \hat x_k x^k are called the reconstruction values or reproduction levels and the intervals C k = [ x k − 1 , x k ) C_k=[x_{k-1},x_k) Ck=[xk1,xk) are usually referred to as the decision intervals or quantization cells.

The map X ^ : X ↦ X ^ \hat X:\mathcal X \mapsto \hat {\mathcal X} X^:XX^ is given by
X ^ ( x ) = x ^ k for  x ∈ C k , k = 1 , ⋯   , L \hat X(x)=\hat x_k\quad \text{for }x\in \mathcal C_k,k=1,\cdots,L X^(x)=x^kfor xCk,k=1,,L
is a staircase function by definition.

在这里插入图片描述

In order to find an optimal quantizer, that is, to find optimal decision and reproduction levels, we need a rule for quantitively assigning a distortion value to every possible approximation of the source samples.

Definition 1 (distortion measure):

A distortion function or distortion measure is a mapping
d : X × X ^ ↦ R + d: \mathcal{X} \times \hat{\mathcal{X}} \mapsto \mathbb{R}^{+} d:X×X^R+
from the set of source alphabet-reproduction alphabet pairs into a set of nonnegative numbers. The distortion d ( x , x ^ ) d(x, \hat{x}) d(x,x^) is a measure of the cost representing the symbol x x x by the symbol x ^ \hat{x} x^.

Examples:

  • Hamming distortion (Probability of error distortion measure)
    d ( x , x ^ ) = { 0  if  x = x ^ 1  if  x ≠ x ^ d(x, \hat{x})=\left\{\begin{array}{ll} 0 & \text { if } x=\hat{x} \\ 1 & \text { if } x \neq \hat{x} \end{array}\right. d(x,x^)={ 01 if x=x^ if x=x^
    E d ( X , X ^ ) = Pr ⁡ ( x = x ^ ) ⋅ 0 + Pr ⁡ ( x ≠ x ^ ) ⋅ 1 = Pr ⁡ ( x ≠ x ^ ) E d(X, \hat{X})=\operatorname{Pr}(x=\hat{x}) \cdot 0+\operatorname{Pr}(x \neq \hat{x}) \cdot 1=\operatorname{Pr}(x \neq \hat{x}) Ed(X,X^)=Pr(x=x^)0+Pr(x=x^)1=Pr(x=x^)

  • Squared-error distortion
    d ( x , x ^ ) = ( x − x ^ ) 2 d(x,\hat x)=(x-\hat x)^2 d(x,x^)=(xx^)2

Assume a squared-error distortion measure. What are the optimal reproduction levels and optimal quantization cells?

That is, we wish to find the function X ^ ( X ) \hat X(X) X^(X) such that X ^ \hat X X^ takes on at most L = 2 R L=2^R L=2R values and minimized E ( X − X ^ ) 2 E(X-\hat X)^2 E(XX^)2.
E ( X − X ^ ) 2 = ∑ k = 1 L ∫ C k ( x − x ^ k ) 2 p ( x ) d x (1) E(X-\hat{X})^{2}=\sum_{k=1}^{L} \int_{\mathcal{C}_{k}}\left(x-\hat{x}_{k}\right)^{2} p(x) d x \tag{1} E(XX^)2=k=1LCk(xx^k)2p(x)dx(1)

  • If the quantization cells C k \mathcal C_k Ck are known:

The optimal reproduction levels are found by
∂ E ( X − X ^ ) 2 ∂ x ^ k ∣ x ^ k = x ^ k ∗ = − 2 ∫ x ∈ C k ( x − x ^ k ∗ ) p ( x ) d x = 0 \left.\frac{\partial E(X-\hat{X})^{2}}{\partial \hat{x}_{k}}\right|_{\hat{x}_{k}=\hat{x}_{k}^{*}}=-2 \int_{x \in C_{k}}\left(x-\hat{x}_{k}^{*}\right) p(x) d x=0 x^kE(XX^)2x^k=x^k=2xCk(xx^k)p(x)dx=0
so that
x ^ k ∗ = ∫ x ∈ C k x p ( x ) d x ∫ x ∈ C k p ( x ) d x \hat{x}_{k}^{*}=\frac{\int_{x \in C_{k}} x p(x) d x}{\int_{x \in C_{k}} p(x) d x} x^k=xCkp(x)dxxCkxp(x)dx
Since
∫ x ∈ C k p ( x ) d x = Pr ⁡ ( x ∈ C k ) \int_{x \in \mathcal{C}_{k}} p(x) d x=\operatorname{Pr}\left(x \in \mathcal{C}_{k}\right) xCkp(x)dx=Pr(xCk)
we have, using Bayes’ rule, that
p ( x ) Pr ⁡ ( x ∈ C k ) = p ( x ∣ x ∈ C k ) Pr ⁡ ( x ∈ C k ∣ x ) \frac{p(x)}{\operatorname{Pr}\left(x \in \mathcal{C}_{k}\right)}=\frac{p\left(x | x \in \mathcal{C}_{k}\right)}{\operatorname{Pr}\left(x \in \mathcal{C}_{k} | x\right)} Pr(xCk)p(x)=Pr(xCkx)p(xxCk)
so that
x ^ k ∗ = ∫ x ∈ C k x p ( x ) Pr ⁡ ( x ∈ C k ) d x = ∫ x ∈ C k x p ( x ∣ x ∈ C k ) 1 d x = E ( X ∣ x ∈ C k ) (2) \hat{x}_{k}^{*}=\int_{x \in \mathcal{C}_{k}} x \frac{p(x)}{\operatorname{Pr}\left(x \in \mathcal{C}_{k}\right)} d x=\int_{x \in \mathcal{C}_{k}} x \frac{p\left(x | x \in \mathcal{C}_{k}\right)}{1} d x=E\left(X | x \in \mathcal{C}_{k}\right)\tag{2} x^k=xCkxPr(xCk)p(x)dx=xCkx1p(xxCk)dx=E(XxCk)(2)
It is the conditional mean or centroid of quantisation cell C k \mathcal{C}_{k} Ck.

  • If the reproduction levels x ^ k \hat x_k x^k are known:

Given a set { x ^ i } \left\{\hat{x}_{i}\right\} { x^i} of reconstruction points, the distortion is minimized by mapping a source random variable to the representation x ^ i \hat{x}_{i} x^i that is closest to it. The partition into regions of X \mathcal{X} X defined by this mapping is called a Voronoi partition.

  • The Voronoi regions are determined by the optimal reproduction points, whereas the optimal reproduction points are obtained given the Voronoi regions. How to solve this problem?

Iterative descent algorithm (Lloyd, 1957 ) ) ) :

  1. start with an initial collection of reproduction points
  2. optimize the partitions for these levels by using a minimum distortion mapping (nearest neighbour quantization)
  3. optimize the set of reproduction levels for the given partition (replace the old values by the centroids of the partition cells)

The alternation is continued until convergence to a local, if not global, optimum.


Instead of quantizing a single random variable, let us assume that we are given a set of n n n i.i.d. random variables X 1 , … X n X_{1}, \ldots X_{n} X1,Xn drawn from a Gaussian distribution which we want to represent by n R n R nR bits

  • we will represent the entire sequence by a single index taking 2 n R 2^{n R} 2nR values

  • this treatment of entire sequences at once achieves a lower distortion for the same rate than independent quantization of the individual samples

    Apparently, rectangular grid points (arising from independent descriptions) do not fill up the space efficiently:

在这里插入图片描述

Definition 2 (dimensionless normalized second moment of inertia):

Let ν \nu ν denote the volume of a quantization cell. The dimensionless normalized second moment of inertia G ( C k ) G\left(\mathcal{C}_{k}\right) G(Ck) of a quantization cell is defined by
G ( C k ) = 1 n ν 1 + 2 / n ∫ C k ∥ x − x ^ k ∥ 2 d x G\left(\mathcal{C}_{k}\right)=\frac{1}{n \nu^{1+2 / n}} \int_{\mathcal{C}_{k}}\left\|x-\hat{x}_{k}\right\|^{2} d x G(Ck)=

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值