Data Compression

Reference:

Elements of Information Theory, 2nd Edition

Slides of EE4560, TUD


Problem description:

Let X 1 , X 2 , ⋯   , X n X_1,X_2,\cdots,X_n X1,X2,,Xn be independent, identically distributed random variables drawn from the probability mass function p ( x ) p(x) p(x). We wish to find short descriptions for such sequences of random variables.

How to solve?

Assigning short descriptions to the most frequent outcomes of the data source, and necessarily longer descriptions to the less frequent outcomes.

Most frequent outcomes? → \to Typical sequences → \to AEP → \to Data compression

Consequences of The AEP: Data Compression

We divide all sequences in X n \mathcal X^n Xn into two sets: the typical set A ϵ ( n ) A_\epsilon^{(n)} Aϵ(n) and its complement non-typical set A ϵ ( n ) ‾ \overline{A_\epsilon^{(n)}} Aϵ(n).

  • Order all elements in A ϵ ( n ) A_{\epsilon}^{(n)} Aϵ(n) and A ϵ ( n ) ‾ \overline{A_{\epsilon}^{(n)}} Aϵ(n) and represent each element by an index
  • Since ∣ A ϵ ( n ) ∣ ≤ 2 n ( H ( X ) + ϵ ) \left|A_{\epsilon}^{(n)}\right| \leq 2^{n(H(X)+\epsilon)} Aϵ(n)2n(H(X)+ϵ), indexing the sequences in A ϵ ( n ) A_{\epsilon}^{(n)} Aϵ(n) requires no more than n ( H ( X ) + ϵ ) + 1 n(H(X)+\epsilon)+1 n(H(X)+ϵ)+1 bits, the extra bit needed in case n ( H ( X ) + ϵ ) n(H(X)+\epsilon) n(H(X)+ϵ) is not an integer
  • Since ∣ A ϵ ( n ) ‾ ∣ ≤ ∣ X ∣ n \left|\overline{A_{\epsilon}^{(n)}}\right| \leq |\mathcal X|^n Aϵ(n)Xn, we can index each sequence in A ϵ ( n ) ‾ \overline{A_{\epsilon}^{(n)}} Aϵ(n) using no more than n log ⁡ ∣ X ∣ + n \log |\mathcal{X}|+ nlogX+ 1 bits, where ∣ X ∣ |\mathcal{X}| X is the cardinality (number of the elements) of the source alphabet
  • To distinguish between A ϵ ( n ) A_{\epsilon}^{(n)} Aϵ(n) and A ϵ ( n ) ‾ \overline{A_{\epsilon}^{(n)}} Aϵ(n) we need one additional bit

在这里插入图片描述

Note the following features of the coding scheme:

  • The typical sequences have short descriptions of length ≈ n H ( X ) \approx n H(X) nH(X)

  • We used a brute force method to enumerate the elements in A ϵ ( n ) ‾ , \overline{A_{\epsilon}^{(n)}}, Aϵ(n), without taking into account the fact that the number of elements in A ϵ ( n ) ‾ \overline{A_{\epsilon}^{(n)}} Aϵ(n) is less than the number of elements in X n \mathcal{X}^{n} Xn

  • The code is one-to-one and easily decodable; the initial bit acts as a flag bit to indicate the length of the codeword that follows

We use the notation x n x^{n} xn to denote a sequence x 1 , x 2 , … , x n . x_{1}, x_{2}, \ldots, x_{n} . x1,x2,,xn. Let l ( x n ) l\left(x^{n}\right) l(xn) be the length of the codeword corresponding to x n . x^{n} . xn. If n n n is sufficiently large so that Pr ⁡ { A ϵ ( n ) } ≥ 1 − δ , \operatorname{Pr}\left\{A_{\epsilon}^{(n)}\right\} \geq 1-\delta, Pr{Aϵ(n)}1δ, the expected length of the codeword is
E ( l ( X n ) ) = ∑ x n p ( x n ) l ( x n ) = ∑ x n ∈ A ϵ ( n ) p ( x n ) l ( x n ) + ∑ x n ∈ A ϵ ( n ) ‾ p ( x n ) l ( x n ) ≤ ∑ x n ∈ A ϵ ( n ) p ( x n ) ( n ( H ( X ) + ϵ ) + 2 ) + ∑ x n ∈ A ϵ ( n ) ‾ p ( x n ) ( n log ⁡ ∣ X ∣ + 2 ) = Pr ⁡ { A ϵ ( n ) } ( n ( H ( X ) + ϵ ) + 2 ) + Pr ⁡ { A ϵ ( n ) ‾ } ( n log ⁡ ∣ X ∣ + 2 ) ≤ ( n ( H ( X ) + ϵ ) + 2 ) + δ ( n log ⁡ ∣ X ∣ + 2 ) = n [ H ( X ) + ϵ + 2 n + δ ( log ⁡ ∣ X ∣ + 2 n ) ] = n [ H ( X ) + ϵ ′ ] \begin{aligned} E\left(l\left(X^{n}\right)\right)&=\sum_{x^{n}} p\left(x^{n}\right) l\left(x^{n}\right)\\ &=\sum_{x^{n}\in A_\epsilon^{(n)}} p\left(x^{n}\right) l\left(x^{n}\right)+\sum_{x^{n}\in \overline{A_\epsilon^{(n)}}} p\left(x^{n}\right) l\left(x^{n}\right)\\ &\le \sum_{x^{n}\in A_\epsilon^{(n)}} p\left(x^{n}\right) (n(H(X)+\epsilon)+2)+\sum_{x^{n}\in \overline{A_\epsilon^{(n)}}} p\left(x^{n}\right) (n\log |\mathcal X|+2)\\ &=\operatorname{Pr}\left\{A_{\epsilon}^{(n)}\right\}(n(H(X)+\epsilon)+2)+\operatorname{Pr}\left\{\overline{A_{\epsilon}^{(n)}}\right\}(n\log |\mathcal X|+2)\\ &\le (n(H(X)+\epsilon)+2)+\delta(n\log |\mathcal X|+2)\\ &=n[H(X)+\epsilon+\frac{2}{n}+\delta(\log |\mathcal X|+\frac{2}{n})]\\ &=n[H(X)+\epsilon '] \end{aligned} E(l(Xn))=xnp(xn)l(xn)=xnAϵ(n)p(xn)l(xn)+xnAϵ(n)p(xn)l(xn)xnAϵ(n)p(xn)(n(H(X)+ϵ)+2)+xnAϵ(n)p(xn)(nlogX+2)=Pr{Aϵ(n)}(n(H(X)+ϵ)+2)+Pr{Aϵ(n)}(nlogX+2)(n(H(X)+ϵ)+2)+δ(nlogX+2)=n[H(X)+ϵ+n2+δ(logX+n2)]=n[H(X)+ϵ]
where ϵ ′ \epsilon ' ϵ can be made arbitrarily small by an appropriate choice of n n n. Hence we have proved the following theorem.

Theorem 1:

Let X n X^{n} Xn be i.i.d. ∼ p ( x ) \sim p(x) p(x). Let ϵ > 0. \epsilon>0 . ϵ>0. Then there exists a code that maps sequences x n x^{n} xn of length n n n into binary strings such that the mapping is one-to-one (and therefore invertible) and
E [ 1 n l ( X n ) ] ≤ H ( X ) + ϵ (1) E\left[\frac{1}{n} l\left(X^{n}\right)\right] \leq H(X)+\epsilon\tag{1} E[n1l(Xn)]H(X)+ϵ(1)
for n n n sufficiently large.

Thus, we can represent sequences X n X^n Xn using n H ( X ) nH(X) nH(X) bits on the average.

How can the probability of error be arbitrary small? And what if the string is not binary?

Theorem 2 (Source coding theorem):

Given a discrete memoryless i.i.d. source { X n , n ∈ Z } ∼ p ( x n ) \{X_n,n\in \mathbb Z \}\sim p(x^n) {Xn,nZ}p(xn), we can encode source messages of length n n n into codewords of length l l l from a code alphabet of size r r r with arbitrary probability of error P e ≤ δ P_e\le \delta Peδ if and only if
r l ≥ 2 n ( H ( X ) + ϵ ) (2) r^l\ge 2^{n(H(X)+\epsilon)} \tag{2} rl2n(H(X)+ϵ)(2)
Proof:

The number of elements in A ϵ ( n ) A_\epsilon ^{(n)} Aϵ(n) is ∣ A ϵ ( n ) ∣ ≤ 2 n ( H ( X ) + ϵ ) ≤ r l |A_\epsilon^{(n)}|\le 2^{n(H(X)+\epsilon)}\le r^l Aϵ(n)2n(H(X)+ϵ)rl, so that the number of codewords is larger than the number of typical source words, and P e ≤ Pr ⁡ ( A ϵ ( n ) ‾ ) ≤ δ P_e \le \Pr(\overline{A_\epsilon^{(n)}})\le \delta PePr(Aϵ(n))δ (AEP).

Remark:

  • Code construction based on typical sets requires long source sequences ( n → ∞ ) (n\to \infty) (n).
  • For short sequences, this coding recipe leads to an inefficient representation of the information.
  • Can we do better than this for small n n n? (see [Shannon Code](# Shannon Code) and [Huffman Code](#Huffman Code))

Source Codes

Information produced by a discrete information source is represented using the alphabet X = { x 1 , ⋯   , x k } \mathcal X=\{x_1,\cdots, x_k\} X={x1,,xk}

Definition 1 (Source code):

A source code C C C for a random variable X X X is a mapping C : X ↦ C C:\mathcal X \mapsto \mathcal C C:XC, the set of finite length strings of symbols from a r r r-ary alphabet. C ( x ) C(x) C(x) denotes the codeword corresponding to x x x and its length is denoted by l ( x ) l(x) l(x).

Definition 2 (Extension):

The extension C ∗ C^{*} C of a code C C C is the mapping from finite-length strings of X \mathcal{X} X to finite-length strings of D \mathcal{D} D, defined by
C ( x 1 x 2 ⋯ x n ) = C ( x 1 ) C ( x 2 ) ⋯ C ( x n ) C\left(x_{1} x_{2} \cdots x_{n}\right)=C\left(x_{1}\right) C\left(x_{2}\right) \cdots C\left(x_{n}\right) C(x1x2xn)=C(x1)C(x2)C(xn)
where C ( x 1 ) C ( x 2 ) ⋯ C ( x n ) C\left(x_{1}\right) C\left(x_{2}\right) \cdots C\left(x_{n}\right) C(x1)C(x2)C(xn) indicates concatenation of the corresponding codewords.

E.g. If C ( x 1 ) = 00 C\left(x_{1}\right)=00 C(x1)=00 and C ( x 2 ) = 11 , C\left(x_{2}\right)=11, C(x2)=11, then C ( x 1 x 2 ) = 0011. C\left(x_{1} x_{2}\right)=0011 . C(x1x2)=0011.


在这里插入图片描述

Definition 3 (Non-singular):

A non-singular code is a code that uniquely maps each of the source symbols x ∈ X x\in \mathcal X xX into a code word C ( x ) C(x) C(x). That is
x i ≠ x j ⟹ C ( x i ) ≠ C ( x j ) x_i\ne x_j \Longrightarrow C(x_i)\ne C(x_j) xi=xjC(xi)=C(xj)
Definition 4 (Uniquely decodable):

A code is uniquely decodable if and only if its n n n-extension is non-singular for all n n n. That is
{ x 1 , ⋯   , x n } 1 ≠ { x 1 , ⋯   , x n } 2 ⟹ [ C ( x 1 ) , ⋯   , C ( x n ) ] 1 ≠ [ C ( x 1 ) , ⋯   , C ( x n ) ] 2 \{x_1,\cdots,x_n\}_1\ne \{x_1,\cdots,x_n\}_2 \Longrightarrow [C(x_1),\cdots,C(x_n)]_1\ne [C(x_1),\cdots,C(x_n)]_2 {x1,,xn}1={x1,,xn}2[C(x1),,C(xn)]1=[C(x1),,C(xn)]2
Definition 5 (Prefix/Instantaneous code):

A code is called a prefix or instantaneous code if no codeword is a prefix of any other codeword. It can be decoded without reference to future codewords since the end of a codeword is immediately recognizable.

Remark: A prefix code can be represented by a r r r-ary tree ( r r r is the size of the alphabet), where the codewords are represented by a leaf of the pruned tree.

在这里插入图片描述

The branches of the tree represent the symbols of the codeword. For example, the r r r branches arising from the root node represent the r r r possible values of the first symbol of the codeword. Then each codeword is represented by a leaf on the tree. The path from the root traces out the symbols of the codeword. The prefix condition on the codewords implies that no codeword is an ancestor of any other codeword on the tree. Hence, each codeword eliminates its descendants as possible codewords.


Examples:

在这里插入图片描述

N.B. Morse code is non-singular, uniquely decodable, but not instantaneous.

Kraft Inequality

We wish to construct instantaneous codes of minimum expected length to describe a given source. It is clear that we cannot assign short codewords to all source symbols and still be prefix-free. The set of codeword lengths possible for instantaneous codes is limited by the following inequality.

Theorem 3 (Kraft inequality):

For any instantaneous code over an alphabet of size r r r, the codewords lengths l ( x 1 ) , ⋯   , l ( x k ) l(x_1),\cdots,l(x_k) l(x1),,l(xk) must satisfy the inequality
∑ i = 1 k r − l ( x i ) ≤ 1 (3) \sum_{i=1}^k r^{-l(x_i)}\le 1 \tag{3} i=1krl(xi)1(3)
Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous code with these word lengths.

Proof:

Let l max ⁡ l_{\max } lmax be the longest codeword. A codeword at level l ( x i ) l\left(x_{i}\right) l(xi) has r l max ⁡ − l ( x i ) r^{l_{\max }-l\left(x_{i}\right)} rlmaxl(xi) descendants at level l max ⁡ . l_{\max } . lmax. In order to be a prefix code, each of these descendent sets must be disjoint and the total number of nodes in these sets is less than r l max ⁡ r^{l_{\max }} rlmax. Hence, summing over all the codewords, we obtain
∑ i = 1 k r l max ⁡ − l ( x i ) ≤ r l max ⁡ \sum_{i=1}^{k} r^{l_{\max }-l\left(x_{i}\right)} \leq r^{l_{\max }} i=1krlmaxl(xi)rlmax
and thus
∑ i = 1 k r − l ( x i ) ≤ 1 \sum_{i=1}^{k} r^{-l\left(x_{i}\right)} \leq 1 i=1krl(xi)1
N.B. For any countably infinite set of codewords that form a prefix code, the codeword lengths also satisfy the Kraft inequality, i.e.,
∑ i = 1 ∞ r − l ( x i ) ≤ 1 \sum_{i=1}^\infty r^{-l(x_i)}\le 1 i=1rl(xi)1

Optimal Codes

What is the minimum expected length of the prefix code? How do we find the prefix code with minimum expected length?

Bounds on the optimal code length

Consider a constrained optimization problem:
min ⁡ l ( x i ) ∑ i = 1 k p ( x i ) l ( x i ) subject to  ∑ i = 1 k r − l ( x i ) ≤ 1 (4) \min _{l(x_i)} \sum_{i=1}^k p(x_i)l(x_i)\quad \text{subject to } \sum_{i=1}^k r^{-l(x_i)}\le 1 \tag{4} l(xi)mini=1kp(xi)l(xi)subject to i=1krl(xi)1(4)
Lagrange multiplier technique:
min ⁡ l ( x i ) J ( l ( x i ) , λ ) = min ⁡ l ( x i ) ( ∑ i = 1 k p ( x i ) l ( x i ) + λ ( ∑ i = 1 k r − l ( x i ) − 1 ) ) \min _{l\left(x_{i}\right)} J\left(l\left(x_{i}\right), \lambda\right)=\min _{l\left(x_{i}\right)}\left(\sum_{i=1}^{k} p\left(x_{i}\right) l\left(x_{i}\right)+\lambda\left(\sum_{i=1}^{k} r^{-l\left(x_{i}\right)}-1\right)\right) l(xi)minJ(l(xi),λ)=l(xi)min(i=1kp(xi)l(xi)+λ(i=1krl(xi)1))
Hence
∂ J ∂ l ( x i ) = p ( x i ) − λ r − l ( x i ) ln ⁡ r ⇒ r − l ∗ ( x i ) = p ( x i ) λ ln ⁡ r ∑ i = 1 k r − l ∗ ( x i ) ≤ 1 ⇒ λ ln ⁡ r ≥ 1 ⇒ l ∗ ( x i ) ≥ − log ⁡ r p ( x i ) \begin{array}{l} \frac{\partial J}{\partial l\left(x_{i}\right)}=p\left(x_{i}\right)-\lambda r^{-l\left(x_{i}\right)} \ln r \quad \Rightarrow \quad r^{-l^{*}\left(x_{i}\right)}=\frac{p\left(x_{i}\right)}{\lambda \ln r} \\ \sum_{i=1}^{k} r^{-l^{*}\left(x_{i}\right)} \leq 1 \Rightarrow \lambda \ln r \geq 1 \Rightarrow l^{*}\left(x_{i}\right) \geq-\log _{r} p\left(x_{i}\right) \end{array} l(xi)J=p(xi)λrl(xi)lnrrl(xi)=λlnrp(xi)i=1krl(xi)1λlnr1l(xi)logrp(xi)
The average codelength, E l ( X ) , E l(X), El(X), then becomes
∑ i = 1 k p ( x i ) l ∗ ( x i ) ≥ − ∑ i = 1 k p ( x i ) log ⁡ r p ( x i ) = H ( X ) \sum_{i=1}^{k} p\left(x_{i}\right) l^{*}\left(x_{i}\right) \geq-\sum_{i=1}^{k} p\left(x_{i}\right) \log _{r} p\left(x_{i}\right)=H(X) i=1kp(xi)l(xi)i=1kp(xi)logrp(xi)=H(X)
As a consequence, we have that for any instantaneous code
E ⁡ l ( X ) ≥ H ( X ) (5) \operatorname{E}l(X) \geq H(X) \tag{5} El(X)H(X)(5)
with equality iff r − l ∗ ( x i ) = p ( x i ) r^{-l^{*}\left(x_{i}\right)}=p\left(x_{i}\right) rl(xi)=p(xi).

In the case that − log ⁡ p ( x i ) -\log p\left(x_{i}\right) logp(xi) is not integer, we should choose a set of codeword lengths “close” to the optimal set. Shannon suggested to round it up to the nearest integer
− log ⁡ p ( x i ) ≤ l ( x i ) < − log ⁡ p ( x i ) + 1 -\log p\left(x_{i}\right) \leq l\left(x_{i}\right)<-\log p\left(x_{i}\right)+1 logp(xi)l(xi)<logp(xi)+1
This choice satisfies Kraft’s inequality and we conclude that the optimal codeword length for a given source distribution satisfies
H ( X ) ≤ E l ( X ) ≤ H ( X ) + 1 (6) H(X) \leq E l(X) \leq H(X)+1 \tag{6} H(X)El(X)H(X)+1(6)

  • There is an overhead which at most 1 bit per symbol due to the fact that log ⁡ p ( x i ) \log p\left(x_{i}\right) logp(xi) is not always an integer
  • Overhead can be reduced by combining symbols into sequences

Encoding of sequences of length n n n :
H ( X 1 , … , X n ) ≤ E l ( X 1 , … , X n ) < H ( X 1 , … , X n ) + 1 H\left(X_{1}, \ldots, X_{n}\right) \leq E l\left(X_{1}, \ldots, X_{n}\right)<H\left(X_{1}, \ldots, X_{n}\right)+1 H(X1,,Xn)El(X1,,Xn)<H(X1,,Xn)+1
Define L n L_n Ln to be the expected codeword length per input symbol, that is
L n = 1 n E l ( X 1 , ⋯   , X n ) (7) L_n=\frac{1}{n}El(X_1,\cdots,X_n)\tag{7} Ln=n1El(X1,,Xn)(7)
Assuming symbols are drawn i.i.d. according to p ( x n ) , p\left(x^{n}\right), p(xn), we have that H ( X 1 , … , X n ) = n H ( X ) H\left(X_{1}, \ldots, X_{n}\right)=n H(X) H(X1,,Xn)=nH(X) and we conclude that
H ( X ) ≤ L n < H ( X ) + 1 n (8) H(X) \leq L_n<H(X)+\frac{1}{n} \tag{8} H(X)Ln<H(X)+n1(8)

This equation relates to ( 1 ) (1) (1).

For a sequence of symbols that is not necessarily i.i.d., we have the bound
H ( X 1 , … , X n ) n ≤ L n < H ( X 1 , … , X n ) n + 1 n (9) \frac{H\left(X_{1}, \ldots, X_{n}\right)}{n} \leq L_n<\frac{H\left(X_{1}, \ldots, X_{n}\right)}{n}+\frac{1}{n}\tag{9} nH(X1,,Xn)Ln<nH(X1,,Xn)+n1(9)
For stationary processes, we have the entropy rate
H ∞ ( X ) = lim ⁡ n → ∞ H ( X 1 , ⋯   , X n ) n H_\infty (X)=\lim _{n\to \infty} \frac{H(X_1,\cdots,X_n)}{n} H(X)=nlimnH(X1,,Xn)
Therefore, L n → H ∞ ( X ) L_n \rightarrow H_{\infty}(X) LnH(X) for n → ∞ n \rightarrow \infty n, which provides another justification for the definition of entropy rate - it is the expected number of bits per symbol required to describe the process.


Example:

在这里插入图片描述


Do there exist uniquely decodable, non-instantaneous codes that achieve shorter expected codelengths?

We have the following result (by McMillan):

The codeword lengths of any uniquely decodable code must satisfy the Kraft inequality.

This rather surprising result implies that the class of uniquely decodable codes does not offer any further choices for the set of codeword lengths than the class of prefix codes!

Shannon Code

In Shannon coding, the symbols are arranged in order from most probable to least probable, and assigned codewords by taking the first l i = ⌈ − log ⁡ p ( x i ) ⌉ l_i=\lceil -\log p(x_i) \rceil li=logp(xi) bits from the binary expansions of the cumulative probabilities F ( x i ) = ∑ j = 1 i − 1 p ( x j ) F(x_i)=\sum \limits_{j=1}^{i-1}p(x_j) F(xi)=j=1i1p(xj)

Example:

在这里插入图片描述

Shannon code is asymptotically optimal as n → ∞ n\to \infty n. However, the Shannon code may be much worse than the optimal code for finite n n n for some particular symbol.

For example, let p ( x 1 ) = 0.99 p(x_1)=0.99 p(x1)=0.99 and p ( x 2 ) = 0.01 p(x_2)=0.01 p(x2)=0.01. Obviously, an optimal code is C ( x 1 ) = 0 C(x_1)=0 C(x1)=0 and C ( x 2 ) = 1 C(x_2)=1 C(x2)=1. The Shannon code, though asymptotically optimal, assigns a codeword of length ⌈ log ⁡ 100 ⌉ = 7 \lceil \log 100 \rceil=7 log100=7 to x 2 x_2 x2. Note that in this case H ( X ) = 0.08 H(X)=0.08 H(X)=0.08 and 1 < E l ( x ) = 1.06 < H ( X ) + 1 1<El(x)=1.06<H(X)+1 1<El(x)=1.06<H(X)+1.

Huffman Code

Theorem 4 (Huffman coding):

Huffman coding is optimal, i.e., if C ∗ C^* C is a Huffman code and C ′ C' C is any other code, then E l ( C ∗ ) ≤ E l ( C ′ ) El(C^*)\le E l(C') El(C)El(C).

Remarks:

  • The lengths are ordered inversely with the probabilities
  • The two longest codewords have the same length
  • Two of the longest codewords differ only in the last bit and correspond to the two least likely symbols

How to construct Huffman codes:

  • For the binary case, the Huffman code arranges the messages in order of decreasing probability, and joins the two least probable source symbols together, resulting in a new message alphabet with one less symbol.
  • The new messages are reordered, after which two symbols are again joined together
  • Repeat until the last two symbols added to one
  • Every time when adding, assign 0 and 1 to the two added probabilities
  • Start with a symbol and go to the last “1”. Combine the 0 and 1 encountered on the route and flip the result.

Example:

在这里插入图片描述

N.B. The result is not unique. It can also be constructed as

在这里插入图片描述

Observations:

  • Huffman coding is not ideal since it is an bottom-up approach that requires the calculation of the probabilities of all source sequences and the construction of the corresponding complete code tree.
  • Cannot easily extended to longer block length without having to redo all the calculations

Arithmetic Coding

The Huffman coding procedure described is optimal for encoding a random variable with a known distribution that has to be encoded symbol by symbol. However, due to the fact that the codeword lengths for a Huffman code were restricted to be integral, there could be a loss of up to 1 bit per symbol in coding efficiency. We could alleviate this loss by using blocks of input symbols—however, the complexity of this approach increases exponentially with block length. We now describe a method of encoding without this inefficiency. In arithmetic coding, instead of using a sequence of bits to represent a symbol, we represent it by a subinterval of the unit interval.

N.B. Huffman code is still the optimal in the sense that we encode the whole sequence, not symbol by symbol.

[Encoding and Decoding details: Slides 40-52]

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1 Basic Techniques 17 1.1 Intuitive Compression 17 1.2 Run-Length Encoding 22 1.3 RLE Text Compression 23 1.4 RLE Image Compression 27 1.5 Move-to-Front Coding 37 1.6 Scalar Quantization 40 1.7 Recursive Range Reduction 42 2 Statistical Methods 47 2.1 Information Theory Concepts 48 2.2 Variable-Size Codes 54 2.3 Prefix Codes 55 2.4 Tunstall Code 61 2.5 The Golomb Code 63 2.6 The Kraft-MacMillan Inequality 71 2.7 Shannon-Fano Coding 72 2.8 Huffman Coding 74 2.9 Adaptive Huffman Coding 89 2.10 MNP5 95 2.11 MNP7 100 2.12 Reliability 101 2.13 Facsimile Compression 104 2.14 Arithmetic Coding 112 xxii Contents 2.15 Adaptive Arithmetic Coding 125 2.16 The QM Coder 129 2.17 Text Compression 139 2.18 PPM 139 2.19 Context-Tree Weighting 161 3 Dictionary Methods 171 3.1 String Compression 173 3.2 Simple Dictionary Compression 174 3.3 LZ77 (Sliding Window) 176 3.4 LZSS 179 3.5 Repetition Times 182 3.6 QIC-122 184 3.7 LZX 187 3.8 LZ78 189 3.9 LZFG 192 3.10 LZRW1 195 3.11 LZRW4 198 3.12 LZW 199 3.13 LZMW 209 3.14 LZAP 212 3.15 LZY 213 3.16 LZP 214 3.17 Repetition Finder 221 3.18 UNIX Compression 224 3.19 GIF Images 225 3.20 RAR and WinRAR 226 3.21 The V.42bis Protocol 228 3.22 Various LZ Applications 229 3.23 Deflate: Zip and Gzip 230 3.24 LZMA and 7-Zip 241 3.25 PNG 246 3.26 XML Compression: XMill 251 3.27 EXE Compressors 253 3.28 CRC 254 3.29 Summary 256 3.30 Data Compression Patents 256 3.31 A Unification 259 Contents xxiii 4 Image Compression 263 4.1 Introduction 265 4.2 Approaches to Image Compression 270 4.3 Intuitive Methods 283 4.4 Image Transforms 284 4.5 Orthogonal Transforms 289 4.6 The Discrete Cosine Transform 298 4.7 Test Images 333 4.8 JPEG 337 4.9 JPEG-LS 354 4.10 Progressive Image Compression 360 4.11 JBIG 369 4.12 JBIG2 378 4.13 Simple Images: EIDAC 389 4.14 Vector Quantization 390 4.15 Adaptive Vector Quantization 398 4.16 Block Matching 403 4.17 Block Truncation Coding 406 4.18 Context-Based Methods 412 4.19 FELICS 415 4.20 Progressive FELICS 417 4.21 MLP 422 4.22 Adaptive Golomb 436 4.23 PPPM 438 4.24 CALIC 439 4.25 Differential Lossless Compression 442 4.26 DPCM 444 4.27 Context-Tree Weighting 449 4.28 Block Decomposition 450 4.29 Binary Tree Predictive Coding 454 4.30 Quadtrees 461 4.31 Quadrisection 478 4.32 Space-Filling Curves 485 4.33 Hilbert Scan and VQ 487 4.34 Finite Automata Methods 497 4.35 Iterated Function Systems 513 4.36 Cell Encoding 529 xxiv Contents 5 Wavelet Methods 531 5.1 Fourier Transform 532 5.2 The Frequency Domain 534 5.3 The Uncertainty Principle 538 5.4 Fourier Image Compression 540 5.5 The CWT and Its Inverse 543 5.6 The Haar Transform 549 5.7 Filter Banks 566 5.8 The DWT 576 5.9 Multiresolution Decomposition 589 5.10 Various Image Decompositions 589 5.11 The Lifting Scheme 596 5.12 The IWT 608 5.13 The Laplacian Pyramid 610 5.14 SPIHT 614 5.15 CREW 626 5.16 EZW 626 5.17 DjVu 630 5.18 WSQ, Fingerprint Compression 633 5.19 JPEG 2000 639 6 Video Compression 653 6.1 Analog Video 653 6.2 Composite and Components Video 658 6.3 Digital Video 660 6.4 Video Compression 664 6.5 MPEG 676 6.6 MPEG-4 698 6.7 H.261 703 6.8 H.264 706 7 Audio Compression 719 7.1 Sound 720 7.2 Digital Audio 724 7.3 The Human Auditory System 727 7.4 WAVE Audio Format 734 7.5 μ-Law and A-Law Companding 737 7.6 ADPCM Audio Compression 742 7.7 MLP Audio 744 7.8 Speech Compression 750 7.9 Shorten 757 7.10 FLAC 762 7.11 WavPack 772 7.12 Monkey’s Audio 783 7.13 MPEG-4 Audio Lossless Coding (ALS) 784 7.14 MPEG-1/2 Audio Layers 795 7.15 Advanced Audio Coding (AAC) 821 7.16 Dolby AC-3 847 Contents xxv 8 Other Methods 851 8.1 The Burrows-Wheeler Method 853 8.2 Symbol Ranking 858 8.3 ACB 862 8.4 Sort-Based Context Similarity 868 8.5 Sparse Strings 874 8.6 Word-Based Text Compression 885 8.7 Textual Image Compression 888 8.8 Dynamic Markov Coding 895 8.9 FHM Curve Compression 903 8.10 Sequitur 906 8.11 Triangle Mesh Compression: Edgebreaker 911 8.12 SCSU: Unicode Compression 922 8.13 Portable Document Format (PDF) 928 8.14 File Differencing 930 8.15 Hyperspectral Data Compression 941 Answers to Exercises 953 Bibliography 1019 Glossary 1041 Joining the Data Compression Community 1067 Index 1069

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值