Asymptotic Equipartition Property

拉普拉斯的汪

于 2021-02-22 01:57:26 发布

阅读量426

点赞数 4

分类专栏： Information Theory

本文链接：https://blog.csdn.net/qq_39599295/article/details/113927010

版权

Information Theory 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

Reference:

Elements of Information Theory, 2nd Edition

Slides of EE4560, TUD

Content

AEP

In information theory, the analog of the law of large numbers is the asymptotic equipartition property (AEP). It is a direct consequence of the weak law of large numbers.

For independent identically distributed (i.i.d.) random variables $X_1,\cdots, X_n$ , the weak law of the large numbers states that
$\frac{1}{n}\sum_{i=1}^nx_i\to EX\quad \text{in probability}$
The AEP states that
$-\frac{1}{n}\log p(x_1,\cdots,x_n)\to -E\log p(X)=H(X)\quad \text{in probability}$
To show this rigorously, we are going to introduce some definitions and theorems.

Definition 1:

Given a sequence of i.i.d. random variables $X_i$ . We say that the sequence $X_1,X_2,\cdots $ converges to a random variable $X$ :

In probability if for every $\epsilon >0$ , $\Pr\{|X_n-X|>\epsilon\}\to 0$
In mean square if $E(X_n-X)^2\to 0$
With probability 1 (also called almost surely) if $\Pr\{\lim_{n\to \infty}X_n=X\}=1$

Theorem 1 (Weak law of large numbers):

Given a sequence of i.i.d. random variables $X_i$ . Then for any $\epsilon>0$ and $\delta>0$ , there exists an $n_0$ such that for any $n>n_0$
$\Pr\left( \left|\frac{1}{n}\sum_{i=1}^n X_i-EX \right|<\epsilon \right)\ge 1-\delta\tag{1}$
It is a direct result of Chebyshev Inequality.

Theorem 2 (AEP):

If $X_1,X_2,\cdots$ are i.i.d. $\sim p(x)$ , i.e., $\{ X_N, n\in \mathbb Z \}\sim p(x^n)$ , then for any $\epsilon>0$ and $\delta>0$ , there exists an $n_0$ such that for any $n>n_0$
$\Pr\left( \left|-\frac{1}{n}\log p(X_1,\cdots,X_n)-H(X) \right|<\epsilon \right)\ge 1-\delta\tag{2}$
Proof: By the weak law of large numbers,
$\begin{aligned} -\frac{1}{n}\log p(X_1,\cdots,X_n)&=-\frac{1}{n}\log \prod_{i=1}^n p(X_i)=-\frac{1}{n}\sum_{i=1}^n\log p(X_i)\\ &\to -E\log p(X)\quad \text{in probability}\\ &=H(X) \end{aligned}$

As a consequence, the probability $p(x_1,\cdots,x_n)$ of almost all sequences will be close to $2^{-nH(X)}$ when $n$ is large.

Typical Set

We can derive the set of all sequences into two sets, the typical set, where the sample entropy is close to the true entropy (will be explained later), and the non-typical set, which contains the other sequences.

Definition 2 (Typical set):

The typical set, denoted by $A_\epsilon^{(n)}$ , is defined by
$A_\epsilon^{(n)}=\left\{(x_1,\cdots,x_n):\left|-\frac{1}{n}\log p(x_1,\cdots,x_n)-H(X)\right|<\epsilon \right\} \tag{3}$
It is the set of sequences $(x_1,\cdots, x_n)\in \mathcal X^n$ having the property
$2^{-n(H(X)-\epsilon)}>p(x_1,\cdots,x_n)>2^{-n(H(X)+\epsilon)}\tag{4}$
As a consequence of the AEP, we can show that the set $A_\epsilon^{(n)}$ has the following properties:

Theorem 3 (Properties of typical sets):

If $\left(x_{1}, x_{2}, \ldots, x_{n}\right) \in A_{\epsilon}^{(n)},$ then $H(X)-\epsilon \leq-\frac{1}{n} \log p\left(x_{1}, x_{2}, \ldots x_{n}\right) \leq H(X)+\epsilon$ .
$\operatorname{Pr}\left\{A_{\epsilon}^{(n)}\right\}>1-\epsilon$ for $n$ sufficiently large.
$\left|A_{\epsilon}^{(n)}\right| \leq 2^{n(H(X)+\epsilon)},$ where $∣ A ∣$ denotes the number of elements in the set $A$ .
$\left|A_{\epsilon}^{(n)}\right| \geq(1-\epsilon) 2^{n(H(X)-\epsilon)}$ for $n$ sufficiently large.

Thus, the typical set has probability nearly $1$ , all elements of the typical set are nearly equiprobable with probability close to $2^{-nH}$ , and the number of elements in the typical set is nearly $2^{nH}$ .

在这里插入图片描述

Proof:

The proof of property $(1)$ is immediate from the definition of $A_\epsilon^{(n)}$ .
$\operatorname{Pr}\left\{A_{\epsilon}^{(n)}\right\}$ means the probability of the event $(X_1,X_2,\cdots,X_n)\in A_\epsilon^{(n)}$ . The proof of property $(2)$ follows directly from Theorem 2, since the probability of the event $(X_1,X_2,\cdots,X_n)\in A_\epsilon^{(n)}$ tends to $1$ as $n\to \infty$ . Thus, for any $\delta>0$ , there exists an $n_0$ such that for all $n\ge n_0$ , we have

$\begin{aligned} \operatorname{Pr}\left\{A_{\epsilon}^{(n)}\right\}&=\operatorname{Pr}\left\{(X_1,X_2,\cdots,X_n)\in A_\epsilon^{(n)}\right\}\\&=\Pr\left\{\left| -\frac{1}{n}\log p(x_1,\cdots,x_n)-H(X) \right|<\epsilon\right\}\\ &> 1-\delta \end{aligned}$

To prove property $(3)$ , we can use Eq. $(4)$ and write
$\begin{aligned} 1&=\sum _{\mathbf x\in \mathcal X^n}p(\mathbf x)\ge\sum _{\mathbf x\in A_\epsilon^{(n)}}p(\mathbf x)\ge \sum _{\mathbf x\in A_\epsilon^{(n)}}2^{-n(H(X)+\epsilon)}\\&=2^{-n(H(X)+\epsilon)}\left|A_\epsilon^{(n)}\right| \end{aligned}$
Hence $\left|A_{\epsilon}^{(n)}\right| \leq 2^{n(H(X)+\epsilon)}$ .
The property $(4)$ can be derived from property $(2)$ and Eq. $(4)$ . For sufficiently large $n$ , $\Pr\{A_{\epsilon}^{(n)}\}>1-\epsilon$ , so that
$\begin{aligned} 1-\epsilon&<\Pr\{A_{\epsilon}^{(n)}\}=\operatorname{Pr}\left\{(X_1,X_2,\cdots,X_n)\in A_\epsilon^{(n)}\right\}\\ &=\sum_{\mathbf x\in A_{\epsilon}^{(n)}}p(\mathbf x)\le \sum_{\mathbf x\in A_{\epsilon}^{(n)}}2^{-n(H(X)-\epsilon)}=2^{-n(H(X)-\epsilon)}\left|A_\epsilon^{(n)}\right| \end{aligned}$
Hence $\left|A_{\epsilon}^{(n)}\right| \geq(1-\epsilon) 2^{n(H(X)-\epsilon)}$ .

Roughly speaking, typical sequences are sequences of which the proportion of occurrences of its alphabet symbols is close to the true probability of occurrence:
$N(x_i)/n\approx p(x_i)$
Examples: slides 17-22.

Discussion: We have that $\Pr (A_\epsilon^{(n)})\to 1$ as $n\to \infty$ . Does this imply that sequences in $\overline{A_\epsilon^{(n)}}$ have lower probability as compared to the ones in $A_\epsilon^{(n)}$ ?

$\Pr (A_\epsilon^{(n)})\to 1$ means that $X_1,X_2,\cdots X_n$ is likely to be in the typical set as $n\to \infty$ . But a single typical sequence does not necessarily have the highest probability. For example:

Consider a stochastic process consisting of Bernoulli random variables having probabilities $\Pr (X=0)=1/3$ and $\Pr (X=1)=2/3$ .

The most likely sequence (length 6): 1 1 1 1 1 1

A typical sequence (length 6): 1 0 1 1 1 0

High-probability Sets

From the definition of $A_\epsilon ^{(n)}$ , it is clear that $A_\epsilon^{(n)}$ is a fairly small set that contains most of the probability. But from the definition, it is not clear whether it is the smallest high-probability set. We first give a definition of the smallest high-probability set:

Definition 3 (The smallest high-probability set):

For each $n=1,2,\cdots$ , let $Q_q^{(n)}\subset \mathcal X^n$ as the smallest high-probability set with
$\Pr\{Q_q^{(n)}\}\ge 1-q \tag{5}$
We can show that, for sufficiently small $q$ , the set $A_\epsilon^{(n)}$ and $Q_q^{(n)}$ have significant intersection:
$\begin{aligned} \Pr\{A_\epsilon^{(n)} \cap Q_q^{(n)} \} &=\Pr\{A_\epsilon^{(n)}\}+\Pr\{Q_q^{(n)}\}-\Pr\{A_\epsilon^{(n)} \cup Q_q^{(n)}\}\\ &>1-\delta+1-q-1\\ &=1-\delta-q \end{aligned}$
And we are going to show that $|A_\epsilon^{(n)}|$ and $Q_q^{(n)}|$ are about the same.

Theorem 4:

For any $0 < q < 1$ , we have
$\frac{1}{n}\log |Q_q^{(n)}|>H(X)-\epsilon ' \tag{6}$
where $\epsilon '$ can be made arbitrarily small.

Proof:
$\begin{aligned} 1-\delta-q&<\Pr\{A_\epsilon^{(n)} \cap Q_q^{(n)} \} =\sum_{x^n\in A_\epsilon^{(n)} \cap Q_q^{(n)}}p(x^n)\\ &<\sum_{x^n\in A_\epsilon^{(n)} \cap Q_q^{(n)}}2^{-n(H(X)-\epsilon)}=|A_\epsilon^{(n)} \cap Q_q^{(n)}|2^{-n(H(X)-\epsilon)}\\ &\le |Q_q^{(n)}|2^{-n(H(X)-\epsilon)} \end{aligned}$

Hence, $|Q_q^{(n)}|>(1-\delta-q)2^{n(H(X)-\epsilon)}$ .

Thus, $Q_q^{(n)}$ must have at least $2^{nH}$ elements. And $A_\epsilon^{(n)}$ has $2^{n(H\pm \epsilon)}$ elements. Therefore, $A_\epsilon^{(n)}$ is about the same size as the smallest high-probability set.

拉普拉斯的汪

关注

4
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Asymptotic Equipartition Property

Reference:Elements of Information Theory, 2nd EditionSlides of EE4560, TUDContentAEPTypical SetHigh-probability SetsAEPIn information theory, the analog of the law of large numbers is the asymptotic equipartition property (AEP). It is a direct consequ
复制链接

扫一扫