统计推断(七) Typical Sequence

最新推荐文章于 2020-03-09 22:29:58 发布

Bonennult

最新推荐文章于 2020-03-09 22:29:58 发布

阅读量859

点赞数 1

分类专栏：统计推断文章标签：统计学

本文链接：https://blog.csdn.net/weixin_41024483/article/details/104165242

版权

统计推断专栏收录该内容

12 篇文章 4 订阅

订阅专栏

1. 一些定理

Markov inequality: $\ \ \mathsf{x}\ge0$
$\mathbb{P}(x\ge\mu)\le \frac{\mathbb{E}[x]}{\mu}$
Proof: omit…

Weak law of large numbers(WLLN): $\vec{y}=[y_1,y_2,...,y_N]^T, \ \ \ \ y_i \sim p \ \ \ i.i.d$
$\lim_{N\to\infty}\mathbb{P}(|L_p(\vec{y})+H(p)|>\varepsilon)=0, \ \ \forall \varepsilon>0$
Proof: omit…

2. Typical set

Definition: $\mathcal{T}_\varepsilon(p;N)=\{\vec{y}\in\mathcal{Y}^N:|L_p(\vec{y})+H(p)|<\varepsilon\}$
Properties
- WLLN $\Longrightarrow P\left(\vec{y}\in\mathcal{T}_\varepsilon(p;N)\right)\simeq1$ , $N$ large
- $L_p(\vec{y})\simeq H(p) \Longrightarrow p_y(\vec{y})\simeq 2^{-NH(p)}$
- $\Longrightarrow |\mathcal{T}_\varepsilon(p;N)|\simeq 2^{NH(p)}$
- 当 p 不是均匀分布的时候， $\frac{|\mathcal{T}_\varepsilon(p;N)|}{|\mathcal{Y}^N|}\to0$ ，也就是说典型集中元素(序列)个数在所有可能的元素(序列)中所占比例趋于 0，但是典型集中元素概率的和却趋近于 1
Theorem
Asymptotic Equipartition Property(AEP)
$\lim_{N\to\infty}P(\mathcal{T}_\varepsilon(p;N))=1 \\$

$2^{-N(H(p)+\epsilon)} \leq p_{\mathrm{y}}(\boldsymbol{y}) \leq 2^{-N(H(p)-\epsilon)}, \forall \boldsymbol{y} \in \mathcal{T}_{\epsilon}(p ; N)$
- for a sufficient large $N$
$(1-\epsilon) 2^{N(H(p)-\epsilon)} \leq\left|\mathcal{T}_{\epsilon}(p ; N)\right| \leq 2^{N(H(p)+\epsilon)}$

Proof:
$\begin{aligned}\left|\mathcal{T}_{\epsilon}(p ; N)\right| &=\sum_{\boldsymbol{y} \in \mathcal{T}_{\epsilon}(p ; N)} 1 \\ &=2^{N(H(p)+\epsilon)} \sum_{\boldsymbol{y} \in \mathcal{T}_{\epsilon}(p ; N)} 2^{-N(H(p)+\epsilon)} \\ & \leq 2^{N(H(p)+\epsilon)} \sum_{\boldsymbol{y} \in \mathcal{T}_{\epsilon}(p ; N)} p_{\mathbf{y}}(\boldsymbol{y}) \\ &=2^{N(H(p)+\epsilon)} P\left\{\mathcal{T}_{\epsilon}(p ; N)\right\} \\ & \leq 2^{N(H(p)+\epsilon)} \end{aligned}$

3. Divergence $\varepsilon$ -typical set

WLLN: $\vec{y}=[y_1,y_2,...,y_N]^T, \ \ \ \ y_i \sim p \ \ \ i.i.d$
$$
L_{p | q}(\boldsymbol{y})=\frac{1}{N} \log \frac{p_{\mathbf{y}}(\boldsymbol{y})}{q_{\mathbf{y}}(\boldsymbol{y})}=\frac{1}{N} \sum_{n=1}^{N} \log \frac{p\left(y_{n}\right)}{q\left(y_{n}\right)} \

\lim {N \rightarrow \infty} \mathbb{P}\left(\left|L{p | q}(\boldsymbol{y})-D(p | q)\right|>\epsilon\right)=0
$$
Remarks: 前面只考虑的均值，这里还考虑了另一个分布
Definition: $\vec{\boldsymbol{y}}=[y_1,y_2,...,y_N]^T, \ \ \ \ y_i \sim p \ \ \ i.i.d$
$\mathcal{T}_{\epsilon}(p | q ; N)=\left\{\boldsymbol{y} \in \mathcal{Y}^{N}:\left|L_{p | q}(\boldsymbol{y})-D(p \| q)\right| \leq \epsilon\right\}$
Properties
- WLLN $\Longrightarrow q_{\mathbf{y}}(\boldsymbol{y}) \approx p_{\mathbf{y}}(\boldsymbol{y}) 2^{-N D(p \| q)}$
- $Q\left\{\mathcal{T}_{\epsilon}(p | q ; N)\right\} \approx 2^{-N D(p \| q)} \to0$
- Remarks: p 的典型集可能是 q 的非典型集，在 $N$ 很大的时候，不同分布的 typical set 是正交的
Theorem
$(1-\epsilon) 2^{-N(D(p \| q)+\epsilon)} \leq Q\left\{\mathcal{T}_{\epsilon}(p \| q ; N)\right\} \leq 2^{-N(D(p \| q)-\epsilon)}$

4. Large deviation of sample averages

Theorem (Cram´er’s Theorem): $\vec{\boldsymbol{y}}=[y_1,y_2,...,y_N]^T, \ \ \ y_i \sim q \ \ \ i.i.d$ with mean $\mu<\infty$ and $\gamma>\mu$
$\lim _{N \rightarrow \infty}-\frac{1}{N} \log \mathbb{P}\left(\frac{1}{N} \sum_{n=1}^{N} y_{n} \geq \gamma\right)=E_{C}(\gamma)$
where $E_C(\gamma)$ is referred as Chernoﬀ exponent
$E_{C}(\gamma) \triangleq D(p(\cdot ; x) \| q),\ \ \ p(\cdot ; x)=q(y) e^{x y-\alpha(x)}$
and with $x > 0$ chosen such that
$\mathbb{E}_{p(\cdot;x)}[y]=\gamma$
Proof:

$\begin{aligned} \mathbb{P}\left(\frac{1}{N} \sum_{n=1}^{N} y_{n} \geq \gamma\right) &=\mathbb{P}\left(e^{x \sum_{n=1}^{N} y_{n}} \geq e^{N x \gamma}\right) \\ & \leq e^{-N x \gamma} \mathbb{E}\left[e^{x \sum_{n=1}^{N} y_{n}}\right] \\ &=e^{-N x \gamma}\left(\mathbb{E}\left[e^{x y}\right]\right)^{N} \\ & \leq e^{-N\left(x_{*} \gamma-\alpha\left(x_{*}\right)\right)} \end{aligned}$
$\varphi(x)=x\gamma-\alpha(x)$ 是凸的，最大值取在 $\mathbb{E}_{p\left(\cdot ; x_{*}\right)}[y]=\dot{\alpha}\left(x_{*}\right)=\gamma$
可以证明 $x_{*} \gamma-\alpha\left(x_{*}\right)=x_{*} \dot{\alpha}\left(x_{*}\right)-\alpha\left(x_{*}\right)=D\left(p\left(\cdot ; x_{*}\right) \| q\right)$
于是有 $\mathbb{P}\left(\frac{1}{N} \sum_{n=1}^{N} y_{n} \geq \gamma\right) \leq e^{-N E_{C}(\gamma)}$
下界的证明，暂时略…

用到的两个事实： $p(y;x)=q(y)\exp(xy-\alpha(x))$

$D (p (y; x) ∣ ∣ q (y))$ 随着 x 单调增加
$\mathbb{E}_{p(;x)}[y]$ 随着 x 单调增加

Remarks:

这个定理也相当于表达了 $\mathbb{P}\left(\frac{1}{N} \sum_{n=1}^{N} y_{n} \geq \gamma\right) \cong 2^{-N E_{\mathrm{C}}(\gamma)}$
相当于是分布 q 向由 $\mathbb{E}[y]=\sum_{n=1}^{N} y_{n} \geq \gamma$ 所定义的一个凸集中投影，恰好投影到边界(线性分布族) $\mathbb{E}[y]=\gamma$ 上，而 $q$ 向线性分布族的投影恰好就是 (10) 中的指数族表达式

5. Types and type classes

Definition: $\vec{\boldsymbol{y}}=[y_1,y_2,...,y_N]^T$ (不关心真实服从的是哪个分布)
- type(实质上就是一个经验分布)定义为
$\hat{p}(b ; \mathbf{y})=\frac{1}{N} \sum_{n=1}^{N} \mathbb{1}_{b}\left(y_{n}\right)=\frac{N_{b}(\mathbf{y})}{N}$
- $\mathcal{P}_{N}^{y}$ 表示长度为 $N$ 的序列所有可能的 types
- type class: $\mathcal{T}_{N}^{y}(p)=\left\{\mathbf{y} \in y^{N}: \hat{p}(\cdot ; \mathbf{y}) \equiv p(\cdot)\right\},\ \ \ p\in\mathcal{P}_{N}^{y}$
Exponential Rate Notation: $\doteq 2^{N \alpha}$
$\lim _{N \rightarrow \infty} \frac{\log f(N)}{N}=\alpha$
Remarks: $\alpha$ 表示了指数上面关于 $N$ 的阶数(log、线性、二次 …)
Properties
- $\left|\mathcal{P}_{N}^{y}\right| \leq(N+1)^{|y|}$
- $q^{N}(\mathbf{y})=2^{-N(D(\hat{p}(\cdot \mathbf{y}) \| q)+H(\hat{p}(\cdot ; \mathbf{y})))}$
  $p^{N}(\mathbf{y})=2^{-N H(p)} \quad \text { for } \mathbf{y} \in \mathcal{T}_{N}^{y}(p)$
- $N^{-|y|} 2^{N H(p)} \leq\left|\mathcal{T}_{N}^{y}(p)\right| \leq 2^{N H(p)}$
Theorem
$N^{-|y|} 2^{-N D(p \| q)} \leq Q\left\{\mathcal{T}_{N}^{y}(p)\right\} \leq 2^{-N D(p \| q)} \\ Q\left\{\mathcal{T}_{N}^{y}(p)\right\} \doteq 2^{-N D(p \| q)}$

6. Large Deviation Analysis via Types

Definition: $\mathcal{R}=\left\{\mathbf{y} \in y^{N}: \hat{p}(\cdot ; \mathbf{y}) \in \mathcal{S} \cap \mathcal{P}_{N}^{y}\right\}$

Sanov’s Theorem:
$Q\left\{\mathrm{S} \cap \mathcal{P}_{N}^{y}\right\} \leq(N+1)^{|y|} 2^{-N D\left(p_{*} \| q\right)} \\ Q\left\{\mathrm{S} \cap \mathcal{P}_{N}^{y}\right\} \dot\leq 2^{-N D\left(p_{*} \| q\right)} \\ p_{*}=\underset{p \in \mathcal{S}}{\arg \min } D(p \| q)$

7. Asymptotics of hypothesis testing

LRT: $L(\boldsymbol{y})=\frac{1}{N} \log \frac{p_{1}^{N}(\boldsymbol{y})}{p_{0}^{N}(\boldsymbol{y})}=\frac{1}{N} \sum_{n=1}^{N} \log \frac{p_{1}\left(y_{n}\right)}{p_{0}\left(y_{n}\right)} \frac{>}{<} \gamma$
$P_{F}=\mathbb{P}_{0}\left\{\frac{1}{N} \sum_{n=1}^{N} t_{n} \geq \gamma\right\} \approx 2^{-N D\left(p^{*} \| p_{0}^{\prime}\right)}$
$P_{M}=1-P_{D} \approx 2^{-N D\left(p^{*} \| p_{1}^{\prime}\right)}$
$D\left(p^{*} \| p_{0}^{\prime}\right)-D\left(p^{*} \| p_{1}^{\prime}\right)=\int p^{*}(t) \log \frac{p_{1}^{\prime}(t)}{p_{0}^{\prime}(t)} \mathrm{d} t=\int p^{*}(t) t \mathrm{d} t=\mathbb{E}_{p^{*}}[\mathrm{t}]=\gamma$

asymptotic

8.Asymptotics of parameter estimation

Strong Law of Large Numbers(SLLN):
$\mathbb{P}\left(\lim _{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^{N} w_{n}=\mu\right)=1$
Central Limit Theorem(CLT):
$\lim _{N \rightarrow \infty} \mathbb{P}\left(\frac{1}{\sqrt{N}} \sum_{n=1}^{N}\left(\frac{w_{n}-\mu}{\sigma}\right) \leq b\right)=\Phi(b)$
以下三个强度依次递减

依概率 1 收敛(SLLN)： $\mathsf{x}_{N} \stackrel{w . p .1}{\longrightarrow} a$
概率趋于 0(WLLN):
依分布收敛: $\mathsf{x}_{N} \stackrel{d}{\longrightarrow} p$

Asymptotics of ML Estimation

Theorem:
$\hat{x}_{N}=\arg \max _{x} L_{N}(x ; \mathbf{y})$
在满足某些条件下(mild conditions)，有
$\begin{array}{c}{\hat{x}_{N} \stackrel{w \cdot p \cdot 1}{\longrightarrow} x_{0}} \\ {\sqrt{N}\left(\hat{x}_{N}-x_{0}\right) \stackrel{d}{\longrightarrow} \mathcal{N}\left(0, J_{y}\left(x_{0}\right)^{-1}\right)}\end{array}$

其他内容请看：
统计推断(一) Hypothesis Test
统计推断(二) Estimation Problem
统计推断(三) Exponential Family
统计推断(四) Information Geometry
统计推断(五) EM algorithm
统计推断(六) Modeling
统计推断(七) Typical Sequence
统计推断(八) Model Selection
统计推断(九) Graphical models
统计推断(十) Elimination algorithm
统计推断(十一) Sum-product algorithm

Bonennult

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
统计推断(七) Typical Sequence

1. 一些定理Markov inequality: r.v.  x≥0r.v. \ \ \mathsf{x}\ge0r.v.  x≥0P(x≥μ)≤E[x]μ\mathbb{P}(x\ge\mu)\le \frac{\mathbb{E}[x]}{\mu}P(x≥μ)≤μE[x]Proof: omit…Weak law of large num...
复制链接

扫一扫