Monte Carlo笔记-2：MCMC，附简易python代码

最新推荐文章于 2020-12-07 22:56:52 发布

你狗

最新推荐文章于 2020-12-07 22:56:52 发布

阅读量1.1k

点赞数 1

文章标签：人工智能 python

原文链接：http://www.cnblogs.com/simplex/p/6777957.html

版权

Monte Carlo笔记-2：MCMC，附简易python代码

Markov chain

基本概念

当随机变量序列\(\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{n}\}\)的联合分布满足：

\(p(\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{n})=p(\mathbf{x}_{1})\prod_{t=2}^{n}p(\mathbf{x}_{t}|\mathbf{x}_{t-1})\)

时，称其为Markov model或者Markov chain。特点是\(\mathbf{x}_{t}\)包含了所有过去的信息，下一随机变量
\(\mathbf{x}_{t+1}\)的分布完全由\(\mathbf{x}_{t}\)决定，\(p(\mathbf{x}_{t}|\mathbf{x}_{1:t-1})=p(\mathbf{x}_{t}|\mathbf{x}_{t-1})\)。

\(\mathbf{x}_{t}\)的边缘分布为：

\(\mathbf{x}_{t}=\int_{\mathbf{x}_{1:t-1}}p(\mathbf{x}_{1})\prod_{t=2}^{n}p(\mathbf{x}_{t}|\mathbf{x}_{t-1})d\mathbf{x}_{1}d\mathbf{x}_{2}...d\mathbf{x}_{t-1} \)

当\(\mathbf{x}_{t}\)的值域为有限集合\(\{1,2,\cdots,K\}\)时，这种离散的马尔可夫链可以用\(K\times K\)跃迁矩阵\(A\)表示，其中\(A_{ij}=p(\mathbf{x}_{t}=j|\mathbf{x}_{t-1}=i)\)表示从状态\(i\)跃迁到状态\(j\)的概率。

另外，用\(A(n)\)表示\(n\)步跃迁，即\(A_{ij(n)}=p(\mathbf{x}_{t+n}=j|\mathbf{x}_{t}=i)\)。

对于离散情况，利用跃迁矩阵\(A\)，可以将\(\mathbf{x}_{t}\)的边缘分布表示：

\(\mathbf{x}_{t}=\int_{\mathbf{x}_{1:t-1}}p(\mathbf{x}_{1})\prod_{t=2}^{n}p(\mathbf{x}_{t}|\mathbf{x}_{t-1})d\mathbf{x}_{1}d\mathbf{x}_{2}...d\mathbf{x}_{t-1}=p(\mathbf{X}_{1})A\times A...\times A=p(\mathbf{X}_{1})A^{n-1}\)

上式利用了关系\(p(\mathbf{x}_{t})=\int p(\mathbf{x}_{t}|\mathbf{x}_{t-1})d\mathbf{x}_{t-1}=p(\mathbf{x}_{t-1})A\)

Markov chain的性质

MCMC主要利用的是满足某些条件的Markov chain具有stationary distribution的性质。这一部分可以参考文献 [4] 的17.2.3节（简单易懂）以及文献[3]的第6章（更详细）。

MCMC方法用到的一个定理（文献[4], Theorem 17.2.3）：

If a Markov chain with transition matrix \(A\) is /regular/ and satisfies /detailed balance/ wrt distribution \(\pi\), then \(\pi\) is a stationary distribution of the chain.

MCMC

MCMC相比 Reject Sampling and Importance Sampling的主要优点是MCMC在高维的情况下的效果更好。MCMC的基本思路是，已知待采样的分布\(\pi(\mathbf{x})\)，通过构造跃迁核\(A(\mathbf{x},\mathbf{x}')\)，得到一个马尔可夫链\(M\)。\(M\)具有规则性(regular)并且对\(\pi(\mathbf{x})\)满足detailed balance条件。这样，\(M\)具有稳态分布\(\pi(\mathbf{x})\)。对\(M\)依次进行采样，得到采样点序列\(\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{n},\cdots\}\)。当采样点数\(n\)足够大时，可以认为\(p（\mathbf{x}_{m>n}）\)已经进入了稳态分布，从而\(\mathbf{x}_{m>n}\)就是从稳态分布\(\pi(\mathbf{x})\)得到的采样。注意这样得到的采样点是关联的（通常非常相关），这一点和reject sampling不同。

上述“对\(M\)进行采样”就是对分布\(A(\mathbf{x}=\mathbf{x}_{0},\mathbf{x}')\)进行采样，其中\(\mathbf{x}_{0}\)是当前已经得到的采样点。

在离散情况下，\(A(\mathbf{x},\mathbf{x}')\)退化成一个矩阵，分布\(A(\mathbf{x}=\mathbf{x}_{0}=j,x')\)就是矩阵\(A\)的第\(j\)行定义的概率分布\(p(\mathbf{x}_{t+1}|\mathbf{x}_{t}=j)\)

在具体实现的时候，关于\(A\)的构造和采样可能很难实现。我们可以采用类似于Reject Sampling的方法：选择一个很简单的分布（proposal distribution）\(T=q(\mathbf{x}_{t+1}|\mathbf{x}_{t})\)进行采样，然后再对采样点进行判断，以概率\(r\)接受该采样。这样一来，\(A=Tr\)由两部分组成，可以设计\(r\)的形式（类似Reject Sampling，\(r\)一般由\(\pi\)和\(T\)同时决定），使\(A\)满足上文所述的两个条件。典型的此类算法有Metropolis算法和Metropolis-Hastings算法。

Metropolis algorithm的步骤如下：

1选择proposal distribution \(T=q(\mathbf{x}_{t+1}|\mathbf{x}_{t})\)。\(T\)的常见选择可以参考random-walk Metropolis、Metropolized independent sampler等等
令\(t=0\)，任选一个初始采样点\(\mathbf{x}_{0}\)，比如从均匀分布中采样得到一个样本
对分布\(T\)进行采样，得到候选采样点\(\bar{\mathbf{x}}\)
计算\(r=min\{1,\frac{p(\bar{\mathbf{x}})}{p(\mathbf{x}_{t})}\}\)
以概率\(r\)接受候选采样点\(\bar{\mathbf{x}}\)。如果接受，则\(\mathbf{x}_{t+1}=\bar{\mathbf{x}}\);如果拒绝，则\(\mathbf{x}_{t+1}=\bar{\mathbf{x}}\)
重复3-5直到得到所需采样点数

最低0.47元/天解锁文章

你狗

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
Monte Carlo笔记-2：MCMC，附简易python代码

Monte Carlo笔记-2：MCMC，附简易python代码 Markov chain基本概念当随机变量序列\(\{\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{n}\}\)的联合分布满足：\(p(\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\math...
复制链接

扫一扫