贝叶斯集锦:R和JAGS的交互

转载自:http://site.douban.com/182577/widget/notes/10567181/note/295466672/

Markov chain Monte Carlo (MCMC)方法最早的实现是Linux下的BUGS,主要是用于Bayesian models涉及的统计计算(1989年),后来移植到Windows下发展成为WinBUGS,并终止了在Linux下的研发。它并不是开源的,于是芬兰的Helsinki大学搞了一个开源的OpenBUGS,法国人Martyn Plummer研发了个开源的JAGS。

JAGS,全称是Just another Gibbs sampler,是基于BUGS语言开发的利用MCMC来进行贝叶斯建模的软件包。它没有提供建模所用的GUI以及MCMC抽样的后处理,这些要在其它的程序软件上来处理,比如说利用R包(rjags)来调用JAGS并后处理MCMC的输出。JAGS相对于WinBUGS/OpenBUGS的主要优点在于平台的独立性,可以应用于各种操作系统,而WinBUGS/OpenBUGS只能应用于windows系统;JAGS也可以在64-bit平台上以64-bit应用来进行编译。

JAGS和R的交互非常好,下面我们使用rjags包来实现R对JAGS的调用。 运行一个JAGS模型是指在参数的后验分布中生成抽样,需要这样5个步骤:

定义模型
初始化
编译
适应
监测
后续的MCMC收敛诊断、模型评价等等工作是要由R来完成的。 当然,在使用rjags之前,要保证JAGS已经安装在你的电脑上。 (JAGS下载:http://sourceforge.net/projects/mcmc-jags/files/

下面采用对一个简单线性回归方程的仿真,来介绍用rjags包调用JAGS进行贝叶斯建模的过程。

对一个线性回归方程:
yi=α+βxi+ϵi

其中α,β是回归系数,ϵ是误差项,且ϵ∼N(0,σ2)。在线性回归中解释变量看做常量,而响应变量y视为随机变量。且y∼N(α+βx,σ2)。

- 用JAGS定义模型:

cat("model{for( i in 1:N ){y[i] ~ dnorm(mu[i],tau)
mu[i] <- alpha + beta*x[i]}
alpha ~ dnorm(0, 1.0E-6)
beta ~ dnorm(0, 1.0E-6)
sigma ~ dunif(0,100)
tau <- 1/pow(sigma,2)}",file="F:/R/reg.jag")


JAGS的语法和R很相似。但对正态分布的密度dnorm来说,R的散布参数取成标准差,JAGS的散布参数取为方差的倒数,称为精度(precision),在上面的程序里我们把这个参数记为tau。建立模型之后,把它存入名为example1.jag的文件中。
三个参数$\alpha$,$\beta$,$\sigma$按无信息先验确定分布。$\alpha$,$\beta$的先验分布取成具有很大方差的正态分布,而$\sigma$取成一个较大区间中的均匀分布。

- 初始化

library(coda)
library(rjags)
reg.dat <- list( x=x, y=y, N=1 )
reg.ini <- list( list( alpha=0.05, beta=1.0, sigma=0.9 ),
list( alpha=0.04, beta=1.1, sigma=1.0 ),
list( alpha=0.06, beta=0.9, sigma=1.1 ) )
reg.par <- c("alpha","beta","sigma" )


数据reg.dat是一个列表,用来存放迭代中产生的x,y和N值。我们打算使用3条MCMC链来进行抽样(见下一段程序中的n.chains = 3),因此对每条链赋一个初始值,得到一个列表的列表reg.ini(如果赋成相同的初始值,那就是一个列表)。reg.par 里存放的是模型参数。


- 适应和编译

reg.mod <- jags.model( file = "F:/R/reg.jag",
data = reg.dat,
n.chains = 3,
inits = reg.ini,
 n.adapt = 2000 )

###Compiling model graph
###Resolving undeclared variables
###Allocating nodes
###Graph Size: 3113

###Initializing model



- 监测和图形诊断

通过调用jags.model函数,写在m2.jag中的程序就被JAGS编译了,对未定义的参数和变量进行检查,做好进行后验抽样的准备。有时候模型中参数很多,需要指定进行抽样的参数,这个过程叫做检测(monitoring)。

给定一个jags.model对象,可以通过调用coda.samples函数来对抽样个数进行收集存为coda对象。然后利用plot函数绘出由coda包所提供的图形诊断。

reg.res <- coda.samples( reg.mod,
var = reg.par,
n.iter = 10000,
thin = 1)
plot(reg.res)

 


图形显示的是2000次MCMC迭代的后验抽样。

- burn-in和后验概括

因为MCMC抽样过程中,马尔科夫链是随n增大而收敛。在使用MCMC抽样的结果时,要有合适一个burn-in的过程,使得burn-in之后的链的样本是收敛的。
取burn-in值为1000,对之后的样本采用summary函数来得到后验抽样的统计量。


burn.in <- 5000
a<-window(reg.res, start = burn.in)
summary(window(reg.res, start = burn.in))

###Iterations = 5000:12000
###Thinning interval = 1 
###Number of chains = 3 
###Sample size per chain = 7001 

###1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

        Mean SD Naive SE Time-series SE
alpha -1.712 57.96 0.3999 0.3986
beta 3.824 1001.52 6.9107 6.9511
sigma 50.203 28.85 0.1991 0.4586

###2. Quantiles for each variable:

          2.5% 25% 50% 75% 97.5%
alpha -130.35 -27.75 -1.308 24.41 124.24
beta -1955.49 -683.66 12.810 682.41 1962.03
sigma 2.47 25.25 50.210 75.24 97.62

Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition provides an accessible approach for conducting Bayesian data analysis, as material is explained clearly with concrete examples. Included are step-by-step instructions on how to carry out Bayesian data analyses in the popular and free software R and WinBugs, as well as new programs in JAGS and Stan. The new programs are designed to be much easier to use than the scripts in the first edition. In particular, there are now compact high-level scripts that make it easy to run the programs on your own data sets. The book is divided into three parts and begins with the basics: models, probability, Bayes’ rule, and the R programming language. The discussion then moves to the fundamentals applied to inferring a binomial probability, before concluding with chapters on the generalized linear model. Topics include metric-predicted variable on one or two groups; metric-predicted variable with one metric predictor; metric-predicted variable with multiple metric predictors; metric-predicted variable with one nominal predictor; and metric-predicted variable with multiple nominal predictors. The exercises found in the text have explicit purposes and guidelines for accomplishment. This book is intended for first-year graduate students or advanced undergraduates in statistics, data analysis, psychology, cognitive science, social sciences, clinical sciences, and consumer sciences in business. Accessible, including the basics of essential concepts of probability and random sampling Examples with R programming language and JAGS software Comprehensive coverage of all scenarios addressed by non-Bayesian textbooks: t-tests, analysis of variance (ANOVA) and comparisons in ANOVA, multiple regression, and chi-square (contingency table analysis) Coverage of experiment planning R and JAGS computer programming code on website Exercises have explicit purposes and guidelines for accomplishment Provides step-by-step instructions on how to conduct Bayesian data analyses in the popular and free software R and WinBugs
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值