ESL-chapter8-bagging

果然好吃

于 2014-07-27 10:01:03 发布

阅读量430

点赞数

本文链接：https://blog.csdn.net/u010198460/article/details/38167881

版权

bootstrap抽样能够帮助提高模型的准确率。本文主要介绍bagging方法是如何利用bootstrap来提高模型的准确率的。

考虑如下的一个回归问题。假定我们有训练集如下：Z={(x1,y1),(x2,y2),...,(xn,yn)}。通过bootstrap抽样，生成B个样本集。然后分别对每个样本集进行模型拟合（比如用决策树），然后在B个样本集上平均预测结果，这就是bagging的思路。直接用书上8.7.1的示例来解释。

生成如下的样本集。N=30，有两类五个特征。每一类之间的相关度为0.95，Y的生成方式如下：Pr(Y=1|x1<0.5)=0.2，Pr(Y=1|x1>0.5)=0.8。R代码如下：

require(MASS)
library(ISLR)
library(boot)
library(tree)

m <- matrix(0.95,nrow=5,ncol=5)
diag(m) <- 1
n.train <- 30
x <- mvrnorm(30, mu = c(0,0,0,0,0), Sigma =m, empirical = TRUE)#生成相关性高斯矩阵
y <- rep(0,30)
s1 <- sample(which(x[,1]<=0.5),n.train*(2/3)*0.2)
s2 <- sample(which(x[,1]>0.5),n.train*(1/3)*0.8)
y[s1] <- 1;y[s2] <- 1;
y <- as.factor(y)

然后再按照上面的生成方法，产生2000个测试样本。

n.test <- 2000
x.test <- mvrnorm(n.test, mu = c(0,0,0,0,0), Sigma =m, empirical = TRUE)
y.test <- rep(0,n.test)
s1.test <- sample(which(x.test[,1]<=0.5),n.test*(2/3)*0.2)
s2.test <- sample(which(x.test[,1]>0.5),n.test*(1/3)*0.8)
y.test[s1.test] <- 1;y.test[s2.test] <- 1;
y.test <- as.factor(y.test)

然后，我们生成书上的图8.9

par(mfrow=c(3,4))
tree.original <- tree(y~.,data.frame(x,y))
plot(tree.original)
text(tree.original,pretty=0,all=TRUE)
for(i in 1:11)
{
tree.temp <- tree.fn(data.frame(x,y),sample(1:30,30,replace=TRUE))
plot(tree.temp)
text(tree.temp)
}

第一个是根据原始数据拟合出来的决策树，后面的十一个是基于bootstrap生成的决策树，从图中，我们可以看出每棵树都长的不一样。

然后，我们依次生成200棵树，然后基于这两百棵树对测试集进行预测。然后画出相应的预测结果。我们按照书上的方法用两种方式预测结果，代码和绘图如下：

tree.pred <- c()
tree.pred.error.rate <- c()
tree.prob.pred <- c()
tree.prob.pred.error.rate <- c()#用来收集结果
for(i in 1:200)
{
#concensus vote
tree.temp <- tree(y~.,data.frame(x,y)[sample(1:30,30,replace=TRUE),])#拟合模型
tree.pred.temp=predict(tree.temp,data.frame(x.test),type="class")#预测
tree.pred <- cbind(as.numeric(as.character(tree.pred.temp)), tree.pred)
tree.pred.value <- ifelse(rowMeans(tree.pred)>0.5,1,0)#把结果求均值并分类
table.temp <- table(as.factor(tree.pred.value) ,y.test)
pred.error.rate <- (table.temp[2]+table.temp[3])/(table.temp[1]+table.temp[4])
tree.pred.error.rate <- c(tree.pred.error.rate,pred.error.rate)#收集预测结果

#probability
y.prob <- as.numeric(as.character(y))
tree.temp.prob <- tree(y.prob~.,data.frame(x,y.prob)[sample(1:30,30,replace=TRUE),])
tree.pred.prob.temp=predict(tree.temp.prob,data.frame(x.test))
tree.prob.pred <- cbind(tree.pred.prob.temp, tree.prob.pred)
tree.pred.value.prob <- ifelse(rowMeans(tree.prob.pred)>0.5,1,0)
table.prob.temp <- table(as.factor(tree.pred.value.prob),y.test)
pred.prob.error.rate <- (table.prob.temp[2]+table.prob.temp[3])/(table.prob.temp[1]+table.prob.temp[4])
tree.prob.pred.error.rate <- c(tree.prob.pred.error.rate,pred.prob.error.rate)
}
plot( tree.pred.error.rate,ylim=c(0.15,0.55),col="green",type="l")
points( tree.pred.error.rate,col="green")
lines(tree.prob.pred.error.rate,col="orange")
points( tree.prob.pred.error.rate,col="orange")

我们的结果和书上的类似。结果表明，bagging确实能够降低预测错误率。

接下来书上分析了bagging为什么能够有效以及局限性的地方。但我理解的不是很好。下面简单描述一下bagging的局限性。看个例子，假定对于任何特征，Y都取值为1。现在有一个分类器，按固定概率随机指定Y值。比如Y=1的概率为0.4，Y=0的概率为0.6。该分类器的错误率为0.6，但是bagging的错误率为1。准确的说是当有无限个分类器时，bagging的错误率为1。假定bagging有三个分类器，三个都预测为1的概率为0.4的三次方，任意两个预测为1的概率是C(3,2)0.4^2*0.6。加起来的正确率为0.352，比0.4低，依次类推，当分类器增加时，分类正确率一直降低。最后为0.错误率达到1.