用rpart包建立regression tree，并利用prune函数进行修剪

最新推荐文章于 2024-09-09 10:41:10 发布

jiabiao1602

最新推荐文章于 2024-09-09 10:41:10 发布

阅读量7.6k

点赞数 2

分类专栏： R语言文章标签：数据挖掘数据机器学习 R语言算法

本文链接：https://blog.csdn.net/jiabiao1602/article/details/42126563

版权

R语言专栏收录该内容

103 篇文章 10 订阅

订阅专栏

body fat data is in TH.data

library(TH.data)
library(rpart)
data("bodyfat", package = "TH.data")
help("bodyfat",package="TH.data")
## starting httpd help server ... done
# head(bodyfat)
user rpart package to “grow” regression tree.Response variable and covariates defined by model formula is same way as lm().we grow a large initial tree.

bodyfat_rpart<-rpart(DEXfat~age+waistcirc+hipcirc+elbowbreadth+kneebreadth,data=bodyfat,
# user control arg to restrict of obs for potential binary split to 10:
control=rpart.control(minsplit=10))
print the graphical tree with partykit

obs that satisfy the condition shown for each node go to left and those that do not to right

library(partykit)
## Loading required package: grid
plot(as.party(bodyfat_rpart),
tp_args=list(id=FALSE))

the cptable element of rpart object call tell us if the tree should be “pruned”

(cptable里面的元素能告诉我们这棵树是否需要修剪)

see xerror values … tree with least error has 4 splits:

print(bodyfat_rpart$cptable)
## CP nsplit rel error xerror xstd
## 1 0.66290 0 1.0000 1.0360 0.17147
## 2 0.09376 1 0.3371 0.4870 0.09825
## 3 0.07704 2 0.2433 0.4651 0.08414
## 4 0.04508 3 0.1663 0.4090 0.06790
## 5 0.01845 4 0.1212 0.3622 0.06585
## 6 0.01819 5 0.1028 0.3049 0.06312
## 7 0.01000 6 0.0846 0.2799 0.06086
we preserve the minimum xerror in opt(我们将最小xerror的赋值给opt)

opt<-which.min(bodyfat_rpart$cptable[,"xerror"])
here we prune back the large initial tree:（我们对初始树进行修剪）

cp<-bodyfat_rpart$cptable[opt,"CP"]
bodyfat_prune<-prune(bodyfat_rpart,cp=cp)
and then we plot the resulting pruned tree:（我们对修建后的树进行画图）

plot(as.party(bodyfat_prune),
tp_args=list(id=FALSE))

Based on this model,one can predict the (unkown) body fact content based on covariate values … so we do just that using the data we have:(我们利用建立的模型对原有数据进行预测):

DEXfat_pred<-predict(bodyfat_prune,
newdata=bodyfat)
xlim<-range(bodyfat$DEXfat)
plot(DEXfat_pred~bodyfat$DEXfat,
data=bodyfat,xlab="Observed",
ylab="Predicted",
ylim=xlim,
xlim=xlim)
abline(a=0,b=1)

other approach to recursive partitioning(其他递归分隔方法)

other approach implemented in 'party' package

one each node of those trees,we test for independence bewteen any of the covariates and a split made when p-value is small.

Advantage:Do not have to prune back large initial trees because we are using a statistic motivated stopping criterion.

called a “Conditional Inference Tree”:

we do it for body fat:

library(party)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## 下列对象被屏蔽了from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: sandwich
## Loading required package: strucchange
## Loading required package: modeltools
## Loading required package: stats4
##
## Attaching package: 'party'
##
## 下列对象被屏蔽了from 'package:partykit':
##
## ctree, ctree_control, edge_simple, mob, mob_control,
## node_barplot, node_bivplot, node_boxplot, node_inner,
## node_surv, node_terminal
bodyfat_ctree<-ctree(DEXfat~age+waistcirc+hipcirc+elbowbreadth+kneebreadth,
data=bodyfat)

plot(bodyfat_ctree)