body fat data is in TH.data
library(TH.data)
library(rpart)
data("bodyfat", package = "TH.data")
help("bodyfat",package="TH.data")
## starting httpd help server ... done
# head(bodyfat)
user rpart package to “grow” regression tree.Response variable and covariates defined by model formula is same way as lm().we grow a large initial tree.
bodyfat_rpart<-rpart(DEXfat~age+waistcirc+hipcirc+elbowbreadth+kneebreadth,data=bodyfat,
# user control arg to restrict of obs for potential binary split to 10:
control=rpart.control(minsplit=10))
print the graphical tree with partykit
obs that satisfy the condition shown for each node go to left and those that do not to right
library(partykit)
## Loading required package: grid
plot(as.party(bodyfat_rpart),
tp_args=list(id=FALSE))
the cptable element of rpart object call tell us if the tree should be “pruned”
(cptable里面的元素能告诉我们这棵树是否需要修剪)
see xerror values … tree with least error has 4 splits:
print(bodyfat_rpart$cptable)
## CP nsplit rel error xerror xstd
## 1 0.66290 0 1.0000 1.0360 0.17147
## 2 0.09376 1 0.3371 0.4870 0.09825
## 3 0.07704 2 0.2433 0.4651 0.08414
## 4 0.04508 3 0.1663 0.4090 0.06790
## 5 0.01845 4 0.1212 0.3622 0.06585
## 6 0.01819 5 0.1028 0.3049 0.06312
## 7 0.01000 6 0.0846 0.2799 0.06086
we preserve the minimum xerror in opt(我们将最小xerror的赋值给opt)
opt<-which.min(bodyfat_rpart$cptable[,"xerror"])
here we prune back the large initial tree:(我们对初始树进行修剪)
cp<-bodyfat_rpart$cptable[opt,"CP"]
bodyfat_prune<-prune(bodyfat_rpart,cp=cp)
and then we plot the resulting pruned tree:(我们对修建后的树进行画图)
plot(as.party(bodyfat_prune),
tp_args=list(id=FALSE))
Based on this model,one can predict the (unkown) body fact content based on covariate values … so we do just that using the data we have:(我们利用建立的模型对原有数据进行预测):
DEXfat_pred<-predict(bodyfat_prune,
newdata=bodyfat)
xlim<-range(bodyfat$DEXfat)
plot(DEXfat_pred~bodyfat$DEXfat,
data=bodyfat,xlab="Observed",
ylab="Predicted",
ylim=xlim,
xlim=xlim)
abline(a=0,b=1)
other approach to recursive partitioning(其他递归分隔方法)
other approach implemented in 'party' package
one each node of those trees,we test for independence bewteen any of the covariates and a split made when p-value is small.
Advantage:Do not have to prune back large initial trees because we are using a statistic motivated stopping criterion.
called a “Conditional Inference Tree”:
we do it for body fat:
library(party)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## 下列对象被屏蔽了from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: sandwich
## Loading required package: strucchange
## Loading required package: modeltools
## Loading required package: stats4
##
## Attaching package: 'party'
##
## 下列对象被屏蔽了from 'package:partykit':
##
## ctree, ctree_control, edge_simple, mob, mob_control,
## node_barplot, node_bivplot, node_boxplot, node_inner,
## node_surv, node_terminal
bodyfat_ctree<-ctree(DEXfat~age+waistcirc+hipcirc+elbowbreadth+kneebreadth,
data=bodyfat)
library(TH.data)
library(rpart)
data("bodyfat", package = "TH.data")
help("bodyfat",package="TH.data")
## starting httpd help server ... done
# head(bodyfat)
user rpart package to “grow” regression tree.Response variable and covariates defined by model formula is same way as lm().we grow a large initial tree.
bodyfat_rpart<-rpart(DEXfat~age+waistcirc+hipcirc+elbowbreadth+kneebreadth,data=bodyfat,
# user control arg to restrict of obs for potential binary split to 10:
control=rpart.control(minsplit=10))
print the graphical tree with partykit
obs that satisfy the condition shown for each node go to left and those that do not to right
library(partykit)
## Loading required package: grid
plot(as.party(bodyfat_rpart),
tp_args=list(id=FALSE))
the cptable element of rpart object call tell us if the tree should be “pruned”
(cptable里面的元素能告诉我们这棵树是否需要修剪)
see xerror values … tree with least error has 4 splits:
print(bodyfat_rpart$cptable)
## CP nsplit rel error xerror xstd
## 1 0.66290 0 1.0000 1.0360 0.17147
## 2 0.09376 1 0.3371 0.4870 0.09825
## 3 0.07704 2 0.2433 0.4651 0.08414
## 4 0.04508 3 0.1663 0.4090 0.06790
## 5 0.01845 4 0.1212 0.3622 0.06585
## 6 0.01819 5 0.1028 0.3049 0.06312
## 7 0.01000 6 0.0846 0.2799 0.06086
we preserve the minimum xerror in opt(我们将最小xerror的赋值给opt)
opt<-which.min(bodyfat_rpart$cptable[,"xerror"])
here we prune back the large initial tree:(我们对初始树进行修剪)
cp<-bodyfat_rpart$cptable[opt,"CP"]
bodyfat_prune<-prune(bodyfat_rpart,cp=cp)
and then we plot the resulting pruned tree:(我们对修建后的树进行画图)
plot(as.party(bodyfat_prune),
tp_args=list(id=FALSE))
Based on this model,one can predict the (unkown) body fact content based on covariate values … so we do just that using the data we have:(我们利用建立的模型对原有数据进行预测):
DEXfat_pred<-predict(bodyfat_prune,
newdata=bodyfat)
xlim<-range(bodyfat$DEXfat)
plot(DEXfat_pred~bodyfat$DEXfat,
data=bodyfat,xlab="Observed",
ylab="Predicted",
ylim=xlim,
xlim=xlim)
abline(a=0,b=1)
other approach to recursive partitioning(其他递归分隔方法)
other approach implemented in 'party' package
one each node of those trees,we test for independence bewteen any of the covariates and a split made when p-value is small.
Advantage:Do not have to prune back large initial trees because we are using a statistic motivated stopping criterion.
called a “Conditional Inference Tree”:
we do it for body fat:
library(party)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## 下列对象被屏蔽了from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: sandwich
## Loading required package: strucchange
## Loading required package: modeltools
## Loading required package: stats4
##
## Attaching package: 'party'
##
## 下列对象被屏蔽了from 'package:partykit':
##
## ctree, ctree_control, edge_simple, mob, mob_control,
## node_barplot, node_bivplot, node_boxplot, node_inner,
## node_surv, node_terminal
bodyfat_ctree<-ctree(DEXfat~age+waistcirc+hipcirc+elbowbreadth+kneebreadth,
data=bodyfat)
plot(bodyfat_ctree)