R语言实现决策回归树的包rpart

介绍

rpart包中的rpart()函数可以实现决策树和回归树的建模

rpart()的使用方法

rpart(formula, data, weights, subset, na.action = na.rpart, method,
      model = FALSE, x = FALSE, y = TRUE, parms, control, cost, ...)

参数介绍

  • formula
    a formula, with a response but no interaction terms. If this a a data frame, that is taken as the model frame (see model.frame).

  • data
    an optional data frame in which to interpret the variables named in the formula.

  • weights
    optional case weights.

  • subset
    optional expression saying that only a subset of the rows of the data should be used in the fit.

  • na.action
    the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing.

  • method
    one of “anova”, “poisson”, “class” or “exp”. If method is missing then the routine tries to make an intelligent guess. If y is a survival object, then method = “exp” is assumed, if y has 2 columns then method = “poisson” is assumed, if y is a factor then method = “class” is assumed, otherwise method = “anova” is assumed. It is wisest to specify the method directly, especially as more criteria may added to the function in future.
    Alternatively, method can be a list of functions named init, split and eval. Examples are given in the file ‘tests/usersplits.R’ in the sources, and in the vignettes ‘User Written Split Functions’.

  • model
    if logical: keep a copy of the model frame in the result? If the input value for model is a model frame (likely from an earlier call to the rpart function), then this frame is used rather than constructing new data.

  • x
    keep a copy of the x matrix in the result.

  • y
    keep a copy of the dependent variable in the result. If missing and model is supplied this defaults to FALSE.

  • parms
    optional parameters for the splitting function.
    Anova splitting has no parameters.
    Poisson splitting has a single parameter, the coefficient of variation of the prior distribution on the rates. The default value is 1.
    Exponential splitting has the same parameter as Poisson.
    For classification splitting, the list can contain any of: the vector of prior probabilities (component prior), the loss matrix (component loss) or the splitting index (component split). The priors must be positive and sum to 1. The loss matrix must have zeros on the diagonal and positive off-diagonal elements. The splitting index can be gini or information. The default priors are proportional to the data counts, the losses default to 1, and the split defaults to gini. 例如:parms = list(prior = c(0.65,0.35), split = “information”))

  • control
    a list of options that control details of the rpart algorithm. See rpart.control.

  • cost
    a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose.


  • arguments to rpart.control may also be specified in the call to rpart. They are checked against the list of valid arguments.

实例


fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
fit2 <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,method = 'class',
              parms = list(prior = c(.65,.35), split = "information"))
fit3 <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis,
              control = rpart.control(cp = 0.05))
plot(fit)
text(fit, use.n = TRUE)
plot(fit2)
text(fit2, use.n = TRUE)
plot(fit3)
text(fit3, use.n = TRUE)

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

对rapart对象的美观显示包rattle

rattle包中的fancyRpartPlot()可以使rpart对象得到更好的显示

fancyRpartPlot()的用法

fancyRpartPlot(model, main="", sub, caption, palettes, type=2, ...)

参数介绍

  • model
    an rpart object.

  • main
    title for the plot.

  • sub
    sub title for the plot. The default is a Rattle string with date, time and username.

  • caption
    caption for bottom right of plot.

  • palettes
    a list of sequential palettes names. As supported by RColorBrewer::brewer.pal the available names are Blues BuGn BuPu GnBu Greens Greys Oranges OrRd PuBu PuBuGn PuRd Purples RdPu Reds YlGn YlGnBu YlOrBr YlOrRd.

  • type
    the type of plot to generate (2).


  • additional arguments passed on to prp.

实例

## Set up the data for modelling.
library(rattle)
library(rpart)
set.seed(42)
ds     <- weather
target <- "RainTomorrow"
risk   <- "RISK_MM"
ignore <- c("Date", "Location", risk)
vars   <- setdiff(names(ds), ignore)
nobs   <- nrow(ds)
form   <- formula(paste(target, "~ ."))
train  <- sample(nobs, 0.7*nobs)
test   <- setdiff(seq_len(nobs), train)
actual <- ds[test, target]
risks  <- ds[test, risk]

# Fit the model.

fit <- rpart(form, data=ds[train, vars])

## Plot the model.

fancyRpartPlot(fit)

## Choose different colours.

fancyRpartPlot(fit, main='test',sub='test1',caption='Let me think',palettes=c("Greys", "Oranges"),type=1)



在这里插入图片描述

在这里插入图片描述

  • 4
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值