R语言 抽样方法

做点笔记备用
caret manual:

caret:
http://topepo.github.io/caret/index.html

sampling包

strata:指定每个类别的抽取的样本数目

导入数据Sonar:

library(mlbench)

data(Sonar) #mlbench包中自带的Sonar数据框为例
dim(Sonar)
View(Sonar)

使用strata函数进行抽样

#stratified random sample
subset <- strata(data = Sonar,stratanames='Class',size=rep(77,68),method="srswor")

#split dataset
#strata输出为包含"Class"、"ID_unit" 、"Prob" 、"Stratum"四个列的数据框,其中"ID_unit"为该行数据在原数据框中的索引
train <- Sonar[subset$ID_unit,] 
test <- Sonar[-subset$ID_unit,]

strata关键参数:

参数含义
data原数据,格式为数据框(data.frame)或矩阵(matrix)
stratanames分类变量,上文中即为根据Class这一属性进行分类
size对每个类别的抽样数目
mehod抽样方法,无放回的简单随机抽样(srswor)、带放回的简单随机抽样(srswr)、泊松抽样(poisson)、系统抽样(systematic); 如果缺少“method”,则默认方法是“srswor”。

caret包

对R版本要求可能较高,R可能会提示需要加载recipes包。

createDataPartition:根据分类变量分层抽样

#install.packages(caret)
library(caret)
library(recipes)

train_index <- createDataPartition(Sonar$Class, #分类变量
               times = 1,p=0.7, #times:样本个数,此处为1 p:抽取的比例
               list = FALSE) #list = FALSE:不以列表格式返回
               
train <- Sonar[train_index,] 
test <- Sonar[-train_index,]

返回的结果为在原数据框中的索引。

head(train_index)
     Resample1
[1,]         1
[2,]         2
[3,]         4
[4,]         5
[5,]         6
[6,]        11

head(train)
  age           job marital education default balance housing loan  contact day month campaign pdays previous poutcome y
1  30    unemployed married   primary      no    1787      no   no cellular  19   oct        1    -1        0  unknown 0
2  33      services married secondary      no    4789     yes  yes cellular  11   may        1   339        4  failure 0
3  35    management  single  tertiary      no    1350     yes   no cellular  16   apr        1   330        1  failure 0
4  30    management married  tertiary      no    1476     yes  yes  unknown   3   jun        4    -1        0  unknown 0
7  36 self-employed married  tertiary      no     307     yes   no cellular  14   may        1   330        2    other 0
8  39    technician married secondary      no     147     yes   no cellular   6   may        2    -1        0  unknown 0

head(test)
   age           job marital education default balance housing loan  contact day month campaign pdays previous poutcome y
5   59   blue-collar married secondary      no       0     yes   no  unknown   5   may        1    -1        0  unknown 0
6   35    management  single  tertiary      no     747      no   no cellular  23   feb        2   176        3  failure 0
13  36    technician married  tertiary      no    1109      no   no cellular  13   aug        2    -1        0  unknown 0
18  37        admin.  single  tertiary      no    2317     yes   no cellular  20   apr        1   152        2  failure 0
20  31      services married secondary      no     132      no   no cellular   7   jul        1   152        1    other 0
29  56 self-employed married secondary      no     784      no  yes cellular  30   jul        2    -1        0  unknown 0

createResample:抽取一个或多个多个boostrap样本

subsets <- createResample(Sonar$Class,times = 5,list = FALSE)
head(subsets,10)

results:

head(subsets,10)
      Resample1 Resample2 Resample3 Resample4 Resample5
 [1,]         2         1         6         2         1
 [2,]         3         2         6         3         2
 [3,]         3         6         8         4         2
 [4,]         4         8         8         4         4
 [5,]         4         8         8         7         5
 [6,]         5         9         9         9         5
 [7,]         9        10        10        10         5
 [8,]         9        11        11        13         7
 [9,]         9        11        12        15         8
[10,]        10        11        13        16         9

createFolds:多个folds

folds <- createFolds(Sonar$Class, k = 10, list = TRUE, 
          returnTrain = FALSE) #returnTrain = FALSE:返回训练集,即数据为原数据的70%
length(folds)

result

length(folds)
[1] 10

可指定参数k,返回的列表长度即为k

createMultiFolds

可指定参数k,times,返回的列表长度即为k×times

mfolds <- createMultiFolds(Sonar$Class, k = 10, times = 5)
length(mfolds)

result

> length(mfolds)
[1] 50
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值