R 数据分离与交叉验证的一些抽样方法

最新推荐文章于 2024-05-10 17:39:35 发布

baibingbingbing

最新推荐文章于 2024-05-10 17:39:35 发布

阅读量2.1k

点赞数 2

本文链接：https://blog.csdn.net/baibingbingbing/article/details/81323484

版权

> #等比例数据抽样
> library(caret)
> str(iris)
'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> #数据划分时，考虑预测的分类Y的属性。Y为因子类型
> #并且划分的数据中，Y的比例与原始数据一致
> idx<-createDataPartition(iris$Species,times=1,p=0.8,list=TRUE)$Resample
> train<-iris[idx,]
> test<-iris[-idx,]
> prop.table(table(train$Species))

    setosa versicolor  virginica 
 0.3333333  0.3333333  0.3333333 
> prop.table(table(test$Species))

    setosa versicolor  virginica 
 0.3333333  0.3333333  0.3333333 
> prop.table(table(iris$Species))

    setosa versicolor  virginica 
 0.3333333  0.3333333  0.3333333 
> 
> #利用createFolds(),createMultiFolds()划分数据集
> library(rpart)
> index<-createFolds(iris$Species,k=5,list=FALSE,returnTrain=FALSE)
> str(index)
 int [1:150] 1 4 2 4 2 5 1 4 3 1 ...
> e0=rep(0,5);e1=e0
> set.seed(137)
> for (i in 1:5) {
+   train<-iris[!index==i,]
+

最低0.47元/天解锁文章

baibingbingbing

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
R 数据分离与交叉验证的一些抽样方法

&gt; #等比例数据抽样&gt; library(caret)&gt; str(iris)'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 ...
复制链接

扫一扫