Python 之 sklearn 交叉验证数据拆分

最新推荐文章于 2024-06-18 21:33:08 发布

稚枭天卓

最新推荐文章于 2024-06-18 21:33:08 发布

阅读量1w

点赞数

分类专栏： My_Python_Dynasty 文章标签： Python sklearn 交叉验证 K折验证 k-fold

本文链接：https://blog.csdn.net/u013630349/article/details/47133283

版权

My_Python_Dynasty 专栏收录该内容

26 篇文章 0 订阅

订阅专栏

本文K折验证拟采用的是 Python 中 sklearn 包中的 StratifiedKFold 方法。

方法思想详见：http://scikit-learn.org/stable/modules/cross_validation.html

StratifiedKFold is a variation of k-fold which returns stratified folds: each set contains approximately the same percentage of samples of each target class as the complete set.

【译】

StratifiedKFold 是一种将数据集中每一类样本的数据成分，按均等方式拆分的方法。

其它划分方法详见：http://scikit-learn.org/stable/modules/cross_validation.html

闲言少叙，直接上代码。

【屌丝源码】

import numpy
import h5py
import sklearn
from sklearn import cluster,cross_validation
from sklearn.cluster import AgglomerativeClustering
from sklearn.cross_validation import StratifiedKFold

## 生成一个随机矩阵并保存
#arr = numpy.random.random([200,400])
#labvec = []
#for i in numpy.arange(0,200):
#    j = i%10
#    arr[i,j*20:j*20+20] = arr[i,j*20:j*20+20]+10
#    labvec.append(j)
#arr = arr.T
#file = h5py.File('arr.mat','w')    
#file.create_dataset('arr', data = arr)
#file.close()
#file = h5py.File('labvec.mat','w')    
#file.create_dataset('labvec', data = labvec)
#file.close()
# 读方式打开文件
myfile=h5py.File('arr.mat','r')
arr = myfile['arr'][:]
myfile.close()
arr = arr.T
myfile=h5py.File('labvec.mat','r')
labvec = myfile['labvec'][:]
myfile.close()
skf = StratifiedKFold(labvec, 4)
train_set = []
test_set = []
for train, test in skf:
    train_set.append(train)
    test_set.append(test)

详见： http://scikit-learn.org/stable/modules/cross_validation.html

稚枭天卓

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
Python 之 sklearn 交叉验证数据拆分

本文K折验证拟采用的是 Python 中 sklearn 包中的 StratifiedKFold 方法。方法思想详见：http://scikit-learn.org/stable/modules/cross_validation.htmlStratifiedKFold is a variation of k-fold which returns stratified folds:
复制链接

扫一扫