本文K折验证拟采用的是
Python 中 sklearn 包中的 StratifiedKFold 方法。
方法思想详见:http://scikit-learn.org/stable/modules/cross_validation.html
is
a variation ofk-foldwhich returnsstratifiedfolds:
each set contains approximately the same percentage of samples of each target class as the complete set.
【译】
数据成分,按均等方式拆分的方法。
其它划分方法详见:http://scikit-learn.org/stable/modules/cross_validation.html
闲言少叙,直接上代码。
【屌丝源码】
import numpy
import h5py
import sklearn
from sklearn import cluster,cross_validation
from sklearn.cluster import AgglomerativeClustering
from sklearn.cross_validation import StratifiedKFold
## 生成一个随机矩阵并保存
#arr = numpy.random.random([200,400])
#labvec = []
#for i in numpy.arange(0,200):
# j = i%10
# arr[i,j*20:j*20+20] = arr[i,j*20:j*20+20]+10
# labvec.append(j)
#arr = arr.T
#file = h5py.File('arr.mat','w')
#file.create_dataset('arr', data = arr)
#file.close()
#file = h5py.File('labvec.mat','w')
#file.create_dataset('labvec', data = labvec)
#file.close()
# 读方式打开文件
myfile=h5py.File('arr.mat','r')
arr = myfile['arr'][:]
myfile.close()
arr = arr.T
myfile=h5py.File('labvec.mat','r')
labvec = myfile['labvec'][:]
myfile.close()
skf = StratifiedKFold(labvec, 4)
train_set = []
test_set = []
for train, test in skf:
train_set.append(train)
test_set.append(test)
详见:http://scikit-learn.org/stable/modules/cross_validation.html
版权声明:本文为博主原创文章,未经博主允许不得转载。