- 将全部训练集S分成k个不相交的子集,假设S中的训练样例个数为m,那么每一个自己有m/k个训练样例,相应的子集为{s1,s2,...,sk}
每次从分好的子集里面,拿出一个作为测试集,其他k-1个作为训练集
在k-1个训练集上训练出学习器模型,把这个模型放到测试集上,得到分类率的平均值,作为该模型或者假设函数的真实分类率
StratifiedKFold用法类似Kfold,但是他是分层采样,确保训练集,测试集中各类别样本的比例与原始数据集中相同
Parameters
- n_splits : int, default=3
Number of folds. Must be at least 2.
- shuffle : boolean, optional
Whether to shuffle each stratification of the data before splitting into batches.
- random_state :
int, RandomState instance or None, optional, default=None
If int, random_state is the seed used by the random number generatorIf RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`. Used when ``shuffle`` == True.
import numpy as np
from sklearn.model_selection import KFold,StratifiedKFold
X=np.array([
[1,2,3,4],
[11,12,13,14],
[21,22,23,24],
[31,32,33,34],
[41,42,43,44],
[51,52,53,54],
[61,62,63,64],
[71,72,73,74]
])
y=np.array([1,1,0,0,1,1,0,0])
sfolder=StratifiedKFold(n_splits=4,random_state=0,shuffle=False)
floder = KFold(n_splits=4,random_state=0,shuffle=False)
for train, test in sfolder.split(X,y):
print('Train: %s | test: %s' % (train, test))
print(" ")
for train, test in floder.split(X,y):
print('Train: %s | test: %s' % (train, test))
print(" ")
StratifiedKFold KFold