不shuffle时KFold按照各样本原来的顺序简单分折,而StratifiedKFold分层抽样
from sklearn.model_selection import KFold,StratifiedKFold
X = np.array([
[1, 2, 3, 4],
[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34],
[41, 42, 43, 44],
[51, 52, 53, 54],
[61, 62, 63, 64],
[71, 72, 73, 74],
[71, 74, 73, 74],
])
y = np.array([1, 1, 0, 0, 1, 1, 0, 0, 0])
folder = KFold(n_splits=4, shuffle=False)
sfolder = StratifiedKFold(n_splits=4, shuffle=False)
for train, test in folder.split(X, y):
print('Train: %s | test: %s' % (train, test))
print(" ")
print(pd.Series(y[train]).value_counts())
print(pd.Series(y[test]).value_counts())
print('-------------------------------------')
for train, test in sfolder.split(X, y):
print('Train: %s | test: %s' % (train, test))
print(" ")
print(pd.Series(y[train]).value_counts())
print(pd.Series(y[test]).value_counts())
Train: [3 4 5 6 7 8] | test: [0 1 2]
0 4
1 2
dtype: int64
1 2
0 1
dtype: int64
Train: [0 1 2 5 6 7 8] | test: [3 4]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64
Train: [0 1 2 3 4 7 8] | test: [5 6]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64
Train: [0 1 2 3 4 5 6] | test: [7 8]
1 4
0 3
dtype: int64
0 2
dtype: int64
-------------------------------------
Train: [1 4 5 6 7 8] | test: [0 2 3]
1 3
0 3
dtype: int64
0 2
1 1
dtype: int64
Train: [0 2 3 4 5 7 8] | test: [1 6]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64
Train: [0 1 2 3 5 6 8] | test: [4 7]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64
Train: [0 1 2 3 4 6 7] | test: [5 8]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64
shuffle时,KFold随机打乱后分折,StratifiedKFold打乱但仍保持分层抽样的原则
folder = KFold(n_splits=4, shuffle=True,random_state=1)
sfolder = StratifiedKFold(n_splits=4, shuffle=True,random_state=1)
注:random_state用来保存随机的状态,以便实验复现。默认random_state=None可以看作0,即如果不设置random_state,下次的打乱情况还是一样的。
Train: [0 1 3 4 5 7] | test: [2 6 8]
1 4
0 2
dtype: int64
0 3
dtype: int64
Train: [0 2 3 4 5 6 8] | test: [1 7]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64
Train: [1 2 3 5 6 7 8] | test: [0 4]
0 5
1 2
dtype: int64
1 2
dtype: int64
Train: [0 1 2 4 6 7 8] | test: [3 5]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64
-------------------------------------
Train: [0 1 3 5 6 7] | test: [2 4 8]
1 3
0 3
dtype: int64
0 2
1 1
dtype: int64
Train: [0 1 2 4 6 7 8] | test: [3 5]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64
Train: [0 2 3 4 5 6 8] | test: [1 7]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64
Train: [1 2 3 4 5 7 8] | test: [0 6]
0 4
1 3
dtype: int64
1 1
0 1
dtype: int64