(一)K折交叉验证
from sklearn.model_selection import KFold ## K折交叉验证
X = np.arange(36).reshape(18,2)
kfold = KFold(n_splits = 9) ## kfold为KFolf类的一个对象
for a, b in kfold.split(X): ## .split(X)方法返回迭代器,迭代器每次产生两个元素,1、训练数据集的索引;
## 2、交叉验证数据集的索引。
print('Train_index: ', a, 'Validation_index:', b)
返回结果:
Train_index: [ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17] Validation_index: [0 1]
Train_index: [ 0 1 4 5 6 7 8 9 10 11 12 13 14 15 16 17] Validation_index: [2 3]
Train_index: [ 0 1 2 3 6 7 8 9 10 11 12 13 14 15 16 17] Validation_index: [4 5]
Train_index: [ 0 1 2 3 4 5 8 9 10 11 12 13 14 15 16 17] Validation_index: [6 7]
Train_index: [ 0 1 2 3 4 5 6 7 10 11 12 13 14 15 16 17] Validation_index: [8 9]
Train_index: [ 0 1 2 3 4 5 6 7 8 9 12 13 14 15 16 17] Validation_index: [10 11]
Train_index: [ 0 1 2 3 4 5 6 7 8 9 10 11 14 15 16 17] Validation_index: [12 13]
Train_index: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 16 17] Validation_index: [14 15]
Train_index: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] Validation_index: [16 17]
(二)随机排序交叉验证器
from sklearn.model_selection import ShuffleSplit
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([1, 2, 1, 2])
rs = ShuffleSplit(n_splits=3, train_size = 0.5, test_size=.25, random_state=None)
## 产生“随机排序交叉验证器”, n_splits:交叉验证器中的分裂迭代器数
rs.get_n_splits(X) ## 返回分割迭代次数
for train_index, test_index in rs.split(X):
## .split(X)方法返回迭代器,迭代器每次产生两个元素,
##1、训练数据集的索引;2、交叉验证数据集的索引。
print("TRAIN:", train_index, "TEST:", test_index)
运行结果:
3
TRAIN: [0 3] TEST: [1]
TRAIN: [3 2] TEST: [0]
TRAIN: [2 0] TEST: [3]
(三)针对若干组“训练-交叉验证数据集”,训练出若干个模型,并返回模型在交叉验证数据集上的若干得分
from sklearn.model_selection import cross_val_score
cv_scores = cross_val_score(model, X, y, cv)
## model为未经训练的模型, cv可以为上面提到的kfold或rs,
## 而cv_scores就是cv对应的若干个训练数据集训练出来的若干个模型
## 在对应的交叉验证数据集上的得分