【sklearn】交叉验证

原理

在这里插入图片描述

1. sklearn.model_selection.KFold

1.1 KFold().split(x) 循环获取分割数据

from sklearn.model_selection import KFold

X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # 索引与值一样
'''
不管样本的标签(y)分布
shuffle 每次分割前打乱顺序
random_state shuffle=True时使用,设定后重复运行数据分组不变
'''
kf = KFold(n_splits=5, shuffle=False)
for train, test in kf.split(X, y):
    print(train, test)
'''
[2 3 4 5 6 7 8 9] [0 1]
[0 1 4 5 6 7 8 9] [2 3]
[0 1 2 3 6 7 8 9] [4 5]
[0 1 2 3 4 5 8 9] [6 7]
[0 1 2 3 4 5 6 7] [8 9]
'''
kf = KFold(n_splits=5, shuffle=True)
for train, test in kf.split(X, y):
    print(train, test)
'''
[0 1 2 4 5 6 7 9] [3 8]
[1 2 3 4 5 7 8 9] [0 6]
[0 1 3 4 6 7 8 9] [2 5]
[0 1 2 3 5 6 8 9] [4 7]
[0 2 3 4 5 6 7 8] [1 9]
'''

1.2 cross_validate(cv=KFold()) 作为cv参数

2. sklearn.model_selection.StratifiedKFold

from sklearn.model_selection import StratifiedKFold

X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]

skf = StratifiedKFold(n_splits=5, shuffle=False)
for train, test in skf.split(X, y):
    print(train, test)
'''
[1 2 3 5 6 7 8 9] [0 4]
[0 2 3 4 6 7 8 9] [1 5]
[0 1 3 4 5 7 8 9] [2 6]
[0 1 2 4 5 6 8 9] [3 7]
[0 1 2 3 4 5 6 7] [8 9]
'''
skf = StratifiedKFold(n_splits=5, shuffle=True)
for train, test in skf.split(X, y):
    print(train, test)
'''
[0 1 2 4 5 6 7 8] [3 9]
[0 1 3 4 6 7 8 9] [2 5]
[1 2 3 4 5 6 8 9] [0 7]
[0 2 3 4 5 6 7 9] [1 8]
[0 1 2 3 5 7 8 9] [4 6]
'''

3. sklearn.model_selection.GroupKFold /(np.random.shuffle)

  • 只有n_splits一个参数, 打乱可用np.random.shuffle实现
  • 作用: 保证同一个group的样本不会同时出现在训练集和测试集上
    即:一个group的多个样本要么出现在训练集,要么都出现在测试集
  • 意义: 若一个group中的样本即用于训练也用于测试,模型能充分学习该group样本的特征并在测试集表现良好,但遇到新group会表现较差。
import numpy as np
from sklearn.model_selection import GroupKFold

X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
groups = [1, 1, 1, 2, 3, 3, 4, 4, 5, 5]

gkf = GroupKFold(n_splits=5)
for train, test in gkf.split(X, y, groups=groups):
    print(train, test)
    # 先按group分组划分数据集,再样本顺序打乱
    np.random.shuffle(train_idx) 
'''
[3 4 5 6 7 8 9] [0 1 2]
[0 1 2 3 4 5 6 7] [8 9]
[0 1 2 3 4 5 8 9] [6 7]
[0 1 2 3 6 7 8 9] [4 5]
[0 1 2 4 5 6 7 8 9] [3]
'''

4. sklearn.model_selection.StratifiedGroupKFold

from sklearn.model_selection import StratifiedGroupKFold

sgkf = StratifiedGroupKFold(n_splits=10, shuffle=True)
for train_idx, test_idx in sgkf.split(X, y, groups):
	LR.fit(X[train_idx], y[train_idx])
	svr.fit(X[train_idx], y[train_idx])

5. sklearn.model_selection.cross_validate

【sklearn】RF 交叉验证 袋外数据 参数学习曲线 网格搜索

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值