K折交叉验证

最新推荐文章于 2024-04-21 18:42:26 发布

weixin_40248634

最新推荐文章于 2024-04-21 18:42:26 发布

阅读量231

点赞数

分类专栏：学习笔记文章标签：机器学习

原文链接：https://blog.csdn.net/Softdiamonds/article/details/80062638

版权

学习笔记专栏收录该内容

34 篇文章 3 订阅

订阅专栏

转载自：https://blog.csdn.net/Softdiamonds/article/details/80062638

K折交叉验证

将初始采样（样本集X，Y）分割成K份，一份被保留作为验证模型的数据（test set），其他K-1份用来训练（train set）。交叉验证重复K次，每份验证一次，平均K次的结果或者使用其它结合方式，最终得到一个单一估测。

这个方法的优势在于，同时重复运用随机产生的子样本进行训练和验证，每次的结果验证一次，10折交叉验证是最常用的。（切记每次作为验证模型的数据是不同的）。

示例

from sklearn.model_selection import KFold  
import numpy as np  
X = np.arange(24).reshape(12,2)  
y = np.random.choice([1,2],12,p=[0.4,0.6])  
kf = KFold(n_splits=5,shuffle=False)  
for train_index , test_index in kf.split(X):  
       print('train_index:%s , test_index: %s ' %(train_index,test_index))

train_index:[ 3 4 5 6 7 8 9 10 11] , test_index: [0 1 2]
train_index:[ 0 1 2 6 7 8 9 10 11] , test_index: [3 4 5]
train_index:[ 0 1 2 3 4 5 8 9 10 11] , test_index: [6 7]
train_index:[ 0 1 2 3 4 5 6 7 10 11] , test_index: [8 9]
train_index:[0 1 2 3 4 5 6 7 8 9] , test_index: [10 11]




# 参数说明：

n_splits：表示划分几等份

shuffle：在每次划分时，是否进行洗牌

①若为Falses时，其效果等同于random_state等于整数，每次划分的结果相同

②若为True时，每次划分的结果都不一样，表示经过洗牌，随机取样的

random_state：随机种子数

属性：

①get_n_splits(X=None, y=None, groups=None)：获取参数n_splits的值

②split(X, y=None, groups=None)：将数据集划分成训练集和测试集，返回索引生成器

通过一个不能均等划分的栗子，设置不同参数值，观察其结果

①设置shuffle=False，运行两次，发现两次结果相同