机器学习笔记：交叉验证 Cross Validation

最新推荐文章于 2023-05-12 17:36:07 发布

weixin_34205826

最新推荐文章于 2023-05-12 17:36:07 发布

阅读量259

点赞数

文章标签：人工智能数据结构与算法

原文链接：https://juejin.im/post/5ab88b7d6fb9a028e11ff06d

版权

测试的好处

With separate training and testing dataset, we would know how are the performance of our learning model against dataset that haven't been seen.
Serves as check on overfitting

在sklearn中训练/测试分离

train_test_split是交叉验证中常用的函数

功能是从样本中随机的按比例选取训练集（training set）和测试集（test set）

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

参数解释：

train_data：所要划分的样本特征集

train_target：所要划分的样本结果

test_size：样本占比，如果是整数的话就是样本的数量

random_state：是随机数的种子，其实就是该组随机数的编号，在需要重复试验的时候，保证得到一组一样的随机数

何处使用训练与测试数据

训练

测试

K折交叉验证

Another method is using K-Fold, where you split our dataset into K units. You narrow the test set to 1 units, and K-1 units as training set.

Then we take iterative K-steps with different test bin each steps, springing K units test results.

Then you average the results.

This will give you max accuracy, as all bagging method, but gives up to longer training time than usual.

sklearn中的k-fold cross-validation

红色部分标记出的cv = KFold( len(authors), 2 )

需要改成随机化的： cv = KFold( len(authors), 2, shuffle=True )

为参数调整而进行的交叉验证

GridSearchCV 用于系统地遍历多种参数组合，通过交叉验证确定最佳效果参数。它的好处是，只需增加几行代码，就能遍历多种组合。

parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} svr = svm.SVC() clf = grid_search.GridSearchCV(svr, parameters) clf.fit(iris.data, iris.target)

parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}

各组合均用于训练 SVM，并使用交叉验证对表现进行评估

svr = svm.SVC() “分类器”在这种情况下不仅仅是一个算法，而是算法加参数值

clf = grid_search.GridSearchCV(svr, parameters) 我们传达算法 (svr) 和参数 (parameters) 字典来尝试，它生成一个网格的参数组合进行尝试

clf.fit(iris.data, iris.target) 拟合函数现在尝试了所有的参数组合，并返回一个合适的分类器，自动调整至最佳参数组合

可通过 clf.best_params_ 来获得参数值

weixin_34205826

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。