scikit-learn：3.1. Cross-validation: evaluating estimator performance

最新推荐文章于 2023-07-15 07:12:24 发布

VIP文章 mmc2015

最新推荐文章于 2023-07-15 07:12:24 发布

阅读量2.3k

点赞数 1

分类专栏： scikit-learn scikit-learn 文章标签： scikit-learn 交叉验证模型评估

本文链接：https://blog.csdn.net/mmc2015/article/details/47099275

版权

参考：http://scikit-learn.org/stable/modules/cross_validation.html

overfitting很常见，所以提出使用test set来验证模型的performance。给个直观的例子：

>>> import numpy as np
>>> from sklearn import cross_validation
>>> from sklearn import datasets
>>> from sklearn import svm
>>> iris = datasets.load_iris()
>>> iris.data.shape, iris.target.shape
((150, 4), (150,))

>>> X_train, X_test, y_train, y_test = <strong>cross_validation.train_test_split</strong>(
...     iris.data, iris.target, <strong>test_size=0.4, random_state=0</strong>) #<span style="font-family: Arial, Helvetica, sans-serif;"><strong>holding out 40% of the data for testing</strong></span>
>>> X_train.shape, y_train.shape
((90, 4), (90,))
>>> X_test.shape, y_test.shape
((60, 4), (60,))
>>> clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)
>>> clf.score(X_test, y_test)                           
0.96...

还有个问题就是，超参数（ C=1）是人工设置，这样会造成overfitting。所以提出training set、validation set、test set的三级概念： training proceeds on the training set, after which evaluation is done on the validation set, and when the experiment seems to be successful, final evaluation can be done on the test set。

三级概念也有问题，数据量少时，进一步加重了训练数据的量少。所以提出 cross-validation (CV for short，k-fold CV)的概念：

A model is trained using $k-1$ of the folds as training data;
the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a performance measure such as accuracy).

The performance measure reported by k -fold cross-validation is then the average of the values computed in the loop.计算量虽然大，但好处多多。

1、 Computing cross-validated metrics

使用CV最简单的方法是，同时对estimator和dataset调用 cross_val_score helper function：

 
  >>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_validation.cross_val_score(
...    clf, iris.data, iris.target, cv=5)
...
>>>
 

最低0.47元/天解锁文章

mmc2015

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
scikit-learn：3.1. Cross-validation: evaluating estimator performance

参考：http://scikit-learn.org/stable/modules/cross_validation.htmloverfitting很常见，所以提出使用test set来验证模型的performance。给个直观的例子：>>> import numpy as np>>> from sklearn import cross_validation>>> from s
复制链接

扫一扫