交叉验证网格搜索笔记

最新推荐文章于 2024-04-25 20:07:55 发布

533_

最新推荐文章于 2024-04-25 20:07:55 发布

阅读量391

点赞数

分类专栏：机器学习文章标签：交叉验证网格搜索

本文链接：https://blog.csdn.net/qq_14993591/article/details/100823236

版权

机器学习专栏收录该内容

8 篇文章 0 订阅

订阅专栏

交叉验证

https://sklearn.apachecn.org/docs/0.21.3/30.html

cross_val_score

>>> from sklearn.model_selection import cross_val_score
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
>>> scores                                              
array([0.96..., 1.  ..., 0.96..., 0.96..., 1.        ])

>>> from sklearn import metrics
>>> scores = cross_val_score(
...     clf, iris.data, iris.target, cv=5, scoring='f1_macro')
>>> scores                                              
array([0.96..., 1.  ..., 0.96..., 0.96..., 1.        ])

>> from sklearn.pipeline import make_pipeline
>> clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
>> cross_val_score(clf, iris.data, iris.target, cv=5)
  ...                                                 
  array([ 0.97...,  0.93...,  0.95...])

K 折

KFold 将所有的样例划分为 k个组，称为折叠 (fold) （如果k=n，这等价于 Leave One Out（留一） 策略），都具有相同的大小（如果可能）。预测函数学习时使用 k-1个折叠中的数据，最后一个剩下的折叠会用于测试。

在 4 个样例的数据集上使用 2-fold 交叉验证的例子:

>>> import numpy as np
>>> from sklearn.model_selection import KFold

>>> X = ["a", "b", "c", "d"]
>>> kf = KFold(n_splits=2)
>>> for train, test in kf.split(X):
...     print("%s  %s" % (train, test))
[2 3] [0 1]
[0 1] [2 3]

在这里插入图片描述

分层 k 折

StratifiedKFold 是 k-fold 的变种，会返回 stratified（分层）的折叠：每个小集合中，各个类别的样例比例大致和完整数据集中相同。

在有 10 个样例的，有两个略不均衡类别的数据集上进行分层 3-fold 交叉验证的例子:

>>> from sklearn.model_selection import StratifiedKFold

>>> X = np.ones(10)
>>> y = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
>>> skf = StratifiedKFold(n_splits=3)
>>> for train, test in skf.split(X, y):
...     print("%s  %s" % (train, test))
[2 3 6 7 8 9] [0 1 4 5]
[0 1 3 4 5 8 9] [2 6 7]
[0 1 2 4 5 6 7] [3 8 9]

在这里插入图片描述

调整超参数

https://sklearn.apachecn.org/docs/0.21.3/31.html

param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

from sklearn.metrics import fbeta_score, make_scorer
from sklearn.model_selection import GridSearchCV
from sklearn.svm import LinearSVC


ftwo_scorer = make_scorer(fbeta_score, beta=2)

grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
corer(fbeta_score, beta=2)

grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)