sklearn 交叉验证与参数寻优

最新推荐文章于 2024-08-22 17:38:53 发布

五道口纳什

最新推荐文章于 2024-08-22 17:38:53 发布

阅读量2.4k

点赞数

本文链接：https://blog.csdn.net/lanchunhui/article/details/50520118

版权

3.3. Model evaluation: quantifying the quality of predictions — scikit-learn 0.19.2 documentation

sklearn.model_selection 模型评估支持的metric
- Classification：accuracy/average_precision/f1/f1_micro/f1_macro/f1_weighted/f1_samples/neg_log_loss/precision/recall/roc_auc
- Clustering：adjusted_mutual_info_score/adjusted_rand_score/completeness_score/fowlkes_mallows_score/homogeneity_score/mutual_info_score/normalized_mutual_info_score/v_measure_score
- Regression：explained_variance/neg_mean_absolute_error/neg_mean_squared_error/neg_mean_squared_log_error/neg_median_absolute_error/r2

1. 交叉验证

sklearn.cross_validation ⇒ 新版的 sklearn 已将其移步至 sklearn.model_selection

cv: cross_validation，交叉验证

KFold 与 StratifiedKFold

from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import KFold

StratifiedKFold 是改进升级版的 KFold 方法，两者实现上的区别，根据二者的接口便知，

class sklearn.model_selection.KFold(n, n_folds=3, shuffle=False, random_state=None)
                            # 一般取 n == len(y)
                            # 也即根据长度进行split
                            # 不考虑class distribution
class sklearn.model_selection.StratifiedKFold(y, n_folds=3, 
                        shuffle=False, random_state=None)

cross_val_score

参数：cv=10, scoring='neg_mean_squared_error'

alphas = np.logspace(-3, 2, 50)
test_scores = []
for alpha in alphas:
    clf = Ridge(alpha)
    test_score = np.sqrt(-cross_val_score(clf, X_train, y_train, 
                                    cv=10, scoring='neg_mean_squared_error'))
    test_scores.append(np.mean(test_score))

2. 参数寻优

网格搜索：GridSearchCV
- http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

svm + GridSearchCV 为例：

iris = datasets.load_iris()
svc = svm.SVC()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
    # 以字典构建参数集合，最终组合出来的参数有 4 种（2*2）；
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)

for k, v in clf.cv_results_.items():
    print(k, v)