3.3. Model evaluation: quantifying the quality of predictions — scikit-learn 0.19.2 documentation
- sklearn.model_selection 模型评估支持的metric
- Classification:accuracy/average_precision/f1/f1_micro/f1_macro/f1_weighted/f1_samples/neg_log_loss/precision/recall/roc_auc
- Clustering:adjusted_mutual_info_score/adjusted_rand_score/completeness_score/fowlkes_mallows_score/homogeneity_score/mutual_info_score/normalized_mutual_info_score/v_measure_score
- Regression:explained_variance/neg_mean_absolute_error/neg_mean_squared_error/neg_mean_squared_log_error/neg_median_absolute_error/r2
1. 交叉验证
sklearn.cross_validation ⇒ 新版的 sklearn 已将其移步至 sklearn.model_selection
cv: cross_validation,交叉验证
KFold 与 StratifiedKFold
from sklearn.model_selection import StratifiedKFold from sklearn.model_selection import KFold
StratifiedKFold 是改进升级版的 KFold 方法,两者实现上的区别,根据二者的接口便知,
class sklearn.model_selection.KFold(n, n_folds=3, shuffle=False, random_state=None) # 一般取 n == len(y) # 也即根据长度进行split # 不考虑class distribution class sklearn.model_selection.StratifiedKFold(y, n_folds=3, shuffle=False, random_state=None)
cross_val_score
- 参数:
cv=10, scoring='neg_mean_squared_error'
alphas = np.logspace(-3, 2, 50) test_scores = [] for alpha in alphas: clf = Ridge(alpha) test_score = np.sqrt(-cross_val_score(clf, X_train, y_train, cv=10, scoring='neg_mean_squared_error')) test_scores.append(np.mean(test_score))
- 参数:
2. 参数寻优
网格搜索:GridSearchCV
svm + GridSearchCV 为例:
iris = datasets.load_iris() svc = svm.SVC() parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} # 以字典构建参数集合,最终组合出来的参数有 4 种(2*2); clf = GridSearchCV(svc, parameters) clf.fit(iris.data, iris.target) for k, v in clf.cv_results_.items(): print(k, v)