python决策评价模型是什么_《Python机器学习基础教程》五、模型评估与改进（二）...

最新推荐文章于 2022-10-06 17:27:45 发布

weixin_39682944

最新推荐文章于 2022-10-06 17:27:45 发布

阅读量300

点赞数 1

文章标签： python决策评价模型是什么

本文链接：https://blog.csdn.net/weixin_39682944/article/details/113970596

版权

本文介绍了Python决策评价模型，包括混淆矩阵、精度、召回率、F1分数等评估指标。通过实例展示了如何使用这些指标比较不同模型，如决策树和Logistic回归，并探讨了如何通过调整阈值来平衡准确率和召回率。此外，还讨论了ROC曲线和AUC作为评估工具的重要性。

摘要由CSDN通过智能技术生成

三、评估指标与评分

2.二分类指标

(3)混淆矩阵

from sklearn.metrics import confusion_matrix

confusion = confusion_matrix(y_test, pred_logreg)

print("Confusion matrix:\n{}".format(confusion))

Confusion matrix: [[401 2] [ 8 39]]

mglearn.plots.plot_confusion_matrix_illustration()

即二分类混淆矩阵：

mglearn.plots.plot_binary_confusion_matrix()

使用混淆矩阵来比较前面拟合过的模型(两个虚拟模型、决策树和Logistic回归)

print("Most frequent class:")

print(confusion_matrix(y_test, pred_most_frequent))

print("\nDummy model:")

print(confusion_matrix(y_test, pred_dummy))

print("\nDecision tree:")

print(confusion_matrix(y_test, pred_tree))

print("\nLogistic Regression")

print(confusion_matrix(y_test, pred_logreg))

Most frequent class: [[403 0] [ 47 0]] Dummy model: [[369 34] [ 43 4]] Decision tree: [[390 13] [ 24 23]] Logistic Regression [[401 2] [ 8 39]]

从这个对比可以明确看出，只有决策树和Logistic回归给出了合理的结果，并且Logistic回归的效果全面好于决策树。

几个公式：

以上是精度、准确率、召回率与f-分数的公式。

几个f-分数的对比：

from sklearn.metrics import f1_score

print("f1 score most frequent:{:.2f}".format(

f1_score(y_test, pred_most_frequent)))

print("f1 score dummy:{:.2f}".format(f1_score(y_test, pred_dummy)))

print("f1 score tree:{:.2f}".format(f1_score(y_test, pred_tree)))

print("f1 score logistic regression:{:.2f}".format(

f1_score(y_test, pred_logreg)))

f1 score most frequent: 0.00 f1 score dummy: 0.09 f1 score tree: 0.55 f1 score logistic regression: 0.89

获取准确率、召回率和f1-分数的全面的总结，可以使用classification_report函数，同时计算这三个值，以美观的格式打印出来。

from sklearn.metrics import classification_report

print(classification_report(y_test, pred_most_frequent,

target_names=["not nine", "nine"]))

precision recall f1-score support not nine 0.90 1.00 0.94 403 nine 0.00 0.00 0.00 47 avg / total 0.80 0.90 0.85 450

print(classification_report(y_test, pred_dummy,

target_names=["not nine", "nine"]))

precision recall f1-score support not nine 0.90 0.92 0.91 403 nine 0.11 0.09 0.09 47 avg / total 0.81 0.83 0.82 450

print(classification_report(y_test, pred_logreg,

target_names=["not nine", "nine"]))

precision recall f1-score support not nine 0.98 1.00 0.99 403 nine 0.95 0.83 0.89 47 avg / total 0.98 0.98 0.98 450

(4)考虑不确定性

下面是一个不平衡二分类任务

from mglearn.datasets import make_blobs

X, y = make_blobs(n_samples=(400, 50), centers=2, cluster_std=[7.0, 2],

random_state=22)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

svc = SVC(gamma=.05).fit(X_train, y_train)

print(classification_report(y_test, svc.predict(X_test)))

precision recall f1-score support 0 0.97 0.89 0.93 104 1 0.35 0.67 0.46 9 avg / total 0.92 0.88 0.89 113

y_pred_lower_threshold = svc.decision_function(X_test) > -.8

print(classification_report(y_test, y_pred_lower_threshold))

precision recall f1-score support 0 1.00 0.82 0.90 104 1 0.32 1.00 0.49 9 avg / total 0.95 0.83 0.87 113

这方面的概念主要涉及了召回率和准确率的平衡，通过选择决策阈值来调整这一平衡，以后遇到时在查阅相关资料。

(5)准确率-召回率曲线

from sklearn.metrics import precision_recall_curve

precision, recall, thresholds = precision_recall_curve(

y_test, svc.decision_function(X_test))

# create a similar dataset as before, but with more samples

# to get a smoother curve

X, y = make_blobs(n_samples=(4000, 500), centers=2, cluster_std=[7.0, 2],

random_state=22)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

svc = SVC(gamma=.05).fit(X_train, y_train)

precision, recall, thresholds = precision_recall_curve(

y_test, svc.decision_function(X_test))

# find threshold closest to zero

close_zero = np.argmin(np.abs(thresholds))

plt.plot(precision[close_zero], recall[close_zero], 'o', markersize=10,

label="threshold zero", fillstyle="none", c='k', mew=2)

plt.plot(precision, recall, label="precision recall curve")

plt.xlabel("Precision")

plt.ylabel("Recall")

plt.legend(loc="best")

我们可以看到，在准确率约为0.75的位置对应的召回率0.4。黑色圆圈表示的是阈值为0的点，0是decision_function的默认阈值。这个点是在调用predict方法时所选择的折中点。

曲线越靠近右上角，则分类器越好。右上角的点表示对于同一个阈值，准确率和召回率都很高。

下面是比较SVM与随机森林的准确率-召回率曲线

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=100, random_state=0, max_features=2)

rf.fit(X_train, y_train)

# RandomForestClassifier has predict_proba, but not decision_function

precision_rf, recall_rf, thresholds_rf = precision_recall_curve(

y_test, rf.predict_proba(X_test)[:, 1])

plt.plot(precision, recall, label="svc")

plt.plot(precision[close_zero], recall[close_zero], 'o', markersize=10,

label="threshold zero svc", fillstyle="none", c='k', mew=2)

plt.plot(precision_rf, recall_rf, label="rf")

close_default_rf = np.argmin(np.abs(thresholds_rf - 0.5))

plt.plot(precision_rf[close_default_rf], recall_rf[close_default_rf], '^', c='k',

markersize=10, label="threshold 0.5 rf", fillstyle="none", mew=2)

plt.xlabel("Precision")

plt.ylabel("Recall")

plt.legend(loc="best")

从这张对比图可以看出，随机森林在极值处(要求很高的召回率或很高的准确率)的表现更好。在中间位置(准确率约为0.7)SVM的表现更好。

f1-分数和平均准确率的对比：

print("f1_score of random forest:{:.3f}".format(

f1_score(y_test, rf.predict(X_test))))

print("f1_score of svc:{:.3f}".format(f1_score(y_test, svc.predict(X_test))))

f1_score of random forest: 0.610

f1_score of svc: 0.656

from sklearn.metrics import average_precision_score

ap_rf = average_precision_score(y_test, rf.predict_proba(X_test)[:, 1])

ap_svc = average_precision_score(y_test, svc.decision_function(X_test))

print("Average precision of random forest:{:.3f}".format(ap_rf))

print("Average precision of svc:{:.3f}".format(ap_svc))

Average precision of random forest: 0.660

Average precision of svc: 0.666

(6)受试者工作特征(ROC)与AUC

受试者工作特征曲线(receiver operating characteristics curve)，简称为ROC曲线(ROC curve)。与准确率-召回率曲线类似，ROC曲线考虑了给定分类器的所有可能阈值，但它显示的是假正例率(false positive rate，FPR)和真正例率(true positive rate，TPR)，而不是报告准确率和召回率。(真正例率只是召回率的另一个名称，而假正例率则是假正例占所有反类样本的比例)：

可以用roc_curve函数来计算ROC曲线：

from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_test, svc.decision_function(X_test))

plt.plot(fpr, tpr, label="ROC Curve")

plt.xlabel("FPR")

plt.ylabel("TPR (recall)")

# find threshold closest to zero

close_zero = np.argmin(np.abs(thresholds))

plt.plot(fpr[close_zero], tpr[close_zero], 'o', markersize=10,

label="threshold zero", fillstyle="none", c='k', mew=2)

plt.legend(loc=4)

对ROC曲线，理想的曲线要靠近左上角：你希望分类器的召回率很高，同时保持假正例率很低。从曲线中可以看出，与默认阈值0相比，我们可以得到明显更高的召回率(约0.9)，而FPR仅稍有增加。最接近左上角的点可能是比默认选择更好地工作点。同样注意，不应该在测试集上选择阈值，而是应该在单独的验证集上选择。

下面给出随机森林和SVM的ROC曲线对比：

fpr_rf, tpr_rf, thresholds_rf = roc_curve(y_test, rf.predict_proba(X_test)[:, 1])

plt.plot(fpr, tpr, label="ROC Curve SVC")

plt.plot(fpr_rf, tpr_rf, label="ROC Curve RF")

plt.xlabel("FPR")

plt.ylabel("TPR (recall)")

plt.plot(fpr[close_zero], tpr[close_zero], 'o', markersize=10,

label="threshold zero SVC", fillstyle="none", c='k', mew=2)

close_default_rf = np.argmin(np.abs(thresholds_rf - 0.5))

plt.plot(fpr_rf[close_default_rf], tpr[close_default_rf], '^', markersize=10,

label="threshold 0.5 RF", fillstyle="none", c='k', mew=2)

plt.legend(loc=4)

与准确率-召回率曲线一样，我们通常希望使用一个数字来总结ROC曲线，即曲线下的面积(通常被称为AUC(area under the curve)，这里的曲线指的就是ROC曲线)。我们可以利用roc_auc_sore函数来计算ROC曲线下的面积：

from sklearn.metrics import roc_auc_score

rf_auc = roc_auc_score(y_test, rf.predict_proba(X_test)[:, 1])

svc_auc = roc_auc_score(y_test, svc.decision_function(X_test))

print("AUC for Random Forest:{:.3f}".format(rf_auc))

print("AUC for SVC:{:.3f}".format(svc_auc))

AUC for Random Forest: 0.937

AUC for SVC: 0.916

一个不同gamma值的SVM的ROC曲线的对比例子：

y = digits.target == 9

X_train, X_test, y_train, y_test = train_test_split(

digits.data, y, random_state=0)

plt.figure()

for gamma in [1, 0.05, 0.01]:

svc = SVC(gamma=gamma).fit(X_train, y_train)

accuracy = svc.score(X_test, y_test)

auc = roc_auc_score(y_test, svc.decision_function(X_test))

fpr, tpr, _ = roc_curve(y_test , svc.decision_function(X_test))

print("gamma ={:.2f}accuracy ={:.2f}AUC ={:.2f}".format(

gamma, accuracy, auc))

plt.plot(fpr, tpr, label="gamma={:.3f}".format(gamma))

plt.xlabel("FPR")

plt.ylabel("TPR")

plt.xlim(-0.01, 1)

plt.ylim(0, 1.02)

plt.legend(loc="best")

gamma = 1.00 accuracy = 0.90 AUC = 0.50

gamma = 0.05 accuracy = 0.90 AUC = 1.00

gamma = 0.01 accuracy = 0.90 AUC = 1.00

3.多分类指标

10个数字分类任务的混淆矩阵

from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(

digits.data, digits.target, random_state=0)

lr = LogisticRegression().fit(X_train, y_train)

pred = lr.predict(X_test)

print("Accuracy:{:.3f}".format(accuracy_score(y_test, pred)))

print("Confusion matrix:\n{}".format(confusion_matrix(y_test, pred)))

Accuracy: 0.953

Confusion matrix:

[[37 0 0 0 0 0 0 0 0 0]

[ 0 39 0 0 0 0 2 0 2 0]

[ 0 0 41 3 0 0 0 0 0 0]

[ 0 0 1 43 0 0 0 0 0 1]

[ 0 0 0 0 38 0 0 0 0 0]

[ 0 1 0 0 0 47 0 0 0 0]

[ 0 0 0 0 0 0 52 0 0 0]

[ 0 1 0 1 1 0 0 45 0 0]

[ 0 3 1 0 0 0 0 0 43 1]

[ 0 0 0 1 0 1 0 0 1 44]]

scores_image = mglearn.tools.heatmap(

confusion_matrix(y_test, pred), xlabel='Predicted label',

ylabel='True label', xticklabels=digits.target_names,

yticklabels=digits.target_names, cmap=plt.cm.gray_r, fmt="%d")

plt.title("Confusion matrix")

plt.gca().invert_yaxis()

print(classification_report(y_test, pred))

precision recall f1-score support

0 1.00 1.00 1.00 37

1 0.89 0.91 0.90 43

2 0.95 0.93 0.94 44

3 0.90 0.96 0.92 45

4 0.97 1.00 0.99 38

5 0.98 0.98 0.98 48

6 0.96 1.00 0.98 52

7 1.00 0.94 0.97 48

8 0.93 0.90 0.91 48

9 0.96 0.94 0.95 47

avg / total 0.95 0.95 0.95 450

print("Micro average f1 score:{:.3f}".format(

f1_score(y_test, pred, average="micro")))

print("Macro average f1 score:{:.3f}".format(

f1_score(y_test, pred, average="macro")))

Micro average f1 score: 0.953

Macro average f1 score: 0.954

4.回归指标

一般来说，我们认为R^2是评估回归模型的更直观的指标。

5.在模型选择中使用评估指标

scikit-learn提供了一种非常简便的实现方法，就是scoring参数，它可以同时用于GridSearchCV和cross_val_score。你只需要提供一个字符串，用于描述想要使用的评估指标。

# default scoring for classification is accuracy

print("Default scoring:{}".format(

cross_val_score(SVC(), digits.data, digits.target == 9)))

# providing scoring="accuracy" doesn't change the results

explicit_accuracy = cross_val_score(SVC(), digits.data, digits.target == 9,

scoring="accuracy")

print("Explicit accuracy scoring:{}".format(explicit_accuracy))

roc_auc = cross_val_score(SVC(), digits.data, digits.target == 9,

scoring="roc_auc")

print("AUC scoring:{}".format(roc_auc))

Default scoring: [0.9 0.9 0.9]

Explicit accuracy scoring: [0.9 0.9 0.9]

AUC scoring: [0.994 0.99 0.996]

X_train, X_test, y_train, y_test = train_test_split(

digits.data, digits.target == 9, random_state=0)

# we provide a somewhat bad grid to illustrate the point:

param_grid = {'gamma': [0.0001, 0.01, 0.1, 1, 10]}

# using the default scoring of accuracy:

grid = GridSearchCV(SVC(), param_grid=param_grid)

grid.fit(X_train, y_train)

print("Grid-Search with accuracy")

print("Best parameters:", grid.best_params_)

print("Best cross-validation score (accuracy)):{:.3f}".format(grid.best_score_))

print("Test set AUC:{:.3f}".format(

roc_auc_score(y_test, grid.decision_function(X_test))))

print("Test set accuracy:{:.3f}".format(grid.score(X_test, y_test)))

# using AUC scoring instead:

grid = GridSearchCV(SVC(), param_grid=param_grid, scoring="roc_auc")

grid.fit(X_train, y_train)

print("\nGrid-Search with AUC")

print("Best parameters:", grid.best_params_)

print("Best cross-validation score (AUC):{:.3f}".format(grid.best_score_))

print("Test set AUC:{:.3f}".format(

roc_auc_score(y_test, grid.decision_function(X_test))))

print("Test set accuracy:{:.3f}".format(grid.score(X_test, y_test)))

Grid-Search with accuracy

Best parameters: {'gamma': 0.0001}

Best cross-validation score (accuracy)): 0.970

Test set AUC: 0.992

Test set accuracy: 0.973

Grid-Search with AUC

Best parameters: {'gamma': 0.01}

Best cross-validation score (AUC): 0.997

Test set AUC: 1.000

Test set accuracy: 1.000

from sklearn.metrics.scorer import SCORERS

print("Available scorers:\n{}".format(sorted(SCORERS.keys())))

Available scorers:

['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'log_loss', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']

weixin_39682944

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫