T3 模型评估

最新推荐文章于 2022-11-06 14:27:08 发布

Guoliang Li

最新推荐文章于 2022-11-06 14:27:08 发布

阅读量436

点赞数

分类专栏：数据处理

本文链接：https://blog.csdn.net/qq_35740095/article/details/85890897

版权

数据处理专栏收录该内容

6 篇文章 0 订阅

订阅专栏

说明：对前面T1.1模型构建,预测贷款用户是否会逾期代码的补充，需在上一篇代码之后运行

记录7个模型（逻辑回归、SVM、决策树、随机森林、GBDT、XGBoost和LightGBM）关于accuracy、precision，recall和F1-score、auc值的评分表格，及ROC曲线。
7个模型的accuracy、precision，recall和F1-score、auc值

模型评价指标

准确率（accuracy）
准确率 = 正确识别的数量／所有数量

1.分类准确率不能告诉你响应值的潜在分布，并且它也不能告诉你分类器犯错的类型。
2.准确率的缺陷在于不适用于skewed class，skewed class是指有很多数据点，大部分属于一个类，其余的小部分属于一个类，比如titanic生还问题，猜测全部死亡，accuracy也不会很低，一些算法算出来可能还不猜测全部死亡准确率高，同理猜测全部存活，accuracy就会很低，可能再怎么进行下一步判断也依然提高不了多少。

精确率（precision）
precision = true_positives / (true_positives + false_positives)
正确归为此类的占（正确归为此类的+误归为此类的）百分比。
召回率（recall）
recall = true_positives / (true_positives + false_negtives)
正确归为此类的占（正确归为此类的+本来是此类但是没有归为此类的）百分比。就是所有准确的条目有多少被检索出来了。
F1 Score
F1 score (或称 F-score 或 F-measure) ，是一个兼顾考虑了Precision 和 Recall 的评估指标。通常， F-measure 就是指 Precision 和 Recall 的调和平均数（Harmonic mean）
F1 = 2 x (精确率 x 召回率) / (精确率 + 召回率)

同时考虑精确率和召回率，以便计算新的分数。可将 F1 分数理解为精确率和召回率的加权平均值，其中 F1 分数的最佳值为 1、最差值为 0

AUC&ROC
如下图，详细见模型评价指标

代码

models = {'lr': lr,
          'svc': svc,
          'dt': dt,
          'RF': rf,
          'GBDT': GBDT,
          'XGBoost': XGBoost,
          'LightGBM': LightGBM}

df_result = pd.DataFrame(columns=('model', 'accuracy', 'precision', 'Recall', 'F1 score', 'AUC','AUC1'))
row = 0

for name,model in models.items():
    y_test_pred = model.predict(x_test)

    acc = metrics.accuracy_score(y_test, y_test_pred)
    p = metrics.precision_score(y_test, y_test_pred)
    r = metrics.recall_score(y_test, y_test_pred)
    f1 = metrics.f1_score(y_test, y_test_pred)

    y_test_proba = model.predict_proba(x_test)
    fpr, tpr, thresholds = metrics.roc_curve(y_test, y_test_proba[:, 1])
    auc = metrics.auc(fpr, tpr)
    auc1 = metrics.roc_auc_score(y_test, y_test_proba[:, 1])  # auc求法2

    df_result.loc[row] = [name, acc, p, r, f1, auc,auc1]
    row += 1
print(df_result)

      model  accuracy  precision    Recall  F1 score       AUC      AUC1
0  LightGBM  0.770147   0.570136  0.350975  0.434483  0.757402  0.757402
1   XGBoost  0.785564   0.630542  0.356546  0.455516  0.771363  0.771363
2       svc  0.748423   0.000000  0.000000  0.000000  0.500000  0.500000
3        RF  0.766643   0.576471  0.272981  0.370510  0.708059  0.708059
4      GBDT  0.780659   0.611650  0.350975  0.446018  0.763828  0.763828
5        lr  0.748423   0.000000  0.000000  0.000000  0.567460  0.567460
6        dt  0.684653   0.382429  0.412256  0.396783  0.594237  0.594237

ROC曲线

def plot_roc_curve(fpr, tpr, label=None):
    plt.plot(fpr, tpr, label=label)

plt.figure(figsize=(8, 6))
for name,clf in models.items():
    proba = clf.predict_proba(x_test)[:,1]
    fpr, tpr, thresholds = metrics.roc_curve(y_test, proba)
    plot_roc_curve(fpr, tpr, label=name)
    
#plt.plot([0, 1], [0, 1], 'k--')# 对角线
plt.axis([0, 1, 0, 1])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')   
plt.legend()
plt.show()

在这里插入图片描述

目前情况下，XGBoost的效果最好。

参考资料
sklearn.metrics中的评估方法(accuracy_score,recall_score,roc_curve,roc_auc_score,confusion_matrix)
机器学习项目流程及模型评估验证
 https://shimo.im/docs/jse5ZZhdvEQR4siC

Guoliang Li

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
T3 模型评估

说明：对前面Task1.1模型构建,预测贷款用户是否会逾期代码的补充，需在上一篇代码之后运行记录7个模型（逻辑回归、SVM、决策树、随机森林、GBDT、XGBoost和LightGBM）关于accuracy、precision，recall和F1-score、auc值的评分表格，并画出ROC曲线。必须有的5关键点：7个模型的accuracy、precision，recall和F1-score...
复制链接

扫一扫