人工智能基础 ROC-AUC 与 PR-AUC 介绍 2021-10-01

最新推荐文章于 2025-04-05 11:17:41 发布

偶入编程深似海

最新推荐文章于 2025-04-05 11:17:41 发布

阅读量1.4k

点赞数

分类专栏：人工智能基础文章标签：人工智能 python

本文链接：https://blog.csdn.net/qq_21438267/article/details/120580941

版权

人工智能基础专栏收录该内容

21 篇文章

订阅专栏

本文介绍了ROC-AUC和PR-AUC的概念及其在人工智能基础中的应用。ROC曲线关注模型对正负样本的分类能力，而PR曲线侧重正样本的精确性和召回率。ROC-AUC适用于类别平衡的情况，PR-AUC适合类别不平衡。通过Python代码示例展示了如何计算和绘制ROC-AUC和PR-AUC曲线。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

人工智能基础总目录

第一章 K-近邻算法
 第一章决策树
 第一章神经网络及其构成
 第一章 ROC-AUC 与 PR-AUC 的区别与联系

一、基础术语解释

	正例	负例
预测正	真正例(true positive, TP)	假正例(false positive, FP)
预测负	假负例(false negative, FN)	真负例(true negative, TN)

1.真正例率(true positive ratio) : TPR = $\frac{TP}{TP+TN}$ 表示的是所有正例中被预测为正例的比例
2.假正例率(false positive ratio): FPR = $\frac{FP}{FP+FN}$ ，表示所有负例中被错误地预测为正例的比例
3.精确度(precision): P = $\frac{TP}{TP+FP}$ ,表示所有预测为正的样本中真正为正样本的比例
4.召回率(recall): R =P. 表示的是所有正例中被预测为正例的比例，其中，真正例率等于召回率。

二、ROC-AUC 与 PR-AUC

2.1 定义及计算

ROC （receiver operating characteristic curve）接收者操作特征曲线
AUC, Area Under Curve 指的是ROC曲线下的面积.
通过在［0, 1］范围内选取阈值(threshold)来计算对应的TPR和FPR,最终将所有点连起来构成ROC曲线。.
—个没有任何分类能力的模型，意味着TPR和FPR将会相等(所有正例将会有一半被预测为正例，所有负例也将会有一半被预测为正例)，这时ROC曲线将会如下图蓝色虚线所示。

ROC-AUC 计算

？？

PR-AUC 计算

PR-AUC的构造和上述过程基本一致，只是需要再计算出Precision和Recall。
？？

三、使用场景和Python计算

所以当类别相对来说较均衡时，可以使用ROC-AUC,当类别极其不均衡时使用PR-AUC较好。

那为什么不只使用PR-AUC呢？
ROC-AUC对于分类模型来说存在的意义是什么？
看了许多文章多采用一个说法：从各自两个指标来看，TPR和FPR分别聚焦于模型对正样本和负样本的分类能力，而Precision和Recall都是针对正样本的指标，没有考虑负样本。所以当我们希望模型在正负样本上都能表现较好时使用ROC-AUC衡量，如果我们只关注模型对正样本的分辨能力使用PR-AUC更好。但是菜鸟本鸟我并没能理解，这个坑先放着。

3.1 ROC-AUC 相关代码

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from matplotlib import pyplot
#generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
#split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2)
#generate a no skill prediction (majority class)
ns_probs = [0 for _ in range(len(testy))]
#fit a model model = LogisticRegression(solver='lbfgs') model.fit(trainX, trainy)
#predict probabilities
lr_probs = model.predict_proba(testX)
#keep probabilities for the positive outcome only
lr_probs = lr_probs[:, 1]
#calculate scores
ns_auc = roc_auc_score(testy, ns_probs)
lr_auc = roc_auc_score(testy, lr_probs)


#summarize scores
print('No Skill: ROC AUC=%.3f' % (ns_auc))
print('Logistic: ROC AUC=%.3f' % (lr_auc))
#calculate roc curves
ns_fpr, ns_tpr, _ = roc_curve(testy, ns_probs)
lr_fpr, lr_tpr, _ = roc_curve(testy, lr_probs)# plot the roc curve for the model
pyplot.plot(ns_fpr, ns_tpr, linestyle=label= 'No Skill')
pyplot.plot(lr_fpr, lr_tpr, marker=label= ' Logistic')
#axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel( 'True Positive Rate')
#show the legend
pyplot.legend()
#show the plot
pyplot.show()

3.2 PR-AUC 相关代码

# precision-recall curve and f1
from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import precision_recall_curve from sklearn.metrics import f1_score from sklearn.metrics import auc from matplotlib import pyplot
# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2)
# fit a model
model = LogisticRegression(solver='lbfgs')model.fit(trainX, trainy)
# predict probabilities
lr_probs = model.predict_proba(testX)
# keep probabilities for the positive outcome only
lr_probs = lr_probs[:, 1]
# predict class values
yhat = model.predict(testX) lr_precision, lr_recall, _ = precision_recall_curve(testy, lr_probs) lr_f1, lr_auc = f1_score(testy, yhat), auc(lr_recall, lr_precision)
# summarize scoresprint('Logistic: f1=%.3f auc=%.3f' % (lr_f1, lr_auc))
# plot the precision-recall curves
no_skill = len(testy[testy==1]) / len(testy) pyplot.plot([0, 1], [no_skill, no_skill], linestyle='--', label='No
Skill')
pyplot.plot(lr_recall, lr_precision, marker='.', label='Logistic')
# axis labels
pyplot.xlabel('Recall') pyplot.ylabel('Precision') # show the legendpyplot.legend()
# show the plot
pyplot.show()