关于ROC曲线和AUC的计算可以参考下文:
ROC对类别不平衡不敏感。AUC指标的缺陷是仅反应模型的排序能力,无法反应模型的拟合优度。(AUC的缺陷 - 知乎 (zhihu.com))
一、sklearn.metrics.roc_curve(y_true, y_score, *, pos_label=None, sample_weight=None, drop_intermediate=True)
只能用于二分类
y_true 和 y_score的shape为(n_samples,);返回fpr、tpr和thresholds,shape均为(n_samples,)
例子:
>>> import numpy as np >>> from sklearn import metrics >>> y = np.array([1, 1, 2, 2]) >>> scores = np.array([0.1, 0.4, 0.35, 0.8]) >>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2) >>> fpr array([0. , 0. , 0.5, 0.5, 1. ]) >>> tpr array([0. , 0.5, 0.5, 1. , 1. ]) >>> thresholds array([1.8 , 0.8 , 0.4 , 0.35, 0.1 ])
二、sklearn.metrics.precision_recall_curve(y_true, probas_pred, *, pos_label=None, sample_weight=None)
和sklearn.metrics.roc_curve类似
三、sklearn.metrics.auc(x, y)
This is a general function, given points on a curve,compute Area Under the Curve (AUC) using the trapezoidal rule.
参数说明:
例子:
>>> import numpy as np >>> from sklearn import metrics >>> y = np.array([1, 1, 2, 2]) >>> pred = np.array([0.1, 0.4, 0.35, 0.8]) >>> fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2) >>> metrics.auc(fpr, tpr) 0.75
四、sklearn.metrics.roc_auc_score(y_true, y_score, *, average='macro', sample_weight=None, max_fpr=None, multi_class='raise', labels=None)
可用于二分类, 多类单标签和多标签分类,但某些参数取值有限制
y_true和y_score的形状为(n_samples,) or (n_samples, n_classes)
average参数:只在多类别下有效。If None
, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. ‘macro’为计算每个类的auc,再取平均;‘micro’为将shape为(n_samples, n_classes)的y_true和y_score中每个元素都视为一个0/1gt标签和对应的概率,计算全局auc(Calculate metrics globally by considering each element of the label indicator matrix as a label)。
五、python实现二分类和多分类的ROC曲线教程
python实现二分类和多分类的ROC曲线教程 - 腾讯云开发者社区-腾讯云 (tencent.com)
参考:
sklearn.metrics.roc_curve — scikit-learn 1.1.1 documentation
sklearn.metrics.auc — scikit-learn 1.1.1 documentation
sklearn.metrics.roc_auc_score — scikit-learn 1.1.1 documentation