[Python] 使用约登指数寻找最佳ROC曲线阈值

燕策西

已于 2022-12-02 10:03:29 修改

阅读量1.3w

点赞数 18

分类专栏：二元分类文章标签： python

于 2020-07-24 17:59:15 首次发布

本文链接：https://blog.csdn.net/weixin_43543177/article/details/107565947

版权

二元分类专栏收录该内容

2 篇文章 0 订阅

订阅专栏

[预备知识]

对于二元分类结果评价，ROC曲线是常用标准，其使用TPR与FPR绘制而成。(相关知识推荐博文：一文让你彻底理解准确率，精准率，召回率，真正率，假正率，ROC/AUC) 而TPR与FPR的计算是根据选定的一系列阈值(Threshold)得到的，本文的目的便是寻找最优阈值，在假正率FPR与真正率TPR之间做折中。ROC用以判断分类模型好坏，是否足以区分两类对象，而寻找最佳阈值可以使分类效果达到最优，符合实际应用要求。

[计算方法]

`Youden Index`

参考链接：全面了解ROC曲线
寻找最优阈值
如图所示，该方法的思想是找到横坐标 $1 - S p e c i f i c i t y$ 与纵坐标 $S e n s i t i v i t y$ 差异最大的点所对应的阈值。在本文中描述为：
$i n d e x = a r g m a x (T P R - F P R),$
最终可以得到最优阈值及其ROC曲线坐标：
$th_{optimal}=thresholds[index]$

$point_{optimal}=(FPR[index], TPR[index])$
很简单吧！

`Codes using Python`

def Find_Optimal_Cutoff(TPR, FPR, threshold):
	"""
	threshold 一般通过sklearn.metrics里面的roc_curve得到，具体不赘述，可以参考其他资料。
	:param threshold: array, shape = [n_thresholds]
	"""
    y = TPR - FPR
    Youden_index = np.argmax(y)  # Only the first occurrence is returned.
    optimal_threshold = threshold[Youden_index]
    point = [FPR[Youden_index], TPR[Youden_index]]
    return optimal_threshold, point

ROC的计算及绘制也放一下：

def ROC(label, y_prob):
    """
    Receiver_Operating_Characteristic, ROC
    :param label: (n, )
    :param y_prob: (n, )
    :return: fpr, tpr, roc_auc, optimal_th, optimal_point
    """
    fpr, tpr, thresholds = metrics.roc_curve(label, y_prob)
    roc_auc = metrics.auc(fpr, tpr)
    optimal_th, optimal_point = Find_Optimal_Cutoff(TPR=tpr, FPR=fpr, threshold=thresholds)
    return fpr, tpr, roc_auc, optimal_th, optimal_point

fpr, tpr, roc_auc, optimal_th, optimal_point = ROC(labels, img_distance)

plt.figure(1)
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.3f}")
plt.plot([0, 1], [0, 1], linestyle="--")
plt.plot(optimal_point[0], optimal_point[1], marker='o', color='r')
plt.text(optimal_point[0], optimal_point[1], f'Threshold:{optimal_th:.2f}')
plt.title("ROC-AUC")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.legend()
plt.show()

[结果]

在这里插入图片描述
红色的点即为最佳阈值啦，完结撒花。

燕策西

关注

18
点赞
踩
92

收藏

觉得还不错? 一键收藏
12
评论
[Python] 使用约登指数寻找最佳ROC曲线阈值

[预备知识]对于二元分类结果评价，ROC曲线是常用标准，其使用TPR与FPR绘制而成。(相关知识推荐博文：一文让你彻底理解准确率，精准率，召回率，真正率，假正率，ROC/AUC) 而TPR与FPR的计算是根据选定的一系列阈值(Threshold)得到的，本文的目的便是寻找最优阈值，在假正率FPR与真正率TPR之间做折中。ROC用以判断分类模型好坏，是否足以区分两类对象，而寻找最佳阈值可以使分类效果达到最优，符合实际应用要求。[计算方法]Youden Index参考链接：全面了解ROC曲线如图
复制链接

扫一扫

专栏目录