ROC曲线与PR曲线的理解与简单实现

最新推荐文章于 2024-09-12 21:05:39 发布

BlackdogC

最新推荐文章于 2024-09-12 21:05:39 发布

阅读量71

点赞数

文章标签：机器学习人工智能

本文链接：https://blog.csdn.net/m0_63567560/article/details/133994763

版权

原理

1.几个度量的含义以及计算方法

混淆矩阵：表示模型将样本分类的结果的矩阵

在Roc曲线和PR曲线中有四个重要的度量：精确率(查准率)，召回率(查全率)，真正例率，假正例率。

精确率（Pression）：

$P=\frac{TP}{TP+FP}$

召回率(recall):

$R=\frac{TP}{TP+FN}$

真正例率(TPR):

$TPR=\frac{TP}{TP+FN}$

假正例率(FPR):

$FPR=\frac{FP}{TN+FP}$

2.对PR曲线与ROC曲线的理解

一：PR曲线

召回率(recall): 在所有的真实正例中，被正确预测的占比
精确度(precision):在所有的预测正例中，预测正确的占比

绘制的原理大概为改变阈值，生成不同阈值下的R与R，然后连成PR曲线。

如果一条曲线包裹住了另一条曲线，则说明他的性能比后者好，如果有交叉，则计算他们的面积。

二：ROC曲线

TPR（True Positive Rate）可以理解为所有正类中，有多少被预测成正类（正类预测正确）

FPR（False Positive Rate）可以理解为所有反类中，有多少被预测成正类（正类预测错误）

曲线生成原理与PR曲线类似

也是面积越大效果越好

简单绘制两种曲线

给定置信度以及标签

confidence_scores = np.array([0.9, 0.78, 0.6, 0.46, 0.4, 0.37, 0.2, 0.16])
data_labels = np.array([1,1,0,1,0,0 ,1,1])

代码

import numpy as np
import matplotlib.pyplot as plt

confidence_scores = np.array([0.9, 0.78, 0.6, 0.46, 0.4, 0.37, 0.2, 0.16])
data_labels = np.array([1,1,0,1,0,0 ,1,1])
thre = np.array([1,0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,0.1,0])

P = np.sum(data_labels == 1) 
N = np.sum(data_labels == 0) 

sorted_indices = np.argsort(confidence_scores)[::-1]
sorted_labels = data_labels[sorted_indices] 

thresholds = []
tprs = []
fprs = []
ps = []
rs = []

TP = 0 
FP = 0
FN = P
TN = N

for i in range(len(confidence_scores)):
    threshold = thre[i]
    thresholds.append(threshold)
    
    if sorted_labels[i] == 1:
        TP += 1
        FN -= 1
    else:
        FP += 1
        TN -= 1
    
    TPR = TP / P
    FPR = FP / N
    P = TP / (TP + FP)
    R = TP / P
    tprs.append(TPR)
    fprs.append(FPR)
    ps.append(P)
    rs.append(R)

plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.plot(fprs, tprs, 'b', label='ROC curve')
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.title('ROC curve')
plt.legend()

plt.subplot(1,2,2)
plt.plot(rs, ps, 'g', label='PR curve')
plt.xlabel('R')
plt.ylabel('P')
plt.title('PR curve')
plt.legend()

plt.show()