P-R曲线、ROC曲线、AUC及代价(CC)曲线


**

P-R曲线

**
解释:P为查准率(precision),表示真正例数占预测正例数的比例。
R为召回率(recall),表示真正例数占实际正例数的比例。
根据分类结果建立混淆矩阵,如下图所示分类混淆矩阵
进一步定义,查准率P=TP/(TP+FP),查全率R=TP/(TP+FN)

由西瓜书关于P-R曲线的描述:根据学习器的预测为正例的概率对样例进行排序,排在前边的样本更有可能为正例,然后按此顺序逐个将样本作为正例进行预测,计算出每次的查全率、查准率,最终作出P-R曲线图。

Code:

法一,可自行模拟上诉过程

import numpy as np
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt

#0-坏瓜 1-好瓜
Data=np.round(np.random.uniform(0,1,1000)).tolist()
Probability=np.random.uniform(0,1,1000).tolist()
西瓜=list(zip(Data, Probability))
西瓜=sorted(西瓜,key=lambda x:x[1],reverse=True)
P=[]
R=[]
TPR=[]
FPR=[]
FNR=[]
for 好瓜 in range(1,1000):
    TP = 0
    FP=0
    TN=0
    FN = 0
    for idx in range(0,好瓜):
        if 西瓜[idx][0] == 1:
            TP=TP+1
        else:
            FP=FP+1
    for idx in range(好瓜,1000):
        if 西瓜[idx][0]==0:
            TN = TN + 1
        else:
            FN=FN+1
    P.append(TP/(TP+FP) if TP+FP!=0 else TP/(TP+FP+1))
    R.append(TP/(TP+FN) if TP+FN!=0 else TP/(TP+FN+1))
    TPR.append(TP/(TP+FN) if TP+FN!=0 else TP/(TP+FN+1))
    FPR.append(FP/(TN+FP) if TN+FP!=0 else FP/(TN+FP+1))
    FNR.append(1-TPR[-1])
font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14)
plt.xlabel("查全率", fontproperties=font)
plt.ylabel("查准率", fontproperties=font)
plt.title('P-R曲线', fontproperties=font)
plt.plot(R,P)
f1=np.polyfit(R,P,4) #使用4次多项式拟合P-R曲线
predict_P=np.polyval(f1,R)
plt.plot(R,predict_P)
plt.show()

法二,使用scikit库函数直接求得P,R值

import numpy as np
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt
from sklearn.metrics import precision_recall_curve, roc_curve, auc

Data=np.round(np.random.uniform(0,1,1000)).tolist()
Probability=np.random.uniform(0,1,1000).tolist()
P,R, thresholds = precision_recall_curve(Data, Probability)
font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14)
plt.xlabel("查全率", fontproperties=font)
plt.ylabel("查准率", fontproperties=font)
plt.title('P-R曲线', fontproperties=font)
plt.plot(R,P)
f1=np.polyfit(R,P,4)
predict_P=np.polyval(f1,R)
plt.plot(R,predict_P)
plt.show()

**

ROC曲线与AUC

**
解释:中文全称为受试者工作特征,同P-R曲线相似,衡纵坐标涉及假真例率FPR和真正例率TPR。AUC为ROC曲线下的面积。
两者分别定义为FPR=TP/(TP+FN),FPR=FP/(TN+FP)。
绘制方法与P-R曲线类似,就此略过。AUC可通过scikit库函数直接求得或者利用以下公式求梯形和得到面积:
A U C = 1 2 ∑ 1 ≤ i ≤ m − 1 ( x i + 1 − x i ) ⋅ ( y i + 1 + y i ) AUC=\frac 1 2 \sum_{\mathclap{1\le i\le m-1}} (x_{i+1}-x_{i})·(y_{i+1}+y_{i}) AUC=211im1(xi+1xi)(yi+1+yi)

Code:

法一,可自行模拟上诉过程

import numpy as np
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt

#0-坏瓜 1-好瓜

Data=np.round(np.random.uniform(0,1,1000)).tolist()
Probability=np.random.uniform(0,1,1000).tolist()
西瓜=list(zip(Data, Probability))
西瓜=sorted(西瓜,key=lambda x:x[1],reverse=True)
P=[]
R=[]
TPR=[]
FPR=[]
FNR=[]
for 好瓜 in range(1,1000):
    TP = 0
    FP=0
    TN=0
    FN = 0
    for idx in range(0,好瓜):
        if 西瓜[idx][0] == 1:
            TP=TP+1
        else:
            FP=FP+1
    for idx in range(好瓜,1000):
        if 西瓜[idx][0]==0:
            TN = TN + 1
        else:
            FN=FN+1
    P.append(TP/(TP+FP) if TP+FP!=0 else TP/(TP+FP+1))
    R.append(TP/(TP+FN) if TP+FN!=0 else TP/(TP+FN+1))
    TPR.append(TP/(TP+FN) if TP+FN!=0 else TP/(TP+FN+1))
    FPR.append(FP/(TN+FP) if TN+FP!=0 else FP/(TN+FP+1))
    FNR.append(1-TPR[-1])
font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14)
plt.xlabel("假正例率", fontproperties=font)
plt.ylabel("真正例率", fontproperties=font)
plt.title('ROC曲线', fontproperties=font)
plt.plot(FPR,TPR)
f2=np.polyfit(FPR,TPR,4)
predict_TPR=np.polyval(f2,FPR)
plt.plot(FPR,predict_TPR)
plt.show()
AUC2=0
for idx in range(1,len(FPR)):
    AUC2=AUC2+(FPR[idx]-FPR[idx-1])*(TPR[idx]+TPR[idx-1])/2
print("公式计算的AUC=",AUC2)

法二,使用scikit库函数直接求得TPR,FPR值

import numpy as np
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt
from sklearn.metrics import precision_recall_curve, roc_curve, auc

Data=np.round(np.random.uniform(0,1,1000)).tolist()
Probability=np.random.uniform(0,1,1000).tolist()
P,R, thresholds = precision_recall_curve(Data, Probability)
font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14)
FPR,TPR, thresholds  =  roc_curve(Data, Probability)
plt.xlabel("假正例率", fontproperties=font)
plt.ylabel("真正例率", fontproperties=font)
plt.title('ROC曲线', fontproperties=font)
plt.plot(FPR,TPR)
f2=np.polyfit(FPR,TPR,4)
predict_TPR=np.polyval(f2,FPR)
plt.plot(FPR,predict_TPR)
plt.show()
AUC = auc(FPR,TPR)
print("scikit库计算的AUC=",AUC)

**

代价(CC)曲线

**
解释:对于第i类样本预测为第j类样本的代价cost(i,j),其中图中的横坐标为正例率代价,纵坐标为[0,1]范围内的归一化代价

根据分类结果建立混淆矩阵,如下图所示
二分类代价矩阵

对横纵坐标分别定义得到正例率代价P(+)cost=(p·cost(0,1))/(p·cost(0,1)+(1-p)·cost(1,0)),归一化代价cost(norm)=(FNR·p·cost(0,1)+FPR·(1-p)·cost(1,0))/(p·cost(0,1)+(1-p)·cost(1,0)),其中p为样例为正例的概率

可自行模拟上诉过程绘制代价曲线图,第一种方法是n条经 ( 0 , F P R i ) , ( 1 , F N R i ) (0,FPR_{i}),(1,FNR_{i}) (0,FPRi)(1,FNRi)两点的直线所交得的y值最小曲线,第二种方法是根据定义模拟拟合代价曲线,横轴为取值为[0,1]的正例率概率代价:
P ( + ) c o s t = p ⋅ c o s t 01 p ⋅ c o s t 01 + ( 1 − p ) ⋅ c o s t 10 P(+)cost=\frac {p·cost_{01}} {p·cost_{01}+(1-p)·cost_{10}} P(+)cost=pcost01+(1p)cost10pcost01
纵坐标为取值为[0,1]的归一化代价:
c o s t n o r m = F N R ⋅ P ( + ) c o s t + F P R ⋅ ( 1 − P ( + ) c o s t ) = F N R ⋅ p ⋅ c o s t 01 + F P R ⋅ ( 1 − p ) ⋅ c o s t 10 p ⋅ c o s t 01 + ( 1 − p ) ⋅ c o s t 10 cost_{norm}=FNR·P(+)cost+FPR·(1-P(+)cost)=\frac {FNR·p·cost_{01}+FPR·(1-p)·cost_{10}} {p·cost_{01}+(1-p)·cost_{10}} costnorm=FNRP(+)cost+FPR(1P(+)cost)=pcost01+(1p)cost10FNRpcost01+FPR(1p)cost10;代价曲线围成的面积同上诉AUC的求法,计算梯形和的面积表示曲线下的面积:
期 望 总 体 代 价 = 1 2 ∑ 1 ≤ i ≤ m − 1 ( x i + 1 − x i ) ⋅ ( y i + 1 + y i ) 期望总体代价=\frac 1 2 \sum_{\mathclap{1\le i\le m-1}} (x_{i+1}-x_{i})·(y_{i+1}+y_{i}) =211im1(xi+1xi)(yi+1+yi)
最终得到如下效果图:
在这里插入图片描述

Code:

import numpy as np
from matplotlib.font_manager import FontProperties
import matplotlib.pyplot as plt

#0-坏瓜 1-好瓜
Data=np.round(np.random.uniform(0,1,1000)).tolist()
Probability=np.random.uniform(0,1,1000).tolist()
西瓜=list(zip(Data, Probability))
西瓜=sorted(西瓜,key=lambda x:x[1],reverse=True)
P=[]
R=[]
TPR=[]
FPR=[]
FNR=[]
for 好瓜 in range(1,1000):
    TP = 0
    FP=0
    TN=0
    FN = 0
    for idx in range(0,好瓜):
        if 西瓜[idx][0] == 1:
            TP=TP+1
        else:
            FP=FP+1
    for idx in range(好瓜,1000):
        if 西瓜[idx][0]==0:
            TN = TN + 1
        else:
            FN=FN+1
    P.append(TP/(TP+FP) if TP+FP!=0 else TP/(TP+FP+1))
    R.append(TP/(TP+FN) if TP+FN!=0 else TP/(TP+FN+1))
    TPR.append(TP/(TP+FN) if TP+FN!=0 else TP/(TP+FN+1))
    FPR.append(FP/(TN+FP) if TN+FP!=0 else FP/(TN+FP+1))
    FNR.append(1-TPR[-1])
font = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=14)
plt.xlabel('正例率代价', fontproperties=font)
plt.ylabel('归一化代价', fontproperties=font)
plt.title('代价曲线与期望总体代价', fontproperties=font)
for idx in range(len(FNR)):
    k=(FNR[idx]-FPR[idx])/(1-0)
    b=FPR[idx]
    x=np.arange(0,1.1,0.1)
    y=k*x+b
    plt.plot(x,y,color='blue')
pCost=[]
costNorm=[]
for x in np.arange(0,1,0.01):
    pCost.append(x)
    ymin=1
    for idx in range(len(FNR)):
        ymin=min(ymin,FNR[idx]*x+FPR[idx]*(1-x))
    costNorm.append(ymin)
plt.plot(pCost,costNorm,color='red')
plt.show()
期望总体代价=0
for idx in range(1,len(pCost)):
    期望总体代价=期望总体代价+(pCost[idx]-pCost[idx-1])*(costNorm[idx]+costNorm[idx-1])/2
print('期望总体代价=',期望总体代价)

补充

上述过程使用随机算法进行分类,故所绘制的曲线与书本中的情形有些许差距。为检验算法实现的正确性,可设置分类器全部预测正确的情况,即当概率为0~0.5区间时预测为坏瓜否则为好瓜,判断最终的曲线是否为直角线段即可。

  • 3
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
下面是一个使用Python实现Accuracy类、F1度量类、P-R曲线类、ROC曲线类和AUC类的示例代码: ```python import matplotlib.pyplot as plt class Accuracy: def __init__(self, y_true, y_pred): self.y_true = y_true self.y_pred = y_pred def accuracy_score(self): correct = sum([1 for yt, yp in zip(self.y_true, self.y_pred) if yt == yp]) total = len(self.y_true) accuracy = correct / total return accuracy class F1Score: def __init__(self, y_true, y_pred): self.y_true = y_true self.y_pred = y_pred def precision_recall_f1(self): true_positives = sum([1 for yt, yp in zip(self.y_true, self.y_pred) if yt == 1 and yp == 1]) false_positives = sum([1 for yt, yp in zip(self.y_true, self.y_pred) if yt == 0 and yp == 1]) false_negatives = sum([1 for yt, yp in zip(self.y_true, self.y_pred) if yt == 1 and yp == 0]) precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0 recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0 f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0 return precision, recall, f1_score class PRCurve: def __init__(self, y_true, y_scores): self.y_true = y_true self.y_scores = y_scores def precision_recall_curve(self): thresholds = sorted(set(self.y_scores), reverse=True) precisions = [] recalls = [] for threshold in thresholds: y_pred = [1 if score >= threshold else 0 for score in self.y_scores] true_positives = sum([1 for yt, yp in zip(self.y_true, y_pred) if yt == 1 and yp == 1]) false_positives = sum([1 for yt, yp in zip(self.y_true, y_pred) if yt == 0 and yp == 1]) false_negatives = sum([1 for yt, yp in zip(self.y_true, y_pred) if yt == 1 and yp == 0]) precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0 recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0 precisions.append(precision) recalls.append(recall) return precisions, recalls class ROCCurve: def __init__(self, y_true, y_scores): self.y_true = y_true self.y_scores = y_scores def roc_curve(self): thresholds = sorted(set(self.y_scores), reverse=True) tpr_values = [] fpr_values = [] num_positive_cases = sum([1 for yt in self.y_true if yt == 1]) num_negative_cases = sum([1 for yt in self.y_true if yt == 0]) for threshold in thresholds: y_pred = [1 if score >= threshold else 0 for score in self.y_scores] true_positives = sum([1 for yt, yp in zip(self.y_true, y_pred) if yt == 1 and yp == 1]) false_positives = sum([1 for yt, yp in zip(self.y_true, y_pred) if yt == 0 and yp == 1]) tpr = true_positives / num_positive_cases if num_positive_cases > 0 else 0 fpr = false_positives / num_negative_cases if num_negative_cases > 0 else 0 tpr_values.append(tpr) fpr_values.append(fpr) return tpr_values, fpr_values class AUC: def __init__(self, tpr, fpr): self.tpr = tpr self.fpr = fpr def auc_score(self): auc = 0 for i in range(1, len(self.fpr)): auc += (self.fpr[i] - self.fpr[i-1]) * (self.tpr[i] + self.tpr[i-1]) / 2 return auc # 示例数据 y_true = [1, 0, 1, 1, 0, 0, 1] y_scores = [0.9, 0.6, 0.8, 0.7, 0.4, 0.3, 0.5] # 计算并输出准确率 accuracy = Accuracy(y_true, y_scores) acc = accuracy.accuracy_score() print("Accuracy:", acc) # 计算并输出精确率、召回率和F1度量 f1_score = F1Score(y_true, y_scores) precision, recall, f1 = f1_score.precision_recall_f1() print("Precision:", precision) print("Recall:", recall) print("F1 Score:", f1) # 计算并绘制P-R曲线 pr_curve = PRCurve(y_true, y_scores) precisions, recalls = pr_curve.precision_recall_curve() plt.plot(recalls, precisions) plt.xlabel('Recall') plt.ylabel('Precision') plt.title('P-R Curve') plt.show() # 计算并绘制ROC曲线 roc_curve = ROCCurve(y_true, y_scores) tpr_values, fpr_values = roc_curve.roc_curve() plt.plot(fpr_values, tpr_values) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('ROC Curve') plt.show() # 计算并输出AUC auc = AUC(tpr_values, fpr_values) auc_score = auc.auc_score() print("AUC Score:", auc_score) ``` 这段代码展示了如何实现Accuracy类、F1度量类、P-R曲线类、ROC曲线类和AUC类。你可以根据你的实际需求进行修改和优化。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值