说明:
“P”为查准率,也称准确率;“R”为查全率,也称召回率。
PR曲线是由模型的查准率和查全率为坐标轴形成的曲线,查准率P为纵坐标 查全率R为横坐标。
正例 | 反例 | |
正例 | TP(真正例) | FN(假反例) |
反例 | FP(假正例) | TN(真反例) |
上图为测试的混淆矩阵,表示一个数据集上的所有预测结果,
其中查准率P = TP /(TP+FP);查全率R = TP /(TP+FN)
代码实现:
包引入:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
生成测试数据:
# 产生两组 0到1之间的随机数
# 演示数据1
rand_1 = list(np.random.random(20))
# 生成标签和预测概率数据
test_1 = []
for i in range(20):
label = "P" if i < 10 else "N"
test_1.append({"value": rand_1[i], "label": label})
# 对概率进行排序
rand_1.sort(reverse=True)
# 演示数据2
rand_2 = list(np.random.random(20))
# 生成标签和预测概率数据
test_2 = []
for i in range(20):
label = "P" if i < 10 else "N"
test_2.append({"value": rand_2[i], "label": label})
# 对概率进行排序
rand_2.sort(reverse=True)
计算PR值:
# values 模型预测的所有样本为正的概率列表
# data 模型预测的数据与样本自身正确标签
def get_pr(values=[], datas=[]):
pr = []
for value in values:
counts = {"TP": 0, "FP": 0, "TN": 0, "FN": 0}
for data in datas:
predict_label = "P" if data["value"] >= value else "N"
if predict_label == "P" and data["label"] == "P":
counts["TP"] += 1
elif predict_label == "P" and data["label"] == "N":
counts["FP"] += 1
elif predict_label == "N" and data["label"] == "N":
counts["TN"] += 1
elif predict_label == "N" and data["label"] == "P":
counts["FN"] += 1
# 计算查准率
p = round(counts["TP"]/(counts["TP"]+counts["FP"]), 2)
# 计算查全率
r = round(counts["TP"]/(counts["TP"]+counts["FN"]), 2)
pr.append({"p": p, "r": r})
return pr
组合数据,用于绘制PR图:
pr_1 = get_pr(rand_1, test_1)
pr_2 = get_pr(rand_2, test_2)
# 生成展示数据
data_show = []
for pr in pr_1:
data_show.append({'p': pr['p'], 'r': pr['r'], 'model': 'model_1'})
for pr in pr_2:
data_show.append({'p': pr['p'], 'r': pr['r'], 'model': 'model_2'})
for pr in range(20):
value = (1.0/20)*pr
data_show.append({'p': value, 'r': value, 'model': 'BEP'})
data_show = pd.DataFrame(data_show)
#绘制图表
sns.relplot(x="r", y="p", ci=None, hue='model', kind="line", data=data_show)