python画pr曲线_在python中使用ROC曲线和PR曲线进行分类

基本概念

ROC: receiver operating characteristic curve

PRC: precision-recall curve

ROC曲线和Precision-Recall曲线是帮助解释分类(主要是binary)预测建模问题的概率预测的诊断工具。

ROC Curves summarize the trade-off between the true positive rate and false positive rate for a predictive model using different probability thresholds.

Precision-Recall curves summarize the trade-off between the true positive rate and the positive predictive value for a predictive model using different probability thresholds.

ROC curves : observations are balanced between each class

Precision-recall curves: imbalanced datasets.

Predicting probability

In a classification problem, we may decide to predict the class values directly. Alternately, it can be more flexible to predict the probabilities for each class instead. Why? It can provide the capability to choose and even calibrate the threshold for how to interpret the predicted probabilities.

two types of errors when making a prediction for a binary/ two-class classification problem:

FP: predict an event when there was no event

FN: predict no event when there was an event

A common way to compare models that predict probabilities for two classes is to use a ROC curve.

ROC Curve

sensitvity: true positive rate = TP /(TP+FN)

false-positive rate = FP / (FP+TN) = 1- specificity

specificity = TN/ (FP+TN)

accuracy = (TP + TN) / (TP+TN+FP+FN)

在binary classification especially when we are interested in minioriry class, accuracy is not that useful.. .e.g in our case, 90% accuracy negative

precision = TP / (TP + FN) .

TP/golden set =P(Y =1/ Y^ = 1)

recall = TP / (TP + FP)

TP / retrieved set = P(Y^ =1 / Y=1)

presicion and recall are trade off.

if we want to cover more sample, then it's easier to make mistakes -> high recall -> low precision

if we have concerned model -> low recall -> high precision

AUC: area under the curve. Can be used as a summary of the model skill. AUC的概率意义是随机取一对正负样本,正样本得分大于负样本的概率.

ROC: x-axis: false positive rate, y-axis: true positive rate. aks false alarm rate vs hit rate.

Smaller values on the x-axis of the plot indicate lower false positives and higher true negatives.

Larger values on the y-axis of the plot indicate higher true positives and lower false negatives

when we predict a binary outcome, it is either a correct prediction (true positive) or not (false positive). There is a tension between these options, the same with a true negative and false negative.

A skilful model will assign a higher probability to a randomly chosen real positive occurrence than a negative occurrence on average. This is what we mean when we say that the model has skill. Generally, skilful models are represented by curves that bow up to the top left of the plot.

A no-skill classifier is one that cannot discriminate between the classes and would predict a random class or a constant class in all cases. A model with no skill is represented at the point (0.5, 0.5). A model with no skill at each threshold is represented by a diagonal line from the bottom left of the plot to the top right and has an AUC of 0.5.

A model with perfect skill is represented at a point (0,1). A model with perfect skill is represented by a line that travels from the bottom left of the plot to the top left and then across the top to the top right.

An operator may plot the ROC curve for the final model and choose a threshold that gives a desirable balance between the false positives and false negatives.

F1 score:

F1 = 2 * Recall * precision / (recall +precision)

control recall and precision.

recall -> risk -> sensitivity -> True positive rate 希望是1

precision -> cost -> specificity -> false positive rate 希望是0

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值