python画pr曲线_在python中使用ROC曲线和PR曲线进行分类

最新推荐文章于 2023-10-23 21:44:42 发布

weixin_39716088

最新推荐文章于 2023-10-23 21:44:42 发布

阅读量709

点赞数

文章标签： python画pr曲线

基本概念

ROC: receiver operating characteristic curve

PRC: precision-recall curve

ROC曲线和Precision-Recall曲线是帮助解释分类（主要是binary）预测建模问题的概率预测的诊断工具。

ROC Curves summarize the trade-off between the true positive rate and false positive rate for a predictive model using different probability thresholds.

Precision-Recall curves summarize the trade-off between the true positive rate and the positive predictive value for a predictive model using different probability thresholds.

ROC curves ： observations are balanced between each class

Precision-recall curves: imbalanced datasets.

Predicting probability

In a classification problem, we may decide to predict the class values directly. Alternately, it can be more flexible to predict the probabilities for each class instead. Why? It can provide the capability to choose and even calibrate the threshold for how to interpret the predicted probabilities.

two types of errors when making a prediction for a binary/ two-class classification problem:

FP: predict an event when there was no event

FN: predict no event when there was an event

A common way to compare models that predict probabilities for two classes is to use a ROC curve.

ROC Curve

sensitvity: true positive rate = TP /(TP+FN)

false-positive rate = FP / (FP+TN) = 1- specificity

specificity = TN/ (FP+TN)

accuracy = (TP + TN) / (TP+TN+FP+FN)

在binary classification especially when we are interested in minioriry class, accuracy is not that useful.. .e.g in our case, 90% accuracy negative

precision = TP / (TP + FN) .

TP/golden set =P(Y =1/ Y^ = 1)

recall = TP / (TP + FP)

TP / retrieved set = P(Y^ =1 / Y=1)

presicion and recall are trade off.

if we want to cover more sample, then it's easier to make mistakes -> high recall -> low precision

if we have concerned model -> low recall -> high precision

AUC: area under the curve. Can be used as a summary of the model skill. AUC的概率意义是随机取一对正负样本，正样本得分大于负样本的概率.

ROC: x-axis: false positive rate, y-axis: true positive rate. aks false alarm rate vs hit rate.

Smaller values on the x-axis of the plot indicate lower false positives and higher true negatives.

Larger values on the y-axis of the plot indicate higher true positives and lower false negatives

when we predict a binary outcome, it is either a correct prediction (true positive) or not (false positive). There is a tension between these options, the same with a true negative and false negative.

A skilful model will assign a higher probability to a randomly chosen real positive occurrence than a negative occurrence on average. This is what we mean when we say that the model has skill. Generally, skilful models are represented by curves that bow up to the top left of the plot.

A no-skill classifier is one that cannot discriminate between the classes and would predict a random class or a constant class in all cases. A model with no skill is represented at the point (0.5, 0.5). A model with no skill at each threshold is represented by a diagonal line from the bottom left of the plot to the top right and has an AUC of 0.5.

A model with perfect skill is represented at a point (0,1). A model with perfect skill is represented by a line that travels from the bottom left of the plot to the top left and then across the top to the top right.

An operator may plot the ROC curve for the final model and choose a threshold that gives a desirable balance between the false positives and false negatives.

F1 score:

F1 = 2 * Recall * precision / (recall +precision)

control recall and precision.

recall -> risk -> sensitivity -> True positive rate 希望是1

precision -> cost -> specificity -> false positive rate 希望是0

weixin_39716088

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python画pr曲线_在python中使用ROC曲线和PR曲线进行分类

基本概念ROC: receiver operating characteristic curvePRC: precision-recall curveROC曲线和Precision-Recall曲线是帮助解释分类（主要是binary）预测建模问题的概率预测的诊断工具。ROC Curves summarize the trade-off between the true positive rate ...
复制链接

扫一扫