DS Wannabe之5-AM Project: Measuring accuracy of classification model DS 30day int prep day24 (第一季完结)

最新推荐文章于 2024-07-25 18:53:56 发布

wendyponcho

最新推荐文章于 2024-07-25 18:53:56 发布

阅读量950

点赞数 21

本文链接：https://blog.csdn.net/wendyponcho/article/details/136223703

版权

Machine Learning 同时被 3 个专栏收录

18 篇文章 0 订阅

订阅专栏

Data Science

15 篇文章 0 订阅

订阅专栏

NLP

7 篇文章 0 订阅

订阅专栏

How to fix the accuracy problem? Defining different types of errors and how to measure them

False positives and false negatives: Which one is worse?

False positive: a healthy person who is incorrectly diagnosed as sick
False negative: a sick person who is incorrectly diagnosed as healthy
True positive: a sick person who is diagnosed as sick
True negative: a healthy person who is diagnosed as healthy

ANALYZING FALSE POSITIVES AND NEGATIVES IN THE CORONAVIRUS MODEL

in the coronavirus model, a false negative is much worse than a false positive

在冠状病毒模型中分析假阳性和假阴性

让我们停下来思考一下。在冠状病毒模型中，哪种错误听起来更糟糕：假阳性还是假阴性？换句话说，哪个更糟糕：将一个健康的患者错误地诊断为患病，还是将一个患病的患者错误地诊断为健康？假设当我们将一个患者诊断为健康时，我们会让他们回家休息而不给予治疗，当我们将一个患者诊断为患病时，我们会让他们接受更多的检测。错误地诊断一个健康人可能只是一种小困扰，因为这意味着一个健康的人将不得不接受额外的检查。然而，错误地诊断一个患病的人意味着一个患病的人将得不到治疗，他们的病情可能会恶化，并且可能会传染给其他许多人。因此，在冠状病毒模型中，假阴性远比假阳性更糟糕。

ANALYZING FALSE POSITIVES AND NEGATIVES IN THE SPAM EMAIL MODEL

在垃圾邮件模型中分析假阳性和假阴性

现在我们将对垃圾邮件模型进行同样的分析。在这种情况下，假设如果我们的垃圾邮件分类器将一封电子邮件分类为垃圾邮件，那么该邮件将被自动删除。如果将其分类为正常邮件，则该邮件将被发送到我们的收件箱。哪种错误听起来更糟糕：假阳性还是假阴性？换句话说，哪个更糟糕，错误地将一封正常邮件分类为垃圾邮件并删除它，还是错误地将一封垃圾邮件分类为正常邮件并将其发送到收件箱？我想我们可以同意，删除一封好的邮件要比将一封垃圾邮件发送到收件箱更糟糕。偶尔在我们的收件箱收到垃圾邮件可能会很烦人，但删除一封正常邮件可能会是一场完全的灾难！想象一下，如果我们的祖母给我们发了一封非常友善的电子邮件告诉我们她烤了饼干，而我们的过滤器将其删除，那么我们会感到多么的悲伤！因此，在垃圾邮件模型中，假阳性远比假阴性更糟糕。

Storing the correctly and incorrectly classified points in a table:

The confusion matrix

Table 7.2 The confusion matrix of our coronavirus model helps us dig into our model and tell the two types of errors apart. This model makes 10 false negative errors (a sick person diagnosed healthy) and zero false positive errors (a healthy person diagnosed sick). Notice that the model creates too many false negatives, which are the worst type of error in this case, which implies that this model is not very good.

The recall of this model is the number of true positives (eight sick people correctly diagnosed) divided by the total number of positives (10 sick people), which is 8/10 = 0.8, or 80%. In terms of recall, the second model is much better. Let’s summarize these calculations for clarity as follows:

Coronavirus Model 1:

True positives (sick patients diagnosed sick and sent for more tests) = 0

False negatives (sick patients diagnosed healthy and sent home) = 10

Recall = 0/10 = 0%

Coronavirus Model 2:

True positives (sick patients diagnosed sick and sent for more tests) = 8

False negatives (sick patients diagnosed healthy and sent home) = 2

Recall = 8/10 = 80%

Models like the coronavirus model, in which false negatives are much more expensive than false positives, are high recall models.

Precision: Among the examples we classified as positive, how many did we correctly classify?

wendyponcho

关注

21
点赞
踩
21

收藏

觉得还不错? 一键收藏
0
评论
DS Wannabe之5-AM Project: Measuring accuracy of classification model DS 30day int prep day24 (第一季完结)

想象一下，如果我们的祖母给我们发了一封非常友善的电子邮件告诉我们她烤了饼干，而我们的过滤器将其删除，那么我们会感到多么的悲伤！因此，在垃圾邮件模型中，假阳性远比假阴性更糟糕。换句话说，哪个更糟糕：将一个健康的患者错误地诊断为患病，还是将一个患病的患者错误地诊断为健康？假设当我们将一个患者诊断为健康时，我们会让他们回家休息而不给予治疗，当我们将一个患者诊断为患病时，我们会让他们接受更多的检测。然而，错误地诊断一个患病的人意味着一个患病的人将得不到治疗，他们的病情可能会恶化，并且可能会传染给其他许多人。
复制链接

扫一扫

专栏目录