第一章.Classification -- 07.ROC Curves翻译

最新推荐文章于 2023-05-11 12:08:10 发布

Stella__Lee

最新推荐文章于 2023-05-11 12:08:10 发布

阅读量260

点赞数

分类专栏： Artificial Intelligence

Artificial Intelligence 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

Well let’s talk about ROC curves.

I say the word ROC probably at least once every day.

So ROC curves started during World War 2 for analyzing radar signals.

And the question that they answer is: for a particular False Positive Rate,

what is the corresponding True Positive Rate.

The false positive rate is the number of negatives that are classified by the algorithm as being positive, then that divided by the total number of negatives.

And the true positive is the total number of positives that were classified by the machine learning algorithm as positive, divided by the total number of positives.

And so, we want to have a, you know, for each positive false positive rate,

we want to have a corresponding true positive rate.

So let’s say that our classifier has a decision boundary which is right here.

And the positives are over on this side and the negatives are over here.

Then the true positive rate uses the number of positives classified as positive,

so one, two, three, four, five, six, seven, and then divided by the total number of positives,

which is seven on this side, eight, nine, ten, eleven – so seven out of eleven.

And the false positive rate uses the number of negatives classified as positive –

so one, two three, divided by the total number of negatives - so four, five, six, seven, eight, nine, ten, eleven, twelve – three out of twelve.

So what if I chose the decision boundary to be a little bit different?

So what if I had chosen f(x) = 3 to be my decision boundary?

Then my boundary might be over here, like that.

Then what happened to the true positive rate and false positive rate?

Well now we only have three out of our eleven positives classified correctly,

and we only have three of two negatives now, that are classified as positives, out of the twelve.

And let’s put the decision boundary over here. And again,

you can get values for the true positive rate and false positive rate. Ok, this game is kind of fun.

So now, what if you sweep this thing down from top to bottom?

Just take the thing and just go like that and just sweep it all the way down,

and every time you move this line, you record the true positive rate and the false positive rate.

And then you plot all of the true positive rates and false positive rates on a scatter plot;

and that is an ROC curve. So as you swing that line across, as you sweep the thing down,

the true positive rate grows because we have more positives that are classified as positives,

but also the number of false positives grow too,

because now we have more negatives being classified as positives.

So the curve ends up looking something like that.

Now, the quality measure people use is the area under the curve – the AUC, the area under curve,

or area under the ROC curve – AUROC

– so that is the quality measure that we often use for predictive models.

So that’s why I say every day. We're always evaluating the quality of our method.

Now, that line is random guessing.

So if you’re not better than that line, you might want to turn your classifier upside down,

so that it predicts the opposite of what it’s currently doing,

because it’s worse than random guessing;

which means it could be better if you flip it.

So hopefully you get a curve like I showed you on the previous slide.

Most of the time, you don’t get perfect; perfect would be just straight up and then over.

Most of thetime you won’t get that. Most prediction problems are not that easy. So you’ll get

something actually kind of similar to what I showed you on the previous slide.

Alright, so just to summarize,

we talked about a lot of evaluation measures for machine learning models.

You can summarize what the model does by looking at the confusion matrix,

looking at the accuracy or the misclassification error,

or you can look at the precision recall in F1- score if you’re doing information retrieval,

and then you can look at ROC curves and the area under the ROC.

Ok, so these are all the ways that we enumerated in the data science course and more.

我们来谈谈ROC曲线。

我每天至少说一次ROC这个词。

所以ROC曲线在第二次世界大战期间开始分析雷达信号。

他们回答的问题是:对于一个特定的假阳性率，

什么是对应的真正速率。

假阳性率是被算法分类为正的负数的数量，然后除以负的总数量。

而真正正的是，被机器学习算法分类的阳性结果总数除以阳性的总数。

所以，我们想要有一个，你知道，对于每一个正的假阳性率，

我们想要有一个对应的真正速率。

假设我们的分类器有一个决策边界在这里。

正电荷在这边，负极在这里。

然后真正的正确率使用阳性的阳性数，

一，二，三，四，五，六，七，然后除以正数的总数，

这一边是7,8 9 10 11，所以是11的7。

而假阳性率则使用被分类为正数的负数的数量。

一，二，三，除以负的总数，四，五，六，七，八，九，十，十一，十二，三，十二。

那么，如果我选择了不同的决策边界呢?

那么如果我选择f(x) = 3作为我的决策边界呢?

然后边界可能在这里，像这样。

那么真正的阳性率和假阳性率是怎样的呢?

现在我们只有3个正确分类，

现在我们只有两个底片中的三个，它们被归类为阳性，从12个。

我们把决策边界放在这里。再一次,

你可以得到正确的正确率和假阳性率。这个游戏很有趣。

那么现在，如果你把这个从上到下扫掉呢?

把它拿出来，像这样，把它扫到下面，

每次你移动这条线，你就记录了正确率和假阳性率。

然后在散点图上画出所有的正负率和假阳性率;

这是ROC曲线。当你把这条线穿过，当你把它扫下来的时候，

真正的阳性率增长是因为我们有更多的积极因素被归类为积极因素，

但假阳性的数量也增加了，

因为现在我们有更多的负面因素被归类为积极因素。

所以曲线是这样的。

现在，人们使用的质量测量是曲线下的面积- AUC，曲线下的面积，

或ROC曲线下的面积- AUROC。

这就是我们经常用于预测模型的质量标准。

所以我每天都这么说。我们总是在评估我们方法的质量。

这条线是随机猜测。

如果你不比这行更好，你可能想把你的分类器颠倒过来，

所以它预测与当前的情况相反，

因为它比随机猜测更糟糕;

这意味着如果你翻转它会更好。

希望你们能得到像我在上一张幻灯片中展示的曲线。

大多数时候，你不会完美;完美会是竖直向上，然后结束。

大部分时间你都不会明白。大多数预测问题并不那么容易。所以你会得到

有点类似于我之前展示的。

好的，总结一下，

我们讨论了很多机器学习模型的评估方法。

你可以通过迷惑矩阵来总结模型的作用，

看一下准确性或误分类错误，

或者你可以看一下F1的精确回忆如果你在做信息检索，

然后你可以看到ROC曲线和ROC下的面积。

这些都是我们在数据科学课程中列举的所有方法。

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第一章.Classification -- 07.ROC Curves翻译

Well let’s talk about ROC curves.I say the word ROC probably at least once every day.So ROC curves started during World War 2 for analyzing radar signals.And the question that they answer is: for a pa...
复制链接

扫一扫

专栏目录

Stella__Lee CSDN认证博客专家 CSDN认证企业博客

码龄8年

30: 原创

73万+: 周排名

157万+: 总排名

26万+: 访问

: 等级

1837: 积分

142: 粉丝

82: 获赞

13: 评论

241: 收藏

私信

关注

热门文章

分类专栏

最新评论

关于PyCharm导入自己写的module时报错：No Module Named XXX的解决方案
不问不想不知道: 这个管用！
理解云计算三种服务模式——IaaS、PaaS和SaaS
无糖～: 基础设施及服务和OpenStac什么关系
理解云计算三种服务模式——IaaS、PaaS和SaaS
Jhon_63: 云服务器需要满足哪些需求呢？构建华为云服务器的平台，就要去考虑现有物理集中的IT资源集群，并以此为基础实现虚拟化，建立起面向未来的需求，面向发展的动态计算资源分配管理和服务自动化的平台，而这样的云平台是需要支持强大的延展性和可扩充性，从而帮助用户以最小的成本获取高度伸缩、高可用的计算资源。因此呢，云计算平台是需要有资源池为其提供能力输出的，这种能力包括计算能力、存储能力和网络能力，为了将这些能力调度到其所需要的地方，云计算平台还需要对能力进行调度管理，部署动态迁移、负载均衡策略，这些能力均是由虚拟化资源池提供的。同时，还需要搭配云监控、云防护等措施保证整个云平台的安全和稳定。
理解云计算三种服务模式——IaaS、PaaS和SaaS
mason-1984: 专业解释
理解云计算三种服务模式——IaaS、PaaS和SaaS
AJR_LY: 近几天一直对虚拟机归属于那个模式或者每个模式下服务商应该提供的服务类型纠结，你这篇文章很好的解释了我的疑问，THX

大家在看

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。