AI-014: 吴恩达教授(Andrew Ng)的机器学习课程学习笔记49

本文是学习Andrew Ng的机器学习系列教程的学习笔记。教学视频地址:

https://study.163.com/course/introduction.htm?courseId=1004570029#/courseDetail?tab=1

49. Machine learning system design: prioritizing what to work on: spam classification example

以建立垃圾邮件过滤系统为例,首先建立分类器:

选择高频词汇作为特征。

如何降低分类器的错误率,举例:

  • 收集大量数据
  • 使用从邮件路由信息(比如发件人、标题)中提取的复杂特征,比如空标题、@saler.com等
  • 使用从邮件内容中提取的复杂特征,比如由降价、促销等词汇
  • 识别错误拼写

50. Machine Learning system design: Error analysis

方法论:

 

错误分析:

看看各种情况的分布,占比大的情况可以改进算法进行识别,尝试各种新的方法(更多数据、更多特征...),然后看看引起误差的主要原因;

算法最好能够返回量化的检验结果,比如返回错误率,这样根据引入不同的特征或方法(比如是否使用提取词干)获得的错误率来决定如何做更好:

如果引入词干提取的错误率更小,就采用引入词干分析的算法;

51. Machine learning system design: Error metric for skewed classes

skewed classes 偏斜类

accuracy 精确度

Precision 查准率

Recall 召回率

查准率和召回率越高越好;

if a classify is getting high precision and high recall then we are actually confident that the algorithm has to be doing well, even if we have very skewed classes.

So for the problem of skewed classes, precision and recall gives us more direct insight into how the learning algorithm is doing, and this is often a much better way to evaluate our learning algorithms than looking at classification error(类误) or classification accuracy(准确率) when the classes are very skewed.

51. Machine learing system design: Trading off precision and recall

threshold

被查出来的很少,但是一旦查出来,就可以确定->高查准率,低召回率。比如垃圾邮件,你可不希望错过正常邮件;

被查出来的很多,但是查出来的有很多是误判->低查准率,高召回率。比如预测癌症,保持怀疑态度:)

use F function to compute if the precision and recall is ok.

52. Machine learning system design: data for machine learning

In such condition, the size of training set will advance the algorithm.

in this case, large training set can get good result and no need to discuss using which algorithms.

 

key test:

first, can a human experts look at the features x and confidently predict the value of y.

second, can we actually get a large training set and training the learning algorithm with a lot of parameters in the training set.

If you can do the both, you often can get a very good algorithm.

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

铭记北宸

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值