AI-014: 吴恩达教授（Andrew Ng）的机器学习课程学习笔记49

最新推荐文章于 2019-03-05 08:43:41 发布

铭记北宸

最新推荐文章于 2019-03-05 08:43:41 发布

阅读量183

点赞数

分类专栏： AI 人工智能之路

本文链接：https://blog.csdn.net/hanjingjava/article/details/83449003

版权

AI 同时被 2 个专栏收录

45 篇文章 1 订阅

订阅专栏

人工智能之路

36 篇文章 1 订阅

订阅专栏

本文是学习Andrew Ng的机器学习系列教程的学习笔记。教学视频地址：

https://study.163.com/course/introduction.htm?courseId=1004570029#/courseDetail?tab=1

49. Machine learning system design: prioritizing what to work on: spam classification example

以建立垃圾邮件过滤系统为例，首先建立分类器：

选择高频词汇作为特征。

如何降低分类器的错误率，举例：

收集大量数据
使用从邮件路由信息（比如发件人、标题）中提取的复杂特征，比如空标题、@saler.com等
使用从邮件内容中提取的复杂特征，比如由降价、促销等词汇
识别错误拼写

50. Machine Learning system design: Error analysis

方法论：

错误分析：

看看各种情况的分布，占比大的情况可以改进算法进行识别，尝试各种新的方法（更多数据、更多特征...），然后看看引起误差的主要原因；

算法最好能够返回量化的检验结果，比如返回错误率，这样根据引入不同的特征或方法（比如是否使用提取词干）获得的错误率来决定如何做更好：

如果引入词干提取的错误率更小，就采用引入词干分析的算法；

51. Machine learning system design: Error metric for skewed classes

skewed classes 偏斜类

accuracy 精确度

Precision 查准率

Recall 召回率

查准率和召回率越高越好；

if a classify is getting high precision and high recall then we are actually confident that the algorithm has to be doing well, even if we have very skewed classes.

So for the problem of skewed classes, precision and recall gives us more direct insight into how the learning algorithm is doing, and this is often a much better way to evaluate our learning algorithms than looking at classification error(分类误差) or classification accuracy(分类准确率) when the classes are very skewed.

51. Machine learing system design: Trading off precision and recall

threshold 临界值

被查出来的很少，但是一旦查出来，就可以确定->高查准率，低召回率。比如垃圾邮件，你可不希望错过正常邮件；

被查出来的很多，但是查出来的有很多是误判->低查准率，高召回率。比如预测癌症，保持怀疑态度：）

use F function to compute if the precision and recall is ok.

52. Machine learning system design: data for machine learning

In such condition, the size of training set will advance the algorithm.

in this case, large training set can get good result and no need to discuss using which algorithms.

key test:

first, can a human experts look at the features x and confidently predict the value of y.

second, can we actually get a large training set and training the learning algorithm with a lot of parameters in the training set.

If you can do the both, you often can get a very good algorithm.

铭记北宸

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
AI-014: 吴恩达教授（Andrew Ng）的机器学习课程学习笔记49

本文是学习Andrew Ng的机器学习系列教程的学习笔记。教学视频地址：https://study.163.com/course/introduction.htm?courseId=1004570029#/courseDetail?tab=149. Machine learning system design: prioritizing what to work on: spam clas...
复制链接

扫一扫