Stanford ML - Lecture 7 - Machine learning system design

1. Prioritizing what to work on: Spam classification example

  1. collect lots of data
  2. developed sophisticated features based on email routing information
  3. developed sophisticated features for message body
  4. developed sophisticated algorithm to detect misspellings

2. Error analysis

  • recommended approach
    • Start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data.
    • Plot learning curves to decide if more data, more features, etc. are likely to help.
    • Error analysis:  Manually examine the examples (in cross validation set) that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on.

3. Error metrics for skewed classes

  • Precision/Recall



4. Trading off precision and recall

  • how to compare precision/recall numbers
    • F_1 score (F score)

5. Data for machine learning

  • It's not who has the best algorithm that wins. It's who has the most data.

什么是Skewed Classes呢?一个分类问题,如果结果仅有两类y=0和y=1,而且其中一类样本非常多,另一类非常少,我们称这种分类问题中的类为Skewed Classes.

考虑一个二分问题,即将实例分成正类(positive)或负类(negative)。对一个二分问题来说,会出现四种情况。如果一个实例是正类并且也被预测成正类,即为真正类(True positive),如果实例是负类被预测成正类,称之为假正类(False positive)。相应地,如果实例是负类被预测成负类,称之为真负类(True negative),正类被预测成负类则为假负类(false negative)。

  • TP:正确肯定的数目;
  • FN:漏报,没有正确找到的匹配的数目;
  • FP:误报,给出的匹配是不正确的;
  • TN:正确拒绝的非匹配对数;

From: http://blog.csdn.net/abcjennifer/article/details/7834256

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值