Stanford ML - Lecture 7 - Machine learning system design

最新推荐文章于 2024-02-25 14:11:51 发布

Quebradawill

最新推荐文章于 2024-02-25 14:11:51 发布

阅读量1.3k

点赞数

分类专栏： ML-Stanford-Andrew Ng Machine Learning

本文链接：https://blog.csdn.net/qiudw/article/details/8684788

版权

Machine Learning 同时被 2 个专栏收录

19 篇文章 0 订阅

订阅专栏

ML-Stanford-Andrew Ng

12 篇文章 0 订阅

订阅专栏

1. Prioritizing what to work on: Spam classification example

collect lots of data
developed sophisticated features based on email routing information
developed sophisticated features for message body
developed sophisticated algorithm to detect misspellings

2. Error analysis

recommended approach
- Start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data.
- Plot learning curves to decide if more data, more features, etc. are likely to help.
- Error analysis: Manually examine the examples (in cross validation set) that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on.

3. Error metrics for skewed classes

Precision/Recall

$\textrm{Precision} = \frac{\textrm{no. of true positives}}{\textrm{no. of predicted positives}} = \frac{\textrm{true positives}}{\textrm{true positives} + \textrm{false positives}}$

$\textrm{Recall} = \frac{\textrm{no. of true positives}}{\textrm{no. of actual positives}} = \frac{\textrm{true positives}}{\textrm{true positives} + \textrm{false negatives}}$

4. Trading off precision and recall

how to compare precision/recall numbers
- F_1 score (F score)

$2 \frac{PR}{P+R}$

5. Data for machine learning

It's not who has the best algorithm that wins. It's who has the most data.

什么是Skewed Classes呢？一个分类问题，如果结果仅有两类y=0和y=1，而且其中一类样本非常多，另一类非常少，我们称这种分类问题中的类为Skewed Classes.

考虑一个二分问题，即将实例分成正类（positive）或负类（negative）。对一个二分问题来说，会出现四种情况。如果一个实例是正类并且也被预测成正类，即为真正类（True positive），如果实例是负类被预测成正类，称之为假正类（False positive）。相应地，如果实例是负类被预测成负类，称之为真负类（True negative），正类被预测成负类则为假负类（false negative）。

TP：正确肯定的数目；
FN：漏报，没有正确找到的匹配的数目；
FP：误报，给出的匹配是不正确的；
TN：正确拒绝的非匹配对数；

From: http://blog.csdn.net/abcjennifer/article/details/7834256

Quebradawill

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Stanford ML - Lecture 7 - Machine learning system design

1. Prioritizing what to work on: Spam classification examplecollect lots of datadeveloped sophisticated features based on email routing informationdeveloped sophisticated features for message bo
复制链接

扫一扫

专栏目录