Coursera 机器学习 Week6 System Design 课后习题

最新推荐文章于 2020-01-14 23:59:55 发布

坠入苦海销尘垢

最新推荐文章于 2020-01-14 23:59:55 发布

阅读量1.6k

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/yeya24/article/details/78945585

版权

本文介绍了Coursera机器学习课程第六周关于系统设计的课后习题解答，涉及垃圾邮件分类、模型性能评估、阈值调整、偏斜数据集处理等知识点。通过对习题的解析，加深对机器学习中召回率、精度、阈值影响等概念的理解。

摘要由CSDN通过智能技术生成

做了好几遍都没有做到全对下面来写一下我对这章课后习题的理解，算是加深一点印象吧。

You are working on a spam classification system using regularized logistic regression. "Spam" is a positive class (y = 1) and "not spam" is the negative class (y = 0). You have trained your classifier and there are m = 1000 examples in the cross-validation set. The chart of predicted class vs. actual class is:

	Actual Class: 1	Actual Class: 0
Predicted Class: 1	85	890
Predicted Class: 0	15	10

For reference:

Accuracy = (true positives + true negatives) / (total examples)
Precision = (true positives) / (true positives + false positives)
Recall = (true positives) / (true positives + false negatives)
F1 score = (2 * precision * recall) / (precision + recall)

What is the classifier's recall (as a value from 0 to 1)?

这边要求计算预测结果的召回率recall

根据公式 recall = True Positive / (True Positive + True Negative) = 85/ (85+15)=0.85

如果是计算 precision = True Positive / (True Positive + False Positive) = 85/(85+890)=0.09

Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true.

Which are the two?

We train a learning algorithm with a

large number of parameters (that is able to

learn/represent fairly complex functions).

We train a learning algorithm with a

small number of parameters (that is thus unlikely to

overfit).