做了好几遍 都没有做到全对 下面来写一下我对这章课后习题的理解,算是加深一点印象吧。
1.
You are working on a spam classification system using regularized logistic regression. "Spam" is a positive class (y = 1) and "not spam" is the negative class (y = 0). You have trained your classifier and there are m = 1000 examples in the cross-validation set. The chart of predicted class vs. actual class is:
Actual Class: 1 | Actual Class: 0 | |
Predicted Class: 1 | 85 | 890 |
Predicted Class: 0 | 15 | 10 |
For reference:
- Accuracy = (true positives + true negatives) / (total examples)
- Precision = (true positives) / (true positives + false positives)
- Recall = (true positives) / (true positives + false negatives)
- F1 score = (2 * precision * recall) / (precision + recall)
What is the classifier's recall (as a value from 0 to 1)?
这边要求计算预测结果的召回率recall
根据公式 recall = True Positive / (True Positive + True Negative) = 85/ (85+15)=0.85
如果是计算 precision = True Positive / (True Positive + False Positive) = 85/(85+890)=0.09
2.
Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true.
Which are the two?
We train a learning algorithm with a
large number of parameters (that is able to
learn/represent fairly complex functions).
We train a learning algorithm with a
small number of parameters (that is thus unlikely to
overfit).