摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第十六章《异常检测》中第127课时《异常检测vs监督学习》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助.
————————————————
If we have labeled data we know which examples are anomalous and which examples are non-anomalous, when should we use supervised learning (logistic regression, neural network...) to try to learn directly from our labeled data to predict whether or ? And when should we use anomaly detection algorithm? Followings are the guidelines.
- Choose anomaly detection when:
- If you have very small number of positive examples (, 0-20 maybe up to 50 is pretty typical).
- It can be difficult for an algorithm to learn from the very small set of positive examples what the anomalies look like.
- We'll save the positive examples just for cross validation and test set
- We'll use the large number of negative/non-anomalous/normal examples to fit the the Gaussian parameters of the model
- There are many different types of anomalies. Future anomalies may look nothing like the ones you've seen so far
- It would be more promising to just model the negative examples with kind a Gaussian model rather than trying to model the positive examples because tomorrow's anomaly may be nothing like the ones you've seen so far
- If you have very small number of positive examples (, 0-20 maybe up to 50 is pretty typical).
- Choose supervised learning when:
- If you have reasonably large number of both positive and negative examples.
- There are enough positive examples for an algorithm to get a sense of what the positive examples look like.
- The future positive examples are likely to be similar to ones in the training set
- If you have reasonably large number of both positive and negative examples.
Followings are typical applications of anomaly detection & supervised learning:
- If you have a very major online retainler, and if you actually have had a lot of people try to commit fraud on your website, so you have a lot of examples with . Sometimes fraud detection could actually shift over to the supervised learning column.
- For some manufacturing process, if you're manufacturing very large volumes and you've seen a lot of bad examples, maybe manufacturing could shift to the supervised learning column as well.
<end>