Anomaly detection - Anomaly detection vs. supervised learning

本文探讨了在有标记数据的情况下,何时选择异常检测算法和何时选择监督学习方法。当正面样例非常少或者未来异常可能与历史样本显著不同时,推荐使用异常检测。而当拥有大量正负样本且未来异常可能与训练集相似时,应采用监督学习,如逻辑回归或神经网络。文章列举了异常检测与监督学习在欺诈检测和制造业等场景中的应用。
摘要由CSDN通过智能技术生成

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第十六章《异常检测》中第127课时《异常检测vs监督学习》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助.

————————————————

If we have labeled data we know which examples are anomalous and which examples are non-anomalous, when should we use supervised learning (logistic regression, neural network...) to try to learn directly from our labeled data to predict whether y=1 or y=0? And when should we use anomaly detection algorithm? Followings are the guidelines.

  • Choose anomaly detection when:
    • If you have very small number of positive examples (y=1, 0-20 maybe up to 50 is pretty typical).
      • It can be difficult for an algorithm to learn from the very small set of positive examples what the anomalies look like.
      • We'll save the positive examples just for cross validation and test set
      • We'll use the large number of negative/non-anomalous/normal examples to fit the the Gaussian parameters of the model p(x)=p(x_{1}; \mu _{1},\sigma _{1}^{2}),...,p(x_{n}; \mu _{n},\sigma _{n}^{2})
    • There are many different types of anomalies. Future anomalies may look nothing like the ones you've seen so far
      • It would be more promising to just model the negative examples with kind a Gaussian model p(x) rather than trying to model the positive examples because tomorrow's anomaly may be nothing like the ones you've seen so far
  • Choose supervised learning when:
    • If you have reasonably large number of both positive and negative examples.
      • There are enough positive examples for an algorithm to get a sense of what the positive examples look like.
    • The future positive examples are likely to be similar to ones in the training set

Followings are typical applications of anomaly detection & supervised learning:

  • If you have a very major online retainler, and if you actually have had a lot of people try to commit fraud on your website, so you have a lot of examples with y=1. Sometimes fraud detection could actually shift over to the supervised learning column.
  • For some manufacturing process, if you're manufacturing very large volumes and you've seen a lot of bad examples, maybe manufacturing could shift to the supervised learning column as well.

<end>

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值