Week9_1Anomaly Detection

Week9_1Anomaly Detection

第 1 题

For which of the following problems would anomaly detection be a suitable algorithm?

  • Given an image of a face, determine whether or not it is the face of a particular famous individual.
  • Given a dataset of credit card transactions, identify unusual transactions to flag them as possibly fraudulent.
  • Given data from credit card transactions, classify each transaction according to type of purchase
    (for example: food, transportation, clothing).
  • From a large set of primary care patient records, identify individuals who might have unusual health conditions.

*     答案: 2 4 *
*   说明: anomaly detection 是通过高斯概率去判断的, 其中三西格玛准则:3 σ σ 0.9974,超过3 σ σ 认为是不正常. *
*   选项1: 通过人脸的照片,去判断这个脸是不是famous. 要判断必须得有所有的famous的人,不在这些famous的人里面就是非famous的人,即famous的人符合二项分布(只有0和1),不符合高斯分布. 不正确 *
*   选项2: 给出刷卡记录来判断哪些是不正常的交易. 刷卡记录符合正态分布. 正确 *
*   选项3: 给出刷卡记录来区分哪些交易用于食物 交通 及衣服.不能用于多分类. 不正确 *
*   选项4: 给出病人的信息,区分哪些人有不寻常的病. 正常的病的概率+不正常的病的概率=100%, 并且正常与不正常的病不是事先知道,是统计完定个标准,例如认为在3 σ σ 内是正常的,在3 σ σ 外是不正常的. 正确 *


第 2 题

Suppose you have trained an anomaly detection system that flags anomalies when p(x) p ( x ) is less than ε ε , and you find on the cross-validation set that it has too many false positives (flagging too many things as anomalies). What should you do?

  • Increase ε ε
  • Decrease ε ε

*     答案: 2 *
9-1
* 如上图所示, 阴影部分是正常的, 非阴影部分是不正常的,现在不正常的太多了,要增大阴影部分的面积怎么办? 不就是向左称动 ε ε 嘛,即减小 ε ε *


第 3 题

Suppose you are developing an anomaly detection system to catch manufacturing defects in airplane engines. You model uses
p(x)=nj=1p(xj;μj,σ2j). p ( x ) = ∏ j = 1 n p ( x j ; μ j , σ j 2 ) .
You have two features x1 x 1 = vibration intensity, and x2 x 2 = heat generated.
Both x1 x 1 and x2 x 2 take on values between 0 and 1 (and are strictly greater than 0),
and for most “normal” engines you expect that x1x2 x 1 ≈ x 2 . One of the suspected anomalies is that a flawed
engine may vibrate very intensely even without generating much heat (large x1 x 1 , small x2 x 2 ),
even though the particular values of x1 x 1 and x2 x 2 may not fall outside their typical ranges of values.
What additional feature x3 x 3 should you create to capture these types of anomalies:

  • x3=x21×x2 x 3 = x 1 2 × x 2
  • x3=x1x2 x 3 = x 1 x 2
  • x3=x1+x2 x 3 = x 1 + x 2
  • x3=x1×x2 x 3 = x 1 × x 2

*     答案: 2 *
**    用正态分布去判断一个飞机引擎是正常的还是不正常的,现在选取了两个特征:振动强度
和产生的发热量,这两个特征的概率都在0-1之间,并且没有0值;对于正常的引擎来说,振动强度与产生的发热量是正相关的;但对于不正常的引擎振动强度很大但是发热量越很小,虽然不正常但是引擎振动强度也没有超出正常范围
**
*    关键点是: 正常时震动小发热量小;异常时震动小发热量大,所以出异常时这个比例值将会非常大,所以选2 *
*    关键点是: 不正常状况下:引擎振动强度也没有超出正常范围,所以用加乘都看不出异常来. *


第 4 题

Which of the following are true? Check all that apply.

  • If you do not have any labeled data (or if all your data has label y=0 y = 0 ), then is is still possible to learn p(x) p ( x ) , but it may be harder to evaluate the system or choose a good value of ϵ ϵ .
  • If you have a large labeled training set with many positive examples and many negative examples, the anomaly detection algorithm will likely perform just as well as a supervised learning algorithm such as an SVM.
  • If you are developing an anomaly detection system, there is no way to make use of labeled data to improve your system.
  • When choosing features for an anomaly detection system, it is a good idea to look for features that take on unusually large or small values for (mainly the) anomalous examples.
    *     答案: 1 4 *
    * *

第 5 题

You have a 1-D dataset {x(1),,x(m)} { x ( 1 ) , … , x ( m ) } and you want to detect outliers in the dataset. You first plot the dataset and it looks like this:
9-2
Suppose you fit the gaussian distribution parameters μ1 μ 1 and σ21 σ 1 2 to this dataset. Which of the following values for μ1 μ 1 and σ21 σ 1 2 might you get?

  • μ1=3,σ21=4 μ 1 = − 3 , σ 1 2 = 4
  • μ1=6,σ21=4 μ 1 = − 6 , σ 1 2 = 4
  • μ1=3,σ21=2 μ 1 = − 3 , σ 1 2 = 2
  • μ1=6,σ21=2 μ 1 = − 6 , σ 1 2 = 2

*     答案: 1 *
* 从图中可以看出 μ1=3 μ 1 = − 3 因为在-3处概率密度最大,感觉图画得不标准,但概率密度总比-6处大,所以 μ1 μ 1 选-3 *
* 再看 σ1 σ 1 : 1个 σ σ 的概率0.6826,2个是0.9544,3个是0.9974, 图中点的分布是在[-9, 2]之间 *
* 3个是0.9974,, 当 σ1 σ 1 取2时正好包括所有的点 *
* 3个是0.9974,图中的分布是在[-9, 2]之间, 当 σ1 σ 1 2=1.414 2 = 1.414 时约在[-7.3, 1.3]之间则左右都会漏好几个点,则达不到0.9974的概率 *

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值