Anomaly detection - Choosing what features to use

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第十六章《异常检测》中第128课时《选择要使用的features》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助.

————————————————

It turns out when applying anomaly detection, one of the things that has huge effect on how well it does is what features you use.

  • Choose features that might take on either very large or very small values

Use example of monitoring the computers in a data center. We might have thousands or tens of thousands of machines. We want to know whether some computers are doing something strange. Followings are features we might choose:

x_{1=}memory use of computer

x_{2}=number of disk accesses/sec

x_{3}=CPU load

x_{4}=network traffic

Suppose there are a bunch of web servers and if one of my servers is serving a lot of users, we'll have very hight CPU load and very high network traffic. One failure case could be one of my servers code gets stuck in some infinite loop, so the CPU load grows but the network traffic doesn't. To detect that type of anomaly, I might define another x_{5}=\frac{CPU\, \, \, load}{network\, \, \, traffic} and/or  x_{6}=\frac{(CPU\, \, \, load)^{2}}{network\, \, \, traffic}. Both of them could help capture anomalies where one of your machines has a very high CPU load, but doesn't have a commensurately large network traffic.

  • Play with data transformation so that it is more Gaussian

In anomaly detection, one of the things is to model the feature x_{i} with p(x_{i}; \mu _{i}, \sigma _{i}^{2}). We often need to plot the histogram of the feature to make sure it actually looks vaguely Gaussian before feeding it to our anomaly detection algorithm.

For example, if the histogram of feature x_{1} is a very asymmetric distribution which has a peak way off to one side, what often need to do is play with different transformations of the data in order to make it look like more Gaussian. We might take a log transformation of the data log(x_{1}), and its histogram may look much more like the classic bell shaped curve that we can fit with some mean \mu_{1} and variance parameter \sigma_{1}^{2}. Rather than just a log transform, some other things we can do. Let's say we have a different feature x_{2}, maybe we can replace that with log(x_{2}+1). Or more generally with log(x_{2}+c). This constant c could be something that we can play with to try to make it look as Gaussian as possible. For a different feature x_{3} maybe we can replace it with \sqrt{x_{3}}=x_{3}^{\frac{1}{2}}. And this \frac{1}{2} is another example of a parameter that we can play with. Or for another feature x_{4} that can be replaced with x_{4}^{\frac{1}{3}}.

In the video, one example is lively demoed based on data of feature x which has dimension of 1000. Its histogram looks like figure-1 by default with 10 bins. Figure-2 is histogram with 50 bins for fine grid. These doesn't look like Gaussian. So, if taking the squared root of the data (hist(x.\wedge {0.5}, 50)), its histogram looks like figure-3. We can play with different parameter 0.2, 0.1, the corresponding histogram looks like figure-4, figure-5. If using 0.05, it looks like pretty Gaussian as figure-6. Then I can define xNew=x.\wedge 0.05 and feed into my anomaly detection algorithm. Of course, there's other ways you can use like hist(log(x), 50) which is also very Gaussian.

One note is that the anomaly algorithm will usually work ok even if you don't perform such transformation, but if you use these tranformations to make the data more Gaussian, it might work better.

  • Come up with features via error analysis procedure

We would train a complete algorithm. Run the algorithm on the cross validation set and look at the examples it gets wrong. And see if we can come up with extra features to help the algorithm do better on the examples that it got wrong in the cross validataion set. For example, above blue line is the Gaussian that fit to my feature x_{1}. Let's say we have an anomalous example shown as green cross. It's buried in the middle of a bunch of normal examples. We're hoping that p(x) will be large for normal examples and be small for the anomalous examples. But the common problem would be if p(x) is comparable, maybe both are large, for both the normal and anomalous examples. And the algorithm fails to flag this example as anomalous. Then we would look at the training examples and look at what went wrong with that particular example (like aricraft engine), and see if it could inspire us to come up with a new feature x_{2} that helps distinguish this bad example compared to the rest of my red examples.
If I managed to do so, and if re-plot my data, all my training set examples are those red crosses above. Hopefully, I find that the feature x_{2} of the anomalous example takes on the very unusual value x_{2}=3.5. If I model my data now, I find that my anomaly detection algorithm gives high probability to the data in the central regions, and lower probability to that green cross anomalous example.

<end>

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值