吴恩达·Machine Learning || chap15 Anomaly detection简记

15 Anomaly detection

15-1 Problem motivation

Anomaly detection example

Aircraft engine features:

x 1 x_1 x1=heat generated

x 2 x_2 x2=vibration intensity

⋯ \cdots

Dataset: { x ( 1 ) , x ( 2 ) , ⋯   , x ( m ) } \{ x ^ { ( 1 ) } , x ^ { ( 2 ) } , \cdots , x ^ { ( m ) } \} {x(1),x(2),,x(m)}

New engine : x t e s t x_{test} xtest

Density estimation

Dataset: { x ( 1 ) , x ( 2 ) , ⋯   , x ( m ) } \{ x ^ { ( 1 ) } , x ^ { ( 2 ) } , \cdots , x ^ { ( m ) } \} {x(1),x(2),,x(m)}
Is x t e s t a n o m a l o u s ? x_{test}\quad anomalous? xtestanomalous?


Example

Fraud detection:
x ( i ) x^{(i)} x(i)= features of user i’s activities
Model p(x) from data
Identify unusual users by checking which have p ( x ) < ϵ p(x)<\epsilon p(x)<ϵ

Manufacturing

Monitoring computers in a data center

x ( i ) x^{(i)} x(i)= features of machine i
x 1 x_1 x1 =memory use , x 2 x_2 x2=number of disk accesses/sec
x 3 x_3 x3 =CPU load , x 4 x_4 x4 =CPU load/network traffic

15-2 Gaussian distribution

Gaussian (Normal) distribution

​ Say x ∈ R x\in\mathbb{R} xR. If x x x is a distribution Gaussian with mean μ \mu μ,variance σ 2 \sigma^{2} σ2

x ∼ N ( μ , σ 2 ) x\sim N(\mu,\sigma^2) xN(μ,σ2)

p ( x ; μ , σ 2 ) = 1 2 π σ e ( − ( x − μ ) 2 2 σ 2 ) p(x;\mu,\sigma^2)=\frac{1}{\sqrt{2\pi}\sigma}e^{(-\frac{(x-\mu)^2}{2\sigma^2})} p(x;μ,σ2)=2π σ1e(2σ2(xμ)2)

​ σ larger,image wider

Parameter estimation

Dataset: { x ( 1 ) , x ( 2 ) , ⋯   , x ( m ) } \{ x ^ { ( 1 ) } , x ^ { ( 2 ) } , \cdots , x ^ { ( m ) } \} {x(1),x(2),,x(m)} x ( i ) ∈ R x^{(i)}\in \mathbb{R} x(i)R

15-3 Algorithm

Density estimation

Training set: x ( 1 ) , ⋯   , x ( m ) x^{(1)},\cdots,x^{(m)} x(1),,x(m)

Each example is x ∈ R n x\in \mathbb{R}^n xRn

Anomaly detection algorithm

  1. Choose features x i x_i xi that you think might be indicative of
    anomalous examples.

  2. Fit parameters

    μ 1 , ⋯   , μ n , σ 1 2 , ⋯   , σ n 2 \mu_1,\cdots,\mu_n,\sigma_1^2,\cdots,\sigma_n^2 μ1,,μn,σ12,,σn2

    μ j = 1 m ∑ i = 1 m x j ( i ) \mu _ { j } = \frac { 1 } { m } \sum _ { i = 1 } ^ { m } x _ { j } ^ { ( i ) } μj=m1i=1mxj(i)

    σ j 2 = 1 m ∑ i = 1 m ( x i ( i ) − μ j ) 2 \sigma _ { j } ^ { 2 } = \frac { 1 } { m } \sum _ { i = 1 } ^ { m } ( x _ { i } ^ { ( i ) } - \mu _ { j } ) ^ { 2 } σj2=m1i=1m(xi(i)μj)2

  3. Given new example x, compute p(x):

p ( x ) = ∏ j = 1 n p ( x j ; u j , σ j 2 ) = ∏ j = 1 n 1 2 π σ j e x p ( − ( x j − u j ) 2 2 σ j ) p ( x ) = \prod _ { j = 1 } ^ { n } p ( x _ { j } ; u _ { j } , \sigma _ { j } ^ { 2 } ) = \prod _ { j=1 } ^ { n } \frac { 1 } { \sqrt { 2 \pi } \sigma _ { j } } e x p ( - \frac { ( x _ { j } - u _ { j } ) ^ { 2 } } { 2 \sigma _ { j } } ) p(x)=j=1np(xj;uj,σj2)=j=1n2π σj1exp(2σj(xjuj)2)

Anomaly if p ( x ) < ϵ p(x)<\epsilon p(x)<ϵ

Anomaly detection example
在这里插入图片描述

15-4 Developing and evaluating an anomaly detection system

The importance of real-number evaluation

When developing a learning algorithm(choosing features, etc. ), making decisions is much easier if we have a way of evaluating our learning algorithm

Assume we have some labeled data of anomalous and non anomalous examples. (y=0 if normal, y=1 if anomalous

Training set: x ( 1 ) , x ( 2 ) , ⋯   , x ( m ) x^{(1)},x^{(2)},\cdots,x^{(m)} x(1),x(2),,x(m)(assume normal examples/not anomalous)

Cross validation set : ( x c v ( 1 ) , y c v ( 1 ) ) , ⋯   , ( x c v ( m c v ) , y c v ( m c v ) ) ( x_{cv} ^ { ( 1 ) } , y_{cv} ^ { ( 1 ) } ) , \cdots , ( x_{cv} ^ { ( m _ { c v } ) } , y_{cv} ^ { ( m _ { c v } ) } ) (xcv(1),ycv(1)),,(xcv(mcv),ycv(mcv))

Test set : ( x t e s t ( 1 ) , y t e s t ( 1 ) ) , ⋯   , ( x t e s t ( m t e s t ) , y t e s t ( m t e s t ) ) ( x_{test} ^ { ( 1 ) } , y_{test} ^ { ( 1 ) } ) , \cdots , ( x_{test} ^ { ( m _ { test } ) } , y_{test} ^ { ( m _ { test } ) } ) (xtest(1),ytest(1)),,(xtest(mtest),ytest(mtest))

Aircraft engines motivation example

10000 good (normal) engines

20 flawed engines (anomalous)

Training set : 6000 good engines

CV: 2000 good engines(y=0),10 anomalous (y=1)

Test: 2000 good engines(y=0),10 anomalous(y=1)

or Alternative

Algorithm evaluation

Fit model p ( x ) p(x) p(x) on training set { x ( 1 ) , ⋯   , x ( m ) } \{x^{(1)},\cdots,x^{(m)}\} {x(1),,x(m)}

On a cross validation/test example predict
y = { 1 i f    p ( x ) < ϵ    ( a n o m a l y ) 0 i f    p ( x ) ≥ ϵ    ( n o r m a l ) y=\begin{cases}1\quad if\;p(x)<\epsilon\;(anomaly)\\0\quad if\;p(x)\ge\epsilon\;(normal) \end{cases} y={1ifp(x)<ϵ(anomaly)0ifp(x)ϵ(normal)
Possible evaluation metrics:

  • True positive, false positive, false negative, true negative
  • Precision/Recall
  • F 1 F_1 F1 -score

Can also use cross validation set to choose parameter ϵ \epsilon ϵ

15-5 Anomaly detection vs. supervised learning

在这里插入图片描述
在这里插入图片描述

15-6 Choosing what features to use

Non-gaussian features

make the data look a bit more Gaussian

Error analysis for anomaly detection

Want p ( x ) p(x) p(x) large for normal examples x x x
p ( x ) p(x) p(x) small for anomalous examples a
Most common problem:
p ( x ) p(x) p(x) is comparable (say, both large) for normal
and anomalous examples

**Monitoring computers in a data center **

Choose features that might take on unusually large or small values in the event of an anomaly
x 1 x_1 x1= memory use of computer
x 1 x_1 x1=number of disk accesses/sec
x 1 x_1 x1=CPU load
x 1 x_1 x1=network traffic

15-7 Multivariate Gaussian distribution

Motivating example: Monitoring machines in a data center

Multivariate Gaussian (Normal) distribution

x ∈ R n x\in\mathbb{R}^n xRn.Don’t model p ( x 1 ) , p ( x 2 ) , ⋯   , e t c . p(x_1),p(x_2),\cdots,etc. p(x1),p(x2),,etc.separately.
Model p ( x ) p(x) p(x) all in one go.
Parameters: μ ∈ R n , Σ ∈ R n × n    ( c o v a r i a n c e    m a t r i x ) \mu\in\mathbb{R}^n,\Sigma\in\mathbb{R}^{n\times n}\;(covariance\;matrix) μRn,ΣRn×n(covariancematrix)

在这里插入图片描述

Multivariate Gaussian (Normal) examples

在这里插入图片描述

15-8 Anomaly detection using the multivariate Gaussian distribution

Multivariate Gaussian (Normal)distribution

Parameters μ , Σ \mu,\Sigma μ,Σ

p ( x ; μ , Σ ) = 1 ( 2 π ) n 2 ∣ Σ 1 2 ∣ e ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p ( x ; \mu , \Sigma ) = \frac { 1 } { ( 2 \pi ) ^ { \frac { n } { 2 } } | \Sigma ^ { \frac { 1 } { 2 } } |}e^{(- \frac { 1 } { 2 } ( x - \mu ) ^ { T } \Sigma ^ { - 1 } ( x - \mu ))} p(x;μ,Σ)=(2π)2nΣ211e(21(xμ)TΣ1(xμ))

Parameter fitting:

Given training set { x ( 1 ) , x ( 2 ) , ⋯   , x ( m ) } \{ x ^ { ( 1 ) } , x ^ { ( 2 ) } , \cdots , x ^ { ( m ) } \} {x(1),x(2),,x(m)}

u = 1 m ∑ i = 1 m x ( i ) u = \frac { 1 } { m } \sum _ { i = 1 } ^ { m } x ^ { ( i ) } u=m1i=1mx(i)

Σ = 1 m ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T \Sigma=\frac{1}{m} \sum _ { i = 1 } ^ { m } ( x ^ { ( i ) } - \mu ) ( x ^ { ( i ) } - \mu ) ^ { T } Σ=m1i=1m(x(i)μ)(x(i)μ)T

Anomaly detection with the multivariate Gaussian

  1. Fit model p ( x ) p(x) p(x) by setting

    u = 1 m ∑ i = 1 m x ( i ) u = \frac { 1 } { m } \sum _ { i = 1 } ^ { m } x ^ { ( i ) } u=m1i=1mx(i)

    Σ = 1 m ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T \Sigma=\frac{1}{m} \sum _ { i = 1 } ^ { m } ( x ^ { ( i ) } - \mu ) ( x ^ { ( i ) } - \mu ) ^ { T } Σ=m1i=1m(x(i)μ)(x(i)μ)T

  2. Given a new example x,compute

    p ( x ; μ , Σ ) = 1 ( 2 π ) n 2 ∣ Σ 1 2 ∣ e ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p ( x ; \mu , \Sigma ) = \frac { 1 } { ( 2 \pi ) ^ { \frac { n } { 2 } } | \Sigma ^ { \frac { 1 } { 2 } } |}e^{(- \frac { 1 } { 2 } ( x - \mu ) ^ { T } \Sigma ^ { - 1 } ( x - \mu ))} p(x;μ,Σ)=(2π)2nΣ211e(21(xμ)TΣ1(xμ))
    Flag an anomaly if p ( x ) < ϵ p(x)<\epsilon p(x)<ϵ

Relationship to original model

Original model: p ( x ) = p ( x 1 ; μ 1 , σ 1 2 ) × p ( x 2 ; μ 2 , σ 2 2 ) × ⋯ × p ( x n ; μ n , σ n 2 ) p ( x ) = p ( x _ { 1 } ; \mu _ { 1 } , \sigma _ { 1 } ^ { 2 } ) \times p ( x _ { 2 } ; \mu _ { 2 } , \sigma _ { 2 } ^ { 2 } ) \times \cdots \times p ( x _ { n } ; \mu _ { n } , \sigma_n^2 ) p(x)=p(x1;μ1,σ12)×p(x2;μ2,σ22)××p(xn;μn,σn2)

Corresponds to multivariate Gaussian

p ( x ; μ , Σ ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 2 e x p ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p ( x ; \mu , \Sigma ) = \frac { 1 } { ( 2 \pi ) ^ { \frac { n } { 2 } } | \Sigma|^{ \frac { 1 } { 2 } } } e x p ( - \frac { 1 } { 2 } ( x - \mu ) ^ { T } \Sigma ^ { - 1 } ( x - \mu ) ) p(x;μ,Σ)=(2π)2nΣ211exp(21(xμ)TΣ1(xμ))

where
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值