build a model for the probability of x—p(x)
a small threshold—
ϵ
Gaussian(Normal) Distribution
x~N(
μ,σ2
)
p(x;
μ,σ2
)
Parameter estimation
if
x(i)
~
N(μ,σ2)
then
μ=1m∑mi=1x(i)
σ2=1m∑mi=1(x(i)−μ)2
Density Estimation
p(x)=∏nj=1p(xj;μj,σ2j)
Anomaly Detection Algorithm
Evaluation
It doesn’t matter if there are actually some anomalous ones in training set.
The alternative one is not recommended.
We can choose ϵ ,features and so on by examming F1 score.
Non-gaussian features
Try to do some transformation to our x until it looks more like Gaussian.
Choose Feature
Create Features
Multivariate Gaussian Distribution
Problem:
To solve it:
Then we got:
So here’s algorithm for Anomaly detection with multivariate Gaussian
The difference between the two models
This new model, using a multivariate Gaussian distribution, corresponds exactly to the old model, if the covariance matrix sigma, has only 0 elements off the diagonals, and in pictures that corresponds to having Gaussian distributions, where the contours of this distribution function are axis aligned. So you aren’t allowed to model the correlations between the diffrent features.
So in that sense the original model is actually a special case of this multivariate Gaussian model.