Adversarially Learned Anomaly Detection
研究动机(主要解决的问题)
1、developing effective methods for complex and high-dimensional data remains a challenge
对复杂的高维的数据难处理
2、The need to solve an optimization problem for every test example makes this method impractical on large datasets or for real-time applications
优点:effective, but also efficient at test time.
框架方法
Loss & Anomaly Score
loss
V
(
D
x
z
,
D
x
x
,
D
z
z
,
E
,
G
)
=
V
(
D
x
z
,
E
,
G
)
+
V
(
D
x
x
,
E
,
G
)
+
V
(
D
z
z
,
E
,
G
)
\begin{array}{l}{V\left(D_{x z}, D_{x x}, D_{z z}, E, G\right) = \quad V\left(D_{x z}, E, G\right)+V\left(D_{x x}, E, G\right)+V\left(D_{z z}, E, G\right)}\end{array}
V(Dxz,Dxx,Dzz,E,G)=V(Dxz,E,G)+V(Dxx,E,G)+V(Dzz,E,G)
Anomaly Score
A
(
x
)
=
∥
f
x
x
(
x
,
x
)
−
f
x
x
(
x
,
G
(
E
(
x
)
)
)
∥
1
A(x)=\left\|f_{x x}(x, x)-f_{x x}(x, G(E(x)))\right\|_{1}
A(x)=∥fxx(x,x)−fxx(x,G(E(x)))∥1
A(x) 表示D的置信度,样本是都被很好的encoder或者reconstructed by generator。值越大表示越异常。
实验
数据集:
- KDDCup99
- Arrhythmia
参数设置:
KDDCup99 :20%的异常
Arrhythmia :15%的异常
use 80% of the whole official dataset for training and keep the remaining 20% as our test set.
We further remove 25% from the training set for a validation set and discard anomalous samples from both training and validation sets (thus setting up a novelty detection task).
评价方法:
Precision, Recall, F1 score
baselines:
-
One Class Support Vector Machines (OC-SVM)
Support vector method for novelty detection 1999
-
Isolation Forests (IF)
Isolation forest 2008
-
Deep Structured Energy Based Models (DSEBM)
Deep structured energy based models for anomaly detection 2016
-
Deep Autoencoding Gaussian Mixture Model (DAGMM)
Deep autoencoding gaussian mixture model for unsupervised anomaly detection 2018
-
AnoGAN
Unsupervised anomaly detection with generative adversarial networks to guide marker discovery 2017
实验结果
总结
我们提出了一种基于GAN的异常检测方法ALAD,它在训练期间从数据空间到潜在空间学习编码器,使得它在测试时比单独发布的GAN方法更有效。 此外,我们还采用了额外的鉴别器来改进编码器,以及已经发现可以稳定GAN训练的频谱归一化。