(ch11~13) Deep Learning for Anomaly Detection: A Survey

最新推荐文章于 2021-09-08 23:38:38 发布

ac同学

最新推荐文章于 2021-09-08 23:38:38 发布

阅读量548

点赞数

分类专栏： papers 文章标签： python 网络算法神经网络

原文链接：https://www.researchgate.net/publication/330357393_Deep_Learning_for_Anomaly_Detection_A_Survey

版权

papers 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Deep Learning for Anomaly Detection: A Survey
https://www.researchgate.net/publication/330357393_Deep_Learning_for_Anomaly_Detection_A_Survey

参看前文：

ch8:https://blog.csdn.net/qq_40305043/article/details/106310729

ch9:https://blog.csdn.net/qq_40305043/article/details/106317103

ch10:https://blog.csdn.net/qq_40305043/article/details/106345152

11 Deep neural network architectures for locating anomalies 用于定位异常的深度神经网络架构
11.1 Deep Neural Networks (DNN) 深度神经网络
11.2 Spatio Temporal Networks (STN) 时空网络
11.3 Sum-Product Networks (SPN) 和积网络
11.4 Word2vec Models Word2vec模型
11.5 Generative Models 生成模型
11.6 Convolutional Neural Networks 卷积神经网络
11.7 Sequence Models 序列模型
11.8 Autoencoders 自动编码器
12 Relative Strengths and Weakness : Deep Anomaly Detection Methods 相对强弱：深度异常检测方法
13 Conclusion

11 Deep neural network architectures for locating anomalies 用于定位异常的深度神经网络架构

11.1 Deep Neural Networks (DNN) 深度神经网络

The ”deep” in ”deep neural networks” refers to the number of layers through which the features of data are extracted (Schmidhuber [2015], Bengio et al. [2009]). Deep architectures overcome the limitations of traditional machine learning approaches of scalability, and generalization to new variations within data (LeCun et al. [2015]) and the need for manual feature engineering. Deep Belief Networks (DBNs) are class of deep neural network which comprises multiple layers of graphical models known as Restricted Boltzmann Machine (RBMs). The hypothesis in using DBNs for anomaly detection is that RBMs are used as a directed encoder-decoder network with backpropagation algorithm (Werbos [1990]). DBNs fail to capture the characteristic variations of anomalous samples, resulting in high reconstruction error. DBNs are shown to scale efficiently to big-data and improve interpretability (Wulsin et al. [2010]).
“深度神经网络”中的“深度”是指通过其提取数据特征的层数（Schmidhuber [2015]，Bengio等人[2009]）。深度架构克服了传统机器学习方法可扩展性的局限性，并克服了对数据内部新变化的泛化（LeCun等人[2015]）和手动特征工程的需求。深度信念网络（DBN）是一类深度神经网络，它包含多层图形模型，称为受限玻尔兹曼机（RBM）。使用DBN进行异常检测的假设是，RBM被用作带有反向传播算法的定向编码器-解码器网络（Werbos [1990]）。 DBN无法捕获异常样本的特征变化，从而导致较高的重建误差。事实证明，DBN可以有效地扩展到大数据并提高可解释性（Wulsin等人[2010]）。

11.2 Spatio Temporal Networks (STN) 时空网络

Researchers for long have explored techniques to learn both spatial and temporal relation features (Zhang et al. [2018f]). Deep learning architectures is leveraged to perform well at learning spatial aspects ( using CNN’s) and temporal features ( using LSTMs) individually. Spatio Temporal Networks (STNs) comprises of deep neural architectures combining both CNN’s and LSTMs to extract spatiotemporal features. The temporal features (modeling correlations between near time points via LSTM), spatial features (modeling local spatial correlation via local CNN’s) are shown to be effective in detecting outliers (Lee et al. [2018], SZEKER [2014], Nie et al. [2018], Dereszynski and Dietterich ´ [2011]).

长期以来，研究人员已经探索了学习空间和时间关系特征的技术（Zhang等人[2018f]）。利用深度学习架构，可以分别在学习空间方面（使用CNN）和时间特征（使用LSTM）方面表现出色。时空网络(STNs)由结合CNN和LSTMs的深层神经结构组成，以提取时空特征。时间特征(通过LSTM对近时间点之间的相关性进行建模)和空间特征(通过局部CNN对局部空间相关性进行建模)被证明在检测异常值方面是有效的(Lee et al. [2018]， SZEKER [2014]， Nie et al. [2018]， Dereszynski和Dietterich[2011])。

11.3 Sum-Product Networks (SPN) 和积网络

Sum-Product Networks (SPNs) are directed acyclic graphs with variables as leaves, and the internal nodes, and weighted edges constitute the sums and products. SPNs are considered as a combination of mixture models which have fast exact probabilistic inference over many layers (Poon and Domingos [2011], Peharz et al. [2018]). The main advantage of SPNs is that, unlike graphical models, SPNs are more traceable over high treewidth models without requiring approximate inference. Furthermore, SPNs are shown to capture uncertainty over their inputs in a convincing manner, yielding robust anomaly detection (Peharz et al. [2018]). SPNs are shown to be impressive results on numerous datasets, while much remains to be further explored in relation to outlier detection.

Sum-Product Networks（SPN）是有向无环图，变量为叶，内部节点和加权边构成了和与积。 SPN被认为是混合模型的组合，这些模型在多个层上具有快速准确的概率推断（Poon和Domingos [2011]，Peharz等人[2018]）。 SPN的主要优点是，与图形模型不同，SPN在高树宽模型上更具可追溯性，而无需近似推断。此外，已显示SPN以令人信服的方式捕获其输入中的不确定性，从而产生可靠的异常检测（Peharz等人[2018]）。 SPN在许多数据集上均显示出令人印象深刻的结果，而离群值检测还有许多地方需要进一步探索。

11.4 Word2vec Models Word2vec模型

Word2vec is a group of deep neural network models used to produce word embeddings (Mikolov et al. [2013]). These models are capable of capturing sequential relationships within data instance such as sentences, time sequence data. Obtaining word embedding features as inputs are shown to improve the performance in several deep learning architectures (Rezaeinia et al. [2017], Naili et al. [2017], Altszyler et al. [2016]). Anomaly detection models leveraging the word2vec embeddings are shown to significantly improve performance (Schnabel et al. [2015], Bertero et al. [2017], Bakarov et al. [2018], Bamler and Mandt [2017]).

Word2vec是一组用于生成单词嵌入的深度神经网络模型(Mikolov等人[2013])。这些模型能够捕获数据实例(如句子、时间序列数据)中的顺序关系。在一些深度学习架构中，获得单词嵌入特性作为输入可以提高性能(Rezaeinia等[2017]，Naili等[2017]，Altszyler等[2016])。利用word2vec嵌入的异常检测模型被证明可以显著提高性能(Schnabel等[2015]、Bertero等[2017]、Bakarov等[2018]、Bamler和Mandt等[2017])。

11.5 Generative Models 生成模型

Generative models aim to learn exact data distribution in order to generate new data points with some variations. The two most common and efficient generative approaches are Variational Autoencoders (VAE) (Kingma and Welling [2013]) and Generative Adversarial Networks (GAN) (Goodfellow et al. [2014a,b]). A variant of GAN architecture known as Adversarial autoencoders (AAE) ( Makhzani et al. [2015]) that use adversarial training to impose an arbitrary prior on the latent code learned within hidden layers of autoencoder are also shown to learn the input distribution effectively. Leveraging this ability of learning input distributions, several Generative Adversarial Networks-based Anomaly Detection (GAN-AD) frameworks (Li et al. [2018], Deecke et al. [2018], Schlegl et al. [2017], Ravanbakhsh et al. [2017b], Eide [2018]) proposed are shown to be effective in identifying anomalies on high dimensional and complex datasets. However traditional methods such as K-nearest neighbors (KNN) are shown to perform better in scenarios which have a lesser number of anomalies when compared to deep generative models (Skvara et al. [2018]).

生成模型的目的是学习精确的数据分布，以生成具有一些变化的新数据点。最常见和最有效的生成方法是变分自动编码器(VAE) (Kingma和Welling[2013])和生成对抗网络(GAN) (Goodfellow et al. [2014a,b])。GAN架构的一种变体被称为Adversarial autoencoders (AAE) (Makhzani等人[2015])，它使用Adversarial训练对在autoencoder的隐藏层中学习到的潜在代码施加任意先验，也显示出它可以有效地学习输入分布。利用这种学习输入分布的能力，一些提出的基于生成对抗网络的异常检测(GAN-AD)框架(Li et al. [2018]， Deecke et al. [2018]， Schlegl et al. [2017]， Ravanbakhsh et al. [2017b]， Eide[2018])被证明可以有效地识别高维复杂数据集上的异常。然而，与深度生成模型相比，传统方法（例如K近邻（KNN））在异常数量较少的情况下表现出更好的性能（Skvara等人[2018]）。

11.6 Convolutional Neural Networks 卷积神经网络

Convolutional Neural Networks (CNN), are the popular choice of neural networks for analyzing visual imagery (Krizhevsky et al. [2012]). CNN’s ability to extract complex hidden features from high dimensional data with complex structure has enabled its use as feature extractors in outlier detection for both sequential and image dataset (Gorokhov et al. [2017], Kim [2014]). Evaluation of CNN’s based frameworks for anomaly detection is currently still an active area of research (Kwon et al. [2018]).

卷积神经网络（CNN）是用于分析视觉图像常用的神经网络（Krizhevsky等人[2012]）。 CNN能够从具有复杂结构的高维数据中提取复杂的隐藏特征，使其能够在序列和图像数据集的异常检测中用作特征提取器（Gorokhov等人[2017]，Kim [2014]）。目前，基于CNN的异常检测框架的评估仍然是研究的活跃领域（Kwon等人[2018]）。

11.7 Sequence Models 序列模型

Recurrent Neural Networks (RNNs) (Williams [1989]) are shown to capture features of time sequence data. The limitations with RNNs is that they fail to capture the context as time steps increases, in order to resolve this problem, Long Short-Term Memory (Hochreiter and Schmidhuber [1997]) networks were introduced, they are a particular type of RNNs comprising of a memory cell that can store information about previous time steps. Gated Recurrent Unit (Cho et al. [2014]) (GRU) are similar to LSTMs, but use a set of gates to control the flow of information, instead of separate memory cells. Anomaly detection in sequential data has attracted significant interest in the literature due to its applications in a wide range of engineering problems illustrated in Section 9.9. Long Short Term Memory (LSTM) neural network based algorithms for anomaly detection have been investigated and reported to produce significant performance gains over conventional methods (Ergen et al. [2017]).

递归神经网络（RNN）（Williams [1989]）可以捕获时序数据的特征。 RNN的局限性在于它们无法随着时间步长的增长而捕获上下文，为了解决此问题，引入了长短期记忆（Hochreiter和Schmidhuber [1997]）网络，它们是一种特殊类型的RNN，包括：可以存储有关先前时间步长信息的存储单元。门控循环单元（Cho等[2014]）（GRU）与LSTM相似，但使用一组门来控制信息流，而不是使用单独的存储单元。序列数据中的异常检测由于其在9.9节中说明的各种工程问题中的应用而引起了人们的极大兴趣。已经研究并报道了基于长期短期记忆（LSTM）神经网络的异常检测算法，该算法比常规方法具有显著的性能提升（Ergen等人[2017]）。

11.8 Autoencoders 自动编码器

Autoencoders with single layer along with a linear activation function are nearly equivalent to Principal Component Analysis (PCA) (Pearson [1901]). While PCA is restricted to a linear dimensionality reduction, auto encoders enable both linear or nonlinear tranformations (Liou et al. [2008, 2014]). One of the popular applications of Autoencoders is anomaly detection. Autoencoders are also referenced by the name Replicator Neural Networks (RNN) (Hawkins et al. [2002], Williams et al. [2002]). Autoencoders represent data within multiple hidden layers by reconstructing the input data, effectively learning an identity function. The autoencoders, when trained solely on normal data instances ( which are the majority in anomaly detection tasks), fail to reconstruct the anomalous data samples, therefore, producing a large reconstruction error. The data samples which produce high residual errors are considered outliers. Several variants of autoencoder architectures are proposed as illustrated in Figure 13 produce promising results in anomaly detection. The choice of autoencoder architecture depends on the nature of data, convolution networks are preferred for image datasets while Long short-term memory (LSTM) based models tend to produce good results for sequential data. Efforts to combine both convolution and LSTM layers where the encoder is a convolutional neural network (CNN) and decoder is a multilayer LSTM network to reconstruct input images are shown to be effective in detecting anomalies within data. The use of combined models such as Gated recurrent unit autoencoders (GRU-AE), Convolutional neural networks autoencoders (CNN-AE), Long short-term memory (LSTM) autoencoder (LSTM-AE) eliminates the need for preparing hand-crafted features and facilitates the use of raw data with minimal preprocessing in anomaly detection tasks. Although autoencoders are simple and effective architectures for outlier detection, the performance gets degraded due to noisy training data (Zhou and Paffenroth [2017]).

具有线性激活函数的单层自动编码器几乎等同于主成分分析（PCA）（Pearson [1901]）。PCA仅限于线性降维，但自动编码器可实现线性或非线性变换（Liou等人[2008，2014]）。自动编码器的流行应用之一是异常检测。自动编码器也被称为复制器神经网络（RNN）（Hawkins等人[2002]，Williams等人[2002]）。自动编码器通过重构输入数据来表示多个隐藏层中的数据，从而有效地学习身份函数。当自动编码器仅在正常的数据实例上训练时(通常在异常检测任务中)，无法重建异常数据样本，因此产生了很大的重建误差。产生高残差的数据样本被认为是离群值。如图13所示，提出了几种不同的自动编码器架构，在异常检测方面产生了很有前景的结果。自动编码结构的选择取决于数据的性质，卷积网络是图像数据集的首选，而基于长短时记忆(LSTM)的模型对于序列数据往往产生良好的结果。将卷积层和LSTM层结合起来，其中编码器是一个卷积神经网络(CNN)，而解码器是一个多层LSTM网络，用于重建输入图像，结果表明这种方法可以有效地检测数据中的异常。联合模型的使用，如门控循环单元自动编码器(GRU-AE)，卷积神经网络自动编码器(CNN-AE)，长短时记忆(LSTM)自动编码器(LSTM- AE)，消除了准备手工特征的需要并有助于在异常检测任务中以最少的预处理使用原始数据。尽管自动编码器是用于离群点检测的简单而有效的架构，但是由于训练数据的噪声，性能会降低(Zhou和Paffenroth[2017])。
在这里插入图片描述

12 Relative Strengths and Weakness : Deep Anomaly Detection Methods 相对强弱：深度异常检测方法

Each of the deep anomaly detection (DAD) techniques discussed in previous sections have their unique strengths and weaknesses. It is critical to understand which anomaly detection technique is best suited for a given anomaly detection problem context. Given the fact that DAD is an active research area, it is not feasible to provide such an understanding for every anomaly detection problem. Hence in this section, we analyze the relative strengths and weaknesses of different categories of techniques for a few simple problem settings. Classification based supervised DAD techniques illustrated in Section 10.1 are better choices in scenario consisting of the equal amount of labels for both normal and anomalous instances. The computational complexity of supervised DAD technique is a key aspect, especially when the technique is applied to a real domain. While classification based, supervised or semi-supervised techniques have expensive training times, testing is usually fast since it uses a pre-trained model. Unsupervised DAD techniques presented in Section 10.5 are being widely used since label acquisition is a costly and time-consuming process. Most of the unsupervised deep anomaly detection requires priors to be assumed on the anomaly distribution hence the models are less robust in handling noisy data. Hybrid models illustrated in Section 10.3 extract robust features within hidden layers of the deep neural network and feed to best performing classical anomaly detection algorithms. The hybrid model approach is suboptimal because it is unable to influence representational learning in the hidden layers. The One-class Neural Networks (OC-NN) described in Section 10.4 combines the ability of deep networks to extract a progressively rich representation of data along with the one-class objective, such as a hyperplane (Chalapathy et al. [2018a]) or hypersphere (Ruff et al. [2018a]) to separate all the normal data points from anomalous data points. Further research and exploration is necessary to comprehend better the benefits of this new architecture proposed.

上一节中讨论的每种深度异常检测（DAD）技术都有其独特的优点和缺点。了解哪种异常检测技术最适合给定的异常检测问题上下文至关重要。鉴于DAD是一个活跃的研究领域，为每个异常检测问题提供这样的理解是不可行的。因此，在本节中，我们针对一些简单的问题设置，分析了不同类别的技术的相对优势和劣势。在由相等数量的正常和异常情况下的标签组成的情况下，第10.1节中说明的基于分类的有监督DAD技术是更好的选择。有监督的DAD技术的计算复杂度是一个关键方面，尤其是当该技术应用于实际领域时。尽管基于分类，有监督或半监督的技术需要花费大量的训练时间，但由于它使用了预先训练的模型，因此测试通常很快。由于标签获取是一个昂贵且耗时的过程，因此在10.5节中介绍的无监督DAD技术正被广泛使用。大多数无监督的深度异常检测要求先验假设异常分布，因此模型在处理嘈杂数据时不那么健壮。第10.3节所述的混合模型提取了深度神经网络的隐藏层中的鲁棒特征，并提供了性能最佳的经典异常检测算法。混合模型方法是次优的，因为它无法影响隐藏层中的表示学习。第10.4节中描述的一类神经网络（OC-NN）结合了深度网络的能力（提取数据的丰富表示形式）以及一类目标（例如超平面（Chalapathy等人[2018a]）或超球面（Ruff等人[2018a]））将所有正常数据点与异常数据点分开。为了更好地理解所提议的这种新架构的优势，有必要进行进一步的研究和探索。

13 Conclusion

In this survey paper, we have discussed various research methods in deep learning-based anomaly detection along with its application across various domains. This article discusses the challenges in deep anomaly detection and presents several existing solutions to these challenges. For each category of deep anomaly detection techniques, we present the assumption regarding the notion of normal and anomalous data along with its strength and weakness. The goal of this survey was to investigate and identify the various deep learning models for anomaly detection and evaluate its suitability for a given dataset. When choosing a deep learning model to a particular domain or data, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. Deep learning based anomaly detection is still active research, and a possible future work would be to extend and update this survey as more sophisticated techniques are proposed.

在本调查报告中，我们讨论了基于深度学习的异常检测的各种研究方法及其在各个领域的应用。本文讨论了深度异常检测中的挑战，并提出了应对这些挑战的几种现有解决方案。对于每种类别的深度异常检测技术，我们提出有关正常数据和异常数据以及其优缺点的概念的假设。这项调查的目的是调查和识别用于异常检测的各种深度学习模型，并评估其对于给定数据集的适用性。在为特定领域或数据选择深度学习模型时，这些假设可用作评估该领域技术有效性的指南。基于深度学习的异常检测仍是活跃的研究，随着提出更复杂的技术，将来可能的工作是扩展和更新此调查。