长序列检测 深度学习_深度学习时代的时间序列异常检测

长序列检测 深度学习

by Sarah Alnegheimish

莎拉·阿尔内海姆(Sarah Alnegheimish)

In the previous post, we looked at time series data and anomalies. (If you haven’t done so already, you can read the article here.) In part 2, we will discuss time series reconstruction using generative adversarial networks (GAN)¹ and how reconstructing time series can be used for anomaly detection².

在上一篇文章中,我们研究了时间序列数据和异常。 (如果您还没有这样做,可以在这里阅读文章。)在第2部分中 ,我们将讨论使用生成对抗网络(GAN)¹重建时间序列以及如何将重建的时间序列用于异常检测²。

使用生成对抗网络的时间序列异常检测 (Time Series Anomaly Detection using Generative Adversarial Networks)

Before we introduce our approach for anomaly detection (AD), let’s discuss one of today’s most interesting and popular models for deep learning: generative adversarial networks (GAN). The idea behind a GAN is that a generator (G), usually a neural network, attempts to construct a fake image by using random noise and fooling a discriminator (D) — also a neural network. (D)’s job is to identify “fake” examples from “real” ones. They compete with each other to be best at their job. How powerful is this approach? Well, the figure below depicts some fake images generated from a GAN.

在介绍我们的异常检测方法(AD)之前,让我们讨论当今最有趣和最受欢迎的深度学习模型之一:生成对抗网络(GAN)。 GAN的想法是,通常是神经网络的生成器(G)试图通过使用随机噪声并欺骗鉴别器(D)来构造假图像-也是神经网络 。 (D)的工作是从“真实”例子中识别“假”例子。 他们彼此竞争以尽其所能。 这种方法有多强大? 好吧,下图描绘了从GAN生成的一些虚假图像。

In this project, we leverage the same approach for time series. We adopt a GAN structure to learn the patterns of signals from an observed set of data and train the generator “G”. We then use “G” to reconstruct time series data, and calculate the error by finding the discrepancies between the real and reconstructed signal. We then use this error to identify anomalies.

在此项目中,我们将相同的方法用于时间序列。 我们采用GAN结构来从观察到的数据集中学习信号模式,并训练生成器“ G”。 然后,我们使用“ G”重建时间序列数据,并通过找到真实信号与重建信号之间的差异来计算误差。 然后,我们使用此错误来识别异常。

Enough talking — let’s look at some data.

聊够了-让我们看一些数据。

讲解 (Tutorial)

In this tutorial, we will use a python library called Orion to perform anomaly detection. After following the instructions for installation available on github, we can get started and run the notebook. Alternatively, you can launch binder to directly access the notebook.

在本教程中,我们将使用一个称为Orion的python库来执行异常检测。 按照github上提供的安装说明进行操作后,我们可以开始并运行笔记本 。 或者,您可以启动活页夹以直接访问笔记本。

载入资料 (Load Data)

In this tutorial, we continue examining the NYC taxi data maintained by Numenta. Their repository, available here, is full of AD approaches and labeled data, organized as a series of timestamps and corresponding values. Each timestamp corresponds to the time of observation in Unix Time Format.

在本教程中,我们将继续检查Numenta维护的NYC出租车数据。 它们的存储库( 此处提供)充满了AD方法和标记的数据,并按一系列时间戳和相应的值进行组织。 每个时间戳对应于Unix时间格式中的观察时间

To load the data, simply pass the signal name into the load_signal function. (If you are loading your own data, pass the file path.)

要加载数据,只需将信号名称传递给load_signal函数。 (如果要加载自己的数据,请传递文件路径。)

In  [1]: from orion.data import load_signal, load_anomalies


In  [2]: signal = 'nyc_taxi'


# load signal
In  [3]: df = load_signal(signal)


# load ground truth anomalies
In  [4]: known_anomalies = load_anomalies(signal)


In  [5]: df.head(5)
Out [5]:
+-------------+-----------+
|  timestamp  |   value   |
+-------------+-----------+
| 1404165600  | 10844.0   |
| 1404167400  | 8127.0    |
| 1404169200  | 6210.0    |
| 1404171000  | 4656.0    |
| 1404172800  | 3820.0    |
+-------------+-----------+

Though tables are powerful data structures, it’s hard to visualize time series through numerical values alone. So, let’s go ahead and plot the data using plot(df, known_anomalies) .

尽管表是强大的数据结构,但仅通过数字值就很难可视化时间序列。 因此,让我们继续使用plot(df, known_anomalies)绘制数据。

Image for post

As we saw in the previous post, this data spans almost 7 months between 2014 and 2015. It contains five anomalies: NYC Marathon, Thanksgiving, Christmas, New Year’s Eve, and a major snow storm.

正如我们在上一篇文章中看到的那样,该数据跨越2014年至2015年的近7个月。它包含五个异常:纽约马拉松,感恩节,圣诞节,除夕和一场大雪。

The central question of this post is: Can GANs be used to detect these anomalies? To answer this question, we have developed a time series anomaly detection pipeline using TadGAN, which is readily available in Orion. To use the model, pass the pipeline json name or path to the Orion API.

这篇文章的中心问题是: 可以使用GAN来检测这些异常吗? 为了回答这个问题,我们开发了使用TadGAN的时间序列异常检测管道,该管道可在Orion中轻松获得。 要使用该模型,请将管道json名称或路径传递给Orion API。

In  [1]: from orion import Orion


In  [2]: orion = Orion(
    ...:     pipeline='tadgan.json'
    ...: )


# fit the pipeline on the data then detect anomalies
In  [3]: anomalies = orion.fit_detect(df)


In  [4]: anomalies.head(5)
Out [4]:	
+-------------+-------------+------------+
|    start    |     end     |  severity  |
+-------------+-------------+------------+
| 1404442800  | 1404734400  | 0.521908   |
| 1408852800  | 1409050800  | 0.168267   |
| 1409378400  | 1409727600  | 0.319860   |
| 1411275600  | 1411488000  | 0.151349   |
| 1414823400  | 1415064600  | 0.158646   |
+-------------+-------------+------------+

The Orion API is a simple interface

  • 3
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值