异常检测时间序列_使用analalize库进行时间序列异常检测

异常检测时间序列

Time series data have a wide range of application cases — from tracking key performance indicators (KPI) and understanding business trends to advanced modeling and forecasting. Anomaly detection is one such useful application. Business applications such as credit card fraud detection require powerful techniques that can take time series data as inputs and identify anomalies in real-time.

时间序列数据具有广泛的应用案例-从跟踪关键绩效指标(KPI)和了解业务趋势到高级建模和预测。 异常检测就是这样一种有用的应用。 诸如信用卡欺诈检测之类的业务应用程序需要强大的技术,这些技术可以将时间序列数据作为输入并实时识别异常。

Anomaly detection is a well-researched domain with many tools and techniques available. Quite a few R packages are out there for anomaly detection such as tsoutlier and AnomalyDetection. However, I recently became convinced that anomalize is the most intuitive and easy-to-use library — for both novice and advanced data scientists.

异常检测是一个经过广泛研究的领域,具有许多可用的工具和技术。 相当多的R程序包可用于异常检测,例如tsoutlierAnomalyDetection. 但是,我最近确信,对于新手和高级数据科学家而言, anomalize是最直观,最易于使用的库。

So the purpose of today’s article is to demonstrate the implementation of anomalize anomaly detection library in three easy steps.

所以今天的文章的目的是展示的实施anomalize三个简单的步骤异常检测库。

Let’s dive right in.

让我们潜入。

步骤1:安装库并准备数据 (Step 1: Install libraries and prepare data)

Like in any other machine learning algorithm, preparing data is probably the most important step you can take towards anomaly detection. On the positive side though, you’ll likely use only one column at a time. So unlike hundreds of features in other machine learning techniques, you can focus on only one column that is being used for modeling.

像其他任何机器学习算法一样,准备数据可能是您可以进行异常检测的最重要步骤。 从积极的一面看,您可能一次只使用一列。 因此,与其他机器学习技术中的数百个功能不同,您可以仅关注用于建模的一列。

Make sure that you go through the usual ritual of data cleaning and preparation such as taking care of missing values etc. One essential step is to make sure that the dataset is in a tibble or tbl_time object in the end.

请确保你通过数据清理和准备通常的仪式如服用缺失值的护理等一个重要的步骤就是确保该数据集是在tibble或最终tbl_time对象。

Let’s first install the libraries we are going to need:

让我们首先安装我们需要的库:

# install libraries
library(anomalize)
library(tidyverse)
library(tibbletime)
library(tidyquant)

For this demo we are in good luck, no data processing required. We are going to fetch stock price data using tidyquant library.

对于本演示,我们很幸运,不需要数据处理。 我们将使用tidyquant库获取股价数据。

# fetch data
data <- tq_get('AAPL',
from = "2019-09-01",
to = "2020-02-28",
get = "stock.prices")# take a peek
head(data)
Image for post

第2步:实施anomalize (Step 2: Implement anomalize)

First, let’s implement anomalize with the data that we just fetched and then talk about what’s going on.

首先,让我们对刚刚获取的数据进行anomalize处理,然后讨论发生了什么。

# anomalize 
anomalized <- data %>%
time_decompose(close) %>%
anomalize(remainder) %>%
time_recompose()

Few things are going on here, the library takes in input data and applies three separate functions to it.

这里很少发生什么事情,该库接收输入数据并对其应用三个独立的函数。

First,time_decompose() function decomposes “close” column of the time series data into “observe”, “season”, “trend” and “remainder” components.

首先, time_decompose()函数将时间序列数据的“关闭”列分解为“观察”,“季节”,“趋势”和“剩余”部分。

Second,anomalize() function performs anomaly detection on the “remainder” column and gives outputs in 3 columns: “remainder_l1”, “remainder_l2” and “anomaly”. The last column here is what we are after, it’s “yes” if the observation is an anomaly and “no” for a normal data point.

其次, anomalize()函数对“ remainder”列执行异常检测,并在3列中给出输出:“ remainder_l1”,“ remainder_l2”和“ anomaly”。 这里的最后一列是我们要查找的内容,如果观察到的异常,则为“是”,对于正常数据点为“否”。

Image for post
Outputs of anomalize implementation
标准化执行的输出

The final function time_recompose() puts everything back into order by recomposing “trend” and “season” columns created earlier.

最终函数time_recompose()通过重新组合先前创建的“趋势”和“季节”列将所有内容重新排列。

步骤3:绘制异常数据 (Step 3: Plot data with anomalies)

For all intents and purposes, our anomaly detection is complete in the previous step. But we still need to visualize the data and the anomalies. Let’s do that and visually check out the outliers.

出于所有目的和目的,我们的异常检测已在上一步中完成。 但是我们仍然需要可视化数据和异常。 让我们这样做,并目视检查异常值。

# plot data with anomalies
anomalized %>%
plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.25) + labs(title = "AAPL Anomalies")
Image for post

The figure is pretty intuitive. Each dot is an observed data point in the dataset and red circles are anomalies as identified by the model. The shaded areas are the upper and lower limits of the remainders.

该图非常直观。 每个点是数据集中观察到的数据点,红色圆圈是模型所标识的异常。 阴影区域是其余部分的上限和下限。

下一步是什么? (What’s next?)

If you have come along thus far, you have successfully implemented a sophisticated anomaly detection technique in three simple steps. That was easy because we used default parameters and didn’t change anything. As we saw in the figure above, this out of the box model performed pretty well in detecting outliers. However, you might come across complex time series data that will require better model performance by tuning parameters in step 2. You can read the model documentation and the quick starter guide to get a sense of the parameters, what they do and how & when to change them.

如果到目前为止,您已经通过三个简单的步骤成功地实现了复杂的异常检测技术。 这很容易,因为我们使用了默认参数,并且未进行任何更改。 如上图所示,这种开箱即用的模型在检测异常值方面表现非常出色。 但是,您可能会遇到复杂的时间序列数据,这些数据需要通过在步骤2中调整参数来提高模型性能。您可以阅读模型文档快速入门指南,以了解参数,它们的功能以及如何以及何时使用。改变他们。

If you liked this article you can follow me on Twitter or LinkedIn.

如果您喜欢这篇文章,可以在TwitterLinkedIn上关注我。

翻译自: https://towardsdatascience.com/time-series-anomaly-detection-with-anomalize-library-67472003c003

异常检测时间序列

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值