前言
在Github上最火的异常检测算法就是Twitter的BreakoutDetection了 https://github.com/twitter/AnomalyDetection
关于它的介绍 https://blog.twitter.com/engineering/en_us/a/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series.html
但是只是R语言的。
然而有好多人把它用python实现了一遍:
- https://github.com/roland-hochmuth/BreakoutDetection
- https://github.com/indeedeng/anomaly-detection
- https://pypi.org/project/pyculiarity/
我试了一下,最友好的就是 https://pypi.org/project/pyculiarity/ 这个了。
安装
pip install pyculiarity
在安装的时候报错:
ImportError: cannot import name 'TimeSeries'
再继续执行以下更新就好了
pip install statsmodels --upgrade
运行
from pyculiarity import detect_ts
from datetime import datetime
import matplotlib.pyplot as plt
import pandas as pd
# raw_data.csv 数据在这里 https://github.com/zrnsm/pyculiarity/blob/master/tests/raw_data.csv
t = pd.read_csv('raw_data.csv', usecols=['timestamp', 'count'])
df = pd.read_csv('raw_data.csv', usecols=['timestamp', 'count'])
results = detect_ts(t,max_anoms=0.007,direction='both')
df.time = pd.to_datetime(df.timestamp)
df.time2 = pd.to_datetime(results['anoms']['timestamp'])
print(df.time2)
plt.plot(df.time,df['count'])
plt.plot(df.time2,results['anoms']['anoms'], 'ro')
plt.grid(True)
plt.show()
结果:红色的是异常点