python时间序列峰值检测_在实时时间序列数据的峰值信号检测

最新推荐文章于 2020-12-10 10:56:20 发布

weixin_39870238

最新推荐文章于 2020-12-10 10:56:20 发布

阅读量555

点赞数

文章标签： python时间序列峰值检测

Update: The best performing algorithm so far is this one.

This question seeks to explore the available robust methods or algorithms for detecting sudden peaks in real-time timeseries data.

I do not seek fast and obvious answers. I would like every answer to provide a different approach to the problem, complemented with the advantages and disadvantages of the proposed method.

Consider the following dataset

p = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1,...

1 1 1.1 0.9 1 1.1 1 1 0.9 1 1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1,...

1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1,...

1 3 2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];

(Matlab format but it's not about the language but about the algorithm)

You can clearly see that there are three large peaks and some small peaks.

Specification

This dataset is a specific example of the class of timeseries datasets that the question is about. This class of datasets has two general features:

There is basic noise with a general mean

There are large 'peaks' or 'higher data points' that significantly deviate from the noise.

Let's also assume the following:

the width of the peaks cannot be determined beforehand

the height of the peaks clearly deviates from the other values

the used algorithm must calculate realtime (so change with each new datapoint)

Clearly, for such a situation, a boundary value needs to be constructed which triggers signals. However, in reality, this boundary value cannot be static and must be determined realtime based on an algorithm.

My Question: How can I calculate such boundary values realtime? What are well-known and applicable algorithms for such situations? (not this data set specifically)

Additionally, I would like to know the following:

Is it possible to detect peaks without delay given only what is known before that time?

What needs to be known about a peak to detect it (with some given delay)?

Well-founded/ creative alternatives and useful insights are all highly appreciated. (answers in any language are fine: it's about the algorithm)

解决方案

New (very) robust algorithm that uses the standard deviation

I have constructed a very well performing algorithm that signals when the data points are a specified number of standard deviations away from the moving mean. However, when a signal is detected, subsequent data points that are also a signal (so significantly away from the moving mean), will not corrupt the signal threshold. That is, the algorithm creates a 'new mean' and 'new st.dev.' in which the data points that are signals are not used. Therefore, the threshold remains uncorrupted and is able to correctly identify future signals too, without loss of performance. This works extremely well!

In order to display the power of this robust algorithm, I have prepared a demo in which the user can specify its own data. This little demo displays both how the algorithm works and why it is so useful.

The full working Matlab code for this demo:

function [] = RobustDetectionDemo()

%% SPECIFICATIONS

LAG = 10; % lag for the moving mean and moving st. dev.

DIFF = 3.5; % number of st. dev. from the mean to signal

INFLUENCE = 0.0; % when signal: how much is mean/st.dev. influenced?

% or e.g. 0.05/0.1 for influencing

DIRECTION = 'both'; % signal when 'up'/'down'/'both' from the mean

figure(1);

subplot(2,2,1);

title('Draw 30 data points');

ylim([0 5]); xlim([0 50]);

[x,y] = ginputExtra_realtime(30, true, LAG, DIFF, INFLUENCE, DIRECTION);

end

function [x y] = ginputExtra_realtime(n,booText, LAG, DIFF, INFLUENCE, DIRECTION)

if booText == true

bText = booText;

else

bText = false;

end

H = gca;

set(H, 'YLimMode', 'manual'); set(H, 'XLimMode', 'manual');

set(H, 'YLim', get(H,'YLim')); set(H, 'XLim', get(H,'XLim'));

numPoints = n; xg = []; yg = [];

for i=1:numPoints

[xi yi] = ginput(1);

xg = [xg xi]; yg = [yg yi];

if i == 1

hold on;

plot(H, xg(i),yg(i),'ro');

if bText text(xg(i),yg(i),num2str(i),'FontSize',12); end

else

plot(xg([i-1:i]),yg([i-1:i]),'r');

if bText text(xg(i),yg(i),num2str(i),'FontSize',12); end

if length(xg) > LAG

robustMA(xg, yg, LAG, DIFF, INFLUENCE, DIRECTION);

end

hold off;

x = xg; y = yg;

end

function [] = robustMA( x, y, lag, diff, influence, direction)

% robustMA :: Signal detection algorithm ::

% Author: Jean-Paul van Brakel

% ************************************************************ %

% TO BE USED FOR: *determining significant and sudden changes*

% ************************************************************ %

% x = x-axis data

% y = y-axis data

% lag = lag of moving mean and moving st.dev.

% diff = number of st.dev. away from the mean in order to give a signal

% influence = number between 0 and 1 that indicates influence of signals

% direction = 'up'/'down'/'both' which means the following:

% - 'up' : only signal for deviations ABOVE the mean

% - 'down': only signal for deviations BELOW the mean

% - 'both': signal for deviations ABOVE and BELOW the mean

p = y;

outputmean = tsmovavg(y,'s',lag,2);

outputstdev = movingstd(y,lag,'backward');

newMean = zeros(1, length(outputmean));

newStdev = zeros(1, length(outputmean));

signals = ones(1, length(outputmean));

newMean(lag-1) = outputmean(lag);

newStdev(lag-1) = outputstdev(lag);

for i = lag:length(outputmean)

if strcmp(direction, 'up')

if (p(i) > newMean(i-1)+diff*newStdev(i-1))

newMean(i) = (newMean(i-1) + influence*p(i))/(1+influence);

newStdev(i) = (newStdev(i-1) + influence*sqrt((p(i)-newMean(i-1))^2))/(1+influence);

signals(i) = 2;

else

newMean(i) = (newMean(i-1)+p(i))/2;

newStdev(i) = (newStdev(i-1) + sqrt((p(i)-newMean(i-1))^2))/2;

signals(i) = 1;

end

elseif strcmp(direction, 'down')

if (p(i) < newMean(i-1)-diff*newStdev(i-1))

newMean(i) = (newMean(i-1) + influence*p(i))/(1+influence);

newStdev(i) = (newStdev(i-1) + influence*sqrt((p(i)-newMean(i-1))^2))/(1+influence);

signals(i) = 2;

else

newMean(i) = (newMean(i-1)+p(i))/2;

newStdev(i) = (newStdev(i-1) + sqrt((p(i)-newMean(i-1))^2))/2;

signals(i) = 1;

end

elseif strcmp(direction, 'both')

if (p(i) > newMean(i-1)+diff*newStdev(i-1) || ...

p(i) < newMean(i-1)-diff*newStdev(i-1))

newMean(i) = (newMean(i-1) + influence*p(i))/(1+influence);

newStdev(i) = (newStdev(i-1) + influence*sqrt((p(i)-newMean(i-1))^2))/(1+influence);

signals(i) = 2;

else

newMean(i) = (newMean(i-1)+p(i))/2;

newStdev(i) = (newStdev(i-1) + sqrt((p(i)-newMean(i-1))^2))/2;

signals(i) = 1;

end

figure(1);

subplot(2,2,2);

hold on;

title('Algorithm output');

area(x, newMean+diff*newStdev, 'FaceColor', [0.9 0.9 0.9], 'EdgeColor', 'none');

area(x, newMean, 'FaceColor', [1 1 1], 'EdgeColor', 'none');

area(x, newMean, 'FaceColor', [0.9 0.9 0.9], 'EdgeColor', 'none');

area(x, newMean-diff*newStdev, 'FaceColor', [1 1 1], 'EdgeColor', 'none');

plot(x, p, ':r', 'LineWidth', 1, 'Color', 'black');

plot(x, newMean, 'LineWidth', 2, 'Color', 'red');

plot(x, newMean+newStdev, 'LineWidth', 2, 'Color', 'green');

plot(x, newMean-newStdev, 'LineWidth', 2, 'Color', 'green');

xlim([0 50]); ylim([0 5])

hold off;

subplot(2,2,4);

hold on;

title('Signal output');

stairs(x, signals, 'LineWidth', 2, 'Color', 'blue');

ylim([0 3]); xlim([0 50]);

hold off;

end

function s = movingstd(x,k,windowmode)

% movingstd: efficient windowed standard deviation of a time series

% usage: s = movingstd(x,k,windowmode)

% Movingstd uses filter to compute the standard deviation, using

% the trick of std = sqrt((sum(x.^2) - n*xbar.^2)/(n-1)).

% Beware that this formula can suffer from numerical problems for

% data which is large in magnitude.

% check for a windowmode

if (nargin<3) || isempty(windowmode)

% supply the default:

windowmode = 'central';

elseif ~ischar(windowmode)

error 'If supplied, windowmode must be a character flag.'

end

% check for a valid shortening.

valid = {'central' 'forward' 'backward'};

windowmode = lower(windowmode);

ind = strmatch(windowmode,valid);

if isempty(ind)

error 'Windowmode must be a character flag: ''c'', ''b'', or ''f''.'

else

windowmode = valid{ind};

end

% length of the time series

n = length(x);

% check for valid k

if (nargin<2) || isempty(k) || (rem(k,1)~=0)

error 'k was not provided or not an integer.'

end

switch windowmode

case 'central'

if k<1

error 'k must be at least 1 for windowmode = ''central''.'

end

if n

error 'k is too large for this short of a series and this windowmode.'

end

otherwise

if k<2

error 'k must be at least 2 for windowmode = ''forward'' or ''backward''.'

end

if (n

error 'k is too large for this short of a series.'

end

% Improve the numerical analysis by subtracting off the series mean

% this has no effect on the standard deviation.

x = x - mean(x);

% we will need the squared elements

x2 = x.^2;

% split into the three windowmode cases for simplicity

A = 1;

switch windowmode

case 'central'

B = ones(1,2*k+1);

s = sqrt((filter(B,A,x2) - (filter(B,A,x).^2)*(1/(2*k+1)))/(2*k));

s(k:(n-k)) = s((2*k):end);

case 'forward'

B = ones(1,k);

s = sqrt((filter(B,A,x2) - (filter(B,A,x).^2)*(1/k))/(k-1));

s(1:(n-k+1)) = s(k:end);

case 'backward'

B = ones(1,k);

s = sqrt((filter(B,A,x2) - (filter(B,A,x).^2)*(1/k))/(k-1));

end

% special case the ends as appropriate

switch windowmode

case 'central'

% repairs are needed at both ends

for i = 1:k

s(i) = std(x(1:(k+i)));

s(n-k+i) = std(x((n-2*k+i):n));

end

case 'forward'

% the last k elements must be repaired

for i = (k-1):-1:1

s(n-i+1) = std(x((n-i+1):n));

end

case 'backward'

% the first k elements must be repaired

for i = 1:(k-1)

s(i) = std(x(1:i));

end

The necessary parameters are:

LAG: lag for the moving mean and moving st. dev.

DIFF: number of st. dev. away from the mean to generate a signal

INFLUENCE: when there is a signal, how much is mean/st.dev. influenced? (number between 0-1)

DIRECTION: signal when deviation is 'up'/'down'/'both' away from the mean?

As you can see, I used the settings LAG=10; DIFF=3.5; INFLUENCE=0; for this demo. Feel free to fiddle around with these parameters and study the differences in performance of the algorithm.

Original explanation

I have constructed a new (very well performing) algorithm in which the number of standard deviations away from the mean is used as threshold.

The basic idea behind it is the following:

if (price(t) deviates more than mean(t-1) + k * std(t-1) )

{

// construct signal

new Mean : mean(t) = (mean(t-1)+influence*price(t))/(1+influence);

new Std : std(t) = (std(t-1)+influence*sqrt((price(t)-newMean(t-1))^2))/(1+influence)

} else

{

new Mean : mean(t) = (mean(t-1) + price(t)) / 2

new Std : std(t) = std(t-1) + sqrt((price(t) - mean(t-1))^2) /2

}

This small algorithm performs surprisingly well!

Especially because it doesn't use the real mean and standard deviation but construct a new one such that signals do not influence/ corrupt the signal threshold.

Advantages

Performs surprisingly well (no matter the length of time)

Very robust: adjusts for noise but still remains objective towards high peaks

Only needs 2 input parameters

Allows for different moving average structures (Simple, Exponential, Weighted etc.)

Disadvantages

Need to estimate the 2 input parameters, need to specify the influence parameter

For small general variation in the noise, the algorithm is not very useful

Examples for the sample data:

lag = 30, diff = 3, influence = 0

lag = 30, diff = 3.5, influence = 0

lag = 30, diff = 5, influence = 0

Code

Code for replicating in Matlab:

p = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1,...

1 1 1.1 0.9 1 1.1 1 1 0.9 1 1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1,...

1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1,...

1 3 2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];

% SPECS

lag = 30; % lag of the algorithm

diff = 3.5; % number of standard deviations from the mean

influence = 0; % influence parameter (convergence to signals)

% SPECS

outputmean = tsmovavg(p,'s',lag,2);

outputstdev = movingstd(p,lag,'backward');

newMean = zeros(1, length(outputmean));

newStdev = zeros(1, length(outputmean));

signals = zeros(1, length(outputmean));

newMean(lag-1) = outputmean(lag);

newStdev(lag-1) = outputstdev(lag);

for i=lag:length(outputmean)

if (p(i) > newMean(i-1)+diff*newStdev(i-1))

newMean(i) = (newMean(i-1) + influence*p(i))/(1+influence);

newStdev(i) = (newStdev(i-1) + influence*sqrt((p(i)-newMean(i-1))^2))/(1+influence);

signals(i) = 0.3;

else

newMean(i) = (newMean(i-1)+p(i))/2;

newStdev(i) = (newStdev(i-1) + sqrt((p(i)-newMean(i-1))^2))/2;

signals(i) = 0;

end

figure;

hold all;

plot(p, ':r', 'LineWidth', 1, 'Color', 'black');

plot(signals, 'LineWidth', 2, 'Color', 'blue');

plot(newMean, 'LineWidth', 2, 'Color', 'red');

plot(newMean+newStdev, 'LineWidth', 2, 'Color', 'green');

Extra function movingstd: code

weixin_39870238

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python时间序列峰值检测_在实时时间序列数据的峰值信号检测

Update: The best performing algorithm so far is this one.This question seeks to explore the available robust methods or algorithms for detecting sudden peaks in real-time timeseries data.I do not seek...
复制链接

扫一扫