python corrcoef,为什么NUMPY相关和corrcoef返回不同的值，以及如何“归一化”？ “完整”关联中的模式？...

最新推荐文章于 2022-01-20 18:57:20 发布

weixin_39830225

最新推荐文章于 2022-01-20 18:57:20 发布

阅读量216

点赞数

文章标签： python corrcoef

I'm trying to use some Time Series Analysis in Python, using Numpy.

I have two somewhat medium-sized series, with 20k values each and I want to check the sliding correlation.

The corrcoef gives me as output a Matrix of auto-correlation/correlation coefficients. Nothing useful by itself in my case, as one of the series contains a lag.

The correlate function (in mode="full") returns a 40k elements list that DO look like the kind of result I'm aiming for (the peak value is as far from the center of the list as the Lag would indicate), but the values are all weird - up to 500, when I was expecting something from -1 to 1.

I can't just divide it all by the max value; I know the max correlation isn't 1.

How could I normalize the "cross-correlation" (correlation in "full" mode) so the return values would be the correlation on each lag step instead those very large, strange values?

解决方案

You are looking for normalized cross-correlation. This option isn't available yet in Numpy, but a patch is waiting for review that does just what you want. It shouldn't be too hard to apply it I would think. Most of the patch is just doc string stuff. The only lines of code that it adds are

if normalize:

a = (a - mean(a)) / (std(a) * len(a))

v = (v - mean(v)) / std(v)

where a and v are the inputted numpy arrays of which you are finding the cross-correlation. It shouldn't be hard to either add them into your own distribution of Numpy or just make a copy of the correlate function and add the lines there. I would do the latter personally if I chose to go this route.

Another, quite possibly better, alternative is to just do the normalization to the input vectors before you send it to correlate. It's up to you which way you would like to do it.

By the way, this does appear to be the correct normalization as per the Wikipedia page on cross-correlation except for dividing by len(a) rather than (len(a)-1). I feel that the discrepancy is akin to the standard deviation of the sample vs. sample standard deviation and really won't make much of a difference in my opinion.

weixin_39830225

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python corrcoef,为什么NUMPY相关和corrcoef返回不同的值，以及如何“归一化”？ “完整”关联中的模式？...

I'm trying to use some Time Series Analysis in Python, using Numpy.I have two somewhat medium-sized series, with 20k values each and I want to check the sliding correlation.The corrcoef gives me as ou...
复制链接

扫一扫

python corrcoef,为什么NUMPY相关和corrcoef返回不同的值，以及如何“归一化”？ “完整”关联中的模式？...

“相关推荐”对你有帮助么？