python correlate,解释(和比较)numpy.correlate的输出

最新推荐文章于 2024-05-28 14:42:07 发布

天接云涛

最新推荐文章于 2024-05-28 14:42:07 发布

阅读量2k

点赞数

文章标签： python correlate

这篇博客探讨了如何在Python中使用numpy.correlate来确定两个信号之间的强相关性，并将其与matlab的xcorr函数进行了比较。通过示例解释了numpy.correlate输出的含义，特别是对于不同滞后(lag)的交叉相关值。博主发现numpy.correlate的输出不能直接用于比较不同信号的相关性，因为它们提供了每个滞后下的匹配程度，而不是一个全局的相似度指标。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

I have looked at this question but it hasn't really given me any answers.

Essentially, how can I determine if a strong correlation exists or not using np.correlate? I expect the same output as I get from matlab's xcorr with the coeff option which I can understand (1 is a strong correlation at lag l and 0 is no correlation at lag l), but np.correlate produces values greater than 1, even when the input vectors have been normalised between 0 and 1.

Example input

import numpy as np

x = np.random.rand(10)

y = np.random.rand(10)

np.correlate(x, y, 'full')

This gives the following output:

array([ 0.15711279, 0.24562736, 0.48078652, 0.69477838, 1.07376669,

1.28020871, 1.39717118, 1.78545567, 1.85084435, 1.89776181,

1.92940874, 2.05102884, 1.35671247, 1.54329503, 0.8892999 ,

0.67574802, 0.90464743, 0.20475408, 0.33001517])

How can I tell what is a strong correlation and what is weak if I don't know the maximum possible correlation value is?

Another example:

In [10]: x = [0,1,2,1,0,0]

In [11]: y = [0,0,1,2,1,0]

In [12]: np.correlate(x, y, 'full')

Out[12]: array([0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0])

Edit: This was a badly asked question, but the marked answer does answer what was asked. I think it is important to note what I have found whilst digging around in this area, you cannot compare outputs from cross-correlation. In other words, it would not be valid to use the outputs from cross-correlation to say signal x is better correlated to signal y than signal z. Cross-correlation does not provide this kind of information

解决方案

numpy.correlate is under-documented. I think that we can make sense of it, though. Let's start with your sample case:

>>> import numpy as np

>>> x = [0,1,2,1,0,0]

>>> y = [0,0,1,2,1,0]

>>> np.correlate(x, y, 'full')

array([0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0])

Those numbers are the cross-correlations for each of the possible lags. To make that more clear, let's put the lag numbers above the correlations:

>>> np.concatenate((np.arange(-5, 6)[None,...], np.correlate(x, y, 'full')[None,...]), axis=0)

array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],

[ 0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0]])

Here, we can see that the cross-correlation reaches its peak at a lag of -1. If you look at x and y above, that makes sense: it one shifts y to the left by one place, it matches x exactly.

To verify this, let's try again, this time shifting y further:

>>> y = [0, 0, 0, 0, 1, 2]

>>> np.concatenate((np.arange(-5, 6)[None,...], np.correlate(x, y, 'full')[None,...]), axis=0)

array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],

[ 0, 2, 5, 4, 1, 0, 0, 0, 0, 0, 0]])

Now, the correlation peaks at a lag of -3, meaning that the best match between x and y occurs when y is shifted to the left by 3 places.