python 多维数组相关性计算_计算两个多维数组之间的相关系数

最新推荐文章于 2023-08-09 11:13:27 发布

weixin_39980929

最新推荐文章于 2023-08-09 11:13:27 发布

阅读量605

点赞数

文章标签： python 多维数组相关性计算

numpy 皮尔逊相关系数矩阵乘法运行效率二维数组

关键词由CSDN通过智能技术生成

I have two arrays that have the shapes N X T and M X T. I'd like to compute the correlation coefficient across T between every possible pair of rows n and m (from N and M, respectively).

What's the fastest, most pythonic way to do this? (Looping over N and M would seem to me to be neither fast nor pythonic.) I'm expecting the answer to involve numpy and/or scipy. Right now my arrays are numpy arrays, but I'm open to converting them to a different type.

I'm expecting my output to be an array with the shape N X M.

N.B. When I say "correlation coefficient," I mean the Pearson product-moment correlation coefficient.

Here are some things to note:

The numpy function correlate requires input arrays to be one-dimensional.

The numpy function corrcoef accepts two-dimensional arrays, but they must have the same shape.

The scipy.stats function pearsonr requires input arrays to be one-dimensional.

解决方案

Correlation (default 'valid' case) between two 2D arrays:

You can simply use matrix-multiplication np.dot like so -

out = np.dot(arr_one,arr_two.T)

Correlation with the default "valid" case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.

Row-wise Correlation Coefficient calculation for two 2D arrays:

def corr2_coeff(A,B):

# Rowwise mean of input arrays & subtract from input arrays themeselves

A_mA = A - A.mean(1)[:,None]

B_mB = B - B.mean(1)[:,None]

# Sum of squares across rows

ssA = (A_mA**2).sum(1);

ssB = (B_mB**2).sum(1);

# Finally get corr coeff

return np.dot(A_mA,B_mB.T)/np.sqrt(np.dot(ssA[:,None],ssB[None]))

Benchmarking

This section compares runtime performance with the proposed approach against generate_correlation_map & loopy pearsonr based approach listed in the other answer.(taken from the function test_generate_correlation_map() without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.

Case #1:

In [106]: A = np.random.rand(1000,100)

In [107]: B = np.random.rand(1000,100)

In [108]: %timeit corr2_coeff(A,B)

100 loops, best of 3: 15 ms per loop

In [109]: %timeit generate_correlation_map(A, B)

100 loops, best of 3: 19.6 ms per loop

Case #2:

In [110]: A = np.random.rand(5000,100)

In [111]: B = np.random.rand(5000,100)

In [112]: %timeit corr2_coeff(A,B)

1 loops, best of 3: 368 ms per loop