【Python】当你需要计算1,000,000+次一维向量的相关系数, 怎么算最快(2)?

最新推荐文章于 2023-03-14 22:15:39 发布

AnyaBee

最新推荐文章于 2023-03-14 22:15:39 发布

阅读量493

点赞数

分类专栏： python 矩阵乘法相关系数

本文链接：https://blog.csdn.net/weixin_40006612/article/details/103402340

版权

python 同时被 3 个专栏收录

15 篇文章 0 订阅

订阅专栏

矩阵乘法

2 篇文章 0 订阅

订阅专栏

1.np.corrcoef的批量运算

前一篇我们已经尝试过这个方法,1,000,000次计算用时75.88秒,也不是特别慢,不过当然也不是很快.

但是我刚刚发现,其实是没有必要对测试数据和模板数据进行i,j 两次循环的.

可以一次从模板数据中拿出1个(100,)和所有的测试数据进行相关性运算,得到一个(1000,)的相关系数向量,然后进行1000次i循环,把结果拼成一个矩阵就可以了.

import time
import numpy as np

a = np.random.random((100, 1000))
b = np.random.random((100, 1000))
corrmat = np.zeros((b.shape[1], a.shape[1]))
tic1 = time.time()
for i in range(b.shape[1]):
    corrmat[i] = np.corrcoef(a.T,b[:,i])[0,1:1001]

toc1 = time.time()

print ('Method 1 time: %.2f' % (toc1 - tic))

Output

Method 1 time: 11.52

运算时间只需要11.52秒,比之前的stats.pearsonr()快将近6倍. np.corrcoef(), 错怪你了!

小拓展: 那么stats.pearsonr()有没有批量的算法呢?我们来看一下文档

np.corrcoef() 文档

'''
Parameters
----------
x : array_like
    A 1-D or 2-D array containing multiple variables and observations.
    Each row of `x` represents a variable, and each column a single
    observation of all those variables. Also see `rowvar` below.
y : array_like, optional
    An additional set of variables and observations. `y` has the same
    shape as `x`.
'''

stats.pearsonr()文档

'''
Parameters
    ----------
    x : (N,) array_like
        Input
    y : (N,) array_like
        Input
'''

可以明显的看到,np.corrcoef()方法第一个参数可以是2维的, 但是stats.pearsonr()要求两个输入必须都是一维数组,所以stats.pearsonr()不存在批量操作这样的东西.

2.np.corrcoef升级版

那么, 如果两个输入都是矩阵呢?会不会更快?

a = np.random.random((100, 1000))
b = np.random.random((100, 1000))
corrmat = np.zeros((b.shape[1], a.shape[1]))
tic2=time.time()
corrmat = np.corrcoef(b.T,a.T)[:b.size,b.size:]
toc2=time.time()

Output

Method 2 time: 0.05435991

喵喵喵?

3.魔法操作

魔法操作的出处

魔法操作的github

魔法操作的代码

a = np.random.random((100, 1000))
b = np.random.random((100, 1000))
corrmat = np.zeros((a.shape[1], b.shape[1]))
tic3 = time.time()
corrmat = AlmightyCorrcoefEinsumOptimized(b, a)
toc3 = time.time()

Output

Method 3 time: 0.02541614

┑(￣Д ￣)┍

3.结论

这个故事告诉我们, 工程性质的尝试通常不是解决问题最本质的方法.

AnyaBee

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【Python】当你需要计算1,000,000+次一维向量的相关系数, 怎么算最快(2)?

目录前景提要1.np.corrcoef的批量运算3.魔法操作3.结论前景提要可以看一下前一篇【Python】当你需要计算1,000,000+次一维向量的相关系数, 怎么算最快? 测试数据: (1000, 100) one of them (100,) 模板数据: (1000, 100) one of them (100,...
复制链接

扫一扫