python输出特征相关矩阵_两个特征矩阵的有效成对相关

似乎^{}遵循了皮尔逊相关系数公式的定义,该公式适用于A&B-

基于这个公式,你可以很容易地将向量化,因为A和{}列的成对计算是相互独立的。这里有一个使用^{}-# Get number of rows in either A or B

N = B.shape[0]

# Store columnw-wise in A and B, as they would be used at few places

sA = A.sum(0)

sB = B.sum(0)

# Basically there are four parts in the formula. We would compute them one-by-one

p1 = N*np.einsum('ij,ik->kj',A,B)

p2 = sA*sB[:,None]

p3 = N*((B**2).sum(0)) - (sB**2)

p4 = N*((A**2).sum(0)) - (sA**2)

# Finally compute Pearson Correlation Coefficient as 2D array

pcorr = ((p1 - p2)/np.sqrt(p4*p3[:,None]))

# Get the element corresponding to absolute argmax along the columns

out = pcorr[np.nanargmax(np.abs(pcorr),axis=0),np.arange(pcorr.shape[1])]

样本运行-

1)输入:

^{pr2}$

2)原始循环代码运行-In [14]: high_corr_out = np.zeros(A.shape[1])

...: for A_col in range(A.shape[1]):

...: high_corr = 0

...: for B_col in range(B.shape[1]):

...: corr,_ = pearsonr(A[:,A_col], B[:,B_col])

...: high_corr = max_absolute(high_corr, corr)

...: high_corr_out[A_col] = high_corr

...:

In [15]: high_corr_out

Out[15]: array([ 0.8067843 , 0.95678152, 0.74016181, -0.85127779])

3)建议代码运行-In [16]: N = B.shape[0]

...: sA = A.sum(0)

...: sB = B.sum(0)

...: p1 = N*np.einsum('ij,ik->kj',A,B)

...: p2 = sA*sB[:,None]

...: p3 = N*((B**2).sum(0)) - (sB**2)

...: p4 = N*((A**2).sum(0)) - (sA**2)

...: pcorr = ((p1 - p2)/np.sqrt(p4*p3[:,None]))

...: out = pcorr[np.nanargmax(np.abs(pcorr),axis=0),np.arange(pcorr.shape[1])]

...:

In [17]: pcorr # Pearson Correlation Coefficient array

Out[17]:

array([[ 0.41895565, -0.5910935 , -0.40465987, 0.5818286 ],

[ 0.66609445, -0.41950457, 0.02450215, 0.64028344],

[-0.64953314, 0.65669916, 0.30836196, -0.85127779],

[-0.41917583, 0.59043266, 0.40364532, -0.58144102],

[ 0.8067843 , 0.07947386, 0.74016181, 0.53165395],

[-0.1613146 , 0.95678152, 0.62107101, -0.4215393 ]])

In [18]: out # elements corresponding to absolute argmax along columns

Out[18]: array([ 0.8067843 , 0.95678152, 0.74016181, -0.85127779])

运行时测试-In [36]: A = np.random.rand(4000,40)

In [37]: B = np.random.rand(4000,144)

In [38]: np.allclose(org_app(A,B),proposed_app(A,B))

Out[38]: True

In [39]: %timeit org_app(A,B) # Original approach

1 loops, best of 3: 1.35 s per loop

In [40]: %timeit proposed_app(A,B) # Proposed vectorized approach

10 loops, best of 3: 39.1 ms per loop

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值