python矩阵计算速度很慢,Python的稀疏LIL矩阵中极慢的求和行运算

I have written this code in Python that is giving expected results but is extremely extremely slow. The bottleneck is the summing multiple rows of the scipy.sparse.lil_matrix. How can I make it fast?

# D1 is a 1.5M x 1.3M sparse matrix, read as scipy.sparse.lil_matrix.

# D2 is a 1.5M x 111 matrix, read as numpy.array

# F1 is a csv file, read using csv.reader

for row in F1:

user_id = row[0]

clust = D2[user_id, 110]

neighbors = D2[ D2[:, 110] == clust][:,1]

score = np.zeros(1300000)

for neigh in neighbors:

score = score + D1 [neigh, :] # the most expensive operation

toBeWritten = np.argsort(score)[:,::-1].A[0,:]

Please let me know if there is something else too that is not very optimal.

解决方案

First a demo with a very small matrix

In [523]: idx=np.arange(0,8,2)

In [526]: D=np.arange(24).reshape(8,3)

In [527]: Dll=sparse.lil_matrix(D)

In [528]: D[idx,:].sum(axis=0)

Out[528]: array([36, 40, 44])

In [529]: Dll[idx,:].sum(axis=0)

Out[529]: matrix([[36, 40, 44]], dtype=int32)

In [530]: timeit D[idx,:].sum(axis=0)

100000 loops, best of 3: 17.3 µs per loop

In [531]: timeit Dll[idx,:].sum(axis=0)

1000 loops, best of 3: 1.16 ms per loop

In [532]: score=np.zeros(3) # your looping version

In [533]: for i in idx:

.....: score = score + Dll[i,:]

In [534]: score

Out[534]: matrix([[ 36., 40., 44.]])

In [535]: %%timeit

.....: score=np.zeros(3)

.....: for i in idx:

score = score + Dll[i,:]

.....:

100 loops, best of 3: 2.76 ms per loop

For some operations the csr format is faster:

In [537]: timeit Dll.tocsr()[idx,:].sum(axis=0)

1000 loops, best of 3: 955 µs per loop

or if I preconvert to csr:

In [538]: Dcsr=Dll.tocsr()

In [539]: timeit Dcsr[idx,:].sum(axis=0)

1000 loops, best of 3: 724 µs per loop

Still slow relative to dense.

I was going to talk about working with the data attributes of the sparse matrix as a way of selecting rows faster. But if the only purpose for selecting these rows is to sum their values we don't need to do that.

Sparse matrices sum on rows or columns by doing a matrix product with a column or row matrix of ones. And I just answered another question with the same answer.

https://stackoverflow.com/a/37120235/901925

Efficiently compute columnwise sum of sparse array where every non-zero element is 1

For example:

In [588]: I=np.asmatrix(np.zeros((1,Dll.shape[0])))

In [589]: I[:,idx]=1

In [590]: I

Out[590]: matrix([[ 1., 0., 1., 0., 1., 0., 1., 0.]])

In [591]: I*Dll

Out[591]: matrix([[ 36., 40., 44.]])

In [592]: %%timeit

I=np.asmatrix(np.zeros((1,Dll.shape[0])))

I[:,idx]=1

I*Dll

.....:

1000 loops, best of 3: 919 µs per loop

For this small matrix it did not help the speed, but with the Dcsr time drops to 215 µs (it's much better for math). With large matrices this product version will improve.

=================

I just found out, in another question, that a A_csr[[1,1,0,3],:] row selection is actually done with a matrix product. It constructs an 'extractor' csr matrix that looks like

matrix([[0, 1, 0, 0],

[0, 1, 0, 0],

[1, 0, 0, 0],

[0, 0, 0, 1]])

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值