以下是使用Jupyter笔记本的三个最受欢迎的答案的性能比较。 输入是一个1M x 100K随机稀疏矩阵,密度为0.001,包含100M非零值:
from scipy.sparse import random
matrix = random(1000000, 100000, density=0.001, format='csr')
matrix
<1000000x100000 sparse matrix of type ''
with 100000000 stored elements in Compressed Sparse Row format>
cPickle/np.savez
from scipy.sparse import io
%time io.mmwrite('test_io.mtx', matrix)
CPU times: user 4min 37s, sys: 2.37 s, total: 4min 39s
Wall time: 4min 39s
%time matrix = io.mmread('test_io.mtx')
CPU times: user 2min 41s, sys: 1.63 s, total: 2min 43s
Wall time: 2min 43s
matrix
<1000000x100000 sparse matrix of type ''
with 100000000 stored elements in COOrdinate format>
Filesize: 3.0G.
(请注意,格式已从csr更改为coo)。
cPickl