python如何创建不同元素的矩阵_如何在python上逐步创建稀疏矩阵?

1586010002-jmsa.png

I am creating a co-occurring matrix, which is of size 1M by 1M integer numbers.

After the matrix is created, the only operation I will do on it is to get top N values per each row (or column. as it is a symmetric matrix).

I have to create matrix as sparse to be able to fit it in memory. I read input data from a big file, and update co-occurance of two indexes (row, col) incrementally.

The sample code for Sparse dok_matrix specifies that I should declare the size of matrix before hand. I know the upper boundary for my matrix (1m by 1m), but in reality it might has less than that.

Do I have to specify the size beforehand, or can i just create it incrementally?

import numpy as np

from scipy.sparse import dok_matrix

S = dok_matrix((5, 5), dtype=np.float32)

for i in range(5):

for j in range(5):

S[i, j] = i + j # Update element

解决方案

A SO question from a couple of days ago, creating sparse matrix of unknown size, talks about creating a sparse matrix from data read from a file. There the OP wanted to use lil format; I recommended building the input arrays for a coo format.

In other SO questions I've found that adding values to a plain dictionary is faster than adding them to a dok matrix - even though a dok is a dictionary subclass. There's quite a bit of over head in the dok indexing method. In some cases I suggested building a dict with a tuple key, and using update to add the values to a defined dok. But I suspect in your case the coo route is better.

dok and lil are the best formats for incremental construction, but neither is that great compared to python list and dict methods.

As for the top N values of each row, I recall exploring that, but back some time, so can't off hand pull up a good SO question. You probably want a row oriented format such as lil or csr.

As for the question - 'do you need to specify the size on creation'. Yes. Because a sparse matrix, regardless of format, only stores nonzero values, there's little harm in creating a matrix that is too large.

I can't think of anything in a dok or coo format that hinges on the shape - at least not in terms of data storage or creation. lil and csr will have some extra values. If you really need to explore this, read up on how values are stored, and play with small matrices.

==================

It looks like all the code for dok format is Python in

/usr/lib/python3/dist-packages/scipy/sparse/dok.py

Scanning that file, I see that dok does have a resize method

d.resize?

Signature: d.resize(shape)

Docstring:

Resize the matrix in-place to dimensions given by 'shape'.

Any non-zero elements that lie outside the new shape are removed.

File: /usr/lib/python3/dist-packages/scipy/sparse/dok.py

Type: method

So if you want to initial the matrix to 1M x 1M and resize to 100 x 100 you can do so - it will step through all the keys to make sure there aren't any outside the new range. So it isn't cheap, even though the main action is to change the shape parameter.

newM, newN = shape

M, N = self.shape

if newM < M or newN < N:

# Remove all elements outside new dimensions

for (i, j) in list(self.keys()):

if i >= newM or j >= newN:

del self[i, j]

self._shape = shape

If you know for sure that there aren't any outside keys, you could change shape directly. The other sparse formats don't have a resize method.

In [31]: d=sparse.dok_matrix((10,10),int)

In [32]: d

Out[32]:

<10x10 sparse matrix of type ''

with 0 stored elements in Dictionary Of Keys format>

In [33]: d.resize((5,5))

In [34]: d

Out[34]:

<5x5 sparse matrix of type ''

with 0 stored elements in Dictionary Of Keys format>

In [35]: d._shape=(9,9)

In [36]: d

Out[36]:

<9x9 sparse matrix of type ''

with 0 stored elements in Dictionary Of Keys format>

See also:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值