稀疏矩阵 csr \ coo 用法，及文件操作

最新推荐文章于 2023-03-05 09:22:38 发布

biu________

最新推荐文章于 2023-03-05 09:22:38 发布

阅读量1.4k

点赞数

分类专栏：数据处理

本文链接：https://blog.csdn.net/biu________/article/details/105295342

版权

数据处理专栏收录该内容

1 篇文章 0 订阅

订阅专栏

一、coo矩阵：

优点：比较容易转换成其他的稀疏矩阵存储格式（CSR等）

缺点：不能进行矩阵运算。

一般格式：coo_matrix((coo_data, (coo_rows, coo_cols)), shape=shape)

例：

>>> import numpy as np
>>> import scipy.sparse as sp

>>> row = np.array([0, 3, 1, 0])
>>> col = np.array([0, 3, 1, 2])
>>> data = np.array([6, 5, 7, 8])
>>> matrix = sp.coo_matrix((data, (row, col)), shape=(4, 4))
>>> matrix

  (0, 0)	6
  (3, 3)	5
  (1, 1)	7
  (0, 2)	8

>>> matrix = matrix.toarray()  #转换成array格式
>>> matrix
    
array([[6, 0, 8, 0],
       [0, 7, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 5]])

row : 记录每个数据的行号

col：记录每个数据的列号

data: 记录对应数据

即记录非0数字的行号、列号以及 对应的数据。

.toarray() 可将 coo_matrix 转换为矩阵格式。

注：最后一行coo_matrix()建议要指定shape,因为coo只保留了有值的坐标，不指定shape可能无法还原矩阵。

二、csr 矩阵：

优点：高效的CSR + CSR, CSR *CSR算术运算；高效的行切片操作；高效的矩阵内积内积操作。CSR格式常用于读入数据后进

行稀疏矩阵计算。

缺点：但是列切片操作慢（相比CSC）；稀疏结构的变化代价高（相比LIL 或者 DOK）。

例：


import numpy as np
import scipy.sparse as sp

 
>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 1, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> matrix = sp.csr_matrix((data, indices, indptr), shape=(3, 3))
>>> print(matrix)

  (0, 0)	1
  (0, 1)	2
  (1, 2)	3
  (2, 0)	4
  (2, 1)	5
  (2, 2)	6

>>> matrix = matrix.toarray()  #转换成array格式
>>> matrix
    
    [[1 2 0]
     [0 0 3]
     [4 5 6]]

tip:

CSR是按行来存储一个稀疏矩阵的.。

indptr中的数据表示矩阵中每一行的数据在 data 中开始和结束的索引，[ l, r )

而indices中的数据表示所对应的在 data 中的数据在矩阵中其所在行的所在列数。

即：矩阵第零行：

在【0，2）这个范围的下标，分别从 indices 和 data 找下标和数据（indrptr 表示），

第一行非零数据的下标为0 和 1（indices表示）

它们的数据分别为1 和 2（data表示）

即坐标（0，0） 1

（0，1） 2

三、文件操作：

from scipy import io
#存储
matrix = [d_data1, d_col1, d_row1] #训练集
io.savemat('train.mat', {'array': matrix})

#读出
[[data1], [col1], [row1]] = io.loadmat("train.mat")['array']; 
data1 = np.array(data1)
col1 = np.array(col1)
row1 = np.array(row1)
# print(val_row1[0])
matrix = sparse.csr_matrix((data1,col1,row1), dtype=int)

四、其它函数用法：

转换成矩阵形式：

matrix.todense()

访问时可以：

matrix[row,column]

4.1 内积

当两个规模相当的矩阵做内积时，选择CSC或CSR并没有太大差别，时间效果相当。

但是当为一大一小矩阵时，就有一些技巧，可以节约时间。

假设B为大矩阵，S为小矩阵。

当CSR格式时，S×B速度较快，与B×S相比节约了一半时间。（小 × 大）
当CSC格式时，B×S速度较快，与S×B相比节约一半时间。（大 × 小）
上述两种方法，时间相a近，不分伯仲之间。

c_mtx = a_mtx.dot( b_mtx )

参考文献：

https://www.jianshu.com/p/9671c568096d

https://blog.csdn.net/mantoureganmian/article/details/80612137

biu________

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
稀疏矩阵 csr \ coo 用法，及文件操作

一、coo矩阵：优点：比较容易转换成其他的稀疏矩阵存储格式（CSR等）缺点：不能进行矩阵运算。一般格式：coo_matrix((coo_data, (coo_rows, coo_cols)), shape=shape)例：>>> import numpy as np>>> import scipy.sparse as sp&gt...
复制链接

扫一扫

专栏目录