对于scipy.sparse.csr_matrix的理解

最新推荐文章于 2022-09-30 14:12:34 发布

yellowTvT

最新推荐文章于 2022-09-30 14:12:34 发布

阅读量1.8k

点赞数 4

文章标签： python 机器学习

本文链接：https://blog.csdn.net/yellowetao/article/details/120577783

版权

本文详细介绍了CSR（Compressed Sparse Row）矩阵格式，这是一种用于存储稀疏矩阵的有效方式，特别适用于高效行切片、矩阵矢量乘法等运算。通过实例展示了如何使用`scipy.sparse.csr_matrix`构造和转换CSR矩阵，包括从二维数组、其他稀疏矩阵以及通过数据、索引和.indptr参数创建。CSR矩阵在科学计算和数据分析中扮演着重要角色。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

csr_matrix的API reference是这样写的：scipy.sparse.csr_matrix — SciPy v1.7.1 Manualhttps://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html#:~:text=csr_matrix%20%28%28data%2C%20indices%2C%20indptr%29%2C%20%5Bshape%3D%20%28M%2C%20N%29%5D%29%20is,matrix%20dimensions%20are%20inferred%20from%20the%20index%20arrays.

csr_matrix(Compressed Sparse Row matrix)压缩稀疏行格式

为什么要使用csr_matrix？

有利于高效运算
有利于高效行切片
有利于快速地矩阵矢量积运算

使用形式

①csr_matrix(D)

其中D是稠密矩阵或者二维向量

②csr_matrix(S)

其中S是其他类型稀疏矩阵

③csr_matrix((M, N), [dtype])

构造一个规模为（M,N）的dtype，其中dtypy是可选的

④csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])

其中满足的关系是：a[row_ind[i],col_ind[i]]=data[i],此处a是结果矩阵

⑤csr_matrix((data, indices, indptr), [shape=(M, N)])

其中满足的关系是：对于第i行有：

列索引为indices[indptr[i]:indptr[i+1]]

值为data[indptr[i]:indptr[i+1]]

例子

csr_matrix((M, N), [dtype])

>>>import numpy as np
>>>from scipy.sparse import csr_matrix
>>>csr_matrix((3, 4), dtype=np.int8).toarray()##转化为ndarray
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)

产生一个3行4列的空矩阵（empty matrix），数据类型为int8

csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])

>>>row = np.array([0, 0, 1, 2, 2, 2])
>>>col = np.array([0, 2, 2, 0, 1, 2])
>>>data = np.array([1, 2, 3, 4, 5, 6])
>>>csr_matrix((data, (row, col)), shape=(3, 3)).toarray()##转化为ndarray
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

也就是在结果矩阵中的[0,0]放1，在[0,2]中放2，在[1,2]中放3......[2,2]中放6

csr_matrix((data, indices, indptr), [shape=(M, N)])

>>>indptr = np.array([0, 2, 3, 6])
>>>indices = np.array([0, 2, 2, 0, 1, 2])
>>>data = np.array([1, 2, 3, 4, 5, 6])
>>>csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()##
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

此处应该是有点难以理解的：

对于结果矩阵第0行：（此处默认从第0行开始），

其列索引为：indices[indptr[0]:indptr[0+1]],也就是indices[0:2]，也就是indices的第0个+第1个，也就是0和2

其值为：data[indptr[0]:indptr[0+1]],也就是data[0:2]，也就是data的第一个和第二个，也就是1和2，

好了这下行索引，列索引，值都确定了，对应一下也就是第0行的第0个位置是1，第0行的第2个位置是2

对于结果矩阵第1行：（此处默认从第0行开始），

其列索引为：indices[indptr[1]:indptr[1+1]],也就是indices[2:3]，也就是indices的第2个，也就是2

其值为：data[indptr[1]:indptr[1+1]],也就是data[2:3]，也就是data的第2个，也就是3

好了这下行索引，列索引，值都确定了，对应一下也就是第1行的第2个位置是3

结果矩阵的第三行也是一样的道理，此处就不在说了。

如有错漏之处，敬请指正，谢谢！