稀疏矩阵（coo_matrix, csr_matrix, csc_matrix）的定义和存取

最新推荐文章于 2024-06-12 11:33:11 发布

chao2016

最新推荐文章于 2024-06-12 11:33:11 发布

阅读量1.2w

点赞数 9

分类专栏： L_Python A_数据结构和算法文章标签：稀疏矩阵

本文链接：https://blog.csdn.net/chao2016/article/details/80344828

版权

L_Python 同时被 2 个专栏收录

29 篇文章 0 订阅

订阅专栏

A_数据结构和算法

9 篇文章 0 订阅

订阅专栏

文章目录

1. 定义
2. 存储和读取

前言：在以下场景中第一次碰到稀疏矩阵：

enc = OneHotEncoder()
enc.fit(data[feature].values.reshape(-1, 1))
# transform()返回的是csr_matrix
train_a = enc.transform(test[feature].values.reshape(-1, 1))
# hstack()返回的是coo_matrix
test_x = sparse.hstack((test_x, test_a))

1. 定义

1.1 coo_matrix

>>> from scipy import sparse
>>> import numpy as np

>>> sparse.coo_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)
       
>>> row = np.array([0, 3, 1, 0])
>>> col = np.array([0, 3, 1, 2])
>>> data = np.array([6, 5, 7, 8])
>>> coo_matrix((data, (row, col)), shape=(4, 4)).toarray()
array([[6, 0, 8, 0],
       [0, 7, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 5]])
# 0排0列：6

1.2 csr_matrix

>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> sparse.csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])
# 按row行来压缩
# 对于第i行，非0数据列是indices[indptr[i]:indptr[i+1]] 数据是data[indptr[i]:indptr[i+1]]
# 在本例中
# 第0行，有非0的数据列是indices[indptr[0]:indptr[1]] = indices[0:2] = [0,2]
# 数据是data[indptr[0]:indptr[1]] = data[0:2] = [1,2],所以在第0行第0列是1，第2列是2
# 第1行，有非0的数据列是indices[indptr[1]:indptr[2]] = indices[2:3] = [2]
# 数据是data[indptr[1]:indptr[2] = data[2:3] = [3],所以在第1行第2列是3
# 第2行，有非0的数据列是indices[indptr[2]:indptr[3]] = indices[3:6] = [0,1,2]
# 数据是data[indptr[2]:indptr[3]] = data[3:6] = [4,5,6],所以在第2行第0列是4，第1列是5,第2列是6

1.3 csc_matrix

>>> sparse.csc_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)

>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> sparse.csc_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 4],
       [0, 0, 5],
       [2, 3, 6]], dtype=int64)

>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> sparse.csc_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 4],
       [0, 0, 5],
       [2, 3, 6]])
# 按col列来压缩
# 对于第i列，非0数据行是indices[indptr[i]:indptr[i+1]] 数据是data[indptr[i]:indptr[i+1]]
# 在本例中
# 第0列，有非0的数据行是indices[indptr[0]:indptr[1]] = indices[0:2] = [0,2]
# 数据是data[indptr[0]:indptr[1]] = data[0:2] = [1,2],所以在第0列第0行是1，第2行是2
# 第1行，有非0的数据行是indices[indptr[1]:indptr[2]] = indices[2:3] = [2]
# 数据是data[indptr[1]:indptr[2] = data[2:3] = [3],所以在第1列第2行是3
# 第2行，有非0的数据行是indices[indptr[2]:indptr[3]] = indices[3:6] = [0,1,2]
# 数据是data[indptr[2]:indptr[3]] = data[3:6] = [4,5,6],所以在第2列第0行是4，第1行是5,第2行是6

2. 存储和读取

2.1 法一：现成的API

# test_x是一个matrix
# 存储为npz文件
sparse.save_npz('./data/npz/test_x.npz', test_x)
# 从npz文件中读取
test_x = sparse.load_npz('./data/npz/test_x.npz')

2.2 法二：自己发明的笨方法

# test_x是一个matrix
# 存储为npz文件
np.savez('./data/npz/test_x.npz', data=test_x.data, row=test_x.row, col=test_x.col)
# 从npz文件中读取
my_testx = np.load('./data/npz/test_x.npz')
test_x = sparse.coo_matrix((my_testx['data'], (my_testx['row'], my_testx['col'])))

2.3 更改ndarray数据格式的方法

# test_x.data初始数据格式是int64
test_x.data = np.array(test_x.data, dtype=np.float64)

chao2016

关注

9
点赞
踩
34

收藏

觉得还不错? 一键收藏
4
评论
稀疏矩阵（coo_matrix, csr_matrix, csc_matrix）的定义和存取

1. 定义1.1 coo_matrix1.2 csr_matrix1.3 csc_matrix2. 存储和读取2.1 法一：现成的API2.2 法二：自己发明的笨方法2.3 更改ndarray数据格式的方法前言：在以下场景中第一次碰到稀疏矩阵：enc = OneHotEncoder()enc.fit(data[feature].values.reshape...
复制链接

扫一扫