csr_matrix矩阵

最新推荐文章于 2023-08-30 13:23:07 发布

得克特

最新推荐文章于 2023-08-30 13:23:07 发布

阅读量2.7k

点赞数 1

分类专栏： python 文章标签： csr_matrix

本文链接：https://blog.csdn.net/weixin_40548136/article/details/111679329

版权

python 专栏收录该内容

33 篇文章 1 订阅

订阅专栏

csr采用按行压缩的方法，将原始的矩阵用三个数组表示：
三个数组的形式有两种
第一种

from scipy.sparse import *

row =  [0,0,0,1,1,1,2,2,2]#行索引
col =  [0,1,2,0,1,2,0,1,2]#列索引
data = [1,0,1,0,1,1,1,1,0]#对应值
t = csr_matrix((data,(row,col)),shape=(3,3))
print(t)
print(t.todense())
>>
 (0, 0)	1
  (0, 1)	0
  (0, 2)	1
  (1, 0)	0
  (1, 1)	1
  (1, 2)	1
  (2, 0)	1
  (2, 1)	1
  (2, 2)	0
[[1 0 1]
 [0 1 1]
 [1 1 0]]

这种是比较好理解的，每个数组分别代表行索引、列索引和对应的值
csr_matrix矩阵用法小节
第二种

from scipy import sparse
data = np.array([1, 2, 3, 4, 5, 6])         #所有的非零数值
indices = np.array([0, 2, 2, 0, 1, 2])      #所有值得列索引
indptr = np.array([0, 2, 3, 6])             #每行的的非零数据 data[i：i+1]
mtx = sparse.csr_matrix((data,indices,indptr),shape=(3,3))
mtx.todense()

比较难理解的是indptr，indptr每个值是每行中一个值得索引，我们用indptr[0]:indptr[1]取第一行对应的data的索引，即data[indptr[0]:indptr[1]]为第一行对应的值，再根据列索引即可确定值的位置。
如何理解sparse.csr_matrix

利用csr矩阵做计算貌似是更有效的，item协同过滤矩阵的乘法也是采用csr_matrix

# 将word映射为id
	documents_as_ids = [np.sort([word_to_id[w] for w in doc if w in word_to_id]).astype('uint32') for doc in documents]
    # row_ind为所有单词的所在的doc索引，col_ind为所有单词在该doc的索引
    row_ind, col_ind = zip(*itertools.chain(*[[(i, w) for w in doc] for i, doc in enumerate(documents_as_ids)]))
    data = np.ones(len(row_ind), dtype='uint32')  # use unsigned int for better memory utilization
    max_word_id = max(itertools.chain(*documents_as_ids)) + 1
    docs_words_matrix = csr_matrix((data, (row_ind, col_ind)), shape=(len(documents_as_ids), max_word_id))  # efficient arithmetic operations with CSR * CSR
    words_cooc_matrix = docs_words_matrix.T * docs_words_matrix  # multiplying docs_words_matrix with its transpose matrix would generate the co-occurences matrix
    words_cooc_matrix.setdiag(0)

得克特

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
csr_matrix矩阵

csr采用按行压缩的方法，将原始的矩阵用三个数组表示：三个数组的形式有两种第一种from scipy.sparse import *row = [0,0,0,1,1,1,2,2,2]#行索引col = [0,1,2,0,1,2,0,1,2]#列索引data = [1,0,1,0,1,1,1,1,0]#对应值t = csr_matrix((data,(row,col)),shape=(3,3))print(t)print(t.todense())>> (0, 0) 1
复制链接

扫一扫