scipy.sparse.vstack() 和 np.sum()

最新推荐文章于 2022-07-24 23:28:56 发布

Jennie_J

最新推荐文章于 2022-07-24 23:28:56 发布

阅读量1.6k

点赞数

分类专栏： python数据处理

本文链接：https://blog.csdn.net/weixin_43685844/article/details/104430023

版权

python数据处理专栏收录该内容

31 篇文章 6 订阅

订阅专栏

1， scipy.sparse.vstack() 表示按行拼接（行数增加），列数必须相同

2， np.sum(s_multiplyed,axis=0) 每一列中所有行求和，结果行数为1，列数不变

3，例子

stopwords_list = stopwords.words('english')+stopwords.words('portuguese')
vectorizer = TfidfVectorizer(stop_words = stopwords_list,ngram_range=(1,2),max_df=0.5,min_df=0.001,max_features=5000)
corpus = articles_df['title'] + '' + articles_df['text']
Tfidf_matrix = vectorizer.fit_transform(corpus)
Tfidf_matrix

<3047x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 638202 stored elements in Compressed Sparse Row format>

print(Tfidf_matrix)

(0, 1607) 0.6982198905259501
(0, 4826) 0.05016397320629753
(0, 1128) 0.0914809174284883
(0, 1507) 0.038620021333355174
(0, 4613) 0.040996310293707997
(0, 530) 0.36826514746823585
(0, 4941) 0.0197965383290968
(0, 4328) 0.03321711519981726
(0, 1433) 0.014645895339224089
(0, 1849) 0.01923512191045725
(0, 1934) 0.013449663472538311
(0, 3630) 0.03198848256271393
(0, 4808) 0.028335733068442087
(0, 4212) 0.05946752721690727
(0, 3747) 0.045487454490611506
(0, 3804) 0.0532422986807451
(0, 4418) 0.13171617172370684
(0, 1073) 0.03143331806534215
(0, 1728) 0.03580697939385251
(0, 4473) 0.016615613307880417
(0, 2564) 0.021110819684069474
(0, 3539) 0.0441493565796537
(0, 2743) 0.05880872809007701
(0, 3972) 0.04010389675404812
(0, 4022) 0.01621413503947185
: :
(3045, 2554) 0.04379656517720175
(3046, 4977) 0.231656783998254
(3046, 614) 0.07545310682213181
(3046, 4680) 0.06812738278369762
(3046, 3573) 0.10662644807409383
(3046, 3318) 0.08953668068222567

tfidf1 = Tfidf_matrix[1:2]
tfidf2 = Tfidf_matrix[2:3]
item_profiles_list = [tfidf1,tfidf2]
item_profiles_list

[<1x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 109 stored elements in Compressed Sparse Row format>,
<1x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 90 stored elements in Compressed Sparse Row format>]

scipy.sparse.vstack（）

表示按行拼接（行数增加），列数必须相同

https://cloud.tencent.com/developer/article/1525041

scipy.sparse.vstack(blocks, format=None, dtype=None)

Stack sparse matrices vertically (row wise)

Parameters

blocks:

sequence of sparse matrices with compatible shapes

format str, optional

sparse format of the result (e.g. “csr”) by default an appropriate sparse matrix format is returned. This choice is subject to change.

dtype dtype, optional

The data-type of the output matrix. If not given, the dtype is determined from that of blocks.

>>> from scipy.sparse import coo_matrix, vstack
>>> A = coo_matrix([[1, 2], [3, 4]])
>>> B = coo_matrix([[5, 6]])
>>> vstack([A, B]).toarray()
array([[1, 2],
       [3, 4],
       [5, 6]])

import numpy as np
from scipy.sparse import coo_matrix, vstack
AB = []
A = coo_matrix([[1, 2],[3,4]])
B = coo_matrix([[5, 6],[7,8]])
AB.append(A)
AB.append(B)
AB

[<2x2 sparse matrix of type ‘<class ‘numpy.int32’>’
with 4 stored elements in COOrdinate format>,
<2x2 sparse matrix of type ‘<class ‘numpy.int32’>’
with 4 stored elements in COOrdinate format>]

vstack(AB)

<4x2 sparse matrix of type ‘<class ‘numpy.int32’>’
with 8 stored elements in COOrdinate format>

item_profiles = scipy.sparse.vstack(item_profiles_list)
item_profiles

<2x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 199 stored elements in Compressed Sparse Row format>

s = np.array(interactions_test_indexed_df['eventType_strength'])[:2].reshape(-1,1)
print(s.shape)
print(type(s))
s

(2, 1)
<class ‘numpy.ndarray’>

array([[1.5849625],
[2. ]])

np.multiply(a,b) 数量积（element-wise相乘）

s_multiplyed = item_profiles.multiply(s)
s_multiplyed

<2x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 199 stored elements in COOrdinate format>

np.sum(s_multiplyed,axis=0) 每一列中所有行求和，结果行数为1，列数不变

s_multiplyed_sumed = np.sum(s_multiplyed,axis=0)
s_multiplyed_sumed.shape

(1, 5000)

Jennie_J

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scipy.sparse.vstack() 和 np.sum()

1， scipy.sparse.vstack() 表示按行拼接（行数增加），列数必须相同2， np.sum(s_multiplyed,axis=0) 每一列中所有行求和，结果行数为1，列数不变3，例子stopwords_list = stopwords.words('english')+stopwords.words('portuguese')vectorizer = TfidfVe...
复制链接

扫一扫