1, scipy.sparse.vstack() 表示按行拼接(行数增加),列数必须相同
2, np.sum(s_multiplyed,axis=0) 每一列中所有行求和,结果行数为1,列数不变
3, 例子
stopwords_list = stopwords.words('english')+stopwords.words('portuguese')
vectorizer = TfidfVectorizer(stop_words = stopwords_list,ngram_range=(1,2),max_df=0.5,min_df=0.001,max_features=5000)
corpus = articles_df['title'] + '' + articles_df['text']
Tfidf_matrix = vectorizer.fit_transform(corpus)
Tfidf_matrix
<3047x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 638202 stored elements in Compressed Sparse Row format>
print(Tfidf_matrix)
(0, 1607) 0.6982198905259501
(0, 4826) 0.05016397320629753
(0, 1128) 0.0914809174284883
(0, 1507) 0.038620021333355174
(0, 4613) 0.040996310293707997
(0, 530) 0.36826514746823585
(0, 4941) 0.0197965383290968
(0, 4328) 0.03321711519981726
(0, 1433) 0.014645895339224089
(0, 1849) 0.01923512191045725
(0, 1934) 0.013449663472538311
(0, 3630) 0.03198848256271393
(0, 4808) 0.028335733068442087
(0, 4212) 0.05946752721690727
(0, 3747) 0.045487454490611506
(0, 3804) 0.0532422986807451
(0, 4418) 0.13171617172370684
(0, 1073) 0.03143331806534215
(0, 1728) 0.03580697939385251
(0, 4473) 0.016615613307880417
(0, 2564) 0.021110819684069474
(0, 3539) 0.0441493565796537
(0, 2743) 0.05880872809007701
(0, 3972) 0.04010389675404812
(0, 4022) 0.01621413503947185
: :
(3045, 2554) 0.04379656517720175
(3046, 4977) 0.231656783998254
(3046, 614) 0.07545310682213181
(3046, 4680) 0.06812738278369762
(3046, 3573) 0.10662644807409383
(3046, 3318) 0.08953668068222567
tfidf1 = Tfidf_matrix[1:2]
tfidf2 = Tfidf_matrix[2:3]
item_profiles_list = [tfidf1,tfidf2]
item_profiles_list
[<1x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 109 stored elements in Compressed Sparse Row format>,
<1x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 90 stored elements in Compressed Sparse Row format>]
scipy.sparse.vstack()
表示按行拼接(行数增加),列数必须相同
https://cloud.tencent.com/developer/article/1525041
scipy.sparse.vstack(blocks, format=None, dtype=None)
Stack sparse matrices vertically (row wise)
Parameters
blocks:
sequence of sparse matrices with compatible shapes
format str, optional
sparse format of the result (e.g. “csr”) by default an appropriate sparse matrix format is returned. This choice is subject to change.
dtype dtype, optional
The data-type of the output matrix. If not given, the dtype is determined from that of blocks.
>>> from scipy.sparse import coo_matrix, vstack
>>> A = coo_matrix([[1, 2], [3, 4]])
>>> B = coo_matrix([[5, 6]])
>>> vstack([A, B]).toarray()
array([[1, 2],
[3, 4],
[5, 6]])
import numpy as np
from scipy.sparse import coo_matrix, vstack
AB = []
A = coo_matrix([[1, 2],[3,4]])
B = coo_matrix([[5, 6],[7,8]])
AB.append(A)
AB.append(B)
AB
[<2x2 sparse matrix of type ‘<class ‘numpy.int32’>’
with 4 stored elements in COOrdinate format>,
<2x2 sparse matrix of type ‘<class ‘numpy.int32’>’
with 4 stored elements in COOrdinate format>]
vstack(AB)
<4x2 sparse matrix of type ‘<class ‘numpy.int32’>’
with 8 stored elements in COOrdinate format>
item_profiles = scipy.sparse.vstack(item_profiles_list)
item_profiles
<2x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 199 stored elements in Compressed Sparse Row format>
s = np.array(interactions_test_indexed_df['eventType_strength'])[:2].reshape(-1,1)
print(s.shape)
print(type(s))
s
(2, 1)
<class ‘numpy.ndarray’>
array([[1.5849625],
[2. ]])
np.multiply(a,b) 数量积(element-wise相乘)
s_multiplyed = item_profiles.multiply(s)
s_multiplyed
<2x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 199 stored elements in COOrdinate format>
np.sum(s_multiplyed,axis=0) 每一列中所有行求和,结果行数为1,列数不变
s_multiplyed_sumed = np.sum(s_multiplyed,axis=0)
s_multiplyed_sumed.shape
(1, 5000)