scipy.sparse.vstack() 和 np.sum()

1, scipy.sparse.vstack() 表示按行拼接(行数增加),列数必须相同

2, np.sum(s_multiplyed,axis=0) 每一列中所有行求和,结果行数为1,列数不变

3, 例子

stopwords_list = stopwords.words('english')+stopwords.words('portuguese')
vectorizer = TfidfVectorizer(stop_words = stopwords_list,ngram_range=(1,2),max_df=0.5,min_df=0.001,max_features=5000)
corpus = articles_df['title'] + '' + articles_df['text']
Tfidf_matrix = vectorizer.fit_transform(corpus)
Tfidf_matrix

<3047x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 638202 stored elements in Compressed Sparse Row format>

print(Tfidf_matrix)

(0, 1607) 0.6982198905259501
(0, 4826) 0.05016397320629753
(0, 1128) 0.0914809174284883
(0, 1507) 0.038620021333355174
(0, 4613) 0.040996310293707997
(0, 530) 0.36826514746823585
(0, 4941) 0.0197965383290968
(0, 4328) 0.03321711519981726
(0, 1433) 0.014645895339224089
(0, 1849) 0.01923512191045725
(0, 1934) 0.013449663472538311
(0, 3630) 0.03198848256271393
(0, 4808) 0.028335733068442087
(0, 4212) 0.05946752721690727
(0, 3747) 0.045487454490611506
(0, 3804) 0.0532422986807451
(0, 4418) 0.13171617172370684
(0, 1073) 0.03143331806534215
(0, 1728) 0.03580697939385251
(0, 4473) 0.016615613307880417
(0, 2564) 0.021110819684069474
(0, 3539) 0.0441493565796537
(0, 2743) 0.05880872809007701
(0, 3972) 0.04010389675404812
(0, 4022) 0.01621413503947185
: :
(3045, 2554) 0.04379656517720175
(3046, 4977) 0.231656783998254
(3046, 614) 0.07545310682213181
(3046, 4680) 0.06812738278369762
(3046, 3573) 0.10662644807409383
(3046, 3318) 0.08953668068222567

tfidf1 = Tfidf_matrix[1:2]
tfidf2 = Tfidf_matrix[2:3]
item_profiles_list = [tfidf1,tfidf2]
item_profiles_list 

[<1x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 109 stored elements in Compressed Sparse Row format>,
<1x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 90 stored elements in Compressed Sparse Row format>]

scipy.sparse.vstack()

表示按行拼接(行数增加),列数必须相同

https://cloud.tencent.com/developer/article/1525041

scipy.sparse.vstack(blocks, format=None, dtype=None)

Stack sparse matrices vertically (row wise)

Parameters

blocks:

sequence of sparse matrices with compatible shapes

format str, optional

sparse format of the result (e.g. “csr”) by default an appropriate sparse matrix format is returned. This choice is subject to change.

dtype dtype, optional

The data-type of the output matrix. If not given, the dtype is determined from that of blocks.

>>> from scipy.sparse import coo_matrix, vstack
>>> A = coo_matrix([[1, 2], [3, 4]])
>>> B = coo_matrix([[5, 6]])
>>> vstack([A, B]).toarray()
array([[1, 2],
       [3, 4],
       [5, 6]])
import numpy as np
from scipy.sparse import coo_matrix, vstack
AB = []
A = coo_matrix([[1, 2],[3,4]])
B = coo_matrix([[5, 6],[7,8]])
AB.append(A)
AB.append(B)
AB

[<2x2 sparse matrix of type ‘<class ‘numpy.int32’>’
with 4 stored elements in COOrdinate format>,
<2x2 sparse matrix of type ‘<class ‘numpy.int32’>’
with 4 stored elements in COOrdinate format>]

vstack(AB)

<4x2 sparse matrix of type ‘<class ‘numpy.int32’>’
with 8 stored elements in COOrdinate format>

item_profiles = scipy.sparse.vstack(item_profiles_list)
item_profiles

<2x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 199 stored elements in Compressed Sparse Row format>

s = np.array(interactions_test_indexed_df['eventType_strength'])[:2].reshape(-1,1)
print(s.shape)
print(type(s))
s

(2, 1)
<class ‘numpy.ndarray’>

array([[1.5849625],
[2. ]])

np.multiply(a,b) 数量积(element-wise相乘)

s_multiplyed = item_profiles.multiply(s)
s_multiplyed

<2x5000 sparse matrix of type ‘<class ‘numpy.float64’>’
with 199 stored elements in COOrdinate format>

np.sum(s_multiplyed,axis=0) 每一列中所有行求和,结果行数为1,列数不变

s_multiplyed_sumed = np.sum(s_multiplyed,axis=0)
s_multiplyed_sumed.shape

(1, 5000)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
import pandas as pd import numpy as np # 计算用户对歌曲的播放比例 triplet_dataset_sub_song_merged_sum_df = triplet_dataset_sub_song_mergedpd[['user', 'listen_count']].groupby('user').sum().reset_index() triplet_dataset_sub_song_merged_sum_df.rename(columns={'listen_count': 'total_listen_count'}, inplace=True) triplet_dataset_sub_song_merged = pd.merge(triplet_dataset_sub_song_mergedpd, triplet_dataset_sub_song_merged_sum_df) triplet_dataset_sub_song_mergedpd['fractional_play_count'] = triplet_dataset_sub_song_mergedpd['listen_count'] / triplet_dataset_sub_song_merged['total_listen_count'] # 将用户和歌曲编码为数字 small_set = triplet_dataset_sub_song_mergedpd user_codes = small_set.user.drop_duplicates().reset_index() song_codes = small_set.song.drop_duplicates().reset_index() user_codes.rename(columns={'index': 'user_index'}, inplace=True) song_codes.rename(columns={'index': 'song_index'}, inplace=True) song_codes['so_index_value'] = list(song_codes.index) user_codes['us_index_value'] = list(user_codes.index) small_set = pd.merge(small_set, song_codes, how='left') small_set = pd.merge(small_set, user_codes, how='left') # 将数据转换为稀疏矩阵形式 from scipy.sparse import coo_matrix mat_candidate = small_set[['us_index_value', 'so_index_value', 'fractional_play_count']] data_array = mat_candidate.fractional_play_count.values row_array = mat_candidate.us_index_value.values col_array = mat_candidate.so_index_value.values data_sparse = coo_matrix((data_array, (row_array, col_array)), dtype=float) # 使用SVD方法进行矩阵分解并进行推荐 from scipy.sparse import csc_matrix from scipy.sparse.linalg import svds import math as mt def compute_svd(urm, K): U, s, Vt = svds(urm, K) dim = (len(s), len(s)) S = np.zeros(dim, dtype=np.float32) for i in range(0, len(s)): S[i, i] = mt.sqrt(s[i]) U = csc_matrix(U, dtype=np.float32) S = csc_matrix(S, dtype=np.float32) Vt = csc_matrix(Vt, dtype=np.float32) return U, S, Vt def compute_estimated_matrix(urm, U, S, Vt, uTest, K, test): rightTerm = S * Vt max_recommendation = 250 estimatedRatings = np.zeros(shape=(MAX_UID, MAX_PID), dtype=np.float16) recomendRatings = np.zeros(shape=(MAX_UID, max_recommendation), dtype=np.float16) for userTest in uTest: prod = U[userTest, :] * rightTerm estimatedRatings[userTest, :] = prod.todense() recomendRatings[userTest, :] = (-estimatedRatings[userTest, :]).argsort()[:max_recommendation] return recomendRatings K = 50 urm = data_sparse MAX_PID = urm.shape[1] MAX_UID = urm.shape[0] U, S, Vt = compute_svd(urm, K) uTest = [4, 5, 6, 7, 8, 73, 23] # uTest=[1b5bb32767963cbc215d27a24fef1aa01e933025] uTest_recommended_items = compute_estimated_matrix(urm, U, S, Vt 继续将这段代码输出完整
最新发布
05-19

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值