Recommendation System

沐风凛

已于 2022-04-01 00:20:41 修改

阅读量309

点赞数

分类专栏： MIT Blended Learning - 日志与思考文章标签：推荐算法矩阵

于 2022-04-01 00:11:47 首次发布

本文链接：https://blog.csdn.net/u011295716/article/details/123887251

版权

MIT Blended Learning - 日志与思考专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Recommendation System

Dataset

In this project, I use Amazon Review Data 2018 dataset.

For now(2022-03-31), I only download the dataset Office_Products.csv and try to deal it with NMF. So all the ideas below are base on that.

Office_Products.csv (ratings only)

there are 1,048,575 rows.

Including 11210 products and 799315 users.

The sparsity is 0.00011702426536352439.

Sparse Matrix

In order to load such large matrix in the memory, I need to use sparse matrix to save the matrix.
from scipy.sparse import csc_matrix

For this dataset, CSC Sparse Matrix is very suitable. The original rows in the csv can be mapped to the CSC matrix space directly.

Know more about Sparse matrix

Use NMF

I use scikit-learn (version 1.0.2) from sklearn.decomposition import NMF

and specific parameters model = NMF(n_components=2, init='random', random_state=0, verbose=True)

Because of n_components=2, when I output model.reconstruction_err_, the number is 4556.902650495943. The whole process took 119.2437425 seconds.

I tried to change the parameters:

n_components	beta-divergence	time consuming(secs)
2	4556	119
20	?	?

About the Result

Get the transformed data by

model = NMF(n_components=20, init='random', random_state=0, verbose=True)
W = model.fit_transform(csc)

And factorization matrix

H = model.components_

To show clearly, I use a simple sample to implement this process.

This simple matrix is just like:

    uA  uB  uC  uD  uE  uF  uG  uH  uI  uJ  uK  uL  uM  uN  uO
iA   5   5   3   0   5   5   4   3   2   1   4   1   3   4   5
iB   5   0   4   0   4   4   3   2   1   2   4   4   3   4   0
iC   0   3   0   5   4   5   0   4   4   5   3   0   0   0   0
iD   5   4   3   3   5   5   0   1   1   3   4   5   0   2   4
iE   5   4   3   3   5   5   3   3   3   4   5   0   5   2   4
iF   5   4   2   2   0   5   3   3   3   4   4   4   5   2   5
iG   5   4   3   3   2   0   0   0   0   0   0   0   2   1   0
iH   5   4   3   3   2   0   0   0   0   0   0   0   1   0   1
iI   5   4   3   3   1   0   0   0   0   0   0   0   0   2   2
iJ   5   4   3   3   1   0   0   0   0   0   0   0   0   1   1

The index is item A to J, and the column is user A to O, and the value is the user’s rating of the item.

Let’s see what happen in the W and H.

[[0.81240799 0.71153396 0.47062388 0.43807017 1.39456425 2.24323719
  1.02417204 1.25356481 1.10517661 1.47624595 1.84626347 0.97437242
  1.14921406 0.8159644  1.14200028]
 [2.23910382 1.70186882 1.34300272 1.09192602 0.68045441 0.
  0.0542231  0.         0.         0.         0.04426552 0.12260418
  0.34109613 0.51642843 0.6157604 ]]

[[2.20401687 1.53852775]
 [1.9083879  0.83214869]
 [1.95596132 0.        ]
 [1.87637018 1.65573674]
 [2.48959328 1.41632377]
 [2.38108536 1.08460665]
 [0.         2.29342959]
 [0.         2.27353353]
 [0.         2.32513876]
 [0.         2.23196277]]

Graphic (generated by H)
the distribution of items

It is obvious that the items are divided into two piles.

This is because we set n_components to 2 at the beginning

Get Fast

zero-masked not try yet…
…

Use Timestamp?

沐风凛

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Recommendation System

Recommendation SystemDatasetIn this project, I use Amazon Review Data 2018 dataset.For now(2022-03-31), I only download the dataset Office_Products.csv and try to deal it with NMF. So all the ideas below are base on that.Office_Products.csv (ratings on
复制链接

扫一扫

专栏目录