原标题:python-recsys:一款实现推荐系统的python库
本资源由 伯乐在线- beyondwu整理,您也想贡献一份力量?欢迎加入我们 »
python-recsys是一个用来实现推荐系统的python库。
安装
依赖项
python-recsys构建于Divisi2(基于语义网络的常识推理库)之上,使用了csc-pysparse(稀疏矩阵计算库),而Divisi2依赖于NumPy和Networkx库。另外python-recsys也依赖于SciPy库。
安装依赖库过程如下(以Ubuntu为例):
Shell
1
2
3
4
5
6
7
8
sudo apt-getinstall python-scipy python-numpy
sudo apt-getinstall python-pip
sudo pip install csc-pysparse networkx divisi2
# If you don't have pip installed then do:
# sudo easy_install csc-pysparse
# sudo easy_install networkx
# sudo easy_install divisi2
先从github上下载安装文件,再安装python-recsys:
Shell
1
2
3
tar xvfz python-recsys.tar.gz
cdpython-recsys
sudo python setup.pyinstall
示例
加载Movielens数据集:
Python
1
2
3
4
5
fromrecsys.algorithm.factorize importSVD
svd=SVD()
svd.load_data(filename='./data/movielens/ratings.dat',
sep='::',
format={'col':0,'row':1,'value':2,'ids':int})
进行奇异值分解 (SVD), M=U Sigma V^t:
Python
1
2
3
4
5
6
7
k=100
svd.compute(k=k,
min_values=10,
pre_normalize=None,
mean_center=True,
post_normalize=True,
savefile='/tmp/movielens')
得到两部电影的相似性:
Python
1
2
3
4
5
ITEMID1=1 # Toy Story (1995)
ITEMID2=2355# A bug's life (1998)
svd.similarity(ITEMID1,ITEMID2)
# 0.67706936677315799
获得和电影Toy Story相似的电影:
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
svd.similar(ITEMID1)
# Returns:
[(1, 0.99999999999999978),# Toy Story
(3114,0.87060391051018071),# Toy Story 2
(2355,0.67706936677315799),# A bug's life
(588, 0.5807351496754426), # Aladdin
(595, 0.46031829709743477),# Beauty and the Beast
(1907,0.44589398718134365),# Mulan
(364, 0.42908159895574161),# The Lion King
(2081,0.42566581277820803),# The Little Mermaid
(3396,0.42474056361935913),# The Muppet Movie
(2761,0.40439361857585354)]# The Iron Giant
预测一个用户 (USERID) 将给一部电影 (ITEMID)的打分:
Python
1
2
3
4
5
6
7
8
9
10
MIN_RATING=0.0
MAX_RATING=5.0
ITEMID=1
USERID=1
svd.predict(ITEMID,USERID,MIN_RATING,MAX_RATING)
# Predicted value 5.0
svd.get_matrix().value(ITEMID,USERID)
# Real value 5.0
推荐 (没被用户打过分的) 电影给用户:
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
svd.recommend(USERID,is_row=False)#cols are users and rows are items, thus we set is_row=False
# Returns:
[(2905,5.2133848204673416),# Shaggy D.A., The
(318, 5.2052108435956033),# Shawshank Redemption, The
(2019,5.1037438278755474),# Seven Samurai (The Magnificent Seven)
(1178,5.0962756861447023),# Paths of Glory (1957)
(904, 5.0771405690055724),# Rear Window (1954)
(1250,5.0744156653222436),# Bridge on the River Kwai, The
(858, 5.0650911066862907),# Godfather, The
(922, 5.0605327279819408),# Sunset Blvd.
(1198,5.0554543765500419),# Raiders of the Lost Ark
(1148,5.0548789542105332)]# Wrong Trousers, The
哪些用户应该会看Toy Story (哪些没给Toy Story打过分的用户将给它一个高的打分?)?
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
svd.recommend(ITEMID)
# Returns:
[(283, 5.716264440514446),
(3604,5.6471765418323141),
(5056,5.6218800339214496),
(446, 5.5707524860615738),
(3902,5.5494529168484652),
(4634,5.51643364021289),
(3324,5.5138903299082802),
(4801,5.4947999354188548),
(1131,5.4941438045650068),
(2339,5.4916048051511659)]
文档
从doc/source目录创建HTML文档:
1
2
cd doc
make html
HTML 将被创建在下面路径中:
1
doc/build/html/index.html
开源地址:https://github.com/ocelma/python-recsys返回搜狐,查看更多
责任编辑: