实验名称
基于MLlib的商品推荐
实验目的
掌握基于Spark MLLib的协同过滤:
- 基于用户(User-Based)的协同过滤
- 基于物品(Item-Based)的协同过滤
- 基于ALS协同过滤
实验环境
- VMware Workstation
- Ubuntu 16.04
- Pycharm
- Pyspark
实验步骤
#-*-coding:utf-8
from pyspark import SparkContext
from pyspark.mllib.recommendation import ALS,Rating
#数据格式 用户id,商品id,评分
sc=SparkContext("local[2]","second spark app")
#读取数据
rawData=sc.textFile("/home/lxs/Downloads/ratingdata.txt")
print(rawData.first())
#分离数据
rawRating=rawData.map(lambda line:line.split("\t")[:3])
print(rawRating.first())
ratings=rawRating.map(lambda x:Rating(int(x[0]),int(x[1]),float(x[2])))
print(ratings.first())
#模型训练
model=ALS.train(ratings,50,10,0.01)
print(model.userFeatures().count())
print(model.productFeatures().count())
#商品预测
predictedRating=model.predict(789,123)
print(predictedRating)
#推荐10个商品
userId=789
K=10
topKRecs=model.recommendProducts(userId,K)
print(topKRecs)