SparkALS.py
这个python文件是关于spark的ALS推荐算法,Spark MLlib支持的ALS推荐算法式机器学习的协同过滤式推荐算法。机器学习的协同过滤式推荐算法通过观察所有用户给产品的评分来推断每个用户的喜好,并向用户推荐合适的产品。
下面是这个python文件用到的第三方库,MovieLens是自己写的python文件,主要用到的第三方库是pyspark。
from pyspark.sql import SparkSession
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row
from time import time
from MovieLens import MovieLens
通过 SparkSession.builder 来创建一个 SparkSession 的实例。
if __name__ == "__main__":
spark = SparkSession\
.builder\
.appName("ALSExample")\
.getOrCreate()
将数据集导入
lines1 = spark.read.option("header", "true").csv("C:\\Users\\12753\Desktop\\13栋525\\MovieRecommendationSystem-master\\ratings.csv").rdd
lines2 = spark.read.option("header", "true").csv("C:\\Users\\12753\Desktop\\13栋525\\MovieRecommendationSystem-master\\ratings_Becky.csv").rdd
将数据集打上标签,分成训练集:测试集 = 4:1的样式
ratingsRDD1 = lines1.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]),
rating=float(p[2]), timestamp=int(p[3])))
ratingsRDD2 = lines2.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]),
rating=float(p[2]), timestamp=int(p[3])))
ratings1 = spark.createDataFrame(ratingsRDD1)
ratings2 = spark.createDataFrame(ratingsRDD2)
ratings = ratings1.union(ratings2)
(training, test) = ratings.randomSplit([0.8, 0.2])
调用ALS库作为模型,训练数据并预测
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating",
coldStartStrategy="drop")
model = als.fit(training)
predictions = model.transform(test)
计算RMSE值,并计算模型训练的时间,再将spark服务关闭
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))
t0 = time()
userRecs = model.recommendForAllUsers(15)
tt = time() - t0
print("Model trained in %s seconds" % round(tt, 3))
testUser0 = userRecs.filter(userRecs['userId'] == 0).collect()
spark.stop()
将推荐的电影名打印出来
ml = MovieLens()
ml.loadMovieLensDataset()
for row in testUser0:
number = 0
for rec in row.recommendations:
number += 1
print(number, " - ", ml.getMovieName(rec.movieId))