Pyspark 推荐系统


from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row

lines = spark.read.text("data/mllib/als/sample_movielens_ratings.txt").rdd
parts = lines.map(lambda row: row.value.split("::"))
ratingsRDD = parts.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]),
                                     rating=float(p[2]), timestamp=long(p[3])))
ratings = spark.createDataFrame(ratingsRDD)
(training, test) = ratings.randomSplit([0.8, 0.2])

# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating")
model = als.fit(training)

# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
                                predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

  • 3
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
PySpark中的推荐系统模块是pyspark.ml.recommendation。这个模块提供了使用ALS(交替最小二乘)算法的推荐系统功能。你可以在官方文档中找到更多关于这个模块的信息,链接是:api/python/pyspark.ml.html#module-pyspark.ml.recommendation。\[1\] 要在终端中使用PySpark推荐系统模块,你可以按照以下步骤准备数据: 1. 进入到你的Python工作目录,可以使用命令:cd ~/pythonwork/ipynotebook。 2. 输入以下命令来启动PySpark并设置相关参数: PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_IR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client pyspark。 3. 准备好你的数据。\[2\] 如果你想在PySpark中使用推荐系统模块进行筛选操作,你可以使用filter函数来筛选出包含特定关键词的行。下面是一个示例代码: ``` lines = sc.parallelize(\['Spark is very fast', 'My name is Li Lei'\]) # 筛选出含有“Spark”的行,操作为并行 linesWithSpark = lines.filter(lambda line: "Spark" in line) # 每行并行打印 linesWithSpark.foreach(print) # 输出结果:Spark is very fast ``` 这段代码会筛选出包含"Spark"关键词的行,并将结果打印出来。\[3\] #### 引用[.reference_title] - *1* [PySpark-推荐系统-RecommenderSystem](https://blog.csdn.net/geek6/article/details/104274739)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [基于Python Spark的推荐系统](https://blog.csdn.net/weixin_40170902/article/details/82585607)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [PySpark基本入门(附python代码示例)](https://blog.csdn.net/weixin_54707168/article/details/122757247)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值