开启集群
cd /export/servers/hadoop-2.7.4/sbin/
sh start-all.sh
cd /export/servers/spark/sbin
sh start-all.sh
处理上传hdfs问题
--------若上传不了--------
这里忽略---------------------------------------------
1、关掉安全模式
2、重新配置datanode,
链接:
https://blog.csdn.net/weixin_41998650/article/details/123961886?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522165555226016782395387498%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=165555226016782395387498&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~first_rank_ecpm_v1~rank_v31_ecpm-4-123961886-null-null.142^v17^control,157^v15^new_3&utm_term=%E5%BC%80%E5%90%AFdatanode&spm=1018.2226.3001.4187
1、新建一个文件夹:
mkdir file
2、下载文件:
cd file
wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
本文使用本地模式
解压:
unzip -j ml-100k.zip
file:///export/servers/spark/sbin/u.data(ml-100k的解压路径,路径自己建)
编写程序,训练模型
进入spark-shell,读取u.data数据文件,转换为rdd
cd /export/servers/spark/sbin
spark-shell --master local[2]
val dataRdd=sc.textFile("file:///export/servers/spark/sbin/u.data").map(_.split("\t").take(3))
输出第一行数据:
dataRdd.first()
导包
import org.apache.spark.mllib.recommendation.ALS
转换Fating格式数据
import org.apache.spark.mllib.recommendation.Rating
val ratings=dataRdd.map{case Array(user,movie,rating) => Rating(user.toInt,movie.toInt,rating.toDouble)}
ratings.first()
val model = ALS.train(ratings,50,10,0.01)
val predicteRating = model.predict(100,200)
为用户推荐多个电影
定义用户id:
val userId=100
定义推荐数量:
val num=10
val topRecoPro = model.recommendProducts(userId,num)
将电影id与电影名称映射
val moviesRdd=sc.textFile("file:///export/servers/spark/sbin/u.item")
val titles=moviesRdd.map(line => line.split("\\|").take(2)).map(array => (array(0).toInt,array(1))).collectAsMap()
打印结果:
topRecoPro.map(rating => (titles(rating.product),rating.rating)).foreach(println)