数据集
MovieLens 1M Dataset
users.dat
UserID::Gender::Age::Occupation::Zip-code
movies.dat
MovieID::Title::Genres
ratings.dat
UserID::MovieID::Rating::Timestamp
RDD算子实现
1.年龄段在“18-24”的男性年轻人,最喜欢看哪10部电影
val root = work.getClass.getResource("/")
val movieRdd = sc.textFile(root + "movies.dat", 3).map(_.split("::"))
val ratingRdd = sc.textFile(root + "ratings.dat", 3).map(_.split("::"))
val userRdd = sc.textFile(root + "users.dat", 3).map(_.split("::"))
val youngMale = userRdd.filter(a => a(1) == "M" && a(2).toInt >= 18 && a(2).toInt <= 25).map(a => (a(0), 1))
val rating = ratingRdd.map(a => (a(0), a(1)))
val movieID = rating.join(youngMale).map(_._2).reduceByKey(_ + _).sortBy(_._2, false).take(10)
//false降序,不加false,升序
val movieNmae = movieRdd.map(x => (x(0), x(1))).collect().toMap
val ans = movieID.map(x => movieNmae.get(x._1))
ans foreach println
2.得分最高的10部电影
val ratmovie = ratingRdd.map(x => (x(