Spark商业案例与性能调优实战100课》第2课:商业案例之通过RDD实现分析大数据电影点评系统中电影流行度分析

Spark商业案例与性能调优实战100课》第2课:商业案例之通过RDD实现分析大数据电影点评系统中电影流行度分析



package com.dt.spark.cores

import org.apache.spark.{SparkConf, SparkContext}

object Movie_Users_Analyzer {
def main (args:Array[String]): Unit ={
    var masterUrl="local[4]"
    var dataPath="data/movielens/medium/"
   if(args.length>0){
     masterUrl=args(0)
        }else if (args.length>1) {
     dataPath =args(1)
   }
   val sc =new SparkContext(new SparkConf().setMaster(masterUrl).setAppName("Movie_Users_Analyzer"))
   val usersRDD =sc.textFile(dataPath+"users.dat")
  val moviessRDD =sc.textFile(dataPath+"movies.dat")
  val occupationsRDD =sc.textFile(dataPath+"occupation.dat")
  val ratingsRDD =sc.textFile(dataPath+"ratings.dat")

   val usersBasic =usersRDD.map(_.split("::")).map{user => (
     user(3),
     (user(0),user(1),user(2))
   )}

  val occupations =occupationsRDD.map(_.split("::")).map(job =>(job(0),job(1)))
  val userInformation=usersBasic.join(occupations)
  userInformation.cache()
 for (elem <- userInformation.collect()){
   println(elem)
 }

  val targetMoive=ratingsRDD.map(_.split("::")).map(x=>(x(0),x(1))).filter(_._2.equals("1193"))
  val targetUsers =userInformation.map(x=>(x._2._1._1,x._2))
  val userInformationForSpecificMovie =targetMoive.join(targetUsers)
  for (elem <- userInformationForSpecificMovie.collect()){
    println(elem)
  }

  //users.dat   UserID::Gender::Age::Occupation::Zip-code
  //ratings.dat  UserID::MovieID::Rating::Timestamp
  //Occupation    6:  "doctor/health care"
 // movies.dat  MovieID::Title::Genres

   val ratings=ratingsRDD.map(_.split("::")).map(x=>(x(0),x(1),x(2))).cache()
     ratings.map(x=>(x._2,(x._3.toInt,1)))
     .reduceByKey((x,y)=>(x._1+y._1,x._2+y._2)) // 总分,总人数
     .map(x => (x._2._1.toDouble / x._2._2 , x._1))
       .sortByKey(false)
      .take(10)
    .foreach(println)
  //观看人数最多的电影  //ratings.dat  UserID::MovieID::Rating::Timestamp
  ratings.map(x => (x._1,1)).reduceByKey(_+_).map(x=>(x._2,x._1)).sortByKey(false)
    .take(10).foreach(print)
}
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值