Spark小练习——求各科老师最受欢迎的TopN
【注】本文参考自小牛学堂学习视频
Spark小练习——求各科老师最受欢迎的TopN
数据格式:http://bigdata.edu360.cn/laozhang
1.数据切分
val func=(line:String)=>{
val index=line.lastIndexOf("/")
val teacher=line.substring(index+1)
val httpHost=line.substring(0,index)
val subject=new URL(httpHost).getHost.split("[.]")(0)
// (subject,teacher)
//(teacher,1)
}
2.逻辑计算
2.1求所有科目中最受欢迎的老师topN
//拿到数据源
val lines=sc.textFile(path)
val teacherAndOne=lines.map(func)
val reduced=teacherAndOne.reduceByKey(_+_)
val sorted=reduced.sortBy(_._2,false)
val result=sorted.top(topN))
2.2求各科最受欢迎老师的topN