Spark累加器使用
使用spark累加器,解决视频平均播放数计算,以及视频播放数平方和平均值
val totalTimes=sc.accumulator(0l)
val totalVids=sc.accumulator(0)
val totalPow2Times=sc.accumulator(0d)
val timesFile=sc.textFile("/user/zhenyuan.yu/DumpIdTimesJob_tmp_out")
timesFile.foreach(f=>{
val vid_times=f.split("\t")
var times=vid_times(1).toInt
if(times>10000000)times=10000000
if(times>500){
val times_d=times.toDouble
totalTimes+=times
totalPow2Times+=Math.pow(times_d,2)
totalVids+=1
}
}
)
val avgTimes=totalTimes.value/totalVids.value
val avgPow2Times=totalPow2Times.value/totalVids.value
println("totalTimes:"+totalTimes+",totalVids:"+totalVids+",totalPow2Times:"+totalPow2Times)
println("avgTimes:"+avgTimes+",avgPow2Times:"+avgPow2Times)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
计算视频播放数每个区间占用比例
val totalVids=sc.accumulator(0)
val timesFile=sc.textFile("/user/zhenyuan.yu/DumpIdTimesJob_tmp_out")
val keysList=List(100, 500, 1000, 2000, 5000, 10000, 20000, 40000, 80000, 100000, 200000, 300000, 500000, 1000000, 2000000, 5000000, 10000000)
val timesRDD=timesFile.map(f=>{
val vid_times=f.split("\t")
var times=vid_times(1).toInt
times
}).filter(_>50).map(times=>{
totalVids+=1
var key=0
var end=false
var i=0
var size=keysList.size
while(i<size && !end){
key=keysList(i)
if(times<key){
end=true
}
i+=1
}
(key,1)
}).reduceByKey(_+_)
val rdd=timesRDD.collect()
println("totalVid:"+totalVids)
for(i<-0 to rdd.size-1){
val times_times=rdd(i)
val percent=times_times._2.toFloat/totalVids.value
println("times:<"+times_times._1+",vid_num:"+times_times._2+",percent:"+percent)
}