作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 转载请注明出处
用spark来快速计算分组的平均值,写法很便捷,话不多说上代码
object ColumnValueAvg extendsApp {/*** ID,Name,ADDRESS,AGE
* 001,zhangsan,chaoyang,20
* 002,zhangsa,chaoyang,27
* 003,zhangjie,chaoyang,35
* 004,lisi,haidian,24
* 005,lier,haidian,40
* 006,wangwu,chaoyang,90
* 007,wangchao,haidian,80*/val conf= new SparkConf().setAppName("test column value sum and avg").setMaster("local[1]")
val sc= newSparkContext(conf)
val textRdd= sc.textFile(args(0))//be careful the toInt here is necessary ,if no cast ,then it will be age string append
val addressAgeMap = textRdd.map(x => (x.split(",")(2), x.split(",")(3).toInt))
val sumAgeResult= addressAgeMap.reduceByKey(_ +_).collect().foreach(println)
val avgAgeResult=addressAg