(1) 请编写Spark应用程序,该程序可以在本地文件系统中生成一个数据文件peopleage.txt,数据文件包含若干行(比如1000行,或者100万行等等)记录,每行记录只包含两列数据,第1列是序号,第2列是年龄。效果如下:
1 89
2 67
3 69
4 78
//代码文件GeneratePeopleAge.scala
import java.io.FileWriter
import java.io.File
import scala.util.Random
object GeneratePeopleAge{
def main(args:Array[String]){
val fileWriter = new FileWriter(new File("/usr/local/spark/mycode/exercise/peopleage/peopleage.txt"),false)
val rand = new Random()
for (i <- 1 to 1000){//这里是生成数据的行数
fileWriter.write(i+" "+rand.nextInt(100))
fileWriter.write(System.getProperty("line.separator"))
}
fileWriter.flush()
fileWriter.close()
}
}
(2) 请编写Spark应用程序,对本地文件系统中的数据文件peopleage.txt的数据进行处理,计算出所有人口的平均年龄。
//CountAvgAge.scala
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object CountAvgAge{
def main(args:Array[String]){
if (args.length<1){
println("Usage: CountAvgAge inputdatafile")
System.exit(1)
}
val conf = new SparkConf().setAppName("Count Average Age")
val sc = new SparkContext(conf)
val lines = sc.textFile(args(0),3)
val count = lines.count()
val totalAge = lines.map(line=>line.split(" ")(1)).map(t=>t.trim.toInt).collect().reduce((a,b)=>a+b)
println("Total Age is: "+totalAge+"; Number of People is:"+count)
val avgAge : Double = totalAge.toDouble / count.toDouble
println("Average Age is:"+avgAge)
}
}