Spark Shell 基本操作
进入spark shell
先jps判断hadoop和spark是否都启动。
jps
spark-shell --master spark://master:7077
案例:求TOP值
任务描述:
字段为:orderid,userid,payment,productid
数据目录:/usr/sort/file1.txt
/usr/sort/file2.txt
代码目录:/usr/code/topfive.scala
数据:
file1.txt
1,1768,50,155
2,1218,600,211
3,2239,788,242
4,3101,28,599
5,4899,290,129
6,3110,54,1201
7,4436,259,877
8,2369,7890,27
file2.txt
100,4287,226,233
101,6562,489,124
102,1124,33,17
103,3267,159,179
104,4569,57,125
105,1438,37,116
代码:
import org.apache.spark.HashPartitioner
val lines = sc.textFile("file:///usr/sort",2)
val score = lines.map(row=>row.split(",")(3))
val sortScore = score.map(n=>n.trim.toInt).sortBy(x=>x, false).take(5)
sortScore.foreach(println)
终端执行代码:
spark-shell -i </usr/code/topfive.scala