用上俩篇写的spark 程序提交到spark 做运行测试,分别以俩种方式进行提交(yarn-cluster)(yarn-client)
1>将编写的spark程序打成jar包
2>将打好的jar包上传(包中添加了spark依赖)
3>上传数据文件到hdfs
hadoop fs -put /data/platform.txt /data/
查看上传的数据
[root@sp1 ~]# hadoop fs -lsr /data
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r-- 3 hdfs supergroup 3721651813 2016-12-21 07:21 /data/platform.txt
4> yarn-cluster
yarn-cluster模式
spark-submit --class com.hadoop.usercounter.PlatFormInfoCounter \
--master spark://192.168.54.11:7077 --executor-memory 5G --total-executor-cores 2 /data/sparksql-exercise.jar
结果:
16/12/22 02:16:14 INFO storage.ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
16/12/22 02:16:14 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/12/22 02:16:14 WARN spark.SparkContext: Requesting executors is only supported in coarse-grained mode
16/12/22 02:16:14 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 228). 2246 bytes result sent to driver
16/12/22 02:16:14 INFO scheduler.DAGScheduler: ResultStage 2 (show at PlatFormInfoCounter.scala:41) finished in 0.060 s
16/12/22 02:16:14 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 228) in 59 ms on localhost (executor driver) (1/1)
16/12/22 02:16:14 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
16/12/22 02:16:14 INFO scheduler.DAGScheduler: Job 0 finished: show at PlatFormInfoCounter.scala:41, took 382.254426 s
+-----+---------+------+----+
| name|phoneType|clicks| _c4|
+-----+---------+------+----+
|Role0| Apple| 8| 81|
|Role0| Apple| 0|1368|
|Role0| Apple| 4| 100|
|Role0| Apple| 5| 95|
|Role0| Apple| 9| 103|
|Role0| Apple| 18| 105|
|Role0| Huawei| 4| 88|
|Role0| Huawei| 5| 87|
|Role0| Huawei| 18| 100|
|Role0| Huawei| 19| 90|
+-----+---------+------+----+
16/12/22 02:16:14 WARN spark.SparkContext: Requesting executors is only supported in coarse-grained mode
16/12/22 02:16:14 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.54.11:4040
16/12/22 02:16:14 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
5>yarn-client
yarn-client模式
spark-submit --class com.hadoop.usercounter.PlatFormInfoCounter \
--deploy-mode client /data/sparksql-exercise.jar
结果:
+-----+---------+------+----+
| name|phoneType|clicks| _c4|
+-----+---------+------+----+
|Role0| Apple| 8| 81|
|Role0| Apple| 0|1368|
|Role0| Apple| 4| 100|
|Role0| Apple| 5| 95|
|Role0| Apple| 9| 103|
|Role0| Apple| 18| 105|
|Role0| Huawei| 4| 88|
|Role0| Huawei| 5| 87|
|Role0| Huawei| 18| 100|
|Role0| Huawei| 19| 90|
+-----+---------+------+----+
结果一样,但是yarn-client 要比yarn cluster 快一些,时间少2分钟