今天在spark1.0.1上面自己写了一个javawordcount 然后打成jar包 提交到yarn上面执行:
运行到最后可以在hadoop的stdout里边看到正常的结果输出
复制代码
但是在stderr里边出现
运行到最后可以在hadoop的stdout里边看到正常的结果输出
- hadoop: 1
- : 1
- hello: 3
- 2.2.0: 1
- world: 1
但是在stderr里边出现
- in 4.089 s
- 14/08/05 13:29:30 INFO DAGScheduler: looking for newly runnable stages
- 14/08/05 13:29:30 INFO DAGScheduler: running: Set()
- 14/08/05 13:29:30 INFO DAGScheduler: waiting: Set(Stage 0)
- 14/08/05 13:29:30 INFO DAGScheduler: failed: Set()
- 14/08/05 13:29:30 INFO DAGScheduler: Missing parents for Stage 0: List()
- 14/08/05 13:29:30 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[6] at reduceByKey at JavaWordCount.java:40), which is now runnable
- 14/08/05 13:29:30 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MapPartitionsRDD[6] at reduceByKey at JavaWordCount.java:40)
- 14/08/05 13:29:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
- 14/08/05 13:29:30 INFO TaskSetManager: Starting task 0.0:0 as TID 1 on executor 0: localhost (PROCESS_LOCAL)
- 14/08/05 13:29:30 INFO TaskSetManager: Serialized task 0.0:0 as 2278 bytes in 5 ms
- 14/08/05 13:29:30 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to spark@localhost:57231
- 14/08/05 13:29:30 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 127 bytes
- 14/08/05 13:29:30 INFO TaskSetManager: Finished TID 1 in 90 ms on localhost (progress: 1/1)
- 14/08/05 13:29:30 INFO DAGScheduler: Completed ResultTask(0, 0)
- 14/08/05 13:29:30 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
- 14/08/05 13:29:30 INFO DAGScheduler: Stage 0 (collect at JavaWordCount.java:47) finished in 0.094 s
- 14/08/05 13:29:30 INFO SparkContext: Job finished: collect at JavaWordCount.java:47, took 4.634113197 s
- 14/08/05 13:29:30 INFO SparkUI: Stopped Spark web UI at http://192.168.200.233:43733
- 14/08/05 13:29:30 INFO DAGScheduler: Stopping DAGScheduler