在spark on yarn提交任务是出现org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory a://bi-aws-users/sbatheja/PRD/PriceEngineOutput already exists的原因及解决方案
16/07/07 11:27:47 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/07 11:27:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/07 11:27:47 INFO SparkContext: Successfully stopped SparkContext
16/07/07 11:27:47 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: **org.apache.hadoop.mapred.FileAlreadyExistsException:
Output directory a://bi-aws-users/sbatheja/PRD/PriceEngineOutput already exists)**
16/07/07 11:27:47 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/07/07 11:27:47 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/07/07 11:27:47 INFO AMRMClientImpl: Waiting for application to be successfully unregistered.
16/07/07 11:27:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/07/07 11:27:47 INFO ApplicationMaster: Deleting staging directory .sparkStaging/application_1467889642439_0001
16/07/07 11:27:47 INFO ShutdownHookManager: Shutdown hook called
16/07/07 11:27:47 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/hadoop/appcache/application_1467889642439_0001/spark-7f836950-a040-4216-9308-2bb4565c5649
原因:spark的版本没有对应
解决方法:
百度搜索spark-core对应的scala-library的依赖添加进pom.xml文件(前提是项目通过maven管理)
本次测试使用的spark是2.4.7版本对应的scala-library版本是2.12以上
所以在pom文件中添加以下依赖即可
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.11</version>
</dependency>
注意:对应好自己的spark版本!!!