每次提交spark任务到yarn的时候,总会出现uploading resource(打包spark jars并上传)到hdfs上。
恶劣情况下,会在这里卡住很久。
17/01/13 17:21:47 INFO Client: Preparing resources for our AM container
17/01/13 17:21:47 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploadi
ng libraries under SPARK_HOME.
17/01/13 17:21:58 INFO Client: Uploading resource file:/tmp/spark-28ebde0d-c77a-4be3-8248-a6d3bcccc253/__spar
k_libs__7542776655448713545.zip -> hdfs://dipperCluster/user/hadoop/.sparkStaging/application_1484215273436_0
050/__spark_libs__7542776655448713545.zip
17/01/13 17:22:08 INFO Client: Uploading resource file:/tmp/spark-28ebde0d-c77a-4be3-8248-a6d3bcccc253/__spar
k_conf__8972755978315292177.zip -> hdfs://dipperCluster/user/hadoop/.sparkStaging/application_1484215273436_0
050/__spark_conf__.zip
解决:
在hdfs上创建目录:
hdfs dfs -mkdir /home/hadoop/spark_jars
上传spark的jars(spark1.6 只需要上传spark-assembly-1.6.0-SNAPSHOT-hadoop2.6.0.jar)
hdfs dfs -put /opt/spark/jars/* /home/hadoop/spark_jars/
在spark的conf的spark-default.conf
添加配置 spark.yarn.jars=hdfs://master:9000/opt/spark/jars/* /home/hadoop/spark_jars/
即可解决。不会出现这个问题。
再次启动时
Source and destination file systems are the same. Not copying hdfs://master:9000/home/hadoop/spark_jars/zookeeper-3.4.6.jar
之后快速开始提交任务,启动任务。