Spark On Yarn中spark.yarn.jar属性的使用

refer:

http://www.cnblogs.com/luogankun/p/4191796.html


今天在测试spark-sql运行在yarn上的过程中,无意间从日志中发现了一个问题:

spark-sql --master yarn
复制代码
14/12/29 15:23:17 INFO Client: Requesting a new application from cluster with 1 NodeManagers
14/12/29 15:23:17 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
14/12/29 15:23:17 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
14/12/29 15:23:17 INFO Client: Setting up container launch context for our AM
14/12/29 15:23:17 INFO Client: Preparing resources for our AM container
14/12/29 15:23:17 INFO Client: Uploading resource file:/home/spark/software/source/compile/deploy_spark/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar -> hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0093/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
14/12/29 15:23:18 INFO Client: Setting up the launch environment for our AM container
复制代码

再开启一个spark-sql命令行,从日志中再次发现:

复制代码
14/12/29 15:24:03 INFO Client: Requesting a new application from cluster with 1 NodeManagers
14/12/29 15:24:03 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
14/12/29 15:24:03 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
14/12/29 15:24:03 INFO Client: Setting up container launch context for our AM
14/12/29 15:24:03 INFO Client: Preparing resources for our AM container
14/12/29 15:24:03 INFO Client: Uploading resource file:/home/spark/software/source/compile/deploy_spark/assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar -> hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0094/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
14/12/29 15:24:05 INFO Client: Setting up the launch environment for our AM container
复制代码

然后查看HDFS上的文件:

hadoop fs -ls hdfs://hadoop000:8020/user/spark/.sparkStaging/
drwx------   - spark supergroup          0 2014-12-29 15:23 hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0093
drwx------   - spark supergroup          0 2014-12-29 15:24 hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0094

每个Application都会上传一个spark-assembly-x.x.x-SNAPSHOT-hadoopx.x.x-cdhx.x.x.jar的jar包,影响HDFS的性能以及占用HDFS的空间。

 

在Spark文档(http://spark.apache.org/docs/latest/running-on-yarn.html)中发现spark.yarn.jar属性,将spark-assembly-xxxxx.jar存放在hdfs://hadoop000:8020/spark_lib/下

在spark-defaults.conf添加属性配置:

spark.yarn.jar hdfs://hadoop000:8020/spark_lib/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar

再次启动spark-sql --master yarn观察日志:

复制代码
14/12/29 15:39:02 INFO Client: Requesting a new application from cluster with 1 NodeManagers
14/12/29 15:39:02 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
14/12/29 15:39:02 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
14/12/29 15:39:02 INFO Client: Setting up container launch context for our AM
14/12/29 15:39:02 INFO Client: Preparing resources for our AM container
14/12/29 15:39:02 INFO Client: Source and destination file systems are the same. Not copying hdfs://hadoop000:8020/spark_lib/spark-assembly-1.3.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar
14/12/29 15:39:02 INFO Client: Setting up the launch environment for our AM container
复制代码

观察HDFS上文件

hadoop fs -ls hdfs://hadoop000:8020/user/spark/.sparkStaging/application_1416381870014_0097

该Application对应的目录下没有spark-assembly-xxxxx.jar了,从而节省assembly包上传的过程以及HDFS空间占用。

 

我在测试过程中遇到了类似如下的错误:

Application application_xxxxxxxxx_yyyy failed 2 times due to AM Container for application_xxxxxxxxx_yyyy 

exited with exitCode: -1000 due to: java.io.FileNotFoundException: File /tmp/hadoop-spark/nm-local-dir/filecache does not exist

在/tmp/hadoop-spark/nm-local-dir路径下创建filecache文件夹即可解决报错问题。


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值