记一次springboot调用sparkLauncher连接不上yarn ResourceManager的问题

最新推荐文章于 2024-03-20 11:09:56 发布

妖果yaoyao

最新推荐文章于 2024-03-20 11:09:56 发布

阅读量2.8k

点赞数 1

分类专栏： spark yarn

本文链接：https://blog.csdn.net/weixin_39768191/article/details/116261087

版权

spark 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

yarn

2 篇文章 0 订阅

订阅专栏

当尝试通过代码调用SparkLauncher启动Spark作业时，遇到作业不启动并持续重试连接ResourceManager的问题。原因是提交代码的机器未配置正确的HADOOP_CONF_DIR，导致无法找到YARN的相关配置。解决方法是在非YARN节点上部署YARN的gateway，并在CDH界面重新部署客户端。若客户端非CDH控制，需手动覆盖相关配置文件，确保所有必要的配置信息完整。

摘要由CSDN通过智能技术生成

web任务启动后，代码开始调用sparkLauncher启动事先编译好的sparkjar，但是会发现作业一直不启动，一直在retry ResourceManager
13/12/14 20:12:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
13/12/14 20:12:07 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:08 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:09 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:10 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:11 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:12 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:13 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:14 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

原因：

在于我提交代码的机器是非yarn节点，导致在配置HADOOP_CONF_DIR时，里面的yarn配置文件什么都没有，或者说关键的yarn信息没有。

HashMap env = new HashMap();
//hadoop、spark环境变量读取
env.put("HADOOP_CONF_DIR", System.getenv().getOrDefault("HADOOP_CONF_DIR", "/etc/hadoop/conf"));
env.put("JAVA_HOME", System.getenv().getOrDefault("JAVA_HOME", "/usr/java/jdk1.8.0_181-cloudera"));

解决办法：

了解到原因后，需要我们在CDH的非yarn节点部署一个yarn的gateway，以用来及时获取yarn的参数改变，部署后在yarn的CDH界面上重新部署客户端就行了。

另外如果大家的客户端不是搭建时候考虑去布置的，也就是没有在CDH的控制下，需要及时将配置文件自己手动覆盖一下后来自己添加的非CDH控制的客户端节点。（注意可不光是yarn节点，最好是yarn的spark的Hadoop的hbase的hive的等等的都覆盖下，在额外搭建客户端节点时你们自己找的教程应该都有，这里不再介绍）