web任务启动后,代码开始调用sparkLauncher启动事先编译好的sparkjar,但是会发现作业一直不启动,一直在retry ResourceManager
13/12/14 20:12:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
13/12/14 20:12:07 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:08 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:09 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:10 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:11 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:12 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:13 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/12/14 20:12:14 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
原因:
在于我提交代码的机器是非yarn节点,导致在配置HADOOP_CONF_DIR时,里面的yarn配置文件什么都没有,或者说关键的yarn信息没有。
HashMap env = new HashMap();
//hadoop、spark环境变量读取
env.put("HADOOP_CONF_DIR", System.getenv().getOrDefault("HADOOP_CONF_DIR", "/etc/hadoop/conf"));
env.put("JAVA_HOME", System.getenv().getOrDefault("JAVA_HOME", "/usr/java/jdk1.8.0_181-cloudera"));
解决办法:
了解到原因后,需要我们在CDH的非yarn节点部署一个yarn的gateway,以用来及时获取yarn的参数改变,部署后在yarn的CDH界面上重新部署客户端就行了。
另外如果大家的客户端不是搭建时候考虑去布置的,也就是没有在CDH的控制下,需要及时将配置文件自己手动覆盖一下后来自己添加的非CDH控制的客户端节点。(注意可不光是yarn节点,最好是yarn的spark的Hadoop的hbase的hive的等等的都覆盖下,在额外搭建客户端节点时你们自己找的教程应该都有,这里不再介绍)