如下,运用cloudera搭建集群启动yarn失败。
1、查找 cloudera-scm-server与cloudera-scm-agent日志文件 排错
tail -f /opt/cloudera-manager/cm-5.14.1/log/cloudera-scm-server/cloudera-scm-server.log
tail -f /opt/cloudera-manager/cm-5.14.1/log/cloudera-scm-agent/cloudera-scm-agent.log
在这里面并没有什么明显的错误信息
2、怀疑是cloudera集群中yarn的配置文件端口有问题
由于报错显示的是 jobhistory 文件,配置文件名一般为:mapred-site.xml
直接寻找: find / -name mapred-site.xml
发现可能是这几个文件:
/opt/cloudera-manager/cm-5.14.1/run/cloudera-scm-agent/process/24-yarn-JOBHISTORY/mapred-site.xml
/opt/cloudera-manager/cm-5.14.1/run/cloudera-scm-agent/process/23-yarn-RESOURCEMANAGER/mapred-site.xml
/opt/cloudera-manager/cm-5.14.1/run/cloudera-scm-agent/process/22-yarn-JOBHISTORY/mapred-site.xml
/opt/cloudera-manager/cm-5.14.1/run/cloudera-scm-agent/process/21-yarn-JOBHISTORY/mapred-site.xml
/opt/cloudera-manager/cm-5.14.1/run/cloudera-scm-agent/process/20-yarn-JOBHISTORY/mapred-site.xml
/opt/cloudera-manager/cm-5.14.1/run/cloudera-scm-agent/process/19-yarn-RESOURCEMANAGER/mapred-site.xml
/opt/cloudera-manager/cm-5.14.1/run/cloudera-scm-agent/process/18-yarn-JOBHISTORY/mapred-site.xml
/opt/cloudera-manager/cm-5.14.1/run/cloudera-scm-agent/process/17-yarn-NODEMANAGER/mapred-site.xml
具体为何是这些文件才是真正的配置文件,请参考:Cloudera Manager 之二 (架构)
点击 resume
发现按照******/process/id自增的方式多了mapred-site.xml 配置文件。
于是查看该配置。结果发现端口10020居然被占用了。
暂时先关闭该服务并再次点击 resume 试一下,居然成功了。
3、更新说明
上面解决的方式确实有点离谱,通过后面分析,如果启动yarn,会有对应的日志文件在/var/log中
于是:查看到这个文件:
vi /var/log/hadoop-mapreduce/hadoop-cmf-yarn-JOBHISTORY-kerberos.hadoop003.com.log.out
再查看里面的报错:
果真有,哈哈哈,
本文不能保证能给你解决该问题,但是提出了一些思路,供你使用。