现象
机房断电后,HDP Hadoop各重要组件启动异常。
单独从Ambari界面启动Zookeeper报错,报错截图如下
解决
根据报错信息,定位到错误发现在ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.4.3.0-227 | tail -1这条命令,单独在节点上执行此命令,输出如下,
[root@namenode-1 hadoop]# ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.4.3.0-227 | tail -1
Traceback (most recent call last):
File "/usr/bin/hdp-select", line 382, in <module>
printVersions()
File "/usr/bin/hdp-select", line 239, in printVersions
result[tuple(map(int, versionRegex.split(f)))] = f
ValueError: invalid literal for int() with base 10: 'spark'
以上语句执行也报错,报错信息与Ambari界面输出一致。查看发现错误发生在 /usr/bin/hdp-select脚本第240行,主要代码如下,
# Print the installed packages
def printVersions():
result = {}
for f in os.listdir(root):
if f not in [".", "..", "current", "share", "lost+found"]:
result[tuple(map(int, versionRegex.split(f)))] = f
keys = result.keys()
keys.sort()
for k in keys:
print result[k]
通过在python脚本中添加打印信息打印f的值,输出如下,
[root@namenode-1 hadoop]# /usr/bin/hdp-select versions
share
2.4.3.0-227
current
spark-2.3.0-bin-hadoop2.7.tgz
Traceback (most recent call last):
File "/usr/bin/hdp-select", line 383, in <module>
printVersions()
File "/usr/bin/hdp-select", line 240, in printVersions
result[tuple(map(int, versionRegex.split(f)))] = f
ValueError: invalid literal for int() with base 10: 'spark'
根据以上信息,spark-2.3.0-bin-hadoop2.7.tgz这条输出嫌疑较大,通过find命令找到此压缩包在/usr/hdp下面,mv移到其他路径下再次执行不报错。Zookeeper现在可以正常启动,