现象描述:
生产环境老离线集群(CDH5.2)的一台impalaD节点服务异常,CM页面显示该节点的服务并无异常。
然而无论是客户端通过jdbc/odbc方式还是直接在该impala节点通过impala-shell来连接该节点都会报错(该集群impala未使用hiveserver2+haproxy+keepalive方案,客户端直连impalad节点):
hostname:/tmp# impala-shell
StartingImpala Shell without Kerberos authentication
Errorconnecting: TTransportException, Could not connect to hostname:21000
Welcome tothe Impala shell. Press TAB twice to see a list of available commands.
Copyright(c) 2012 Cloudera, Inc. All rights reserved.
(Shell buildversion: Impala Shell v2.0.1-cdh5 (cc09df0) built on Wed Nov 19 10:57:34 PST2014)
查看该节点,21000端口未被监听
hostname:/tmp# netstat -anp|grep -i 21000
由此判断,该节点impalaD服务异常,实际上启动失败,然而CM未探测到该异常。
问题排查:
1、分析该节点的impalaD日志,可知ImpalaD卡在了注册Statestore上:
I0312 10:10:56.865545 1