hbase连接超时
问题描述:
WARN TaskSetManager: Lost task 0.0 in stage 21.0 (TID 1832, persp-18.persp.net, executor 4): UnknownReason
20/10/28 15:52:36 WARN TaskSetManager: Lost task 7.0 in stage 21.0 (TID 1839, persp-15.persp.net, executor 1): org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Wed Oct 28 15:52:05 CST 2020, null, java.net.SocketTimeoutException: callTimeout=120000, callDuration=120106: Call to persp-38.persp.net/172.18.0.38:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=16731, waitTime=120003, rpcTimeout=120000 row '99D123123,_' on table 'table_name' at region=table_name,99D123123,A3,1,1598243593877.b681a73d152504748568e8e603f7d1b5., hostname=persp-38.persp.net,16020,1603173609290, seqNum=788127
解决方案:
- hbase shell scan 报错中指定的rowkey
ERROR: Call id=246, waitTime=60007, rpcTimetout=60000
- 查看集群节点状态,发现datanode dead 3台,重启Datanode
- 重启Datanode后,发现hbase访问仍然异常,评估后重启hbase