一年多没有搞HBase了,回想前年和营神一起战斗的日子,~~~。今天线上遇到下面一个问题:
hbase(main):002:0> get 'mynamespace:user_basic_info','BAC3510A922CF026500874EA3975E123'
COLUMN CELL
ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region mynamespace:user_basic_info,BA5968E36ADB91CE1EA37D44267F5865,1489326561674.0250284baa6119d676821e86cfaa29f4. is not online on *******,60020,1491385979553
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2922)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1053)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2006)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
其他region 都是正常的,重启regionserver 后依然报同样的错误。
首先检查这张表是否存储一致性问题
hbase hbck -details table
发现的确出现了2个不一致的地方
2 inconsistencies detected.
既然不一致,咱就尝试修复一下:
hbase hbck -repair table
这个功能要管理权限,使用慎重!修复完了以后结果如下
Summary:
Table hbase:meta is okay.
Number of regions: 1
Deployed on: ctum2f0602005.idc.wanda-group.net,60020,1482504754412
Table idctag:user_basic_info is okay.
Number of regions: 124
0 inconsistencies detected.
Status: OK
测试一下是否修复:
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
17/04/06 11:10:15 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0-cdh5.7.1, rUnknown, Wed Jun 1 16:27:04 PDT 2016
hbase(main):001:0> get 'mynamespace:user_basic_info','BAC3510A922CF026500874EA3975E123'
COLUMN CELL
index:chineseName_encrypt timestamp=1489324693470, value=950887757EDFFFDE26E9961E8998591A
index:city timestamp=1489324693470, value=\xE9\x95\x87\xE6\xB1\x9F
index:ffan_enrollment_time timestamp=1489324693470, value=2016-10-29 08:54:13
index:from_*** timestamp=1489324693470, value=0
index:from_child timestamp=1489324693470, value=0
index:from_** timestamp=1489324693470, value=0
index:from_** timestamp=1489324693470, value=1
index:from_*** timestamp=1489324693470, value=0
index:from_*** timestamp=1489324693470, value=0
index:from_*** timestamp=1489324693470, value=0
index:from_theme timestamp=1489324693470, value=0
index:from_travel timestamp=1489324693470, value=0
index:mobile_encrypt timestamp=1489324693470, value=BAC3510A922CF026500874EA3975E123
index:*** timestamp=1489324693470, value=\xE6\xB1\x9F\xE8\x8B\x8F
index:*** timestamp=1489324693470, value=\xE6\xB1\x9F\xE8\x8B\x8F
15 row(s) in 0.3020 seconds
测试通过
如果hbase fsck 过程提示文件有损坏,可以使用hdfs 如下命名check region对应的文件
hdfs fsck /hbase/data/mynamespace/tablename/0250284baa6119d676821e86cfaa29f4/index/f142db2e1d844d48858ee2d919299ca0 -locations -blocks -files
出现的这样原因我后续会分析。