HBase有关ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
文章目录
- HBase有关ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
- 写在前面
- 问题发现
- 实验场景
- 在这里插入图片描述
- IDEA程序执行报错如下
- HBase Shell交互端报错展示
- 解决方法
- 测试Error原因
- 解决方案
- 彻底删除HBase数据
- 卸载HBase
- 参考资料
写在前面
- Linux版本:
CentOS7.5
- Hadoop版本:
Hadoop-3.1.3
- ZooKeeper版本:
ZooKeeper3.5.7
- HBase版本:
HBase-2.0.5
- 集群:完全分布式(三台节点:hdp01,hdp02,hdp03)
问题发现
实验场景
模拟数据集导入到HBase,数据集为100w,但是由于磁盘空间不足,导致集群中的第三台节点hdp03的HBase被退出,hdp03节点被关闭,模拟数据失败。。。。。
注意:hdp03因空间不足而被关机,重新启动并开启相应的服务,zk和hbase的进程都是完整的。这一点在下文中的「测试Error原因」即可体现。
IDEA程序执行报错如下
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 20 actions: ConnectException: 20 times, servers with issues: hdp03,16020,1677226920150
at org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54)
at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1226)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:455)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:553)
at cn.whybigdata.dynamic_rule.datagen.UserProfileDataGen.main(UserProfileDataGen.java:59)
Process finished with exit code 1
这就是因为磁盘空间不足导致的
HBase Shell交互端报错展示
查看HBase中的所有表
–> 出现错误:ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
hbase(main):001:0> list
TABLE
ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
at org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2932)
at org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:1084)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
List all user tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
hbase> list
hbase> list 'abc.*'
hbase> list 'ns:abc.*'
hbase> list 'ns:.*'
Took 8.9350 seconds
解决方法
- 进入到hadoop安装目录下的
bin
目录,执行以下命令
[whybigdata@hdp01 hbase-2.0.5]$ cd ../hadoop-3.1.3/bin/
[whybigdata@hdp01 bin]$ pwd
/opt/apps/hadoop-3.1.3/bin
- 移除hadoop的安全模式
[whybigdata@hdp01 bin]$ ./hadoop dfsadmin -safemode leave
WARNING: Use of this script to execute dfsadmin is deprecated.
WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.
Safe mode is OFF
关闭Hadoop的安全模式之后,就可以list出HBase的表了,但是依旧不可以扫描
scan
或者查询表数据量等操作
- 其他操作依旧不可行
hbase(main):002:0> count 'user_profile'
ERROR: org.apache.hadoop.hbase.NotServingRegionException: user_profile,,1677228015439.69f3f6a477f90bdc138e31f08ee909d8. is not online on hdp03,16020,1677229361798
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3272)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3249)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2948)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3285)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Count the number of rows in a table. Return value is the number of rows.
This operation may take a LONG time (Run '$HADOOP_HOME/bin/hadoop jar
hbase.jar rowcount' to run a counting mapreduce job). Current count is shown
every 1000 rows by default. Count interval may be optionally specified. Scan
caching is enabled on count scans by default. Default cache size is 10 rows.
If your rows are small in size, you may want to increase this
parameter. Examples:
hbase> count 'ns1:t1'
hbase> count 't1'
hbase> count 't1', INTERVAL => 100000
hbase> count 't1', CACHE => 1000
hbase> count 't1', INTERVAL => 10, CACHE => 1000
hbase> count 't1', FILTER => "
(QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"
hbase> count 't1', COLUMNS => ['c1', 'c2'], STARTROW => 'abc', STOPROW => 'xyz'
The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding commands would be:
hbase> t.count
hbase> t.count INTERVAL => 100000
hbase> t.count CACHE => 1000
hbase> t.count INTERVAL => 10, CACHE => 1000
hbase> t.count FILTER => "
(QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"
hbase> t.count COLUMNS => ['c1', 'c2'], STARTROW => 'abc', STOPROW => 'xyz'
Took 8.8512 seconds
扫描表也是出现错误
ERROR: org.apache.hadoop.hbase.NotServingRegionException:
hbase(main):003:0> scan 'user_profile',{LIMIT => 10}
ROW COLUMN+CELL
ERROR: org.apache.hadoop.hbase.NotServingRegionException: user_profile,,1677228015439.69f3f6a477f90bdc138e31f08ee909d8. is not online on hdp03,16020,1677229361798
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3272)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3249)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2948)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3285)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Scan a table; pass table name and optionally a dictionary of scanner
specifications. Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, ROWPREFIXFILTER, TIMESTAMP,
MAXLENGTH, COLUMNS, CACHE, RAW, VERSIONS, ALL_METRICS, METRICS,
REGION_REPLICA_ID, ISOLATION_LEVEL, READ_TYPE, ALLOW_PARTIAL_RESULTS,
BATCH or MAX_RESULT_SIZE
If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family'.
The filter can be specified in two ways:
1. Using a filterString - more information on this is available in the
Filter Language document attached to the HBASE-4176 JIRA
2. Using the entire package name of the filter.
If you wish to see metrics regarding the execution of the scan, the
ALL_METRICS boolean should be set to true. Alternatively, if you would
prefer to see only a subset of the metrics, the METRICS array can be
defined to include the names of only the metrics you care about.
Some examples:
hbase> scan 'hbase:meta'
hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804000, 1303668904000]}
hbase> scan 't1', {REVERSED => true}
hbase> scan 't1', {ALL_METRICS => true}
hbase> scan 't1', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']}
hbase> scan 't1', {ROWPREFIXFILTER => 'row2', FILTER => "
(QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"}
hbase> scan 't1', {FILTER =>
org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
hbase> scan 't1', {CONSISTENCY => 'TIMELINE'}
hbase> scan 't1', {ISOLATION_LEVEL => 'READ_UNCOMMITTED'}
hbase> scan 't1', {MAX_RESULT_SIZE => 123456}
For setting the Operation Attributes
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false). By
default it is enabled. Examples:
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default. Example:
hbase> scan 't1', {RAW => true, VERSIONS => 10}
There is yet another option -- READ_TYPE -- which instructs the scanner to
use a specific read type. Example:
hbase> scan 't1', {READ_TYPE => 'PREAD'}
Besides the default 'toStringBinary' format, 'scan' supports custom formatting
by column. A user can define a FORMATTER by adding it to the column name in
the scan specification. The FORMATTER can be stipulated:
1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt',
'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }
Note that you can specify a FORMATTER by column only (cf:qualifier). You can set a
formatter for all columns (including, all key parts) using the "FORMATTER"
and "FORMATTER_CLASS" options. The default "FORMATTER_CLASS" is
"org.apache.hadoop.hbase.util.Bytes".
hbase> scan 't1', {FORMATTER => 'toString'}
hbase> scan 't1', {FORMATTER_CLASS => 'org.apache.hadoop.hbase.util.Bytes', FORMATTER => 'toString'}
Scan can also be used directly from a table, by first getting a reference to a
table, like such:
hbase> t = get_table 't'
hbase> t.scan
Note in the above situation, you can still provide all the filtering, columns,
options, etc as described above.
Took 8.2657 seconds
测试Error原因
新建一张表,来检测是否是表
user_profile
本身的问题,其实这里显而易见了,但是还是操作一下,稳妥些。
- 创建新的表stu,成功创建并可以插入、查询数据
hbase(main):004:0> create 'stu', 'info'
Created table stu
Took 4.3756 seconds
=> Hbase::Table - stu
hbase(main):005:0> list
TABLE
stu
user_profile
2 row(s)
Took 0.0085 seconds
=> ["stu", "user_profile"]
hbase(main):006:0> put 'stu', '1001', 'info:name', 'zhangsan'
Took 0.1275 seconds
hbase(main):007:0> scan 'stu'
ROW COLUMN+CELL
1001 column=info:name, timestamp=1677230857034, value=zhangsan
1 row(s)
Took 0.0197 seconds
hbase(main):008:0> count 'stu'
1 row(s)
Took 0.0277 seconds
=> 1
证明是表
user_profile
自身的原因;
编写文章最后才发现,因为是表
user_profile
自身的原因,所以我是不是可以直接把表user_profile
删除,再新建一个同名表,然后再次模拟导入数据不就可以了???傻了!!!
解决方案
检测数据表
user_profile
的状态,控制台会陆续打印出集群的状态以及表的状态相关信息
[whybigdata@hdp01 hbase-2.0.5]$ hbase hbck 'user_profile'
2023-02-24 17:34:49,265 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=hdp01:2181,hdp02:2181,hdp03:2181
2023-02-24 17:34:49,273 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2023-02-24 17:34:49,273 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=hdp01
2023-02-24 17:34:49,273 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_212
2023-02-24 17:34:49,273 INFO [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2023-02-24 17:34:49,273 INFO [main] zookeeper.ZooKeeper: Client environment:java.home=/opt/apps/jdk1.8.0_212/jre
2023-02-24 17:34:49,273 INFO [main] zookeeper.ZooKeeper: /hadoop/yarn/hadoop-yarn-server-web-proxy-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-api-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-client-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-common-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-tests-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-services-api-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-services-core-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-common-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-router-3.1.3.jar
2023-02-24 17:34:49,274 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/opt/apps/hadoop-3.1.3/lib/native
2023-02-24 17:34:49,274 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2023-02-24 17:34:49,274 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2023-02-24 17:34:49,274 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2023-02-24 17:34:49,274 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2023-02-24 17:34:49,274 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-862.el7.x86_64
2023-02-24 17:34:49,274 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=whybigdata
2023-02-24 17:34:49,274 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/home/whybigdata
2023-02-24 17:34:49,274 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/opt/apps/hbase-2.0.5
2023-02-24 17:34:49,275 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp01:2181,hdp02:2181,hdp03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@7a362b6b
2023-02-24 17:34:49,290 INFO [main-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp02/192.168.10.12:2181. Will not attempt to authenticate using SASL (unknown error)
Allow checking/fixes for table: user_profile
HBaseFsck command line options: user_profile
2023-02-24 17:34:49,294 INFO [main] util.HBaseFsck: Launching hbck
2023-02-24 17:34:49,295 INFO [main-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Socket connection established to hdp02/192.168.10.12:2181, initiating session
2023-02-24 17:34:49,306 INFO [main-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp02/192.168.10.12:2181, sessionid = 0x3000001abaf0003, negotiated timeout = 40000
2023-02-24 17:34:49,353 INFO [main] zookeeper.ReadOnlyZKClient: Connect 0x45fd9a4d to hdp01:2181,hdp02:2181,hdp03:2181 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
2023-02-24 17:34:49,359 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp01:2181,hdp02:2181,hdp03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$13/371452875@5321776b
2023-02-24 17:34:49,360 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp02/192.168.10.12:2181. Will not attempt to authenticate using SASL (unknown error)
2023-02-24 17:34:49,361 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Socket connection established to hdp02/192.168.10.12:2181, initiating session
2023-02-24 17:34:49,365 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp02/192.168.10.12:2181, sessionid = 0x3000001abaf0004, negotiated timeout = 40000
Version: 2.0.5
2023-02-24 17:34:49,970 INFO [main] util.HBaseFsck: Computing mapping of all store files
2023-02-24 17:34:50,240 INFO [main] util.HBaseFsck: Validating mapping using HDFS state
2023-02-24 17:34:50,240 INFO [main] util.HBaseFsck: Computing mapping of all link files
.
2023-02-24 17:34:50,292 INFO [main] util.HBaseFsck: Validating mapping using HDFS state
Number of live region servers: 3
Number of dead region servers: 0
Master: hdp01,16000,1677229359514
Number of backup masters: 0
Average load: 1.0
Number of requests: 76
Number of regions: 3
Number of regions in transition: 0
2023-02-24 17:34:50,406 INFO [main] util.HBaseFsck: Loading regionsinfo from the hbase:meta table
Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
2023-02-24 17:34:50,501 INFO [main] util.HBaseFsck: getTableDescriptors == tableNames => [user_profile]
2023-02-24 17:34:50,502 INFO [main] zookeeper.ReadOnlyZKClient: Connect 0x1ce93c18 to hdp01:2181,hdp02:2181,hdp03:2181 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
2023-02-24 17:34:50,504 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp01:2181,hdp02:2181,hdp03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$13/371452875@5321776b
2023-02-24 17:34:50,505 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18-SendThread(hdp01:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp01/192.168.10.11:2181. Will not attempt to authenticate using SASL (unknown error)
2023-02-24 17:34:50,506 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18-SendThread(hdp01:2181)] zookeeper.ClientCnxn: Socket connection established to hdp01/192.168.10.11:2181, initiating session
2023-02-24 17:34:50,521 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18-SendThread(hdp01:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp01/192.168.10.11:2181, sessionid = 0x20000018ff60005, negotiated timeout = 40000
2023-02-24 17:34:50,537 INFO [main] client.ConnectionImplementation: Closing master protocol: MasterService
2023-02-24 17:34:50,537 INFO [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x1ce93c18 to hdp01:2181,hdp02:2181,hdp03:2181
Number of Tables: 1
2023-02-24 17:34:50,542 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18] zookeeper.ZooKeeper: Session: 0x20000018ff60005 closed
2023-02-24 17:34:50,542 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x20000018ff60005
2023-02-24 17:34:50,550 INFO [main] util.HBaseFsck: Loading region directories from HDFS
2023-02-24 17:34:50,588 INFO [main] util.HBaseFsck: Loading region information from HDFS
2023-02-24 17:34:50,638 INFO [hbasefsck-pool1-t5] sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-24 17:34:50,638 INFO [hbasefsck-pool1-t9] sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-24 17:34:50,638 INFO [hbasefsck-pool1-t10] sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-24 17:34:50,765 INFO [main] util.HBaseFsck: Checking and fixing region consistency
ERROR: Region { meta => user_profile,,1677228015439.69f3f6a477f90bdc138e31f08ee909d8., hdfs => hdfs://hdp01:8020/hbase/data/default/user_profile/69f3f6a477f90bdc138e31f08ee909d8, deployed => , replicaId => 0 } not deployed on any region server.
ERROR: Region { meta => user_profile,003155,1677228015439.690658266a0b11c87aada6935c91a1f7., hdfs => hdfs://hdp01:8020/hbase/data/default/user_profile/690658266a0b11c87aada6935c91a1f7, deployed => , replicaId => 0 } not deployed on any region server.
2023-02-24 17:34:50,804 INFO [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
ERROR: There is a hole in the region chain between and . You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table user_profile
Summary:
Table user_profile is okay.
Number of regions: 0
Deployed on:
Table hbase:meta is okay.
Number of regions: 1
Deployed on: hdp02,16020,1677229361706
3 inconsistencies detected.
Status: INCONSISTENT
2023-02-24 17:34:50,881 INFO [main] zookeeper.ZooKeeper: Session: 0x3000001abaf0003 closed
2023-02-24 17:34:50,881 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3000001abaf0003
2023-02-24 17:34:50,881 INFO [main] client.ConnectionImplementation: Closing master protocol: MasterService
2023-02-24 17:34:50,882 INFO [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x45fd9a4d to hdp01:2181,hdp02:2181,hdp03:2181
2023-02-24 17:34:50,887 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d] zookeeper.ZooKeeper: Session: 0x3000001abaf0004 closed
2023-02-24 17:34:50,888 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3000001abaf0004
由上图可以看到,最终结果是:数据不一致(INCONSISTENT),同时还可以看到有一个
region hole
的问题
- 对比操作:
检测可以正常操作的表:
stu
表的信息如下所示
[whybigdata@hdp01 hbase-2.0.5]$ hbase hbck 'stu'
2023-02-24 17:42:33,198 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=hdp01:2181,hdp02:2181,hdp03:2181
2023-02-24 17:42:33,206 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2023-02-24 17:42:33,206 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=hdp01
2023-02-24 17:42:33,206 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_212
2023-02-24 17:42:33,206 INFO [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2023-02-24 17:42:33,206 INFO [main] zookeeper.ZooKeeper: Client environment:java.home=/opt/apps/jdk1.8.0_212/jre
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: /hadoop/yarn/hadoop-yarn-server-web-proxy-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-api-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-client-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-common-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-tests-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-services-api-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-services-core-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-common-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-router-3.1.3.jar
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/opt/apps/hadoop-3.1.3/lib/native
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-862.el7.x86_64
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=whybigdata
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/home/whybigdata
2023-02-24 17:42:33,207 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/opt/apps/hbase-2.0.5
2023-02-24 17:42:33,208 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp01:2181,hdp02:2181,hdp03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@7a362b6b
2023-02-24 17:42:33,222 INFO [main-SendThread(hdp03:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp03/192.168.10.13:2181. Will not attempt to authenticate using SASL (unknown error)
Allow checking/fixes for table: stu
HBaseFsck command line options: stu
2023-02-24 17:42:33,226 INFO [main] util.HBaseFsck: Launching hbck
2023-02-24 17:42:33,227 INFO [main-SendThread(hdp03:2181)] zookeeper.ClientCnxn: Socket connection established to hdp03/192.168.10.13:2181, initiating session
2023-02-24 17:42:33,235 INFO [main-SendThread(hdp03:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp03/192.168.10.13:2181, sessionid = 0x40000034a3b0005, negotiated timeout = 40000
2023-02-24 17:42:33,281 INFO [main] zookeeper.ReadOnlyZKClient: Connect 0x45fd9a4d to hdp01:2181,hdp02:2181,hdp03:2181 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
2023-02-24 17:42:33,286 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp01:2181,hdp02:2181,hdp03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$13/499703683@58ca425e
2023-02-24 17:42:33,288 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp02/192.168.10.12:2181. Will not attempt to authenticate using SASL (unknown error)
2023-02-24 17:42:33,289 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Socket connection established to hdp02/192.168.10.12:2181, initiating session
2023-02-24 17:42:33,294 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp02/192.168.10.12:2181, sessionid = 0x3000001abaf0006, negotiated timeout = 40000
Version: 2.0.5
2023-02-24 17:42:33,788 INFO [main] util.HBaseFsck: Computing mapping of all store files
2023-02-24 17:42:34,024 INFO [main] util.HBaseFsck: Validating mapping using HDFS state
2023-02-24 17:42:34,025 INFO [main] util.HBaseFsck: Computing mapping of all link files
.
2023-02-24 17:42:34,075 INFO [main] util.HBaseFsck: Validating mapping using HDFS state
Number of live region servers: 3
Number of dead region servers: 0
Master: hdp01,16000,1677229359514
Number of backup masters: 0
Average load: 1.0
Number of requests: 97
Number of regions: 3
Number of regions in transition: 0
2023-02-24 17:42:34,168 INFO [main] util.HBaseFsck: Loading regionsinfo from the hbase:meta table
Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
2023-02-24 17:42:34,245 INFO [main] util.HBaseFsck: getTableDescriptors == tableNames => [stu]
2023-02-24 17:42:34,245 INFO [main] zookeeper.ReadOnlyZKClient: Connect 0x1ce93c18 to hdp01:2181,hdp02:2181,hdp03:2181 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
2023-02-24 17:42:34,246 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp01:2181,hdp02:2181,hdp03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$13/499703683@58ca425e
2023-02-24 17:42:34,247 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp02/192.168.10.12:2181. Will not attempt to authenticate using SASL (unknown error)
2023-02-24 17:42:34,248 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Socket connection established to hdp02/192.168.10.12:2181, initiating session
2023-02-24 17:42:34,252 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18-SendThread(hdp02:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp02/192.168.10.12:2181, sessionid = 0x3000001abaf0007, negotiated timeout = 40000
2023-02-24 17:42:34,264 INFO [main] client.ConnectionImplementation: Closing master protocol: MasterService
2023-02-24 17:42:34,264 INFO [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x1ce93c18 to hdp01:2181,hdp02:2181,hdp03:2181
Number of Tables: 1
2023-02-24 17:42:34,269 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18] zookeeper.ZooKeeper: Session: 0x3000001abaf0007 closed
2023-02-24 17:42:34,270 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x1ce93c18-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3000001abaf0007
2023-02-24 17:42:34,273 INFO [main] util.HBaseFsck: Loading region directories from HDFS
2023-02-24 17:42:34,306 INFO [main] util.HBaseFsck: Loading region information from HDFS
2023-02-24 17:42:34,343 INFO [hbasefsck-pool1-t1] sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-24 17:42:34,343 INFO [hbasefsck-pool1-t16] sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-02-24 17:42:34,444 INFO [main] util.HBaseFsck: Checking and fixing region consistency
2023-02-24 17:42:34,471 INFO [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially.
Summary:
Table hbase:meta is okay.
Number of regions: 1
Deployed on: hdp02,16020,1677229361706
Table stu is okay.
Number of regions: 1
Deployed on: hdp01,16020,1677229361937
0 inconsistencies detected.
Status: OK
2023-02-24 17:42:34,541 INFO [main] zookeeper.ZooKeeper: Session: 0x40000034a3b0005 closed
2023-02-24 17:42:34,542 INFO [main] client.ConnectionImplementation: Closing master protocol: MasterService
2023-02-24 17:42:34,541 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x40000034a3b0005
2023-02-24 17:42:34,542 INFO [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x45fd9a4d to hdp01:2181,hdp02:2181,hdp03:2181
2023-02-24 17:42:34,547 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d] zookeeper.ZooKeeper: Session: 0x3000001abaf0006 closed
2023-02-24 17:42:34,547 INFO [ReadOnlyZKClient-hdp01:2181,hdp02:2181,hdp03:2181@0x45fd9a4d-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3000001abaf0006
可以看到,正常表的检测结果的region servers个数正常且无死亡如下:
以看到,正常表状态显示为OK
- 尝试修改表
user_profile
[whybigdata@hdp01 hbase-2.0.5]$ hbase hbck -fix "user_profile"
2023-02-24 18:17:24,321 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=hdp01:2181,hdp02:2181,hdp03:2181
2023-02-24 18:17:24,328 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=hdp01
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_212
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:java.home=/opt/apps/jdk1.8.0_212/jre
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: /hadoop/yarn/hadoop-yarn-server-web-proxy-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-api-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-client-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-common-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-tests-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-services-api-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-services-core-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-common-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.1.3.jar:/opt/apps/hadoop-3.1.3/share/hadoop/yarn/hadoop-yarn-server-router-3.1.3.jar
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/opt/apps/hadoop-3.1.3/lib/native
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-862.el7.x86_64
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=whybigdata
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/home/whybigdata
2023-02-24 18:17:24,329 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/opt/apps/hbase-2.0.5
2023-02-24 18:17:24,330 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=hdp01:2181,hdp02:2181,hdp03:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@7a362b6b
This option is deprecated, please use -fixAssignments instead.
2023-02-24 18:17:24,344 INFO [main-SendThread(hdp03:2181)] zookeeper.ClientCnxn: Opening socket connection to server hdp03/192.168.10.13:2181. Will not attempt to authenticate using SASL (unknown error)
Allow checking/fixes for table: user_profile
HBaseFsck command line options: -fix user_profile
2023-02-24 18:17:24,350 INFO [main-SendThread(hdp03:2181)] zookeeper.ClientCnxn: Socket connection established to hdp03/192.168.10.13:2181, initiating session
2023-02-24 18:17:24,360 INFO [main-SendThread(hdp03:2181)] zookeeper.ClientCnxn: Session establishment complete on server hdp03/192.168.10.13:2181, sessionid = 0x4000043ee360001, negotiated timeout = 40000
2023-02-24 18:17:24,462 INFO [pool-6-thread-1] util.HBaseFsck: Failed to create lock file hbase-hbck.lock, try=1 of 5
2023-02-24 18:17:24,665 INFO [pool-6-thread-1] util.HBaseFsck: Failed to create lock file hbase-hbck.lock, try=2 of 5
2023-02-24 18:17:25,069 INFO [pool-6-thread-1] util.HBaseFsck: Failed to create lock file hbase-hbck.lock, try=3 of 5
2023-02-24 18:17:25,873 INFO [pool-6-thread-1] util.HBaseFsck: Failed to create lock file hbase-hbck.lock, try=4 of 5
2023-02-24 18:17:27,476 INFO [pool-6-thread-1] util.HBaseFsck: Failed to create lock file hbase-hbck.lock, try=5 of 5
2023-02-24 18:17:30,677 ERROR [main] util.HBaseFsck: Another instance of hbck is fixing HBase, exiting this instance. [If you are sure no other instance is running, delete the lock file hdfs://hdp01:8020/hbase/.tmp/hbase-hbck.lock and rerun the tool]
Exception in thread "main" java.io.IOException: Duplicate hbck - Abort
at org.apache.hadoop.hbase.util.HBaseFsck.connect(HBaseFsck.java:555)
at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:5105)
at org.apache.hadoop.hbase.util.HBaseFsck$HBaseFsckTool.run(HBaseFsck.java:4928)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:4916)
可以看到错误:说是有另外一个运行的实例也在修复HBase,但是我是第一次执行修复操作,理论上没有另一个运行的实例;按照其建议,删除掉指定的lock文件,即删除hdfs路径
/hbase/.tmp
下的hbase-hbck.lock
文件,如下图所示:
- 继续执行上述修复命令:
[whybigdata@hdp01 hbase-2.0.5]$ hbase hbck -fix "user_profile"
......
ERROR: option '-fix' is not supportted!
-----------------------------------------------------------------------
NOTE: As of HBase version 2.0, the hbck tool is significantly changed.
In general, all Read-Only options are supported and can be be used
safely. Most -fix/ -repair options are NOT supported. Please see usage
below for details on which options are not supported.
-----------------------------------------------------------------------
Usage: fsck [opts] {only tables}
where [opts] are:
-help Display help options (this)
-details Display full report of all regions.
-timelag <timeInSeconds> Process only regions that have not experienced any metadata updates in the last <timeInSeconds> seconds.
-sleepBeforeRerun <timeInSeconds> Sleep this many seconds before checking if the fix worked if run with -fix
-summary Print only summary of the tables and status.
-metaonly Only check the state of the hbase:meta table.
-sidelineDir <hdfs://> HDFS path to backup existing meta.
-boundaries Verify that regions boundaries are the same between META and store files.
-exclusive Abort if another hbck is exclusive or fixing.
Datafile Repair options: (expert features, use with caution!)
-checkCorruptHFiles Check all Hfiles by opening them to make sure they are valid
-sidelineCorruptHFiles Quarantine corrupted HFiles. implies -checkCorruptHFiles
Replication options
-fixReplication Deletes replication queues for removed peers
Metadata Repair options supported as of version 2.0: (expert features, use with caution!)
-fixVersionFile Try to fix missing hbase.version file in hdfs.
-fixReferenceFiles Try to offline lingering reference store files
-fixHFileLinks Try to offline lingering HFileLinks
-noHdfsChecking Don't load/check region info from HDFS. Assumes hbase:meta region info is good. Won't check/fix any HDFS issue, e.g. hole, orphan, or overlap
-ignorePreCheckPermission ignore filesystem permission pre-check
NOTE: Following options are NOT supported as of HBase version 2.0+.
UNSUPPORTED Metadata Repair options: (expert features, use with caution!)
-fix Try to fix region assignments. This is for backwards compatiblity
-fixAssignments Try to fix region assignments. Replaces the old -fix
-fixMeta Try to fix meta problems. This assumes HDFS region info is good.
-fixHdfsHoles Try to fix region holes in hdfs.
-fixHdfsOrphans Try to fix region dirs with no .regioninfo file in hdfs
-fixTableOrphans Try to fix table dirs with no .tableinfo file in hdfs (online mode only)
-fixHdfsOverlaps Try to fix region overlaps in hdfs.
-maxMerge <n> When fixing region overlaps, allow at most <n> regions to merge. (n=5 by default)
-sidelineBigOverlaps When fixing region overlaps, allow to sideline big overlaps
-maxOverlapsToSideline <n> When fixing region overlaps, allow at most <n> regions to sideline per group. (n=2 by default)
-fixSplitParents Try to force offline split parents to be online.
-removeParents Try to offline and sideline lingering parents and keep daughter regions.
-fixEmptyMetaCells Try to fix hbase:meta entries not referencing any region (empty REGIONINFO_QUALIFIER rows)
UNSUPPORTED Metadata Repair shortcuts
-repair Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps -fixReferenceFiles-fixHFileLinks
-repairHoles Shortcut for -fixAssignments -fixMeta -fixHdfsHoles
......
可以看到,
-fix
选项在HBase2.x后不再支持
HBase2.x支持的选项如下:
查询了相关资料后,发现HBase2.x要使用修复功能需要使用得是hbck2,但是官方没有直接提供,需要自己去下载对应安装包的hbck2包,然后编译,选择自己需要的功能,才可以在HBase上使用。
呃呃呃呃呃呃。。。。
兜兜转转,还是得卸载重装!!!
鉴于能力有限,此处还是之间卸载重装HBase吧
彻底删除HBase数据
- 连接ZK,进入zk客户端
[whybigdata@hadoop103 zookeeper-3.5.7]$ bin/zkCli.sh
- 查看当前znode的所包括的内容
[zk: localhost:2181(CONNECTED) 4] ls -s /
[admin, brokers, cluster, config, consumers, controller_epoch, hbase, isr_change_notification, latest_producer_id_block, log_dir_event_notification, zookeeper]cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x600000002
cversion = 19
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 11
- 删除
[zk: localhost:2181(CONNECTED) 5] rmr /hbase
The command 'rmr' has been deprecated. Please use 'deleteall' instead.
[zk: localhost:2181(CONNECTED) 6] ls -s /
[admin, brokers, cluster, config, consumers, controller_epoch, isr_change_notification, latest_producer_id_block, log_dir_event_notification, zookeeper]cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x800000396
cversion = 20
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 10
- 重启ZK、重启hadoop(hdfs、yarn)、重启hbase
可是停止HBase集群,等待了好长时间,hdp01节点上的HMaster进程依旧存在,且停止过程并未中止。
–> 我强制将HMaster进程给kill掉了(其实我也知道会有问题),果不其然,重启所有服务后,HBase的HMaster进程就观察不到了。害害害!!!
卸载HBase
卸载HBase前需要执行上述「彻底删除HBase数据」的步骤:
- 删除hdfs上/hbase的目录(这个目录是在
hbase-site.xml
的参数hbase.rootdir
配置的)- 删除zk上
/hbase
的znode信息- 重启zk、hadoop集群
- 删除三台节点的hbase安装目录
- 重新解压安装hbase
参考资料
https://www.playpi.org/2019101201.html
https://www.cnblogs.com/data-magnifier/p/15383318.html