记一次hbase procedure.DisableTableProcedure: Not ENABLED, state=ENABLING, skipping disable; pid=128898

项目场景:

给一个表添加协处理器 导致生产hbase集群挂掉 重启之后出现 rit问题
刺激!


问题描述

添加协处理器

给表添加AggregateImplementation 协处理器,导致整个Regionserver宕机。

2022-07-11 17:10:35,890 ERROR [RS_OPEN_REGION-regionserver/node119:16020-0] coprocessor.CoprocessorHost: The coprocessor org.apache.Hadoop.hbase.coprocessor.AggregateImplementation threw java.io.IOException: No jar path specified for org.apa
che.Hadoop.hbase.coprocessor.AggregateImplementation
2022-07-11 17:10:35,890 ERROR [RS_OPEN_REGION-regionserver/node119:16020-2] coprocessor.CoprocessorHost: The coprocessor org.apache.Hadoop.hbase.coprocessor.AggregateImplementation threw java.io.IOException: No jar path specified for org.apa
che.Hadoop.hbase.coprocessor.AggregateImplementation
2022-07-11 17:10:35,937 ERROR [RS_OPEN_REGION-regionserver/node119:16020-0] regionserver.HRegionServer: ***** ABORTING region server node119,16020,1655708005885: The coprocessor org.apache.Hadoop.hbase.coprocessor.AggregateImplementation thr
ew java.io.IOException: No jar path specified for org.apache.Hadoop.hbase.coprocessor.AggregateImplementation *****
2022-07-11 17:10:35,937 ERROR [RS_OPEN_REGION-regionserver/node119:16020-2] regionserver.HRegionServer: ***** ABORTING region server node119,16020,1655708005885: The coprocessor org.apache.Hadoop.hbase.coprocessor.AggregateImplementation thr
ew java.io.IOException: No jar path specified for org.apache.Hadoop.hbase.coprocessor.AggregateImplementation *****
2022-07-11 17:10:35,937 ERROR [RS_OPEN_REGION-regionserver/node119:16020-0] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.SequenceRegionObserver, org.apache.phoenix.coprocessor.ScanR
egionObserver, org.apache.hadoop.hbase.coprocessor.AggregateImplementation, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.PhoenixTTLRegionObserver, org
.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.ChildLinkMetaDataEndpoint, org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, org.apache.phoenix.hbase.index.IndexRegionObserver]
2022-07-11 17:10:35,937 ERROR [RS_OPEN_REGION-regionserver/node119:16020-2] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.phoenix.coprocessor.SequenceRegionObserver, org.apache.phoenix.coprocessor.ScanR
egionObserver, org.apache.hadoop.hbase.coprocessor.AggregateImplementation, org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, org.apache.phoenix.hbase.index.Indexer, org.apache.phoenix.coprocessor.PhoenixTTLRegionObserver, org
.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, org.apache.phoenix.coprocessor.ChildLinkMetaDataEndpoint, org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, org.apache.phoenix.hbase.index.IndexRegionObserver]
2022-07-11 17:10:36,105 INFO  [RS_OPEN_REGION-regionserver/node119:16020-2] regionserver.HRegionServer: s": 0,
2022-07-11 17:10:36,111 INFO  [RS_OPEN_REGION-regionserver/node119:16020-0] regionserver.HRegionServer: s": 0,
2022-07-11 17:10:36,346 INFO  [RS_OPEN_REGION-regionserver/node119:16020-0] regionserver.HRegionServer: ***** STOPPING region server 'node119,16020,1655708005885' *****
2022-07-11 17:10:36,346 INFO  [RS_OPEN_REGION-regionserver/node119:16020-2] regionserver.HRegionServer: ***** STOPPING region server 'node119,16020,1655708005885' *****

hbase日志

重启之后添加协处理器的表无法删除、无法禁用。

2022-07-11 19:30:55,572 INFO  [PEWorker-4] procedure.DisableTableProcedure: Not ENABLED, state=ENABLING, skipping disable; pid=128898, state=RUNNABLE:DISABLE_TABLE_PREPARE, locked=true; DisableTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWES
T
2022-07-11 19:30:55,589 INFO  [PEWorker-4] procedure2.ProcedureExecutor: Rolled back pid=128898, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.TableNotEnabledException via master-disable-table:org.apache.hadoop.hbase.TableNotEnabledExc
eption: tableName=TEST:SAMPLE_TASK_SCAN_NEWEST, state=ENABLING; DisableTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWEST exec-time=1 hrs, 58 mins, 13.627 sec
2022-07-11 19:30:55,594 INFO  [PEWorker-4] procedure.DisableTableProcedure: Not ENABLED, state=ENABLING, skipping disable; pid=139703, state=RUNNABLE:DISABLE_TABLE_PREPARE, locked=true; DisableTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWES
T
2022-07-11 19:30:55,598 INFO  [PEWorker-4] procedure2.ProcedureExecutor: Rolled back pid=139703, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.TableNotEnabledException via master-disable-table:org.apache.hadoop.hbase.TableNotEnabledExc
eption: tableName=TEST:SAMPLE_TASK_SCAN_NEWEST, state=ENABLING; DisableTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWEST exec-time=1 hrs, 37 mins, 54.003 sec
2022-07-11 19:30:55,602 INFO  [PEWorker-4] procedure.DisableTableProcedure: Not ENABLED, state=ENABLING, skipping disable; pid=139706, state=RUNNABLE:DISABLE_TABLE_PREPARE, locked=true; DisableTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWES
T
2022-07-11 19:30:55,605 INFO  [PEWorker-4] procedure2.ProcedureExecutor: Rolled back pid=139706, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.TableNotEnabledException via master-disable-table:org.apache.hadoop.hbase.TableNotEnabledExc
eption: tableName=TEST:SAMPLE_TASK_SCAN_NEWEST, state=ENABLING; DisableTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWEST exec-time=1 hrs, 30 mins, 9.834 sec
2022-07-11 19:30:55,613 INFO  [PEWorker-4] procedure2.ProcedureExecutor: Rolled back pid=139712, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.TableNotDisabledException via master-delete-table:org.apache.hadoop.hbase.TableNotDisabledEx
ception: Not DISABLED; tableName=TEST:SAMPLE_TASK_SCAN_NEWEST, state=ENABLING; DeleteTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWEST exec-time=1 hrs, 28 mins, 20.029 sec
2022-07-11 19:30:55,620 INFO  [PEWorker-4] procedure2.ProcedureExecutor: Rolled back pid=139723, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.TableNotDisabledException via master-delete-table:org.apache.hadoop.hbase.TableNotDisabledEx
ception: Not DISABLED; tableName=TEST:SAMPLE_TASK_SCAN_NEWEST, state=ENABLING; DeleteTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWEST exec-time=1 hrs, 10 mins, 41.716 sec
2022-07-11 19:30:55,623 INFO  [PEWorker-4] procedure.DisableTableProcedure: Not ENABLED, state=ENABLING, skipping disable; pid=139732, state=RUNNABLE:DISABLE_TABLE_PREPARE, locked=true; DisableTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWES
T
2022-07-11 19:30:55,627 INFO  [PEWorker-4] procedure2.ProcedureExecutor: Rolled back pid=139732, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.TableNotEnabledException via master-disable-table:org.apache.hadoop.hbase.TableNotEnabledExc
eption: tableName=TEST:SAMPLE_TASK_SCAN_NEWEST, state=ENABLING; DisableTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWEST exec-time=1 hrs, 8 mins, 55.011 sec
2022-07-11 19:30:55,631 INFO  [PEWorker-4] procedure.DisableTableProcedure: Not ENABLED, state=ENABLING, skipping disable; pid=139776, state=RUNNABLE:DISABLE_TABLE_PREPARE, locked=true; DisableTableProcedure table=TEST:SAMPLE_TASK_SCAN_NEWES
T

原因分析:

因为协处理器一直找不到所以RegionServer重启就宕机无法启动,得先解决这个问题,再解决表的问题。
看样子是TEST:SAMPLE_TASK_SCAN_NEWEST这个表的问题,重启问题解决后,我们打算把这个表给删除了。

因为协处理器无法重启
删除时却报错:
org.apache.hadoop.hbase.TableNotDisabledException: org.apache.hadoop.hbase.TableNotDisabledException: test
	at org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:1740)
	at org.apache.hadoop.hbase.master.handler.TableEventHandler.prepare(TableEventHandler.java:86)
	at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1576)
	at org.apache.hadoop.hbase.master.MasterRpcServices.deleteTable(MasterRpcServices.java:463)
	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:44229)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
	at java.lang.Thread.run(Thread.java:745)

而且TEST:SAMPLE_TASK_SCAN_NEWEST, state=ENABLING;这个表一直显示ENABLING,我们也无法对他进行 disable操作


解决方案:

根据Hbase中的CoprocessorHost源码展示

// If we got here, e is not an IOException. A loaded coprocessor has a
    // fatal bug, and the server (master or regionserver) should remove the
    // faulty coprocessor from its set of active coprocessors. Setting
    // 'hbase.coprocessor.abortonerror' to true will cause abortServer(),
    // which may be useful in development and testing environments where
    // 'failing fast' for error analysis is desired.
    if (env.getConfiguration().getBoolean(ABORT_ON_ERROR_KEY, DEFAULT_ABORT_ON_ERROR)) {
      // server is configured to abort.
      abortServer(env, e);
    } else {
      // If available, pull a table name out of the environment
      if(env instanceof RegionCoprocessorEnvironment) {
        String tableName = ((RegionCoprocessorEnvironment)env).getRegionInfo().getTable().getNameAsString();
        LOG.error("Removing coprocessor '" + env.toString() + "' from table '"+ tableName + "'", e);
      } else {
        LOG.error("Removing coprocessor '" + env.toString() + "' from " +
                "environment",e);
      }

添加此配置,可跳过协处理器无法加载而启动失败的问题。

<property>
	<name>hbase.coprocessor.abortonerror</name>
	<value>false</value>
</property>

使用Hbck2 解决重启后带来的一系列RIT问题

hbck参数
 setTableState <TABLENAME> <STATE>
   Possible table states: ENABLED, DISABLED, DISABLING, ENABLING
   To read current table state, in the hbase shell run:
     hbase> get 'hbase:meta', '<TABLENAME>', 'table:state'
   A value of \x08\x00 == ENABLED, \x08\x01 == DISABLED, etc.
   Can also run a 'describe "<TABLENAME>"' at the shell prompt.
   An example making table name 'user' ENABLED:
     $ HBCK2 setTableState users ENABLED
   Returns whatever the previous table state was.
执行命令 【一次可能不行】
/bin/hbase --config /etc/hbase-conf 
hbck -j ./hbase-operator-tools-1.2.0/hbase-hbck2/target/hbase-hbck2-1.2.0.jar
setTableState TEST:SAMPLE_TASK_SCAN_NEWEST DISABLED
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值