hbase HBCK2使用指南

基本慨念

HBCK2 每次运行时都会执行一个独立的任务。 它并不是一个可以分析所有关于正在运行的集群,然后修复发现的“所有问题”,如 hbck1 使用的建议的工具。
虽然 hbck1 仍然捆绑在 hbase-2.x 中——为了尽量减少意外——但它已被弃用,将在 hbase-3.x 中删除
HBCK2 用于修复。 对于正在运行的集群中的不一致或阻塞的列表,您可以转到其他地方,查看正在运行的集群 Master 的日志和 UI。 一旦发现问题,您就可以使用 HBCK2 工具要求 Master 进行修复或跳过不良状态。 HBCK2 和 hbck1 之间的另一个重要区别是要求 Master 进行修复,而不是尝试在修复工具的上下文中进行本地修复。 有关此交互式修复过程如何工作以及 HBCK2 工作原理的更多信息,请参见以下部分。

源码编译以及下载

  • 下载链接
 https://hbase.apache.org/downloads.html
  • 2下载:
    HBase Operator Tools src包
    在这里插入图片描述
  • 编译准备
    下载maven
https://dlcdn.apache.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
  • 解压
  • cd hbase-operator-tools-1.2.0
  • 开始编译
#编译命令
apache-maven-3.6.3/bin/mvn clean install -DskipTests

运行hbck2工具

HBCK2 jar 不包含依赖项; 它不是作为fat jar。 必须提供依赖项。 构建,调整顶级 pom 中的目标 hbase 版本以匹配您的部署将在针对您的部署运行时实现最流畅的操作(请参阅父 pom.xml hbase-operator-tools 以设置 hbase.version)。

HBCK2 和运行集群之间的运行时交互会变得有趣的地方是当 HBCK2 提前于你的 hbase 部署时,你的 hbase 不支持当前 HBCK2 中的所有 API。 如果 HBCK2 不需要服务器端支持,它应该会优雅地失败。 如果遇到该情况使用旧版本HBCK2或升级您的集群(如果可以)。

“提供” HBCK2 其依赖项的最简单方法是通过 $HBASE_HOME/bin/hbase 脚本启动 HBCK2。 bin/hbase 脚本本身就提到了 hbck——在帮助输出中列出了一个 hbck 选项。 默认情况下,运行 bin/hbase hbck,将运行内置的 hbck1 工具。 要运行 HBCK2,您需要使用 -j 选项指向已构建的 HBCK2 jar,如下所示

 $  ${HBASE_HOME}/bin/hbase --config /etc/hbase-conf hbck -j ~/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-xxx.jar

在上面提到的地方, /etc/hbase-conf 是部署的配置所在的位置(随意指定一个空文件夹即可)。
HBCK2 jar 位于 ~/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-xxx.jar。

hbck2参数

usage: HBCK2 [OPTIONS] COMMAND <ARGS>
Options:
 -d,--debug                                       run with debug output
 -h,--help                                        output this help message
 -p,--hbase.zookeeper.property.clientPort <arg>   port of hbase ensemble
 -q,--hbase.zookeeper.quorum <arg>                hbase ensemble
 -s,--skip                                        skip hbase version check
                                                  (PleaseHoldException)
 -v,--version                                     this hbck2 version
 -z,--zookeeper.znode.parent <arg>                parent znode of hbase
                                                  ensemble
Command:
 addFsRegionsMissingInMeta <NAMESPACE|NAMESPACE:TABLENAME>...
   Options:
    -d,--force_disable aborts fix for table if disable fails.
   To be used when regions missing from hbase:meta but directories
   are present still in HDFS. Can happen if user has run _hbck1_
   'OfflineMetaRepair' against an hbase-2.x cluster. Needs hbase:meta
   to be online. For each table name passed as parameter, performs diff
   between regions available in hbase:meta and region dirs on HDFS.
   Then for dirs with no hbase:meta matches, it reads the 'regioninfo'
   metadata file and re-creates given region in hbase:meta. Regions are
   re-created in 'CLOSED' state in the hbase:meta table, but not in the
   Masters' cache, and they are not assigned either. To get these
   regions online, run the HBCK2 'assigns'command printed when this
   command-run completes.
   NOTE: If using hbase releases older than 2.3.0, a rolling restart of
   HMasters is needed prior to executing the set of 'assigns' output.
   An example adding missing regions for tables 'tbl_1' in the default
   namespace, 'tbl_2' in namespace 'n1' and for all tables from
   namespace 'n2':
     $ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2
   Returns HBCK2  an 'assigns' command with all re-inserted regions.
   SEE ALSO: reportMissingRegionsInMeta
   SEE ALSO: fixMeta

 assigns [OPTIONS] <ENCODED_REGIONNAME/INPUTFILES_FOR_REGIONNAMES>...
   Options:
    -o,--override  override ownership by another procedure
    -i,--inputFiles  take one or more encoded region names
   A 'raw' assign that can be used even during Master initialization (if
   the -skip flag is specified). Skirts Coprocessors. Pass one or more
   encoded region names. 1588230740 is the hard-coded name for the
   hbase:meta region and de00010733901a05f5a2a3a382e27dd4 is an example of
   what a user-space encoded region name looks like. For example:
     $ HBCK2 assigns 1588230740 de00010733901a05f5a2a3a382e27dd4
   Returns the pid(s) of the created AssignProcedure(s) or -1 if none.
   If -i or --inputFiles is specified, pass one or more input file names.
   Each file contains encoded region names, one per line. For example:
     $ HBCK2 assigns -i fileName1 fileName2
 bypass [OPTIONS] <PID>...
   Options:
    -o,--override   override if procedure is running/stuck
    -r,--recursive  bypass parent and its children. SLOW! EXPENSIVE!
    -w,--lockWait   milliseconds to wait before giving up; default=1
   Pass one (or more) procedure 'pid's to skip to procedure finish. Parent
   of bypassed procedure will also be skipped to the finish. Entities will
   be left in an inconsistent state and will require manual fixup. May
   need Master restart to clear locks still held. Bypass fails if
   procedure has children. Add 'recursive' if all you have is a parent pid
   to finish parent and children. This is SLOW, and dangerous so use
   selectively. Does not always work.

 extraRegionsInMeta <NAMESPACE|NAMESPACE:TABLENAME>...
   Options:
    -f, --fix    fix meta by removing all extra regions found.
   Reports regions present on hbase:meta, but with no related
   directories on the file system. Needs hbase:meta to be online.
   For each table name passed as parameter, performs diff
   between regions available in hbase:meta and region dirs on the given
   file system. Extra regions would get deleted from Meta
   if passed the --fix option.
   NOTE: Before deciding on use the "--fix" option, it's worth check if
   reported extra regions are overlapping with existing valid regions.
   If so, then "extraRegionsInMeta --fix" is indeed the optimal solution.
   Otherwise, "assigns" command is the simpler solution, as it recreates
   regions dirs in the filesystem, if not existing.
   An example triggering extra regions report for tables 'table_1'
   and 'table_2', under default namespace:
     $ HBCK2 extraRegionsInMeta default:table_1 default:table_2
   An example triggering extra regions report for table 'table_1'
   under default namespace, and for all tables from namespace 'ns1':
     $ HBCK2 extraRegionsInMeta default:table_1 ns1
   Returns list of extra regions for each table passed as parameter, or
   for each table on namespaces specified as parameter.

 filesystem [OPTIONS] [<TABLENAME>...]
   Options:
    -f, --fix    sideline corrupt hfiles, bad links, and references.
   Report on corrupt hfiles, references, broken links, and integrity.
   Pass '--fix' to sideline corrupt files and links. '--fix' does NOT
   fix integrity issues; i.e. 'holes' or 'orphan' regions. Pass one or
   more tablenames to narrow checkup. Default checks all tables and
   restores 'hbase.version' if missing. Interacts with the filesystem
   only! Modified regions need to be reopened to pick-up changes.

 fixMeta
   Do a server-side fix of bad or inconsistent state in hbase:meta.
   Available in hbase 2.2.1/2.1.6 or newer versions. Master UI has
   matching, new 'HBCK Report' tab that dumps reports generated by
   most recent run of _catalogjanitor_ and a new 'HBCK Chore'. It
   is critical that hbase:meta first be made healthy before making
   any other repairs. Fixes 'holes', 'overlaps', etc., creating
   (empty) region directories in HDFS to match regions added to
   hbase:meta. Command is NOT the same as the old _hbck1_ command
   named similarily. Works against the reports generated by the last
   catalog_janitor and hbck chore runs. If nothing to fix, run is a
   noop. Otherwise, if 'HBCK Report' UI reports problems, a run of
   fixMeta will clear up hbase:meta issues. See 'HBase HBCK' UI
   for how to generate new report.
   SEE ALSO: reportMissingRegionsInMeta

 generateMissingTableDescriptorFile <TABLENAME>
   Trying to fix an orphan table by generating a missing table descriptor
   file. This command will have no effect if the table folder is missing
   or if the .tableinfo is present (we don't override existing table
   descriptors). This command will first check it the TableDescriptor is
   cached in HBase Master in which case it will recover the .tableinfo
   accordingly. If TableDescriptor is not cached in master then it will
   create a default .tableinfo file with the following items:
     - the table name
     - the column family list determined based on the file system
     - the default properties for both TableDescriptor and
       ColumnFamilyDescriptors
   If the .tableinfo file was generated using default parameters then
   make sure you check the table / column family properties later (and
   change them if needed).
   This method does not change anything in HBase, only writes the new
   .tableinfo file to the file system. Orphan tables can cause e.g.
   ServerCrashProcedures to stuck, you might need to fix these still
   after you generated the missing table info files.

 replication [OPTIONS] [<TABLENAME>...]
   Options:
    -f, --fix    fix any replication issues found.
   Looks for undeleted replication queues and deletes them if passed the
   '--fix' option. Pass a table name to check for replication barrier and
   purge if '--fix'.

 reportMissingRegionsInMeta <NAMESPACE|NAMESPACE:TABLENAME>...
   To be used when regions missing from hbase:meta but directories
   are present still in HDFS. Can happen if user has run _hbck1_
   'OfflineMetaRepair' against an hbase-2.x cluster. This is a CHECK only
   method, designed for reporting purposes and doesn't perform any
   fixes, providing a view of which regions (if any) would get re-added
   to hbase:meta, grouped by respective table/namespace. To effectively
   re-add regions in meta, run addFsRegionsMissingInMeta.
   This command needs hbase:meta to be online. For each namespace/table
   passed as parameter, it performs a diff between regions available in
   hbase:meta against existing regions dirs on HDFS. Region dirs with no
   matches are printed grouped under its related table name. Tables with
   no missing regions will show a 'no missing regions' message. If no
   namespace or table is specified, it will verify all existing regions.
   It accepts a combination of multiple namespace and tables. Table names
   should include the namespace portion, even for tables in the default
   namespace, otherwise it will assume as a namespace value.
   An example triggering missing regions report for tables 'table_1'
   and 'table_2', under default namespace:
     $ HBCK2 reportMissingRegionsInMeta default:table_1 default:table_2
   An example triggering missing regions report for table 'table_1'
   under default namespace, and for all tables from namespace 'ns1':
     $ HBCK2 reportMissingRegionsInMeta default:table_1 ns1
   Returns list of missing regions for each table passed as parameter, or
   for each table on namespaces specified as parameter.

 setRegionState <ENCODED_REGIONNAME> <STATE>
   Possible region states:
    OFFLINE, OPENING, OPEN, CLOSING, CLOSED, SPLITTING, SPLIT,
    FAILED_OPEN, FAILED_CLOSE, MERGING, MERGED, SPLITTING_NEW,
    MERGING_NEW, ABNORMALLY_CLOSED
   WARNING: This is a very risky option intended for use as last resort.
   Example scenarios include unassigns/assigns that can't move forward
   because region is in an inconsistent state in 'hbase:meta'. For
   example, the 'unassigns' command can only proceed if passed a region
   in one of the following states: SPLITTING|SPLIT|MERGING|OPEN|CLOSING
   Before manually setting a region state with this command, please
   certify that this region is not being handled by a running procedure,
   such as 'assign' or 'split'. You can get a view of running procedures
   in the hbase shell using the 'list_procedures' command. An example
   setting region 'de00010733901a05f5a2a3a382e27dd4' to CLOSING:
     $ HBCK2 setRegionState de00010733901a05f5a2a3a382e27dd4 CLOSING
   Returns "0" if region state changed and "1" otherwise.

 setTableState <TABLENAME> <STATE>
   Possible table states: ENABLED, DISABLED, DISABLING, ENABLING
   To read current table state, in the hbase shell run:
     hbase> get 'hbase:meta', '<TABLENAME>', 'table:state'
   A value of \x08\x00 == ENABLED, \x08\x01 == DISABLED, etc.
   Can also run a 'describe "<TABLENAME>"' at the shell prompt.
   An example making table name 'user' ENABLED:
     $ HBCK2 setTableState users ENABLED
   Returns whatever the previous table state was.

 scheduleRecoveries <SERVERNAME>...
   Schedule ServerCrashProcedure(SCP) for list of RegionServers. Format
   server name as '<HOSTNAME>,<PORT>,<STARTCODE>' (See HBase UI/logs).
   Example using RegionServer 'a.example.org,29100,1540348649479':
     $ HBCK2 scheduleRecoveries a.example.org,29100,1540348649479
   Returns the pid(s) of the created ServerCrashProcedure(s) or -1 if
   no procedure created (see master logs for why not).
   Command support added in hbase versions 2.0.3, 2.1.2, 2.2.0 or newer.

 unassigns <ENCODED_REGIONNAME>...
   Options:
    -o,--override  override ownership by another procedure
   A 'raw' unassign that can be used even during Master initialization
   (if the -skip flag is specified). Skirts Coprocessors. Pass one or
   more encoded region names. 1588230740 is the hard-coded name for the
   hbase:meta region and de00010733901a05f5a2a3a382e27dd4 is an example
   of what a userspace encoded region name looks like. For example:
     $ HBCK2 unassign 1588230740 de00010733901a05f5a2a3a382e27dd4
   Returns the pid(s) of the created UnassignProcedure(s) or -1 if none.

   SEE ALSO, org.apache.hbase.hbck1.OfflineMetaRepair, the offline
   hbase:meta tool. See the HBCK2 README for how to use.

注意

请注意,当您向 bin/hbase 传递 hbck 参数时,默认情况下它将使用默认客户端访问目标 hbase 集群。 这对于大多数 HBCK2 使用来说已经足够了。 如果您遇到如下异常:

bin/hbase --config hbase-conf  hbck
2019-08-30 05:04:54,467 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:361)
        at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3605)

这是因为 HDFS jar 不在 CLASSPATH 上。 默认情况下,当通过 bin/hbase 运行 hbck 时,不会在 CLASSPATH 上捆绑 HDFS jar。 在环境中定义 HADOOP_HOME 以便 bin/hbase 可以找到您本地的 hadoop 安装,然后它将加载其 HDFS jar。

Hbck2简介

HBCK2 目前是一个简单的工具,一次只做一件事。

在 hbase-2.x 中,Master 是所有状态的最终仲裁者,因此大多数 HBCK2 命令的一般原则是它要求 Master 进行所有修复。 这意味着在您可以运行 HBCK2 命令之前,必须先启动 Master。

HBCK2 实现方法是利用托管在 Master 上的 HbckService。 该服务发布了一些方法供 HBCK2 工具使用。 因此,对于依赖于 Master 的 HbckService 门面的 HBCK2 命令,HBCK2 做的第一件事就是对集群进行 poke 以确保服务可用。 如果远程服务器没有发布服务或者 HbckService 缺少请求的方法,这将失败。 对于后一种情况,如果可以,请更新您的集群以获得更多修复工具。

寻找问题

虽然 hbck1 执行分析报告您的集群 GOOD 或 BAD,但 HBCK2 不那么自以为是。 在 hbase-2.x 中,操作员确定需要修复的内容,然后使用包括 HBCK2 在内的工具进行修复。 操作员可能必须来回运行几轮 HBCK2,然后检查集群状态。

要解决集群问题,请使用以下实用程序和方法。

诊断工具

Master Logs
Master 运行所有分配、服务器崩溃处理、集群启动和停止等。在 hbase-2.x 中,Master 所做的一切都被转换为在状态机引擎上运行的程序。 有关此新基础架构如何工作的详细信息,请参阅过程框架和分配管理器。 每个过程都有一个唯一的过程 id,它的 pid,它在每个日志记录中列出。 在 pid 之后,您可以在主日志中跟踪过程的生命周期,作为过程从开始到过程的各个阶段到完成的转换。 一些程序会产生子程序,等待它们的子程序,然后自己完成。 每个子程序记录它的 pid 和它的 ppid; 它的父程序的pid。

一般来说,所有运行都没有问题,但如果出现一些不可预见的情况,分配框架可能会受到损坏,需要操作员干预。 下面我们将讨论一些这样的场景,但它们可以在主日志中表现为一个区域被 STUCK 或一个转换实体(区域或表)的过程可能被阻塞,因为另一个过程持有排他锁并且不放手 .

STUCK 程序如下所示:

2018-09-12 15:29:06,558 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
STUCK Region-In-Transition rit=OPENING, location=va1001.example.org,22101,1536173230599, 
table=IntegrationTestBigLinkedList_20180626110336, region=dbdb56242f17610c46ea044f7a42895b

Master UI: /master-status#tables
这部分关于 Master UI 主页的中间部分显示了一个表列表,其中包含表是 ENABLED、ENABLING、DISABLING 还是 DISABLED 以及其他属性的列。 还列出了具有各种过渡状态的区域计数的列:打开、关闭等。阅读此表有助于确定此表的区域是否具有适当的配置。 例如,如果一个表是 ENABLED 并且有没有处于 OPEN 状态的区域并且主日志对任何正在进行的分配保持沉默,那么就有问题了。

Master UI: ‘Procedures & Locks’
此页面在页面标题中的 Procedures & Locks 菜单项下的 Master UI 主页上列出了所有正在进行的过程和锁以及当前的 Master Procedure WAL 集(在 MasterProcWALs 目录下名为 pv2-0000000000000000###.log 你的 hbase 安装)。 在启动时,在一个大型集群上,当激烈的分配正在进行时,这个页面充满了过程和锁的列表。 MasterProcWAL 的数量也会膨胀。 如果在集群稳定后,有一个卡住的锁或过程,或者 WAL 的计数没有下降而是只会增加,那么需要操作员干预来解除阻塞。

锁和过程的列表也可以通过 hbase shell 获得:

$ echo "list_locks"| hbase shell &> /tmp/locks.txt
$ echo "list_procedures"| hbase shell &> /tmp/procedures.txt

Master UI: The ‘HBCK Report’
在 hbase 2.3.0/2.1.6/2.2.1 的 /hbck.jsp 版本中,一个 HBCK 报告页面被添加到 Master 中,该页面显示了 master 每隔一段时间运行的两次检查的输出; 一个由 CatalogJanitor 运行时输出。 如果 hbase:meta 中有重叠或漏洞,CatalogJanitor 页面的一半将列出它找到的内容(否则它是安静的)。 添加了另一个后台“杂项”进程来比较 hbase:meta 和文件系统内容进行比较; 如果异常,它将在其 HBCK 报告部分中记录。

有关如何强制检查员运行的信息,请参阅“HBCK 报告”页面本身。

The HBase Canary Tool
Canary 工具对验证分配状态很有用。它可以以表为焦点或针对整个集群运行。

例如,要检查集群分配:

$ hbase canary -f false -t 6000000 &>/tmp/canary.log

-f false 告诉 Canary 继续执行失败的区域提取,而 -t 6000000 告诉 Canary 最多运行约两个小时。 完成后,查看 /tmp/canary.log。查看ERROR的行以查找有问题的区域分配。

您可以在 hbase shell 中进行类似 Canary 的探测。 例如,给定一个 Region 的起始行 d1dddd0c 属于表 testtable,请执行以下操作:

hbase> scan 'testtable', {STARTROW => 'd1dddd0c', LIMIT => 10}

其他工具
要计算 ENABLED 或 ENABLING 表上未打开的区域列表,请阅读 hbase:meta table info:state 列。 例如,要查找表 IntegrationTestBigLinkedList_20180626064758 中所有区域的状态,请执行以下操作:

$ echo " scan 'hbase:meta', {ROWPREFIXFILTER => 'IntegrationTestBigLinkedList_20180626064758,', COLUMN => 'info:state'}"| hbase shell > /tmp/t.txt

…然后 OPENING 或 CLOSING 区域执行grep。

要将 OPENING 问题移至 OPEN 以使其与表的 ENABLED 状态一致,请使用 hbase shell 中的 assign 命令对新的分配过程进行排队(查看主日志以查看分配运行)。 如果要分配多个区域,请使用 HBCK2 工具。 它可以进行批量分配。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值