Apache HBase 全攻略

最新推荐文章于 2024-09-06 22:12:31 发布

BenedictJin

最新推荐文章于 2024-09-06 22:12:31 发布

阅读量991

点赞数

分类专栏： Apache HBase 文章标签： Apache HBase

本文链接：https://blog.csdn.net/oasiduofu/article/details/80528867

版权

本文详细介绍了Apache HBase的基础概念、编程技巧、常用命令及实战应用，包括Coprocessor的使用、Hive数据导入、集群间复制、性能优化等。同时，文章分享了作者在使用过程中遇到的问题及解决方案，提供了丰富的参考资料和社区链接。

摘要由CSDN通过智能技术生成

基础概念

Coprocessor

　Coprocessor 其实是一个类似 MapReduce 的分析组件，不过它极大简化了 MapReduce 模型。将请求独立地在各个 Region 中并行地运行，并提供了一套框架让用户灵活地自定义 Coprocessor

编程技巧

充分利用好 CellUtil

// 直接使用 byte[] 进行匹配，效率会更高
// Bad: cf.equals(Bytes.toString(CellUtil.cloneFamily(cell)))
CellUtil.matchingFamily(cell, cf) && CellUtil.matchingQualifier(cell, col)
// 同理，应尽量使用 `Bytes.equals`，来替代 `String#equals`

发挥好协处理的并行计算能力

// 某些很难使得表数据分布均匀的场景下，可以设置好预分区 [00, 01, 02, ..., 99]，并关闭自动分区（详见：常见命令-分区），则可保证每个 Region 上的只有单个 xx 前缀。这样，导表数据的时候，轮询地在 rowkey 前加上 xx 前缀，则可保证无热点 Region
// 在协处理器的程序中，则可先获取到 xx 前缀，并在构建 Scan 的时候，将前缀加在 startKey/endKey 前面即可
static String getStartKeyPrefix(HRegion region) {
    if (region == null) throw new RuntimeException("Region is null!");
    byte[] startKey = region.getStartKey();
    if (startKey == null || startKey.length == 0) return "00";
    String startKeyStr = Bytes.toString(startKey);
    return isEmpty(startKeyStr) ? "00" : startKeyStr.substring(0, 2);
}
private static boolean isEmpty(final String s) {
    return s == null || s.length() == 0;
}

处理好协处理器程序里的异常

　如果在协处理器里面有异常被抛出，并且 hbase.coprocessor.abortonerror 参数没有开启，那么，该协处理器会直接从被加载的环境中被删除掉。否则，则需要看异常类型，如果是 IOException 类型，则会直接被抛出；如果是 DoNotRetryIOException 类型，则不做重试，抛出异常。否则，默认将会尝试 10 次（硬编码在 AsyncConnectionImpl#RETRY_TIMER 中了）。因此需要依据自己的业务场景，对异常做好妥善的处理

日志打印

// 只能使用 Apache Commons 的 Log 类，否则将无法打印
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

private static final Log log = LogFactory.getLog(CoprocessorImpl.class.getName());

部署

# 先上传 coprocessor 处理器 jar 包
$ hadoop fs -copyFromLocal /home/hbase/script/coprocessor-0.0.1.jar hdfs://yuzhouwan/hbase/coprocessor/
$ hadoop fs -ls hdfs://yuzhouwan/hbase/coprocessor/

# 卸载旧的 coprocessor
$ alter 'yuzhouwan', METHOD => 'table_att_unset', NAME =>'coprocessor$1'
# 指定新的 coprocessor
$ alter 'yuzhouwan', METHOD => 'table_att', 'coprocessor' => 'hdfs://yuzhouwan/hbase/coprocessor/coprocessor-0.0.1.jar|com.yuzhouwan.hbase.coprocessor.Aggregation|111|'

# 通过查看 RegionServer 的日志，可观察协处理器的运行状况

常用命令

集群相关

$ su - hbase
$ start-hbase.sh

# HMaster    ThriftServer
$ jps | grep -v Jps
  32538 ThriftServer
  9383 HMaster
  8423 HRegionServer

# BackUp HMaster    ThriftServer
$ jps | grep -v Jps
  24450 jar
  21882 HMaster
  2296 HRegionServer
  14598 ThriftServer
  5998 Jstat

# BackUp HMaster    ThriftServer
$ jps | grep -v Jps
  31119 Bootstrap
  8775 HMaster
  25289 Bootstrap
  14823 Bootstrap
  12671 Jstat
  9052 ThriftServer
  26921 HRegionServer

# HRegionServer
$ jps | grep -v Jps
  29356 hbase-monitor-process-0.0.3-jar-with-dependencies.jar    # monitor
  11023 Jstat
  26135 HRegionServer


$ export -p | egrep -i "(hadoop|hbase)"
  declare -x HADOOP_HOME="/home/bigdata/software/hadoop"
  declare -x HBASE_HOME="/home/bigdata/software/hbase"
  declare -x PATH="/usr/local/anaconda/bin:/usr/local/R-3.2.1/bin:/home/bigdata/software/java/bin:/home/bigdata/software/hadoop/bin:/home/bigdata/software/hive/bin:/home/bigdata/software/sqoop/bin:/home/bigdata/software/hbase/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin"


$ java -XX:+PrintFlagsFinal -version | grep MaxHeapSize
    uintx MaxHeapSize                              := 32126271488     {product}           # 29.919921875 GB
  java version "1.7.0_60-ea"
  Java(TM) SE Runtime Environment (build 1.7.0_60-ea-b15)
  Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)


$ top
  top - 11:37:03 up 545 days, 18:45,  5 users,  load average: 8.74, 10.39, 10.96
  Tasks: 653 total,   1 running, 652 sleeping,   0 stopped,   0 zombie
  Cpu(s): 32.9%us,  0.7%sy,  0.0%ni, 66.3%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
  Mem:  264484056k total, 260853032k used,  3631024k free,  2235248k buffers
  Swap: 10485756k total, 10485756k used,        0k free, 94307776k cached
  # Memory: 252 GB


# `hbase classpath` 可以拿到 HBase 相关的所有依赖
$ java -classpath ~/opt/hbase/soft/yuzhouwan.jar:`hbase classpath` com.yuzhouwan.hbase.MainApp


# Usage
Usage: hbase [<options>] <command> [<args>]
Options:
  --config DIR    Configuration direction to use. Default: ./conf
  --hosts HOSTS   Override the list in 'regionservers' file

Commands:
Some commands take arguments. Pass no args or -h for usage.
  shell           Run the HBase shell
  hbck            Run the hbase 'fsck' tool
  hlog            Write-ahead-log analyzer
  hfile           Store file analyzer
  zkcli           Run the ZooKeeper shell
  upgrade         Upgrade hbase
  master          Run an HBase HMaster node
  regionserver    Run an HBase HRegionServer node
  zookeeper       Run a Zookeeper server
  rest            Run an HBase REST server
  thrift          Run the HBase Thrift server
  thrift2         Run the HBase Thrift2 server
  clean           Run the HBase clean up script
  classpath       Dump hbase CLASSPATH
  mapredcp        Dump CLASSPATH entries required by mapreduce
  pe              Run PerformanceEvaluation
  ltt             Run LoadTestTool
  version         Print the version
  CLASSNAME       Run the class named CLASSNAME


# hbase版本信息
$ hbase version
  2017-01-13 11:05:07,580 INFO  [main] util.VersionInfo: HBase 0.98.8-hadoop2
  2017-01-13 11:05:07,580 INFO  [main] util.VersionInfo: Subversion file:///e/hbase_compile/hbase-0.98.8 -r Unknown
  2017-01-13 11:05:07,581 INFO  [main] util.VersionInfo: Compiled by 14074019 on Mon Dec 26 20:17:32     2016


$ hadoop fs -ls /hbase
  drwxr-xr-x   - hbase hbase          0 2017-03-01 00:05 /hbase/.hbase-snapshot
  drwxr-xr-x   - hbase hbase          0 2016-10-