基础概念
Coprocessor
Coprocessor
其实是一个类似 MapReduce
的分析组件,不过它极大简化了 MapReduce
模型。将请求独立地在各个 Region
中并行地运行,并提供了一套框架让用户灵活地自定义 Coprocessor
编程技巧
充分利用好 CellUtil
// 直接使用 byte[] 进行匹配,效率会更高
// Bad: cf.equals(Bytes.toString(CellUtil.cloneFamily(cell)))
CellUtil.matchingFamily(cell, cf) && CellUtil.matchingQualifier(cell, col)
// 同理,应尽量使用 `Bytes.equals`,来替代 `String#equals`
发挥好协处理的并行计算能力
// 某些很难使得表数据分布均匀的场景下,可以设置好预分区 [00, 01, 02, ..., 99],并关闭自动分区(详见:常见命令-分区),则可保证每个 Region 上的只有单个 xx 前缀。这样,导表数据的时候,轮询地在 rowkey 前加上 xx 前缀,则可保证无热点 Region
// 在协处理器的程序中,则可先获取到 xx 前缀,并在构建 Scan 的时候,将前缀加在 startKey/endKey 前面即可
static String getStartKeyPrefix(HRegion region) {
if (region == null) throw new RuntimeException("Region is null!");
byte[] startKey = region.getStartKey();
if (startKey == null || startKey.length == 0) return "00";
String startKeyStr = Bytes.toString(startKey);
return isEmpty(startKeyStr) ? "00" : startKeyStr.substring(0, 2);
}
private static boolean isEmpty(final String s) {
return s == null || s.length() == 0;
}
处理好协处理器程序里的异常
如果在协处理器里面有异常被抛出,并且 hbase.coprocessor.abortonerror
参数没有开启,那么,该协处理器会直接从被加载的环境中被删除掉。否则,则需要看异常类型,如果是 IOException
类型,则会直接被抛出;如果是 DoNotRetryIOException
类型,则不做重试,抛出异常。否则,默认将会尝试 10 次 (硬编码在 AsyncConnectionImpl#RETRY_TIMER
中了)。因此需要依据自己的业务场景,对异常做好妥善的处理
日志打印
// 只能使用 Apache Commons 的 Log 类,否则将无法打印
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
private static final Log log = LogFactory.getLog(CoprocessorImpl.class.getName());
部署
# 先上传 coprocessor 处理器 jar 包
$ hadoop fs -copyFromLocal /home/hbase/script/coprocessor-0.0.1.jar hdfs://yuzhouwan/hbase/coprocessor/
$ hadoop fs -ls hdfs://yuzhouwan/hbase/coprocessor/
# 卸载旧的 coprocessor
$ alter 'yuzhouwan', METHOD => 'table_att_unset', NAME =>'coprocessor$1'
# 指定新的 coprocessor
$ alter 'yuzhouwan', METHOD => 'table_att', 'coprocessor' => 'hdfs://yuzhouwan/hbase/coprocessor/coprocessor-0.0.1.jar|com.yuzhouwan.hbase.coprocessor.Aggregation|111|'
# 通过查看 RegionServer 的日志,可观察协处理器的运行状况
常用命令
集群相关
$ su - hbase
$ start-hbase.sh
# HMaster ThriftServer
$ jps | grep -v Jps
32538 ThriftServer
9383 HMaster
8423 HRegionServer
# BackUp HMaster ThriftServer
$ jps | grep -v Jps
24450 jar
21882 HMaster
2296 HRegionServer
14598 ThriftServer
5998 Jstat
# BackUp HMaster ThriftServer
$ jps | grep -v Jps
31119 Bootstrap
8775 HMaster
25289 Bootstrap
14823 Bootstrap
12671 Jstat
9052 ThriftServer
26921 HRegionServer
# HRegionServer
$ jps | grep -v Jps
29356 hbase-monitor-process-0.0.3-jar-with-dependencies.jar # monitor
11023 Jstat
26135 HRegionServer
$ export -p | egrep -i "(hadoop|hbase)"
declare -x HADOOP_HOME="/home/bigdata/software/hadoop"
declare -x HBASE_HOME="/home/bigdata/software/hbase"
declare -x PATH="/usr/local/anaconda/bin:/usr/local/R-3.2.1/bin:/home/bigdata/software/java/bin:/home/bigdata/software/hadoop/bin:/home/bigdata/software/hive/bin:/home/bigdata/software/sqoop/bin:/home/bigdata/software/hbase/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin"
$ java -XX:+PrintFlagsFinal -version | grep MaxHeapSize
uintx MaxHeapSize := 32126271488 {product} # 29.919921875 GB
java version "1.7.0_60-ea"
Java(TM) SE Runtime Environment (build 1.7.0_60-ea-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)
$ top
top - 11:37:03 up 545 days, 18:45, 5 users, load average: 8.74, 10.39, 10.96
Tasks: 653 total, 1 running, 652 sleeping, 0 stopped, 0 zombie
Cpu(s): 32.9%us, 0.7%sy, 0.0%ni, 66.3%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 264484056k total, 260853032k used, 3631024k free, 2235248k buffers
Swap: 10485756k total, 10485756k used, 0k free, 94307776k cached
# Memory: 252 GB
# `hbase classpath` 可以拿到 HBase 相关的所有依赖
$ java -classpath ~/opt/hbase/soft/yuzhouwan.jar:`hbase classpath` com.yuzhouwan.hbase.MainApp
# Usage
Usage: hbase [<options>] <command> [<args>]
Options:
--config DIR Configuration direction to use. Default: ./conf
--hosts HOSTS Override the list in 'regionservers' file
Commands:
Some commands take arguments. Pass no args or -h for usage.
shell Run the HBase shell
hbck Run the hbase 'fsck' tool
hlog Write-ahead-log analyzer
hfile Store file analyzer
zkcli Run the ZooKeeper shell
upgrade Upgrade hbase
master Run an HBase HMaster node
regionserver Run an HBase HRegionServer node
zookeeper Run a Zookeeper server
rest Run an HBase REST server
thrift Run the HBase Thrift server
thrift2 Run the HBase Thrift2 server
clean Run the HBase clean up script
classpath Dump hbase CLASSPATH
mapredcp Dump CLASSPATH entries required by mapreduce
pe Run PerformanceEvaluation
ltt Run LoadTestTool
version Print the version
CLASSNAME Run the class named CLASSNAME
# hbase版本信息
$ hbase version
2017-01-13 11:05:07,580 INFO [main] util.VersionInfo: HBase 0.98.8-hadoop2
2017-01-13 11:05:07,580 INFO [main] util.VersionInfo: Subversion file:///e/hbase_compile/hbase-0.98.8 -r Unknown
2017-01-13 11:05:07,581 INFO [main] util.VersionInfo: Compiled by 14074019 on Mon Dec 26 20:17:32 2016
$ hadoop fs -ls /hbase
drwxr-xr-x - hbase hbase 0 2017-03-01 00:05 /hbase/.hbase-snapshot
drwxr-xr-x - hbase hbase 0 2016-10-