2021SC@SDUSC hbase源码分析(八)Coprocessor分析(2)
目录
2021SC@SDUSC 2021SC@SDUSC
2021SC@SDUSC 2021SC@SDUSC
Endpoint介绍
Endpoint Coprocessor类似于MySQL中的存储过程。Endpoint Coprocessor允许将用户代码下推到数据层执行。一个典型的例子就是计算一张表(大量Region)的平均值或求和,可以使用Endpoint Coprocessor将计算逻辑下推到R俄国i哦那Server执行。
Endpoint允许定义自己的动态RPC协议,用于客户端与region servers通讯。Coprocessor 与region server在相同的进程空间中,因此您可以在region端定义自己的方法(Endpoint),将计算放到region端,减少网络开销,常用于提升hbase的功能。
Endpoint与Observer区别
Endpoint与Observer同属于HBase的协处理器Coprocessor。
-
Observer Coprocessor执行对于用户来说是透明的,只要是HBase系统执行了get操作,对应的preGetOp就会执行,不需要显式调用preGetOp方法。
Endpoint Coprocessor执行必须由用户显式触发调用。
-
Observer 允许集群在正常的客户端操作过程中可以有不同的行为表现。
Endpoint 允许扩展集群的能力,对客户端应用开放新的运算命令。 -
observer 类似于 RDBMS 中的触发器,主要在服务端工作。
Endpoint 类似于 RDBMS 中的存储过程,主要在服务端工作。 -
observer 可以实现权限管理、优先级设置、监控、ddl 控制、二级索引等功能。
Endpoint 可以实现 min、max、avg、sum、distinct、group by 等功能。
Endpoint工作流程
Endpoint部分代码
HBase中AggregateImplementation是了一个Endpoint,实现了Coprocessor接口,该类主要用于实现聚合功能,提供了getMax、getMin、getSum等RPC调用。
getMax
@Override
public void getMax(RpcController controller, AggregateRequest request,
RpcCallback<AggregateResponse> done) {
InternalScanner scanner = null;
AggregateResponse response = null;
T max = null;
try {
ColumnInterpreter<T, S, P, Q, R> ci = constructColumnInterpreterFromRequest(request);
T temp;
Scan scan = ProtobufUtil.toScan(request.getScan());
scanner = env.getRegion().getScanner(scan);
List<Cell> results = new ArrayList<>();
byte[] colFamily = scan.getFamilies()[0];
NavigableSet<byte[]> qualifiers = scan.getFamilyMap().get(colFamily);
byte[] qualifier = null;
if (qualifiers != null && !qualifiers.isEmpty()) {
qualifier = qualifiers.pollFirst();
}
// 此处qualifier可以为null
boolean hasMoreRows = false;
do {
hasMoreRows = scanner.next(results);
int listSize = results.size();
for (int i = 0; i < listSize; i++) {
temp = ci.getValue(colFamily, qualifier, results.get(i));
max = (max == null || (temp != null && ci.compare(temp, max) > 0)) ? temp : max;
}
results.clear();
} while (hasMoreRows);
if (max != null) {
AggregateResponse.Builder builder = AggregateResponse.newBuilder();
builder.addFirstPart(ci.getProtoForCellType(max).toByteString());
response = builder.build();
}
} catch (IOException e) {
CoprocessorRpcUtils.setControllerException(controller, e);
} finally {
if (scanner != null) {
try {
scanner.close();
} catch (IOException ignored) {}
}
}
//打印日志:region中的最大值
log.info("Maximum from this region is "
+ env.getRegion().getRegionInfo().getRegionNameAsString() + ": " + max);
done.run(response);
}
getMin
@Override
public void getMin(RpcController controller, AggregateRequest request,
RpcCallback<AggregateResponse> done) {
AggregateResponse response = null;
InternalScanner scanner = null;
T min = null;
try {
ColumnInterpreter<T, S, P, Q, R> ci = constructColumnInterpreterFromRequest(request);
T temp;
Scan scan = ProtobufUtil.toScan(request.getScan());
scanner = env.getRegion().getScanner(scan);
List<Cell> results = new ArrayList<>();
byte[] colFamily = scan.getFamilies()[0];
NavigableSet<byte[]> qualifiers = scan.getFamilyMap().get(colFamily);
byte[] qualifier = null;
if (qualifiers != null && !qualifiers.isEmpty()) {
qualifier = qualifiers.pollFirst();
}
boolean hasMoreRows = false;
do {
hasMoreRows = scanner.next(results);
int listSize = results.size();
for (int i = 0; i < listSize; i++) {
temp = ci.getValue(colFamily, qualifier, results.get(i));
min = (min == null || (temp != null && ci.compare(temp, min) < 0)) ? temp : min;
}
results.clear();
} while (hasMoreRows);
if (min != null) {
response = AggregateResponse.newBuilder().addFirstPart(
ci.getProtoForCellType(min).toByteString()).build();
}
} catch (IOException e) {
CoprocessorRpcUtils.setControllerException(controller, e);
} finally {
if (scanner != null) {
try {
scanner.close();
} catch (IOException ignored) {}
}
}
log.info("Minimum from this region is "
+ env.getRegion().getRegionInfo().getRegionNameAsString() + ": " + min);
done.run(response);
}
getSum
@Override
public void getSum(RpcController controller, AggregateRequest request,
RpcCallback<AggregateResponse> done) {
AggregateResponse response = null;
InternalScanner scanner = null;
long sum = 0L;
try {
ColumnInterpreter<T, S, P, Q, R> ci = constructColumnInterpreterFromRequest(request);
S sumVal = null;
T temp;
Scan scan = ProtobufUtil.toScan(request.getScan());
scanner = env.getRegion().getScanner(scan);
byte[] colFamily = scan.getFamilies()[0];
NavigableSet<byte[]> qualifiers = scan.getFamilyMap().get(colFamily);
byte[] qualifier = null;
if (qualifiers != null && !qualifiers.isEmpty()) {
qualifier = qualifiers.pollFirst();
}
List<Cell> results = new ArrayList<>();
boolean hasMoreRows = false;
do {
hasMoreRows = scanner.next(results);
int listSize = results.size();
for (int i = 0; i < listSize; i++) {
temp = ci.getValue(colFamily, qualifier, results.get(i));
if (temp != null) {
sumVal = ci.add(sumVal, ci.castToReturnType(temp));
}
}
results.clear();
} while (hasMoreRows);
if (sumVal != null) {
response = AggregateResponse.newBuilder().addFirstPart(
ci.getProtoForPromotedType(sumVal).toByteString()).build();
}
} catch (IOException e) {
CoprocessorRpcUtils.setControllerException(controller, e);
} finally {
if (scanner != null) {
try {
scanner.close();
} catch (IOException ignored) {}
}
}
log.debug("Sum from this region is "
+ env.getRegion().getRegionInfo().getRegionNameAsString() + ": " + sum);
done.run(response);
}
getAvg
@Override
public void getAvg(RpcController controller, AggregateRequest request,
RpcCallback<AggregateResponse> done) {
AggregateResponse response = null;
InternalScanner scanner = null;
try {
ColumnInterpreter<T, S, P, Q, R> ci = constructColumnInterpreterFromRequest(request);
S sumVal = null;
Long rowCountVal = 0L;
Scan scan = ProtobufUtil.toScan(request.getScan());
scanner = env.getRegion().getScanner(scan);
byte[] colFamily = scan.getFamilies()[0];
NavigableSet<byte[]> qualifiers = scan.getFamilyMap().get(colFamily);
byte[] qualifier = null;
if (qualifiers != null && !qualifiers.isEmpty()) {
qualifier = qualifiers.pollFirst();
}
List<Cell> results = new ArrayList<>();
boolean hasMoreRows = false;
do {
results.clear();
hasMoreRows = scanner.next(results);
int listSize = results.size();
for (int i = 0; i < listSize; i++) {
sumVal = ci.add(sumVal, ci.castToReturnType(ci.getValue(colFamily,
qualifier, results.get(i))));
}
rowCountVal++;
} while (hasMoreRows);
if (sumVal != null) {
ByteString first = ci.getProtoForPromotedType(sumVal).toByteString();
AggregateResponse.Builder pair = AggregateResponse.newBuilder();
pair.addFirstPart(first);
ByteBuffer bb = ByteBuffer.allocate(8).putLong(rowCountVal);
bb.rewind();
pair.setSecondPart(ByteString.copyFrom(bb));
response = pair.build();
}
} catch (IOException e) {
CoprocessorRpcUtils.setControllerException(controller, e);
} finally {
if (scanner != null) {
try {
scanner.close();
} catch (IOException ignored) {}
}
}
done.run(response);
}
Endpoint使用
实现一个Endpoint(即自定义一个RPC Protocol)涉及到以下两个步骤:
- 创建自定义RPC Protocol接口,并继承接口CoprocessorProtocol,其中自定义RPC Protocol接口中的方法表示Client与Region的交互协议;
- 实现自定义RPC Protocol接口,并继承类BaseEndpointCoprocessor,其中自定义RPC Protocol接口中的方法实现表示Client与Region的交互细节。
CoprocessorProtocol实例是与具体的单个Region相关联的,Client进行RPC请求时必须能够识别出应该在相应表的哪些Regions上发起CoprocessorProtocol实例方法调用,然而Region相关操作很少被Client直接处理,且Region的名称和数目经常变化,因此Endpoint使用RowKey来识别这些Regions,具体表现在HTable的三个API,如下:
<T extends CoprocessorProtocol> T coprocessorProxy(Class<T> protocol, byte[] row)
这个API使用在单个Region的场景,这个Region满足的条件是包含数据行row。
<T extends CoprocessorProtocol, R> Map<byte[],R> coprocessorExec(Class<T> protocol, byte[] startKey, byte[] endKey, Batch.Call<T,R> callable)
<T extends CoprocessorProtocol, R> void coprocessorExec(Class<T> protocol, byte[] startKey, byte[] endKey, Batch.Call<T,R> callable, Batch.Callback<R> callback)
这两个API使用在多个Region的场景,这些Region满足的条件是包含startKey——endKey范围内所有数据行的Regions。
Coprocessor加载
用户定义的Coprocessor可以通过两种方式加载到RegionServer:
- 通过配置文件进行静态加载
- 动态加载
1.静态加载
通过修改 hbase-site.xml 这个文件来实现,启动全局 aggregation,能过操纵所有的表上的数据。
<property>
<name>hbase.coprocessor.user.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>
将Coprocessor配置到hbase-site.xml。hbase-site.xml 定义了多个相关配置项:
- hbase.coprocessor.region.classes,配置RegionObservers和Endpoint Coprocessor。
- hbase.coprocessor.wal.classes,配置WALObservers。
- hbase.coprocessor.master.classes,配置MasterObservers。
比如想业务实现一个Endpoint Coprocessor,我们可以这样配置:
<property>
<name>hbase.coprocessor.user.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.endpoint.SumEndPoint</value>
</property>
然后将Coprocessor代码放到HBase的classpath下。最简单的方法是将Coprocessor对应的jar包放在HBase的lib目录下。最后重启HBase集群即可。
2.动态加载
启用表 aggregation,只对特定的表生效。通过 HBase Shell 来实现。
-
disable表,在命令行输入:
hbase> disable 'mytable'
-
修改schema,代码如下:
hbase> alter 'mytable', METHOD => 'table_att','coprocessor'=>'|org.apache.Hadoop.hbase.coprocessor.AggregateImplementation||'
-
重启指定表
hbase> enable 'mytable'
Shell 来实现。
-
disable表,在命令行输入:
hbase> disable 'mytable'
-
修改schema,代码如下:
hbase> alter 'mytable', METHOD => 'table_att','coprocessor'=>'|org.apache.Hadoop.hbase.coprocessor.AggregateImplementation||'
-
重启指定表
hbase> enable 'mytable'