2021SC@SDUSC hbase源码分析（八）Coprocessor分析（2）

最新推荐文章于 2024-07-26 14:20:58 发布

努力努力再努力6

最新推荐文章于 2024-07-26 14:20:58 发布

阅读量1.6k

点赞数

分类专栏： 2021软件工程应用与实践文章标签： hbase big data 数据库

本文链接：https://blog.csdn.net/weixin_45785943/article/details/121339964

版权

2021软件工程应用与实践专栏收录该内容

14 篇文章 0 订阅

订阅专栏

2021SC@SDUSC hbase源码分析（八）Coprocessor分析（2）

Endpoint介绍

Endpoint Coprocessor类似于MySQL中的存储过程。Endpoint Coprocessor允许将用户代码下推到数据层执行。一个典型的例子就是计算一张表（大量Region）的平均值或求和，可以使用Endpoint Coprocessor将计算逻辑下推到R俄国i哦那Server执行。

Endpoint允许定义自己的动态RPC协议,用于客户端与region servers通讯。Coprocessor 与region server在相同的进程空间中，因此您可以在region端定义自己的方法（Endpoint），将计算放到region端，减少网络开销，常用于提升hbase的功能。

Endpoint与Observer区别

Endpoint与Observer同属于HBase的协处理器Coprocessor。

Observer Coprocessor执行对于用户来说是透明的，只要是HBase系统执行了get操作，对应的preGetOp就会执行，不需要显式调用preGetOp方法。

Endpoint Coprocessor执行必须由用户显式触发调用。
Observer 允许集群在正常的客户端操作过程中可以有不同的行为表现。
Endpoint 允许扩展集群的能力，对客户端应用开放新的运算命令。
observer 类似于 RDBMS 中的触发器，主要在服务端工作。
Endpoint 类似于 RDBMS 中的存储过程，主要在服务端工作。
observer 可以实现权限管理、优先级设置、监控、ddl 控制、二级索引等功能。
Endpoint 可以实现 min、max、avg、sum、distinct、group by 等功能。

Endpoint工作流程

Endpoint部分代码

HBase中AggregateImplementation是了一个Endpoint，实现了Coprocessor接口，该类主要用于实现聚合功能，提供了getMax、getMin、getSum等RPC调用。

getMax

@Override
public void getMax(RpcController controller, AggregateRequest request,
        RpcCallback<AggregateResponse> done) {
  InternalScanner scanner = null;
  AggregateResponse response = null;
  T max = null;
  try {
    ColumnInterpreter<T, S, P, Q, R> ci = constructColumnInterpreterFromRequest(request);
    T temp;
    Scan scan = ProtobufUtil.toScan(request.getScan());
    scanner = env.getRegion().getScanner(scan);
    List<Cell> results = new ArrayList<>();
    byte[] colFamily = scan.getFamilies()[0];
    NavigableSet<byte[]> qualifiers = scan.getFamilyMap().get(colFamily);
    byte[] qualifier = null;
    if (qualifiers != null && !qualifiers.isEmpty()) {
      qualifier = qualifiers.pollFirst();
    }
    // 此处qualifier可以为null
    boolean hasMoreRows = false;
    do {
      hasMoreRows = scanner.next(results);
      int listSize = results.size();
      for (int i = 0; i < listSize; i++) {
        temp = ci.getValue(colFamily, qualifier, results.get(i));
        max = (max == null || (temp != null && ci.compare(temp, max) > 0)) ? temp : max;
      }
      results.clear();
    } while (hasMoreRows);
    if (max != null) {
      AggregateResponse.Builder builder = AggregateResponse.newBuilder();
      builder.addFirstPart(ci.getProtoForCellType(max).toByteString());
      response = builder.build();
    }
  } catch (IOException e) {
    CoprocessorRpcUtils.setControllerException(controller, e);
  } finally {
    if (scanner != null) {
      try {
        scanner.close();
      } catch (IOException ignored) {}
    }
  }
    //打印日志：region中的最大值
  log.info("Maximum from this region is "
      + env.getRegion().getRegionInfo().getRegionNameAsString() + ": " + max);
  done.run(response);
}

getMin

@Override
public void getMin(RpcController controller, AggregateRequest request,
        RpcCallback<AggregateResponse> done) {
  AggregateResponse response = null;
  InternalScanner scanner = null;
  T min = null;
  try {
    ColumnInterpreter<T, S, P, Q, R> ci = constructColumnInterpreterFromRequest(request);
    T temp;
    Scan scan = ProtobufUtil.toScan(request.getScan());
    scanner = env.getRegion().getScanner(scan);
    List<Cell> results = new ArrayList<>();
    byte[] colFamily = scan.getFamilies()[0];
    NavigableSet<byte[]> qualifiers = scan.getFamilyMap().get(colFamily);
    byte[] qualifier = null;
    if (qualifiers != null && !qualifiers.isEmpty()) {
      qualifier = qualifiers.pollFirst();
    }
    boolean hasMoreRows = false;
    do {
      hasMoreRows = scanner.next(results);
      int listSize = results.size();
      for (int i = 0; i < listSize; i++) {
        temp = ci.getValue(colFamily, qualifier, results.get(i));
        min = (min == null || (temp != null && ci.compare(temp, min) < 0)) ? temp : min;
      }
      results.clear();
    } while (hasMoreRows);
    if (min != null) {
      response = AggregateResponse.newBuilder().addFirstPart(
        ci.getProtoForCellType(min).toByteString()).build();
    }
  } catch (IOException e) {
    CoprocessorRpcUtils.setControllerException(controller, e);
  } finally {
    if (scanner != null) {
      try {
        scanner.close();
      } catch (IOException ignored) {}
    }
  }
  log.info("Minimum from this region is "
      + env.getRegion().getRegionInfo().getRegionNameAsString() + ": " + min);
  done.run(response);
}

getSum

@Override
public void getSum(RpcController controller, AggregateRequest request,
        RpcCallback<AggregateResponse> done) {
  AggregateResponse response = null;
  InternalScanner scanner = null;
  long sum = 0L;
  try {
    ColumnInterpreter<T, S, P, Q, R> ci = constructColumnInterpreterFromRequest(request);
    S sumVal = null;
    T temp;
    Scan scan = ProtobufUtil.toScan(request.getScan());
    scanner = env.getRegion().getScanner(scan);
    byte[] colFamily = scan.getFamilies()[0];
    NavigableSet<byte[]> qualifiers = scan.getFamilyMap().get(colFamily);
    byte[] qualifier = null;
    if (qualifiers != null && !qualifiers.isEmpty()) {
      qualifier = qualifiers.pollFirst();
    }
    List<Cell> results = new ArrayList<>();
    boolean hasMoreRows = false;
    do {
      hasMoreRows = scanner.next(results);
      int listSize = results.size();
      for (int i = 0; i < listSize; i++) {
        temp = ci.getValue(colFamily, qualifier, results.get(i));
        if (temp != null) {
          sumVal = ci.add(sumVal, ci.castToReturnType(temp));
        }
      }
      results.clear();
    } while (hasMoreRows);
    if (sumVal != null) {
      response = AggregateResponse.newBuilder().addFirstPart(
        ci.getProtoForPromotedType(sumVal).toByteString()).build();
    }
  } catch (IOException e) {
    CoprocessorRpcUtils.setControllerException(controller, e);
  } finally {
    if (scanner != null) {
      try {
        scanner.close();
      } catch (IOException ignored) {}
    }
  }
  log.debug("Sum from this region is "
      + env.getRegion().getRegionInfo().getRegionNameAsString() + ": " + sum);
  done.run(response);
}

getAvg

@Override
public void getAvg(RpcController controller, AggregateRequest request,
        RpcCallback<AggregateResponse> done) {
  AggregateResponse response = null;
  InternalScanner scanner = null;
  try {
    ColumnInterpreter<T, S, P, Q, R> ci = constructColumnInterpreterFromRequest(request);
    S sumVal = null;
    Long rowCountVal = 0L;
    Scan scan = ProtobufUtil.toScan(request.getScan());
    scanner = env.getRegion().getScanner(scan);
    byte[] colFamily = scan.getFamilies()[0];
    NavigableSet<byte[]> qualifiers = scan.getFamilyMap().get(colFamily);
    byte[] qualifier = null;
    if (qualifiers != null && !qualifiers.isEmpty()) {
      qualifier = qualifiers.pollFirst();
    }
    List<Cell> results = new ArrayList<>();
    boolean hasMoreRows = false;

    do {
      results.clear();
      hasMoreRows = scanner.next(results);
      int listSize = results.size();
      for (int i = 0; i < listSize; i++) {
        sumVal = ci.add(sumVal, ci.castToReturnType(ci.getValue(colFamily,
            qualifier, results.get(i))));
      }
      rowCountVal++;
    } while (hasMoreRows);
    if (sumVal != null) {
      ByteString first = ci.getProtoForPromotedType(sumVal).toByteString();
      AggregateResponse.Builder pair = AggregateResponse.newBuilder();
      pair.addFirstPart(first);
      ByteBuffer bb = ByteBuffer.allocate(8).putLong(rowCountVal);
      bb.rewind();
      pair.setSecondPart(ByteString.copyFrom(bb));
      response = pair.build();
    }
  } catch (IOException e) {
    CoprocessorRpcUtils.setControllerException(controller, e);
  } finally {
    if (scanner != null) {
      try {
        scanner.close();
      } catch (IOException ignored) {}
    }
  }
  done.run(response);
}

Endpoint使用

实现一个Endpoint（即自定义一个RPC Protocol）涉及到以下两个步骤：

创建自定义RPC Protocol接口，并继承接口CoprocessorProtocol，其中自定义RPC Protocol接口中的方法表示Client与Region的交互协议；
实现自定义RPC Protocol接口，并继承类BaseEndpointCoprocessor，其中自定义RPC Protocol接口中的方法实现表示Client与Region的交互细节。

CoprocessorProtocol实例是与具体的单个Region相关联的，Client进行RPC请求时必须能够识别出应该在相应表的哪些Regions上发起CoprocessorProtocol实例方法调用，然而Region相关操作很少被Client直接处理，且Region的名称和数目经常变化，因此Endpoint使用RowKey来识别这些Regions，具体表现在HTable的三个API，如下：

<T extends CoprocessorProtocol> T coprocessorProxy(Class<T> protocol, byte[] row)

这个API使用在单个Region的场景，这个Region满足的条件是包含数据行row。

<T extends CoprocessorProtocol, R> Map<byte[],R> coprocessorExec(Class<T> protocol, byte[] startKey, byte[] endKey, Batch.Call<T,R> callable)

<T extends CoprocessorProtocol, R> void coprocessorExec(Class<T> protocol, byte[] startKey, byte[] endKey, Batch.Call<T,R> callable, Batch.Callback<R> callback)

这两个API使用在多个Region的场景，这些Region满足的条件是包含startKey——endKey范围内所有数据行的Regions。

Coprocessor加载

用户定义的Coprocessor可以通过两种方式加载到RegionServer：

通过配置文件进行静态加载
动态加载

1.静态加载

通过修改 hbase-site.xml 这个文件来实现，启动全局 aggregation，能过操纵所有的表上的数据。

<property>
	<name>hbase.coprocessor.user.region.classes</name>
	<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>

将Coprocessor配置到hbase-site.xml。hbase-site.xml 定义了多个相关配置项：

hbase.coprocessor.region.classes，配置RegionObservers和Endpoint Coprocessor。
hbase.coprocessor.wal.classes，配置WALObservers。
hbase.coprocessor.master.classes，配置MasterObservers。

比如想业务实现一个Endpoint Coprocessor，我们可以这样配置：

<property>
	<name>hbase.coprocessor.user.region.classes</name>
	<value>org.apache.hadoop.hbase.coprocessor.endpoint.SumEndPoint</value>
</property>

然后将Coprocessor代码放到HBase的classpath下。最简单的方法是将Coprocessor对应的jar包放在HBase的lib目录下。最后重启HBase集群即可。

2.动态加载

启用表 aggregation，只对特定的表生效。通过 HBase Shell 来实现。

disable表，在命令行输入：
```
hbase> disable 'mytable'
```

修改schema，代码如下：

hbase> alter 'mytable', METHOD => 'table_att','coprocessor'=>'|org.apache.Hadoop.hbase.coprocessor.AggregateImplementation||'

重启指定表
```
hbase> enable 'mytable'
```

Shell 来实现。

disable表，在命令行输入：
```
hbase> disable 'mytable'
```

修改schema，代码如下：

hbase> alter 'mytable', METHOD => 'table_att','coprocessor'=>'|org.apache.Hadoop.hbase.coprocessor.AggregateImplementation||'

重启指定表
```
hbase> enable 'mytable'
```

努力努力再努力6

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2021SC@SDUSC hbase源码分析（八）Coprocessor分析（2）

2021SC@SDUSC hbase源码分析（八）Coprocessor分析（2）目录2021SC@SDUSC hbase源码分析（八）Coprocessor分析（2）Endpoint介绍Endpoint与Observer区别Endpoint工作流程Endpoint部分代码getMaxgetMingetSumgetAvgEndpoint使用Coprocessor加载1.静态加载2.动态加载2021SC@SDUSC 2021SC@SDUSC2021SC@SDUSC 2021SC@SDUSCEndpoi
复制链接

扫一扫

专栏目录