理解HBASE COPROCESSOR笔记

What is Coprocessor? Simply stated, Coprocessor is a framework that provides an easy way to run your custom code on Region Server(HBase就是运行在RegionServer上的你可以定制程序).

In a scenario like this it’s better to move the computation to the data itself; just like stored procedure (a better analogy is MapReduce model)(将计算放到数据节点上进行).

(业界主流的coprocessor实现)

Coprocessor present in the industry are:

Triggers and Stored Procedure: This is most common analogy that you will find for Coprocessor. (The official document uses this analogy). Observer Coprocessor
(discussed below) is compared to triggers because like triggers they execute your custom code when certain event occurs (like Get or Put etc.)(Get\Put等操作触发协处理器的程序). Similarly Endpoints Coprocessor (discussed below) is compared to the stored procedures and you can perform custom computation on data directly inside the region server.
MapReduce: As in MapReduce you move the computation to the data in the same way. Coprocessor executes your custom computation directly on Region Servers, i.e. where data resides. That’s why some people compare Coprocessor to small MapReduce jobs(MapReduce类比).
AOP: Some people compare it to Aspect Oriented Programming (AOP). As in AOP, you apply advice by intercepting(拦截请求,执行定制程序) the request then running some custom code (probably cross-cutting) and then forwarding the request on its path as if nothing happened (or even return it back). Similarly in Coprocessor you have this facility of intercepting the request and running custom code and then forwarding it on its path (or returning it).

In HBase, to implement a Coprocessor certain steps must be followed as described below:

Either your class should extend one of the Coprocessor classes (like BaseRegionObserver) or it should implement Coprocessor interfaces (like Coprocessor, CoprocessorService)(实现上,必须继承BaseRegionObserver类或者实现Coprocessor,CoprocessorService这样的接口).
Load the Coprocessor: Currently there are two ways to load the Coprocessor. One is static (i.e. loading from configuration) and the other is dynamic (i.e. loading from table descriptor either through Java code or through ‘hbase shell’). Both are discussed below in detail(静态地从lib目录加载jar包,需要重启hbase重新初始化;或者通过hbase shell动态配置加载jar包).
Finally your client-side code to call the Coprocessor. This is the easiest step, as HBase handles the Coprocessor transparently and you don’t have to do much to call the Coprocessor.(最后你的定制代码就会被hbase注册并作为协处理器的实现去执行)
Coprocessors are executed directly on region server; therefore a faulty/malicious code can bring your region server down(直接作为regionserver的一部分在regionserver上执行). Currently there is no mechanism to prevent this, but there are efforts going on for this. For more, see JIRA ticketHBASE-4047.

1.Observer Coprocessor: As stated above, these are just like database triggers, i.e. they execute your custom code on the occurrence of certain events. If you want to your code to be executed before the put operation then you should override following method of RegionObserver class(好比触发器或advice,代码类似覆写RegionObserver类的pre*** post***方法的形式).

public void prePut (final ObserverContext e, final Put put, final WALEdit edit,finalDurability durability) throws IOException {
} 
public void postPut(final ObserverContext e, final Put put, final WALEdit edit, finalDurability durability) throws IOException { }​

Observer Coprocessor has following flavors(以下几类Observer):

  1. RegionObserver: This Coprocessor provides the facility to hook your code when the events on region are triggered. Most common example include ‘preGet’ and ‘postGet’ for ‘Get’ operation and ‘prePut’ and ‘postPut’ for ‘Put’ operation(常用).
  2. Region Server Observer: Provides hook for the events related to the RegionServer, such as stopping the RegionServer and performing operations before or after merges, commits, or rollbacks.
  3. WAL Observer: Provides hooks for WAL (Write-Ahead-Log) related operation. It has only two method ‘preWALWrite()’ and ‘postWALWrite()’.
  4. Master Observer: This observer provides hooks for DDL like operation, such as create, delete, modify table.

Example of Observer Coprocessor:

Table 1: ‘users’ table

这里写图片描述
Consider a hypothetical example having the ‘users’ table as shown above. In the above example, the client can query the information about the employee. For the purpose of demonstration of Coprocessor we assuming that ‘admin’ is a special person and his details shouldn’t be visible to any client querying the table. To achieve this we will take the help of Coprocessor.

Following are the steps:

  1. Write a class that extends the BaseRegionObserver class(扩展类).
  2. Override the ‘preGetOp()’ method (Note that ‘preGet()’ method is now deprecated). You should use ‘preGetOp’ method here because first check if the queried rowkey is ‘admin’ or not. If it ‘admin’ then return the call without allowing the system to perform the get operation thus saving on performance(覆写方法).
  3. Export your code in a jar file(打包jar).
  4. Place the jar in HDFS where HBase can locate it(放入hdfs).
  5. Load the Coprocessor(加载类).
  6. Write a simple program to test it(测试).
public class RegionObserverExample extends BaseRegionObserver {

    private static final byte[] ADMIN = Bytes.toBytes("admin");
    private static final byte[] COLUMN_FAMILY = Bytes.toBytes("details");
    private static final byte[] COLUMN = Bytes.toBytes("Admin_det");
                private static final byte[] VALUE = Bytes.toBytes("You can’t see Admin details");

    @Override
    public void preGetOp(final ObserverContext e, final Get get, final List results) throwsIOException {

        if (Bytes.equals(get.getRow(),ADMIN)) {
            Cell c = CellUtil.createCell(get.getRow(),COLUMN _FAMILY, COLUMN, System.currentTimeMillis(), (byte)4, VALUE);
            results.add(c);
            e.bypass();
        }

        List kvs = new ArrayList(results.size());
        for (Cell c : results) {
            kvs.add(KeyValueUtil.ensureKeyValue(c));
        }
        preGet(e, get, kvs);
        results.clear();
        results.addAll(kvs);
    }
}
@Override
public RegionScanner preScannerOpen(final ObserverContext e, final Scan scan, finalRegionScanner s) throws IOException {  

    Filter filter = new RowFilter(CompareOp.NOT_EQUAL, new BinaryComparator(ADMIN));
    scan.setFilter(filter);
    return s;
}


With Endpoints Coprocessor you can create your own dynamic RPC protocol and thus can provide communication between client and region server, thus enabling you to run your custom code on region server (on each region of a table). Unlike observer Coprocessor (where your custom code is executed transparently when events like ‘Get’ operation occurs), in Endpoint Coprocessor you have to explicitly invoke the Coprocessor by using the ‘CoprocessorService()’ method of the ‘HTableInterface’ (or HTable)(通过RPC协议来和regionserver通讯,形成双向耦合,区别于Obersver的“事件+触发”机制).

  1. Create a ‘.proto’ file defining your service(启动一个.proto的service).
  2. Execute the ‘protoc’ command to generate the Java code from the above ‘.proto’ file(导出rpc代码框架).
  3. Write a class that should:
    Extend the above generated service class(继承导出的class).
    It should also implement two interfaces Coprocessor and CoprocessorService(实现Coprocessor and CoprocessorService接口).
  4. Override the service method(覆写service方法).
  5. Load the Coprocessor(加载Coprocessor).
  6. Write a client code to call Coprocessor(写客户端代码调用Coprocessor).

Step 1: Create a ‘proto’ file(写proto文件) to define your service, request and response. Let’s call this file “sum.proto”. Below is the content of the ‘sum.proto’ file

option java_package = "org.myname.hbase.Coprocessor.autogenerated";
option java_outer_classname = "Sum";
option java_generic_services = true;
option java_generate_equals_and_hash = true;
option optimize_for = SPEED;
message SumRequest {
    required string family = 1;
    required string column = 2;
}

message SumResponse {
  required int64 sum = 1 [default = 0];
}

service SumService {
  rpc getSum(SumRequest)
    returns (SumResponse);
}


Step 2: Compile(编译) the proto file using proto compiler (for detailed instructions see this excellentofficial documentation).

$ protoc --java_out=src ./sum.proto

(Note: It is necessary for you to create the src folder).
This will generate a class call “Sum.java”.

Step 3: Write your Endpoint Coprocessor: Firstly your class should extend the service just defined above (i.e. Sum.SumService)(继承). Second it should implement Coprocessor and CoprocessorService interfaces( 实现). Third, override the ‘getService()’, ‘start()’, ‘stop()’ and ‘getSum()’ methods(覆写). Below is the full code:

public class SumEndPoint extends SumService implements Coprocessor, CoprocessorService {

    private RegionCoprocessorEnvironment env;

    @Override
    public Service getService() {
        return this;
    }

    @Override
    public void start(CoprocessorEnvironment env) throws IOException {
        if (env instanceof RegionCoprocessorEnvironment) {
            this.env = (RegionCoprocessorEnvironment)env;
        } else {
            throw new CoprocessorException("Must be loaded on a table region!");
        }
    }


    @Override
    public void stop(CoprocessorEnvironment env) throws IOException {
        // do mothing
    }


    @Override
    public void getSum(RpcController controller, SumRequest request, RpcCallback done) {
        Scan scan = new Scan();
        scan.addFamily(Bytes.toBytes(request.getFamily()));
        scan.addColumn(Bytes.toBytes(request.getFamily()), Bytes.toBytes(request.getColumn()));
        SumResponse response = null;
        InternalScanner scanner = null;
        try {
            scanner = env.getRegion().getScanner(scan);
            List results = new ArrayList();
            boolean hasMore = false;
                        long sum = 0L;
                do {
                        hasMore = scanner.next(results);
                        for (Cell cell : results) {
                            sum = sum + Bytes.toLong(CellUtil.cloneValue(cell));
                     }
                        results.clear();
                } while (hasMore);

                response = SumResponse.newBuilder().setSum(sum).build();

        } catch (IOException ioe) {
            ResponseConverter.setControllerException(controller, ioe);
        } finally {
            if (scanner != null) {
                try {
                    scanner.close();
                } catch (IOException ignored) {}
            }
        }
        done.run(response);
    }

Step 4: Load the Coprocessor. See loading of Coprocessor. I recommend using static approach for Endpoint Coprocessor(静态方式加载Endpoint Coprocessor).
Step 5: Now we have to write the client code to test it. To do so in your main method, write the following code as shown below:

Configuration conf = HBaseConfiguration.create();
HConnection connection = HConnectionManager.createConnection(conf);
HTableInterface table = connection.getTable("users");
final SumRequest request = SumRequest.newBuilder().setFamily("salaryDet").setColumn("gross").build();
try {
Map<byte[], Long> results = table.CoprocessorService (SumService.class, null, null,
new Batch.Call<SumService, Long>() {
    @Override
        public Long call(SumService aggregate) throws IOException {
BlockingRpcCallback rpcCallback = new BlockingRpcCallback();
            aggregate.getSum(null, request, rpcCallback);
            SumResponse response = rpcCallback.get();
            return response.hasSum() ? response.getSum() : 0L;
        }
    });
    for (Long sum : results.values()) {
        System.out.println("Sum = " + sum);
    }
} catch (ServiceException e) {
e.printStackTrace();
} catch (Throwable e) {
    e.printStackTrace();
}

Loading of Coprocessor:

Coprocessor can be loaded broadly in two ways. One is static (loading through configuration files) and the other one is dynamic loading.

(静态加载方式和动态加载方式)

Dynamic loading: Dynamic loading means loading Coprocessor without restarting HBase. Dynamic loading can be done in three ways(三种):

A. Using Shell: You can load the Coprocessor using the HBase shell as follows:

1.Disable the table so that you can load Coprocessor(disable表)

hbase(main):001:0> disableusers

2.Load the Coprocessor: (i.e. coprocessor.jar) that you copied to HDFS by using following command:

hbase(main):002:0> alter 'users’, METHOD => ‘table_att’, ‘Coprocessor’=>’hdfs://localhost/user/gbhardwaj/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|’

where “hdfs://localhost/user/gbhardwaj/coprocessor.jar” is the full path of the ‘coprocessor.jar’ in your HDFS.and “org.myname.hbase.Coprocessor.RegionObserverExample” is the full name of your class (including package name)(操作拷贝到HDFS中的jar).

  1. Enable the table:
hbase(main):003:0> enable ‘users'
  1. Verify if Coprocessor is loaded by typing following command:
hbase(main):04:0> describe ‘users'

You must see some output like this:

DESCRIPTION ENABLED


users’, {TABLE_ATTRIBUTES => {Coprocessor$1 => true‘hdfs://localhost/user/gbhardwaj/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|’}, {NAME => ‘ personalDet’ …………………

B. Using setValue() method of HTableDescriptor(用HTableDescriptor的setValue方法): This is done entirely in Java as follows:

String tableName = "users";
String path = "hdfs://localhost/user/gbhardwaj/coprocessor.jar";
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
admin.disableTable(tableName);
HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
columnFamily1.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily1);
HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
columnFamily2.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily2);
hTableDescriptor.setValue("COPROCESSOR$1", path +
                  "|" + RegionObserverExample.class.getCanonicalName() +
                  "|" + Coprocessor.PRIORITY_USER);
admin.modifyTable(tableName, hTableDescriptor);
admin.enableTable(tableName);

C. Using addCoprocessor() method of HTableDescriptor(用HTableDescriptor的addCoprocessor方法): This method is available from 0.96 version onwards. Personally I prefer this way only:

String tableName = "users";
String path = "hdfs://localhost/user/gbhardwaj/coprocessor.jar";
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
admin.disableTable(tableName);
HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
columnFamily1.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily1);
HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
columnFamily2.setMaxVersions(3);
hTableDescriptor.addFamily(columnFamily2);
hTableDescriptor.addCoprocessor(RegionObserverExample.class.getCanonicalName(), path, Coprocessor.PRIORITY_USER, null);
admin.modifyTable(tableName, hTableDescriptor);
admin.enableTable(tableName);

2. Static Loading(第二种方式:静态加载): Static loading means that your Coprocessor will take effect only when you restart your HBase and there is a reason for it. In this you make changes ‘hbase-site.xml’ and therefore have to restart HBase for your changes to take place.

Create following entry in ‘hbase-site.xml’ file located in ‘conf’ directory(在hbase-site.xml文件中增加一个entry):

hbase.coprocessor.region.classes
org.myname.hbase.Coprocessor.endpoint.SumEndPoint

Make your code available to the HBase. I used the following simple steps – first, export your endpoint (SumEndPoint. java), service (Sum.java) and other relevant protocol buffer classes (i.e. all classes found under ‘/ java/src/main/java/com/google/protobuf’ directory in jar file). Second, put this jar in the ‘lib’ folder of HBase and finally restart the HBase(将定制类、service类和proto buf类打包,export,并将jar放入hbase的lib文件夹).

You can load both Observer and Endpoint Coprocessor statically using the following Method of HTableDescriptor:

addCoprocessor(String className, org.apache.hadoop.fs.Path jarFilePath, int priority, Map<String
,String> kvs) throws IOException

In my case, the above method worked fine for Observer Coprocessor(适用于Observer Coprocessor) but didn’t work for Endpoint Coprocessor, causing the table to become unavailable and finally I had to restart my HBase. The same Endpoint Coprocessor worked fine when loaded statically. Use the above method for Endpoint Coprocessor with caution.

原文:

https://www.3pillarglobal.com/insights/hbase-coprocessors

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值