hbase(coprocessor)整合es构建二级索引
一.简介
HBase包含两种协处理器:Observers和Endpoint
1.RegionObserver:
eg:可以在客户端进行get操作时,通过preGet进行权限控制
//主要方法:
preOpen, postOpen: Called before and after the region is reported as online to the master.
preFlush, postFlush: Called before and after the memstore is flushed into a new store file.
preGet, postGet: Called before and after a client makes a Get request.
preExists, postExists: Called before and after the client tests for existence using a Get.
prePut and postPut: Called before and after the client stores a value.
preDelete and postDelete: Called before and after the client deletes a value.
2.WALObserver
提供基于WAL的写和刷新WAL文件的操作,一个regionserver上只有一个WAL的上下文。
preWALWrite/postWALWrite: called before and after a WALEdit written to WAL.
3.MasterObserver:
提供基于诸如ddl的的操作检查,如create, delete, modify table等,同样的当客户端delete表的时候通过逻辑检查时候具有此权限场景等。其运行于Master进程中。
preCreateTable/postCreateTable: Called before and after the region is reported as online to the master.
preDeleteTable/postDeleteTable
4.Endpoint Coprocessor:
Endpoint processors allow you to perform computation at the location of the data. An example is the need to calculate a running average or summation for an entire table which spans hundreds of regions.
In contrast to observer coprocessors, where your code is run transparently, endpoint coprocessors must be explicitly invoked using the CoprocessorService() method available in Table or HTable.
Endpoint Coprocessor需要结合客户端代码进行rpc通信来实现数据的搜集归并。而observer coprocessor只会在server端运行,且仅在特定操作后触发相应的代码。
Starting with HBase 0.96, endpoint coprocessors are implemented using Google Protocol Buffers (protobuf). For more details on protobuf, see Google’s Protocol Buffer Guide. Endpoints Coprocessor written in version 0.94 are not compatible with version 0.96 or later. See HBASE-5448). To upgrade your HBase cluster from 0.94 or earlier to 0.96 or later, you need to reimplement your coprocessor.
HBase 0.94更新到0.96之后的版本,coprocessor也发生了改变(0.96采用了protobuf)。
思考:10亿数据求top10000
二.RegionObserver的代码实现:
package myAPI3;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.CoprocessorEnvironment;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Durability;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
import org.apache.hadoop.hbase.coprocessor.ObserverContext;
import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
import org.apache.hadoop.hbase.regionserver.wal.WALEdit;
import org.apache.hadoop.hbase.util.Bytes;
import org.elasticsearch.client.Client;
import java.io.IOException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class DataSyncObserver extends BaseRegionObserver {
private static Client client = null;
private static final Log LOG = LogFactory.getLog(DataSyncObserver.class);
/**
* 读取HBase Shell的指令参数
*
* @param env
*/
private void readConfiguration(CoprocessorEnvironment env) {
Configuration conf = env.getConfiguration();
Config.clusterName = conf.get("es_cluster");
Config.nodeHost = conf.get("es_host");
Config.nodePort = conf.getInt("es_port", -1);
Config.indexName = conf.get("es_index");
Config.typeName = conf.get("es_type");
//LOG.info("observer -- started with config: " + Config.getInfo());
}
@Override
public void start(CoprocessorEnvironment env) throws IOException {
LOG.info("-----------------------------------starting-------------------------------------------------------------------------------------"