hbase 协处理器、二级索引、Phoenix

最新推荐文章于 2022-10-03 20:56:58 发布

四月天03

最新推荐文章于 2022-10-03 20:56:58 发布

阅读量2.2k

点赞数 2

分类专栏： Hbase

本文链接：https://blog.csdn.net/qq_22473611/article/details/88101338

版权

Hbase 专栏收录该内容

15 篇文章 10 订阅

订阅专栏

一、协处理器

官方帮助文档

http://hbase.apache.org/book.html#cp

HBase 协处理器入门及实战

协处理器出现的原因

HBase作为列族数据库经常被人诟病的就是无法轻易建立“二级索引”，难执行求和、计数、排序等操作。

在0.92版本以前的HBase虽然在数据存储层集成了MapReduce，能够有效用于数据表的分布式计算，然而在很多情况下，做一些简单的相加或者聚合计算的时候，如果直接将计算过程放置在server端，能够减少通讯开销，从而获得很好的性能提升。所以HBase在0.92之后引入了协处理器(coprocessors)，添加了一些新的特性：能够轻易建立二次索引、实现访问控制等

协处理器的分类

Observer

它实际类似于RDBMS触发器，对数据进行前置或者后置的拦截操作，通过使用RegionObserver接口可以实现二级索引的创建和维护；

observers分为三种：

RegionObserver：提供数据操作事件钩子；

WALObserver：提供WAL（write ahead log）相关操作事件钩子；

MasterObserver：提供DDL操作事件钩子。

RegionObserver提供了一些数据层操作事件的hook,如Put、Get、Delete和Scan等，在每个操作发生或结束时，会触发调用一些前置的Hook(pre+操作,如preGet)或后置的Hook(post+操作,如postGet)

Endpoint

https://blog.csdn.net/just_young/article/details/50066897

EndPoint类似于RDBMS的存储过程，主要作用于客户端，客户端可以调用这些EndPoint执行一段Server端代码，并将Server端代码结果返回给客户端进一步处理，如常见聚合操作，找一张大表某个字段的最大值，如果不用Coprocesser则只能全表扫描，在客户端遍历所有结果找出最大值，且只能利用有限的客户端资源进行迭代计算，无法利用上HBase的并发计算能力；如果用了Coprocessor,则client端可在RegionServer端执行统计每个Region最大值的逻辑，并将Server端结果返回客户端，再找出所有Server端所返回的最大值中的最大值得到最终结果，很明显，这种方式尽量将统计执行下放到Server端，Client端只执行一些最后的聚合，大幅提高了统计效率;还有一个很常见的需求可能就是统计表的行数，其逻辑和上面一样,具体可参考Coprocessor Introduction,在这里就不展开了，后面有机会针对Coprocessor单独展开介绍。

它类似与存储过程。可用于在各个RegionServer上做一些计算等，然后将计算的结果汇集到Client端来做最后的处理，这有点儿像Map/Reduce的过程。

主要实现数据的一些统计功能，例如 COUNT,SUM,GROUP BY 等等，在对数据进行简单排序和sum，count等统计操作时，能够极大提高性能。

Phoenix

其实就是使用了大量的协处理器来实现的

二、协处理器的使用

加载方式

协处理器的加载方式有两种，静态加载方式(Static Load) 和 动态加载方式(Dynamic Load)。静态加载的协处理器称之为 System Coprocessor，动态加载的协处理器称之为 Table Coprocessor

静态加载

通过修改hbase-site.xml，然后重启hbase实现，这是对所有表都有效

cd /export/servers/hbase-1.2.0-cdh5.14.0/conf
vim hbase-site.xml

<property>
	<name>hbase.coprocessor.user.region.classes</name>
	<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>

因为这个类是 HBase 自带的，如果是我们自定义的 Endpoint，我们需要将打包好的 jar 包放到所有节点的 $HBASE_HOME/lib/ 路径下。

动态加载

动态加载可以对指定表生效
首先在禁用指定表

disable 'mytable'

然后添加aggregation

alter 'mytable', METHOD => 'table_att','coprocessor'=>
'|org.apache.Hadoop.hbase.coprocessor.AggregateImplementation||'

重启指定表

enable 'mytable'

协处理器的卸载

禁用表

disable 'test'

卸载协处理器

alter 'test',METHOD => 'table_att_unset',NAME => 'coprocessor$1'

启用表

enable 'test'

一、协处理器Observer应用实战

需求

通过协处理器Observer实现hbase当中一张表插入数据，然后通过协处理器，将数据复制一份保存到另外一张表当中去，但是只取当第一张表当中的部分列数据保存到第二张表当中去

步骤

一、HBase当中创建第一张表proc1和第二张表proc2

create 'proc1','info'
create 'proc2','info'

二、开发HBase的协处理器

import java.io.IOException;
public class ObserverProcessor  extends BaseRegionObserver {
    @Override
    public void prePut(ObserverContext<RegionCoprocessorEnvironment> e, Put put, WALEdit edit, Durability durability) throws IOException {
        //连接到HBase
        Configuration configuration = HBaseConfiguration.create();
        //设置连接配置
        configuration.set("hbase.zookeeper.quorum", "node01,node02,node03");
        //
        Connection connection = ConnectionFactory.createConnection(configuration);
        Cell nameCell = put.get("info".getBytes(), "name".getBytes()).get(0);
        Put put1 = new Put(put.getRow());
        put1.add(nameCell);
        Table reverseuser = connection.getTable(TableName.valueOf("proc2"));
        reverseuser.put(put1);
        reverseuser.close();
    }
}

三、将java打成Jar包，上传到HDFS

先将jar包上传到linux的/export/servers路径下，然后执行以下命令

mv original-day12_HBaseANDMapReduce-1.0-SNAPSHOT.jar  processor.jar
hdfs dfs -mkdir -p /processor
hdfs dfs -put processor.jar /processor

四、将jar包挂载到 proc1表

在hbase shell执行以下命令

describe 'proc1'

alter 'proc1',METHOD => 'table_att','Coprocessor'=>'hdfs://node01:8020/processor/processor.jar|cn.itcast.mr.demo7.ObserverProcessor|1001|'

describe 'proc1'

在这里插入图片描述

五、用JavaAPI想proc1表中添加数据

@Test
public void testPut() throws Exception{
    //获取连接
    Configuration configuration = HBaseConfiguration.create();
    configuration.set("hbase.zookeeper.quorum", "node01,node02");
    Connection connection = ConnectionFactory.createConnection(configuration);
    Table user5 = connection.getTable(TableName.valueOf("proc1"));
    Put put1 = new Put(Bytes.toBytes("hello_world"));
    put1.addColumn(Bytes.toBytes("info"),"name".getBytes(),"helloworld".getBytes());
    put1.addColumn(Bytes.toBytes("info"),"gender".getBytes(),"abc".getBytes());
    put1.addColumn(Bytes.toBytes("info"),"nationality".getBytes(),"test".getBytes());
    user5.put(put1);
    byte[] row = put1.getRow();
    System.out.println(Bytes.toString(row));
    user5.close();
}

六、查看proc1和proc2表的数据

在这里插入图片描述

七、如果要卸载协处理器

disable 'proc1'
alter 'proc1',METHOD=>'table_att_unset',NAME=>'coprocessor$1'
enable 'proc1'

注意：

1、协处理器如果是配置到hbase-site.xml文件中，默认是对全部的表都进行处理

2、如果不配置到xml文件，只指定某个表，那么就只对改表有效

3、注意将consumer进行编译，打包，打包后上传到hbase的lib包下，记住记住，一定要分发这个jar包

4、一定要慎用，如果出现问题，就在hbase-site.xml中配置

例一：该例子使用RegionObserver实现在写主表之前将索引数据先写到另外一个表：


package com.dengchuanhua.testhbase;

public class TestCoprocessor extends BaseRegionObserver {
    @Override
    public void prePut(final ObserverContext<RegionCoprocessorEnvironment> e,
                       final Put put, final WALEdit edit, final boolean writeToWAL) throws IOException {
        //set configuration
        Configuration conf = new Configuration();
        //need conf.set...
        HTable table = new HTable(conf, "indexTableName");
        List<Cell> kv = put.get("familyName".getBytes(), "columnName".getBytes());
        Iterator<Cell> kvItor = kv.iterator();
        while (kvItor.hasNext()) {
            Cell tmp = kvItor.next();
            Put indexPut = new Put(tmp.getValue());
            indexPut.add("familyName".getBytes(), "columnName".getBytes(), tmp.getRow());
            table.put(indexPut);
        }
        table.close(); 
    }
}

写完后要加载到table里面去，先把该文件打包成test.jar并上传到hdfs的/demo路径下，然后操作如下：

1. disable 'testTable'

2. alter 'testTable', METHOD=>'table_att', 'coprocessor'=>'hdfs:///demo/test.jar|com.dengchuanhua.testhbase.TestCoprocessor|1001′

3. enable 'testTable'

然后往testTable里面插数据就会自动往indexTableName写数据了。

客户端发出put请求；
该请求被分派给合适的RegionServer和Region；
coprocessorHost拦截该请求，然后在该表上登记的每个 RegionObserver 上调用prePut()；
如果没有被preGet()拦截，该请求继续送到 region，然后进行处理；
Region产生的结果再次被CoprocessorHost拦截，调用postGet()；
假如没有postGet()拦截该响应，最终结果被返回给客户端；

例二：

 /**
   * 需求一
   * 1. cf:countCol 进行累加操作。 每次插入的时候都要与之前的值进行相加
   * 需要重载prePut方法
   */
  @Override
  public void prePut(ObserverContext<RegionCoprocessorEnvironment> e, Put put, WALEdit edit,
      Durability durability) throws IOException {
    if (put.has(columnFamily, countCol)) {//获取old countcol value
      Result rs = e.getEnvironment().getRegion().get(new Get(put.getRow()));
      int oldNum = 0;
      for (Cell cell : rs.rawCells()) {
        if (CellUtil.matchingColumn(cell, columnFamily, countCol)) {
          oldNum = Integer.valueOf(Bytes.toString(CellUtil.cloneva lue(cell)));
        }
      }
      //获取new countcol value
      List<Cell> cells = put.get(columnFamily, countCol);
      int newNum = 0;
      for (Cell cell : cells) {
        if (CellUtil.matchingColumn(cell, columnFamily, countCol)) {
          newNum = Integer.valueOf(Bytes.toString(CellUtil.cloneva lue(cell)));
        }
      }
      //sum AND update Put实例
      put.addColumn(columnFamily, countCol, Bytes.toBytes(String.valueOf(oldNum + newNum)));
    }
  }
  /**
   * 需求二
   * 2. 不能直接删除unDeleteCol    删除countCol的时候将unDeleteCol一同删除
   * 需要重载preDelete方法
   */
  @Override
  public void preDelete(ObserverContext<RegionCoprocessorEnvironment> e, Delete delete,
      WALEdit edit,
      Durability durability) throws IOException {
    //判断是否操作删除了cf列族
    List<Cell> cells = delete.getFamilyCellMap().get(columnFamily);
    if (cells == null || cells.size() == 0) {
      return;
    }
    boolean deleteFlag = false;
    for (Cell cell : cells) {
      byte[] qualifier = CellUtil.cloneQualifier(cell);
      if (Arrays.equals(qualifier, unDeleteCol)) {
        throw new IOException("can not delete unDel column");
      }
      if (Arrays.equals(qualifier, countCol)) {
        deleteFlag = true;
      }
    }
    if (deleteFlag) {
      delete.addColumn(columnFamily, unDeleteCol);
    }
  }

二、Endpoint 协处理器

endpoint就像是RDBMS中的存储过程一样，行键决定了哪一个 region处理这个请求，协处理器提供了以endpoint概念为代表的动态调用，实现将计算转移到服务器的功能（如计算某个多列数据，最坏可能会传递多有列到客户端）

HBase 协处理器入门及实战

摘要

原文：https://blog.csdn.net/alphags/article/details/53786777

本文主要内容是通过合理hbase 行键（rowkey)设计实现快速的多条件查询，所采用的方法将所有要用于查询中的列经过一些处理后存储在rowkey中，查询时通过rowkey进行查询，提高rowkey的利用率，加快查询速度。行键（rowkey)并不是简单的把所有要查询的列的值直接拼接起来，而是将各个列的数据转成整型（int)数据来存储。

之后实现两个自定义的比较器（comparator)：

一个是相等比较器，用于实现类似于SQL的多条件精确查找功能。
select * from table where col1='a' and col2='b'

另一个是范围比较器，用于实现类似于SQL语句
select * from table where col3 > '10' and col4<'100'
这样的范围查找功能。
当两个比较器配合使用再结合hbase的过滤器，以实现类似于下面这条SQL语句这样多条件的查询
select * from table where col1='a' and col2='b' andcol3 > '10' and col4<'100'
文章源码位于https://github.com/alphg/hbase.rowkeycomparator

下面的数据是一些网页连通性的数据，每行json字符串都表示某一条网址的连通性扫描信息

{ "_id" : { "$oid" : "584a6e030cf29ba18da2fcd5"} , "url" : "http://www.nmlc.gov.cn/zsyz.htm" , "md5url" : "ea67a96f233d6fcfd7cabc9a6a389283" , "status" : -1 , "code" : 404 , "stime" : 1481272834722 , "sdate" : 20161209 , "sitecode" : "1509250008" , "ip" : "10.168.106.153" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272835222} , "free" : 0 , "close" : 0 , "queue" : 1 , "scantype" : 1 , "scanmemo" : ""}
{ "_id" : { "$oid" : "584a6e020cf224463e76c162"} , "url" : "http://www.xzxzzx.gov.cn:8000/wbsprj/indexlogin.do" , "md5url" : "fd38c0fb8f6e839be56b67c69ad2baa5" , "status" : -1 , "code" : 503 , "stime" : 1481272828174 , "sdate" : 20161209 , "sitecode" : "3203000002" , "ip" : "10.117.8.89" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272834887} , "free" : 0 , "close" : 0 , "queue" : 1 , "scantype" : 0 , "scanmemo" : ""}
{ "_id" : { "$oid" : "584a6e020cf27d1a31f617e0"} , "url" : "http://www.nmds.gov.cn/portal/bsfw/nsfd/list_1.shtml" , "md5url" : "d51abcd8edff79d23ca4a9a0576a1996" , "status" : -1 , "code" : 404 , "stime" : 1481272822971 , "sdate" : 20161209 , "sitecode" : "15BM010001" , "ip" : "10.162.86.176" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272834846} , "free" : 0 , "close" : 0 , "queue" : 0 , "scantype" : 0 , "scanmemo" : ""}
{ "_id" : { "$oid" : "584a6e020cf29ba18da2fcd4"} , "url" : "http://beijing.customs.gov.cn/publish/portal159/tab60561/" , "md5url" : "e27bbc9192e760bacc23c226ffd90219" , "status" : -1 , "code" : 503 , "stime" : 1481272832559 , "sdate" : 20161209 , "sitecode" : "bm28020001" , "ip" : "10.168.106.153" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272834766} , "free" : 0 , "close" : 0 , "queue" : 1 , "scantype" : 0 , "scanmemo" : ""}
{ "_id" : { "$oid" : "584a6e020cf29ba18da2fcd3"} , "url" : "http://www.nss184.com/web2/newlist_index.aspx?classid=1" , "md5url" : "cbc2c0571464621024c89aa019cd09ef" , "status" : -1 , "code" : 404 , "stime" : 1481272826788 , "sdate" : 20161210 , "sitecode" : "BT10000001" , "ip" : "10.168.106.153" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272834732} , "free" : 0 , "close" : 1 , "queue" : 1 , "scantype" : 0 , "scanmemo" : ""}
{ "_id" : { "$oid" : "584a6e020cf2847bb13af52c"} , "url" : "http://cgw.bjdch.gov.cn/n1569/n4860273/n9719314/index.html" , "md5url" : "00a18048ed95f1c057fccc8928ddf610" , "status" : -1 , "code" : 503 , "stime" : 1481272803601 , "sdate" : 20161208 , "sitecode" : "1101010059" , "ip" : "10.117.187.7" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272834150} , "free" : 1 , "close" : 0 , "queue" : 1 , "scantype" : 0 , "scanmemo" : ""}
{ "_id" : { "$oid" : "584a6e020cf29ba18da2fcd2"} , "url" : "http://www.qdn.gov.cn/zwdt/ztfw/shbzfw.htm" , "md5url" : "e6bfa0a07e773e3bab27a37f36ff221a" , "status" : -1 , "code" : 404 , "stime" : 1481272833479 , "sdate" : 20161209 , "sitecode" : "5226000038" , "ip" : "10.168.106.153" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272834046} , "free" : 0 , "close" : 0 , "queue" : 1 , "scantype" : 0 , "scanmemo" : ""}
{ "_id" : { "$oid" : "584a6e010cf29ba18da2fcd1"} , "url" : "http://www.caac.gov.cn/E1/E2/" , "md5url" : "e6217482388cbc57aa80422c3f64bb35" , "status" : -1 , "code" : 404 , "stime" : 1481272833297 , "sdate" : 20161209 , "sitecode" : "bm70000001" , "ip" : "10.168.106.153" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272833723} , "free" : 0 , "close" : 0 , "queue" : 1 , "scantype" : 0 , "scanmemo" : ""}
{ "_id" : { "$oid" : "584a6e010cf22c906fb6f846"} , "url" : "http://www.ny.xwie.com/Thought/" , "md5url" : "b7912f3bdb50be7b58f5a67d65273201" , "status" : -1 , "code" : 404 , "stime" : 1481272821713 , "sdate" : 20161209 , "sitecode" : "4408250003" , "ip" : "10.168.156.196" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272833498} , "free" : 0 , "close" : 0 , "queue" : 1 , "scantype" : 0 , "scanmemo" : ""}
{ "_id" : { "$oid" : "584a6e010cf29ba18da2fcd0"} , "url" : "http://www.guoluo.gov.cn/html/1746/List.html" , "md5url" : "e353cd577fd721eb71538d0938d041f7" , "status" : -1 , "code" : 404 , "stime" : 1481272832723 , "sdate" : 20161209 , "sitecode" : "6326000004" , "ip" : "10.168.106.153" , "port" : 5200 , "type" : 2 , "intime" : { "$date" : 1481272833472} , "free" : 0 , "close" : 0 , "queue" : 1 , "scantype" : 0 , "scanmemo" : ""}

md5url url的md5的值
status 扫描状态
code http访问返回码
sdate 扫描日期
sitecode 所属站点
type 扫描类型
free 是否收费
close 是否关闭
queue 等待队列
scantype 扫描类型

我们在查询的时候有可能会用到上述一个或多个属性，例如
我们要查询某一天连不通的网址（返回码是404）
select * from table where sdate='somedate' and code='404'
或者查询某个URL在过去某几天内的数据
select * from table where sdate<'enddate' and sdate>'startdate' and md5url='somemd5url'
以上只是简单列举几种查询的需求，实际生产中会有更多种累的查询需求，那如何设计hbase 表结构就成为解决问题的关键。

数据存储用的是HBase，恰恰HBase对于这种场景的查询特别不给力，它不擅长于业务复杂的查询甚至是模糊查询。

一般HBase的查询都是通过RowKey(要把多条件组合查询的字段都拼接在RowKey中显然不太可能)，

或者全表扫描再结合过滤器筛选出目标数据(太低效)，

rowkey设计规则可以是

id倒序 + 日期 + 标题
其中id倒序用来防止热点现象，日期和标题也作为rowkey的组成部分方便后续用RowFilter进行模糊查询，HBase对rowkey的操作要比对column的操作性能要好上很多。
分页功能可用PageFilter实现。

一、通过设计HBase的二级索引来解决这个问题

常见的二级索引方案
1. MapReduce方案
2. ITHBASE（Indexed-Transanctional HBase）方案
3. IHBASE（Index HBase）方案
4. Hbase Coprocessor(协处理器)方案
5. Solr（ES）+hbase方案

HBase在0.92之后引入了coprocessors，提供了一系列的钩子，让我们能够轻易实现访问控制和二级索引的特性。

HBase二级索引种类

2.1创建单列索引

2.2同时创建多个单列索引

2.3创建联合索引（最多同时支持3个列）

2.4只根据rowkey创建索引

HBase的一级索引就是rowkey，我们只能通过rowkey进行检索。如果我们相对hbase里面列族的列列进行一些组合查询，就需要采用HBase的二级索引方案来进行多条件的查询。

原表：
row  1      f1:name  zhangsan
row  2      f1:name  lisi
row  3      f1:name  wangwu

索引表：
row     zhangsan    f1:id   1
row     lisi        f1:id   2
row     wangwu      f1:id   3

方案一：基于Coprocessor的方案：

基于Coprocessor（0.92版本开始引入，达到支持类似传统RDBMS的触发器的行为）
开发自定义数据处理逻辑，采用数据“双写”（dual-write）策略，在有数据写入同时同步到二级索引表

优点：基于Coprocessor的方案，从开发设计的角度看，把很多对二级索引管理的细节都封装在的Coprocessor具体实现类里面，这些细节对外面读写的人是无感知的，简化了数据访问者的使用。

缺点：但是Coprocessor的方案入侵性比较强，增加了在Regionserver内部需要运行和维护二级索引关系表的代码逻辑等，对Regionserver的性能会有一定影响。

来自： https://blog.csdn.net/whdxjbw/article/details/81146440 未测试，感觉和Phoenix类似

这里大家只要清楚此表结构即可，结构如下：hyper_table表结构


字段	rowkey	num	country	rd
类型	string	int	int	string

创建二级索引（全局索引）
我们有两种方式创建索引，一种是利用SQL通过Inceptor分布式SQL引擎与HBase交互，创建二级索引。

另一种是直接在HBase shell中创建二级索引。两者创建时都需要指定索引的名字、索引所在表的名字、创建的索引列。

在Inceptor中利用SQL创建二级索引：

-- 创建二级索引（全局索引） 对 hyper_table表的 num列建立索引;
对hyper_table表的num字段创建索引，索引名为index_num 。

create global index index_num on hyper_table(num);

对company表的name字段创建索引，索引名为my_index。

create index my_index on company(name);

在HBase shell中创建二级索引：

add_index 'hyper_table','index_num','COMBINE_INDEX|INDEXED=f:q2:8|rowKey:rowKey:9'

创建好的索引在Hbase中会以表的形式存在，表名为 "表名_索引名" ，如下所示：

rebuild二级索引

rebuild命令为生成索引的命令，他会在上一步生成的索引表中插入索引数据，生成二级索引。

rebuild_global_index '{$yourDBName}:hyper_table', 'index_num';

可以看到，生成二级索引的过程需要用到之前创建好的索引信息、表region信息。

底层是通过mapreduce任务生成的。

Inceptor默认运行在Cluster Mode下，查询时不基于索引，因为对索引文件的访问可能会比较慢。在Cluster Mode下，只有明确声明（/*+USE_INDEX(e USING index_num)*/ *），Inceptor才基于索引进行查询。

测试二级索引性能

1、精准查询性能：

不走索引：

select * from hyper_table where num=503;

走索引：

select /*+USE_INDEX(e USING index_num)*/ * from hyper_table e where num =503;

可以看到，num列走二级索引的情况下，精准查询的性能有明显提升。因为不走索引，HBase会从第一条记录开始遍历全表，而走索引，直接通过索引表查询到对应的num值即可。

如果我们将Inceptor切换为Local Mode，Inceptor自动匹配合适的索引进行查询。切换方式如下：

# 切换Inceptor模式为Local Mode
set ngmr.exec.mode=local;
set hyperbase.integer.transform=true;


切换回Cluster模式如下：

# 将Inceptor切换为Cluster Mode
set ngmr.exec.mode=cluster;

方案二、Phoenix

Apache Phoenix：功能围绕着SQL on hbase，支持和兼容多个hbase版本，二级索引只是其中一块功能。二级索引的创建和管理直接有SQL语法支持，使用起来很简便，该项目目前社区活跃度和版本更新迭代情况都比较好。

ApachePhoenix在目前开源的方案中，是一个比较优的选择。主打SQL on HBase ，基于SQL能完成HBase的CRUD操作，支持JDBC协议。 Apache Phoenix在Hadoop生态里面位置：

3、Phoenix二级索引特点：

Covered Indexes(覆盖索引) ：把关注的数据字段也附在索引表上，只需要通过索引表就能返回所要查询的数据（列），所以索引的列必须包含所需查询的列(SELECT的列和WHRER的列)。Functional indexes(函数索引)：索引不局限于列，支持任意的表达式来创建索引。Global indexes(全局索引)：适用于读多写少场景。通过维护全局索引表，所有的更新和写操作都会引起索引的更新，写入性能受到影响。在读数据时，Phoenix SQL会基于索引字段，执行快速查询。Local indexes(本地索引)：适用于写多读少场景。在数据写入时，索引数据和表数据都会存储在本地。在数据读取时，由于无法预先确定region的位置，所以在读取数据时需要检查每个region（以找到索引数据），会带来一定性能（网络）开销。

其他的在网上也很多自己基于Coprocessor实现二级索引的文章，大体都是遵循类似的思路：构建一份“索引”的映射关系，存储在另一张hbase表或者其他DB里面。

方案三、Solr 、Elasticsearch

常见的是采用底层基于Apache Lucene的Elasticsearch(下面简称ES)或Apache Solr ，来构建强大的索引能力、搜索能力，例如支持模糊查询、全文检索、组合查询、排序等。

四月天03

关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
打赏
0
评论
hbase 协处理器、二级索引、Phoenix

摘要原文：https://blog.csdn.net/alphags/article/details/53786777本文主要内容是通过合理hbase 行键（rowkey)设计实现快速的多条件查询，所采用的方法将所有要用于查询中的列经过一些处理后存储在rowkey中，查询时通过rowkey进行查询，提高rowkey的利用率，加快查询速度。行键（rowkey)并不是简单的把所有要查询的列的值...
复制链接

扫一扫