Phoenix 二级索引

1. 什么是 Phoenix 的二级索引

  • 二级索引是从主键访问数据的正交方式。Hbase中有一个按照字典排序的主键Rowkey作为单一的索引。不按照Rowkey去读取记录都要遍历整张表,然后按照你指定的过滤条件过滤。通过二级索引,索引的列或表达式形成一个备用行键,以允许沿着这个新轴进行点查找和范围扫描。

2. Phoenix 中的盐表

Phoenix 中的盐表

  • 其本质就是类似于 HBase RowKey 盐表一样,Phoenix 通过指定主键以及 salt_buckets ,会把主键前面加上一个 salt_buckets 以内的随机数,来使 数据分散插入不同的 Region 中。
  • 优点:
    • 由于本质是随机数,有助于数据均匀的落在每个 Region 中,从而提高写的性能。
  • 缺点:
    • 数据查询的顺序和插入的顺序不一致。
    • 当数据量比较小的时候,使用盐表插入,查询性能特别低。
  • 建议:
    • salt_buckets 数量和 Region Server 数量一致。
    • 如果一开始有三台 Region Server 设置的 salt_buckets 为3,后面增加两台,Phoenix 无法修改已经创建表的 salt_buckets 的数量,需要重新建表 + 导数。
0: jdbc:phoenix:bigdata:2181> create table bigdata.salt 
. . . . . . . . . . . . . . > (id integer primary key, 
. . . . . . . . . . . . . . > name varchar,
. . . . . . . . . . . . . . > age integer,
. . . . . . . . . . . . . . > address varchar) 
. . . . . . . . . . . . . . > salt_buckets = 20;
No rows affected (4.409 seconds)
  • 查看 HBase 中的表结构
hbase(main):002:0> describe 'BIGDATA:SALT'
Table BIGDATA:SALT is ENABLED

BIGDATA:SALT, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apac
he.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop
.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'}

COLUMN FAMILIES DESCRIPTION

{NAME => '0', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '
65536', REPLICATION_SCOPE => '0'}

1 row(s) in 0.0760 seconds
  • 查看 HBase Master WEB 页面
    • BIGDATA:SALT 表中 Region Count 为 20
  • 插入数据看下数据是否均匀分布
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.salt  values (1,'hadoop1',18,'北京');
1 row affected (0.036 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.salt  values (2,'hadoop2',19,'上海');
1 row affected (0.012 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.salt  values (3,'hadoop3',17,'杭州');
1 row affected (0.011 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.salt  values (4,'hadoop4',12,'深圳');
1 row affected (0.026 seconds)
  • 查看 WEB 发现数据均匀的分布在不同的 Region 上面

Salt-Region

3. Phoenix 二级索引

二级索引(官网)

3.1 全局索引
  • 特点:重写读取
Global Indexes
Global indexing targets read heavy uses cases. With global indexes, all the performance penalties for indexes occur at write time. We intercept the data table updates on write (DELETE, UPSERT VALUES and UPSERT SELECT), build the index update and then sent any necessary updates to all interested index tables. At read time, Phoenix will select the index table to use that will produce the fastest query time and directly scan it just like any other HBase table. An index will not be used for a query that references a column that isn’t part of the index.
  • 全局索引适合那些读多写少的场景。如果使用全局索引,读数据基本不损耗性能,所有的性能损耗都来源于写数据。数据表的添加、删除和修改都会更新相关的索引表(数据删除了,索引表中的数据也会删除;数据增加了,索引表的数据也会增加)。而查询数据的时候,Phoenix会通过索引表来快速低损耗的获取数据。默认情况下,如果你的查询语句中没有索引相关的列的时候,Phoenix不会使用索引。
  • 全局索引其本质是一个索引表,是把在创建索引时使用的索引字段,作为索引表的联合主键来创建一张表的,索引表存储在 HBase 中,索引表的联合主键作为 HBase 中存储的 RowKey ,由于 HBase 的底层存储的特性,所以索引的失效应该是由左往右,例如,如果索引的第一个字段不在本次查询的 Where 条件中,则会扫描整个索引表。
  • 优点:
    • 读取性能高,先通过查询索引表中的数据,找到对应查询表的主键,以此通过主键来查询到原表。
  • 缺点:
    • 数据在进行写入、更新、删除的时候都会操作索引表,以此增加这些操作的消耗。
  • 创建表以及插入数据
0: jdbc:phoenix:bigdata:2181> create table bigdata.global_index
. . . . . . . . . . . . . . > (id integer primary key, 
. . . . . . . . . . . . . . > name varchar,
. . . . . . . . . . . . . . > age integer,
. . . . . . . . . . . . . . > address varchar) 
. . . . . . . . . . . . . . > salt_buckets = 20;
No rows affected (4.344 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.global_index  values (1,'hadoop1',18,'北京');
1 row affected (0.007 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.global_index  values (2,'hadoop2',19,'上海');
1 row affected (0.006 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.global_index  values (3,'hadoop3',17,'杭州');
1 row affected (0.006 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.global_index  values (4,'hadoop4',12,'深圳');
1 row affected (0.004 seconds)
  • 创建索引
0: jdbc:phoenix:bigdata:2181> CREATE  INDEX global_index_idx ON  bigdata.global_index(name,age);
4 rows affected (9.312 seconds)
  • 查看 HBase 中的 索引表
hbase(main):003:0> list
TABLE

BIGDATA:GLOBAL_INDEX

BIGDATA:GLOBAL_INDEX_IDX

BIGDATA:SALT

SYSTEM:CATALOG

SYSTEM:FUNCTION

SYSTEM:LOG

SYSTEM:MUTEX

SYSTEM:SEQUENCE

SYSTEM:STATS

9 row(s) in 0.0110 seconds

=> ["BIGDATA:GLOBAL_INDEX", "BIGDATA:GLOBAL_INDEX_IDX", "BIGDATA:SALT", "SYSTEM:CATALOG", "SYSTEM:FUNCTION", "SYSTEM:LOG", "SYSTEM:MUTEX", "SYSTEM:SEQUENCE", "SYSTEM:STATS"]
hbase(main):004:0> scan 'BIGDATA:GLOBAL_INDEX_IDX'
ROW                                                           COLUMN+CELL

 \x01hadoop2\x00\xC1\x14\x00\x80\x00\x00\x02                  column=0:\x00\x00\x00\x00, timestamp=1612843905504, value=x                                                                                                                        
 \x05hadoop3\x00\xC1\x12\x00\x80\x00\x00\x03                  column=0:\x00\x00\x00\x00, timestamp=1612843905505, value=x                                                                                                                        
 \x0Ahadoop1\x00\xC1\x13\x00\x80\x00\x00\x01                  column=0:\x00\x00\x00\x00, timestamp=1612843905511, value=x                                                                                                                        
 \x0Ehadoop4\x00\xC1\x0D\x00\x80\x00\x00\x04                  column=0:\x00\x00\x00\x00, timestamp=1612843905512, value=x                                                                                                                        
4 row(s) in 0.1330 seconds
  • 通过 Phoenix 查询索引表
0: jdbc:phoenix:bigdata:2181> !tables
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+--------+
| TABLE_CAT  | TABLE_SCHEM  |    TABLE_NAME     |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUTABLE_ROWS  | SALT_BUCKETS  | MULTI_TENANT  | VIEW_STATEMENT  | VIEW_TYPE  | INDEX_ |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+--------+
|            | BIGDATA      | GLOBAL_INDEX_IDX  | INDEX         |          |            |                            |                 | ACTIVE       | false           | 20            | false         |                 |            | GLOBAL |
|            | SYSTEM       | CATALOG           | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |        |
|            | SYSTEM       | FUNCTION          | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |        |
|            | SYSTEM       | LOG               | SYSTEM TABLE  |          |            |                            |                 |              | true            | 32            | false         |                 |            |        |
|            | SYSTEM       | SEQUENCE          | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |        |
|            | SYSTEM       | STATS             | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |        |
|            | BIGDATA      | GLOBAL_INDEX      | TABLE         |          |            |                            |                 |              | false           | 20            | false         |                 |            |        |
|            | BIGDATA      | SALT              | TABLE         |          |            |                            |                 |              | false           | 20            | false         |                 |            |        |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+--------+
0: jdbc:phoenix:bigdata:2181> select * from BIGDATA.GLOBAL_INDEX_IDX
. . . . . . . . . . . . . . > ;
+----------+--------+------+
|  0:NAME  | 0:AGE  | :ID  |
+----------+--------+------+
| hadoop2  | 19     | 2    |
| hadoop3  | 17     | 3    |
| hadoop1  | 18     | 1    |
| hadoop4  | 12     | 4    |
+----------+--------+------+
4 rows selected (0.089 seconds)
  • 查看查询全表的执行计划
0: jdbc:phoenix:bigdata:2181> explain select * from bigdata.global_index;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                       PLAN                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:GLOBAL_INDEX  | null            | null           | null         |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
  • 可以看出扫描的是 GLOBAL_INDEX(索引表) ,并且是 FULL SCAN(全表扫描)
  • 查看 Where 条件中 带有第一个索引字段的执行计划
0: jdbc:phoenix:bigdata:2181> explain select * from bigdata.global_index where name = 'hadoop1';
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                       PLAN                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:GLOBAL_INDEX  | null            | null           | null         |
|     SERVER FILTER BY NAME = 'hadoop1'                                            | null            | null           | null         |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.03 seconds)
  • 可以看出先 FULL SCAN 索引表,然后再把索引表里面的数据进行一下 FILTER。
  • 查看 Where 条件中 带有第一个索引字段强制走索引的执行计划
  • 强制走索引的语法:/*+ INDEX( 库 名 . 库名. .表名 $索引名) */ *
0: jdbc:phoenix:bigdata:2181> explain select /*+ INDEX(bigdata.global_index global_index_idx) */ * from bigdata.global_index where name = 'hadoop1';
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                             PLAN                                                             | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:GLOBAL_INDEX                                              | null            | null           | null         |
|     SKIP-SCAN-JOIN TABLE 0                                                                                                   | null            | null           | null         |
|         CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:GLOBAL_INDEX_IDX [0,'hadoop1'] - [19,'hadoop1']  | null            | null           | null         |
|             SERVER FILTER BY FIRST KEY ONLY                                                                                  | null            | null           | null         |
|     DYNAMIC SERVER FILTER BY "BIGDATA.GLOBAL_INDEX.ID" IN ($11.$13)                                                          | null            | null           | null         |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.049 seconds)
  • 和上面不强制走索引效果差不多
  • 查看 Where 条件中和需要查询的字段都在索引表的执行计划
0: jdbc:phoenix:bigdata:2181> explain select age from bigdata.global_index where name = 'hadoop1';
+----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                         PLAN                                                         | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:GLOBAL_INDEX_IDX [0,'hadoop1'] - [19,'hadoop1']  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY                                                                                  | null            | null           | null         |
+----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.034 seconds)
0: jdbc:phoenix:bigdata:2181> explain select name from bigdata.global_index where age = 18;
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                               PLAN                                               | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:GLOBAL_INDEX_IDX [0] - [19]  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18                                   | null            | null           | null         |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.027 seconds)
  • 可以看出 RANGE SCAN 了索引表,这种是最优化的方案
  • 总结:
    • where条件全部是索引字段,select条件全部是索引字段,最优的。
    • where条件全部是索引字段,select条件不部是索引字段,为了走index,强制走。
3.2 本地索引
  • 特点:重写轻读
Local Indexes
Local indexing targets write heavy, space constrained use cases. Just like with global indexes, Phoenix will automatically select whether or not to use a local index at query-time. With local indexes, index data and table data co-reside on same server preventing any network overhead during writes. Local indexes can be used even when the query isn’t fully covered (i.e. Phoenix automatically retrieve the columns not in the index through point gets against the data table). Unlike global indexes, all local indexes of a table are stored in a single, separate shared table prior to 4.8.0 version. From 4.8.0 onwards we are storing all local index data in the separate shadow column families in the same data table. At read time when the local index is used, every region must be examined for the data as the exact region location of index data cannot be predetermined. Thus some overhead occurs at read-time.
  • 本地索引适合那些写多读少,或者存储空间有限的场景。和全局索引一样,Phoenix也会在查询的时候自动选择是否使用本地索引。本地索引之所以是本地,只要是因为索引数据和真实数据存储在同一台机器上,这样做主要是为了避免网络数据传输的开销。如果你的查询条件没有完全覆盖索引列,本地索引还是可以生效。因为无法提前确定数据在哪个Region上,所以在读数据的时候,还需要检查每个Region上的数据而带来一些性能损耗。
  • 本地索引本质上面索引数据和原数据存储在同一个表里面,本质上面就是在原表的基础上面再增加一个列簇,索引数据写入到新增加的列簇里面。
  • 优点:
    • 写的性能高 无需写index table ,其实实际上是没有index table这个东西的。
  • 缺点:
    • 在使用本地索引时,必须检查所有的region的数据,因为无法先确定索引数据的准确区域位置,所以
      读的开销较大。
0: jdbc:phoenix:bigdata:2181> create table bigdata.local_index
. . . . . . . . . . . . . . > (id integer primary key, 
. . . . . . . . . . . . . . > name varchar,
. . . . . . . . . . . . . . > age integer,
. . . . . . . . . . . . . . > address varchar) 
. . . . . . . . . . . . . . > salt_buckets = 20;
No rows affected (4.358 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.local_index  values (1,'hadoop1',18,'北京');
1 row affected (0.008 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.local_index  values (2,'hadoop2',19,'上海');
1 row affected (0.008 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.local_index  values (3,'hadoop3',17,'杭州');
1 row affected (0.007 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.local_index  values (4,'hadoop4',12,'深圳');
1 row affected (0.005 seconds)
  • 创建本地索引
0: jdbc:phoenix:bigdata:2181> CREATE LOCAL INDEX local_index_idx ON  bigdata.local_index(name,age);
4 rows affected (11.532 seconds)
  • 查看 HBase 中原表的表结构
hbase(main):006:0> describe 'BIGDATA:LOCAL_INDEX'
Table BIGDATA:LOCAL_INDEX is ENABLED

BIGDATA:LOCAL_INDEX, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|o
rg.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache
.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder', coprocessor$6 => '|org.apache.hadoop.hbase.regionserver.IndexHalfStoreFileReaderGenerator|805306366|', ME
TADATA => {'DATA_TABLE_NAME' => 'BIGDATA.LOCAL_INDEX', 'SPLIT_POLICY' => 'org.apache.phoenix.hbase.index.IndexRegionSplitPolicy'}}

COLUMN FAMILIES DESCRIPTION

{NAME => '0', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '
65536', REPLICATION_SCOPE => '0'}

{NAME => 'L#0', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =>
 '65536', REPLICATION_SCOPE => '0'}
 
 2 row(s) in 0.0310 seconds
  • 查看 Phoenix 中是否创建了专门的索引表
0: jdbc:phoenix:bigdata:2181> !tables
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+--------+
| TABLE_CAT  | TABLE_SCHEM  |    TABLE_NAME     |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUTABLE_ROWS  | SALT_BUCKETS  | MULTI_TENANT  | VIEW_STATEMENT  | VIEW_TYPE  | INDEX_ |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+--------+
|            | BIGDATA      | GLOBAL_INDEX_IDX  | INDEX         |          |            |                            |                 | ACTIVE       | false           | 20            | false         |                 |            | GLOBAL |
|            | BIGDATA      | LOCAL_INDEX_IDX   | INDEX         |          |            |                            |                 | ACTIVE       | false           | null          | false         |                 |            | LOCAL  |
|            | SYSTEM       | CATALOG           | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |        |
|            | SYSTEM       | FUNCTION          | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |        |
|            | SYSTEM       | LOG               | SYSTEM TABLE  |          |            |                            |                 |              | true            | 32            | false         |                 |            |        |
|            | SYSTEM       | SEQUENCE          | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |        |
|            | SYSTEM       | STATS             | SYSTEM TABLE  |          |            |                            |                 |              | false           | null          | false         |                 |            |        |
|            | BIGDATA      | GLOBAL_INDEX      | TABLE         |          |            |                            |                 |              | false           | 20            | false         |                 |            |        |
|            | BIGDATA      | LOCAL_INDEX       | TABLE         |          |            |                            |                 |              | false           | 20            | false         |                 |            |        |
|            | BIGDATA      | SALT              | TABLE         |          |            |                            |                 |              | false           | 20            | false         |                 |            |        |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+--------+
  • 发现创建索引表,并查询索引表中的数据
0: jdbc:phoenix:bigdata:2181> select * from BIGDATA.LOCAL_INDEX_IDX;
+----------+--------+------+
|  0:NAME  | 0:AGE  | :ID  |
+----------+--------+------+
| hadoop4  | 12     | 4    |
| hadoop3  | 17     | 3    |
| hadoop2  | 19     | 2    |
| hadoop1  | 18     | 1    |
+----------+--------+------+
4 rows selected (0.043 seconds)
  • 查看查询全表的执行计划
0: jdbc:phoenix:bigdata:2181> explain select * from bigdata.local_index;
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                      PLAN                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:LOCAL_INDEX  | null            | null           | null         |
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.025 seconds)
  • FULL SCAN BIGDATA:LOCAL_INDEX 里面的数据
  • 查看 Where 后面跟着索引字段查询所有字段执行计划
0: jdbc:phoenix:bigdata:2181> explain select * from bigdata.local_index where name = 'hadoop1';
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                      PLAN                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:LOCAL_INDEX  | null            | null           | null         |
|     SERVER FILTER BY NAME = 'hadoop1'                                           | null            | null           | null         |
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
  • FULL SCAN BIGDATA:LOCAL_INDEX 里面的数据
  • 查看 Where 里面带着索引字段,并且查询非索引字段的执行计划
0: jdbc:phoenix:bigdata:2181> explain select address from bigdata.local_index where name = 'hadoop1';
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                              PLAN                                              | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:LOCAL_INDEX [1,'hadoop1']  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY                                                            | null            | null           | null         |
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.039 seconds)
0: jdbc:phoenix:bigdata:2181> explain select address from bigdata.local_index where age = 18;
+--------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                         PLAN                                         | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:LOCAL_INDEX [1]  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18                       | null            | null           | null         |
+--------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.025 seconds
  • 总结:要有where条件,且字段不要以*
3.3 覆盖索引
  • 特点:空间换时间,增加查询数据
Phoenix is particularly powerful in that we provide covered indexes - we do not need to go back to the primary table once we have found the index entry. Instead, we bundle the data we care about right in the index rows, saving read-time overhead.

For example, the following would create an index on the v1 and v2 columns and include the v3 column in the index as well to prevent having to get it from the data table:

CREATE INDEX my_index ON my_table (v1,v2) INCLUDE(v3)
  • Phoenix提供了一种叫Covered Index覆盖索引的二级索引。这种索引在获取数据的过程中,内部不需要再去HBase上获取任何数据,你查询需要返回的列的数据都会被存储在索引中。要想达到这种效果,你的select 的列,where 的列都需要在索引中出现。举个例子,如果你的SQL语句是 select name from hao1 where age=2,要最大化查询效率和速度最快,你就需要建立覆盖索引。
  • 其本质也需要创建一个索引表,不过 Where 条件中的所有字段以及需要查询的所有的字段都在索引表中。
  • 优点:
    • 空间换时间 查询快。
  • 缺点:
    • 存储空间浪费。
  • 创建表并插入数据
0: jdbc:phoenix:bigdata:2181> create table bigdata.covered_index
. . . . . . . . . . . . . . > (id integer primary key, 
. . . . . . . . . . . . . . > name varchar,
. . . . . . . . . . . . . . > age integer,
. . . . . . . . . . . . . . > address varchar) 
. . . . . . . . . . . . . . > salt_buckets = 20;
No rows affected (2.329 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.covered_index  values (1,'hadoop1',18,'北京');
1 row affected (0.009 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.covered_index  values (2,'hadoop2',19,'上海');
1 row affected (0.006 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.covered_index  values (3,'hadoop3',17,'杭州');
1 row affected (0.011 seconds)
0: jdbc:phoenix:bigdata:2181> upsert into bigdata.covered_index  values (4,'hadoop4',12,'深圳');
1 row affected (0.006 seconds)
  • 创建索引
0: jdbc:phoenix:bigdata:2181> CREATE INDEX covered_index_idx ON  bigdata.covered_index(name,age) INCLUDE(address);
4 rows affected (7.271 seconds)
  • 查看 HBase 中索引表的表结构
hbase(main):006:0> describe 'BIGDATA:COVERED_INDEX_IDX'
Table BIGDATA:COVERED_INDEX_IDX is ENABLED

BIGDATA:COVERED_INDEX_IDX, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 
=> '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', METADATA => {'DATA_TABLE_NAME' => 'BIGDATA.COVERED_INDEX', 'PRIORITY' =>
 '1000'}}
 
 COLUMN FAMILIES DESCRIPTION

{NAME => '0', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '
65536', REPLICATION_SCOPE => '0'}

1 row(s) in 0.0240 seconds
  • 查看 Phoenix 里面的索引表的数据
0: jdbc:phoenix:bigdata:2181> select * from BIGDATA.COVERED_INDEX_IDX;
+----------+--------+------+------------+
|  0:NAME  | 0:AGE  | :ID  | 0:ADDRESS  |
+----------+--------+------+------------+
| hadoop2  | 19     | 2    | 上海         |
| hadoop3  | 17     | 3    | 杭州         |
| hadoop1  | 18     | 1    | 北京         |
| hadoop4  | 12     | 4    | 深圳         |
+----------+--------+------+------------+
4 rows selected (0.054 seconds)
  • 查看 Where 条件和查询列都在索引表中查询的执行计划
0: jdbc:phoenix:bigdata:2181> explain select address from bigdata.covered_index where age = 18;
+---------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                               PLAN                                                | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:COVERED_INDEX_IDX [0] - [19]  | null            | null           | null         |
|     SERVER FILTER BY TO_INTEGER("AGE") = 18                                                       | null            | null           | null         |
+---------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.021 seconds)
  • 可以看出是通过 RANGE SCAN 直接查询了索引表
3.4 异步索引
  • 一般我们可以使用CREATE INDEX来创建一个索引,这是一种同步的方法。但是有时候我们创建索引的表非常大,我们需要等很长时间。Phoenix 4.5以后有一个异步创建索引的方式,使用关键字ASYNC来创建索引。
CREATE INDEX async_index ON my_schema.my_table (v) ASYNC
  • 这时候创建的索引表中不会有数据。你还必须要单独的使用命令行工具来执行数据的创建。当语句给执行的时候,后端会启动一个map reduce任务,只有等到这个任务结束,数据都被生成在索引表中后,这个索引才能被使用。启动工具的方法。
${HBASE_HOME}/bin/hbase 
org.apache.phoenix.mapreduce.index.IndexTool
  --schema MY_SCHEMA 
  --data-table MY_TABLE 
  --index-table ASYNC_IDX
  --output-path ASYNC_IDX_HFILES
  • 如果一口气导入海量的数据到 HBase 可以使用异步来执行。
  • 但是还可以使用 Bulk_DataLoad。

Bulk_DataLoad(官网)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值