Global Indexes
Global indexing targets read heavy uses cases. With global indexes, all the performance penalties for indexes occur at write time. We intercept the data table updates on write (DELETE, UPSERT VALUES and UPSERT SELECT), build the index update and then sent any necessary updates to all interested index tables. At read time, Phoenix will select the index table to use that will produce the fastest query time and directly scan it just like any other HBase table. An index will not be used for a query that references a column that isn’t part of the index.
0: jdbc:phoenix:bigdata:2181> explain select * from bigdata.global_index;
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:GLOBAL_INDEX | null | null | null |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
可以看出扫描的是 GLOBAL_INDEX(索引表) ,并且是 FULL SCAN(全表扫描)
查看 Where 条件中 带有第一个索引字段的执行计划
0: jdbc:phoenix:bigdata:2181> explain select * from bigdata.global_index where name = 'hadoop1';
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:GLOBAL_INDEX | null | null | null |
| SERVER FILTER BY NAME = 'hadoop1' | null | null | null |
+----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.03 seconds)
可以看出先 FULL SCAN 索引表,然后再把索引表里面的数据进行一下 FILTER。
查看 Where 条件中 带有第一个索引字段强制走索引的执行计划
强制走索引的语法:/*+ INDEX(
库
名
.
库名.
库名.表名 $索引名) */ *
0: jdbc:phoenix:bigdata:2181> explain select /*+ INDEX(bigdata.global_index global_index_idx) */ * from bigdata.global_index where name = 'hadoop1';
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:GLOBAL_INDEX | null | null | null |
| SKIP-SCAN-JOIN TABLE 0 | null | null | null |
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:GLOBAL_INDEX_IDX [0,'hadoop1'] - [19,'hadoop1'] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY | null | null | null |
| DYNAMIC SERVER FILTER BY "BIGDATA.GLOBAL_INDEX.ID" IN ($11.$13) | null | null | null |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.049 seconds)
和上面不强制走索引效果差不多
查看 Where 条件中和需要查询的字段都在索引表的执行计划
0: jdbc:phoenix:bigdata:2181> explain select age from bigdata.global_index where name = 'hadoop1';
+----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:GLOBAL_INDEX_IDX [0,'hadoop1'] - [19,'hadoop1'] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY | null | null | null |
+----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.034 seconds)
0: jdbc:phoenix:bigdata:2181> explain select name from bigdata.global_index where age = 18;
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:GLOBAL_INDEX_IDX [0] - [19] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18 | null | null | null |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.027 seconds)
可以看出 RANGE SCAN 了索引表,这种是最优化的方案
总结:
where条件全部是索引字段,select条件全部是索引字段,最优的。
where条件全部是索引字段,select条件不部是索引字段,为了走index,强制走。
3.2 本地索引
特点:重写轻读
Local Indexes
Local indexing targets write heavy, space constrained use cases. Just like with global indexes, Phoenix will automatically select whether or not to use a local index at query-time. With local indexes, index data and table data co-reside on same server preventing any network overhead during writes. Local indexes can be used even when the query isn’t fully covered (i.e. Phoenix automatically retrieve the columns not in the index through point gets against the data table). Unlike global indexes, all local indexes of a table are stored in a single, separate shared table prior to 4.8.0 version. From 4.8.0 onwards we are storing all local index data in the separate shadow column families in the same data table. At read time when the local index is used, every region must be examined for the data as the exact region location of index data cannot be predetermined. Thus some overhead occurs at read-time.
0: jdbc:phoenix:bigdata:2181> explain select * from bigdata.local_index;
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:LOCAL_INDEX | null | null | null |
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.025 seconds)
FULL SCAN BIGDATA:LOCAL_INDEX 里面的数据
查看 Where 后面跟着索引字段查询所有字段执行计划
0: jdbc:phoenix:bigdata:2181> explain select * from bigdata.local_index where name = 'hadoop1';
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN FULL SCAN OVER BIGDATA:LOCAL_INDEX | null | null | null |
| SERVER FILTER BY NAME = 'hadoop1' | null | null | null |
+---------------------------------------------------------------------------------+-----------------+----------------+--------------+
FULL SCAN BIGDATA:LOCAL_INDEX 里面的数据
查看 Where 里面带着索引字段,并且查询非索引字段的执行计划
0: jdbc:phoenix:bigdata:2181> explain select address from bigdata.local_index where name = 'hadoop1';
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:LOCAL_INDEX [1,'hadoop1'] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY | null | null | null |
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.039 seconds)
0: jdbc:phoenix:bigdata:2181> explain select address from bigdata.local_index where age = 18;
+--------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+--------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 20-CHUNK PARALLEL 20-WAY ROUND ROBIN RANGE SCAN OVER BIGDATA:LOCAL_INDEX [1] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND TO_INTEGER("AGE") = 18 | null | null | null |
+--------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.025 seconds
总结:要有where条件,且字段不要以*
3.3 覆盖索引
特点:空间换时间,增加查询数据
Phoenix is particularly powerful in that we provide covered indexes - we do not need to go back to the primary table once we have found the index entry. Instead, we bundle the data we care about right in the index rows, saving read-time overhead.
For example, the following would create an index on the v1 and v2 columns and include the v3 column in the index as well to prevent having to get it from the data table:
CREATE INDEX my_index ON my_table (v1,v2) INCLUDE(v3)
Phoenix提供了一种叫Covered Index覆盖索引的二级索引。这种索引在获取数据的过程中,内部不需要再去HBase上获取任何数据,你查询需要返回的列的数据都会被存储在索引中。要想达到这种效果,你的select 的列,where 的列都需要在索引中出现。举个例子,如果你的SQL语句是 select name from hao1 where age=2,要最大化查询效率和速度最快,你就需要建立覆盖索引。
其本质也需要创建一个索引表,不过 Where 条件中的所有字段以及需要查询的所有的字段都在索引表中。