大数据之Phoenix:Phoenix+HBase、DDL、DML和二级索引的使用

记录下,方便查

官方文档:Phoenix官方文档

DDL

查看所有表

0: jdbc:phoenix:mini1,mini2,mini3:2181> !tables
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+----------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUTABL |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+----------+
|            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                            |                 |              | false    |
|            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                            |                 |              | false    |
|            | SYSTEM       | LOG         | SYSTEM TABLE  |          |            |                            |                 |              | true     |
|            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                            |                 |              | false    |
|            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                            |                 |              | false    |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+----------+
0: jdbc:phoenix:mini1,mini2,mini3:2181> 

创建表

0: jdbc:phoenix:mini1,mini2,mini3:2181> CREATE TABLE IF NOT EXISTS us_population (
. . . . . . . . . . . . . . . . . . . >       state CHAR(2) NOT NULL,
. . . . . . . . . . . . . . . . . . . >       city VARCHAR NOT NULL,
. . . . . . . . . . . . . . . . . . . >       population BIGINT
. . . . . . . . . . . . . . . . . . . >       CONSTRAINT my_pk PRIMARY KEY (state, city));
No rows affected (1.844 seconds)

  • 说明

    • char类型必须添加长度限制
    • varchar 可以不用长度限制
    • 主键映射到 HBase 中会成为 Rowkey. 如果有多个主键(联合主键), 会把多个主键的值拼成 rowkey
    • 在 Phoenix 中, 默认会把表名,字段名等自动转换成大写. 如果要使用小写, 需要把他们用双引号括起来.
  • 建表时注意数据类型

    http://phoenix.apache.org/language/datatypes.html

    数据类型Java Map占用大小 (byte)范围
    INTEGERjava.lang.Integer4-2147483648 to 2147483647
    UNSIGNED_INTjava.lang.Integer40 to 2147483647
    BIGINTjava.lang.Long8-9223372036854775807 to 9223372036854775807
    UNSIGNED_LONGjava.lang.Long80 to 9223372036854775807
    TINYINTjava.lang.Byte1-128 to 127
    UNSIGNED_TINYINTjava.lang.Byte10 to 127
    SMALLINTjava.lang.Short2-32768 to 32767
    UNSIGNED_SMALLINTjava.lang.Short20 to 32767
    FLOATjava.lang.Float4-3.402823466 E + 38 to 3.402823466 E + 38
    UNSIGNED_FLOATjava.lang.Float40 to 3.402823466 E + 38
    DOUBLEjava.lang.Double8-1.7976931348623158 E + 308 to 1.7976931348623158 E + 308
    UNSIGNED_DOUBLEjava.lang.Double80 to 1.7976931348623158 E + 308
    DECIMALjava.math.BigDecimalDECIMAL(p,s)
    BOOLEANjava.lang.Boolean1TRUE and FALSE
    TIMEjava.sql.Time8yyyy-MM-dd hh:mm:ss
    DATEjava.sql.Date8yyyy-MM-dd hh:mm:ss,
    TIMESTAMPjava.sql.Timestamp12yyyy-MM-dd hh:mm:ss[.nnnnnnnnn]
    UNSIGNED_TIMEjava.sql.Time8yyyy-MM-dd hh:mm:ss
    UNSIGNED_DATEjava.sql.Date8yyyy-MM-dd hh:mm:ss
    UNSIGNED_TIMESTAMPjava.sql.Timestamp12
    VARCHARjava.lang.StringVARCHAR(n)
    CHARjava.lang.StringCHAR (n)
    BINARYbyte[]BINARY(n)
    VARBINARYbyte[]VARBINARY
    ARRAYjava.sql.ArrayVARCHAR ARRAY

查看表结构

0: jdbc:phoenix:mini1,mini2,mini3:2181> !describe us_population
+------------+--------------+----------------+--------------+------------+------------+--------------+----------------+-----------------+------------------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   | COLUMN_NAME  | DATA_TYPE  | TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX   |
+------------+--------------+----------------+--------------+------------+------------+--------------+----------------+-----------------+------------------+
|            |              | US_POPULATION  | STATE        | 1          | CHAR       | 2            | null           | null            | null             |
|            |              | US_POPULATION  | CITY         | 12         | VARCHAR    | null         | null           | null            | null             |
|            |              | US_POPULATION  | POPULATION   | -5         | BIGINT     | null         | null           | null            | null             |
+------------+--------------+----------------+--------------+------------+------------+--------------+----------------+-----------------+------------------+

修改表

Alters an existing table by adding or removing columns or updating table options. When a column is dropped from a table, the data in that column is deleted as well. PK columns may not be dropped, and only nullable PK columns may be added. For a view, the data is not affected when a column is dropped. Note that creating or dropping columns only affects subsequent queries and data modifications. Snapshot queries that are connected at an earlier timestamp will still use the prior schema that was in place when the data was written.

Example:

ALTER TABLE my_schema.my_table ADD d.dept_id char(10) VERSIONS=10
ALTER TABLE my_table ADD dept_name char(50), parent_id char(15) null primary key
ALTER TABLE my_table DROP COLUMN d.dept_id, parent_id;
ALTER VIEW my_view DROP COLUMN new_col;
ALTER TABLE my_table SET IMMUTABLE_ROWS=true,DISABLE_WAL=true;

0: jdbc:phoenix:mini1,mini2,mini3:2181> ALTER TABLE us_population ADD dept_id char(10),parent_id char(15);
No rows affected (6.063 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> !describe us_population
+------------+--------------+----------------+--------------+------------+------------+--------------+----------------+-----------------+------------------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   | COLUMN_NAME  | DATA_TYPE  | TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX   |
+------------+--------------+----------------+--------------+------------+------------+--------------+----------------+-----------------+------------------+
|            |              | US_POPULATION  | STATE        | 1          | CHAR       | 2            | null           | null            | null             |
|            |              | US_POPULATION  | CITY         | 12         | VARCHAR    | null         | null           | null            | null             |
|            |              | US_POPULATION  | POPULATION   | -5         | BIGINT     | null         | null           | null            | null             |
|            |              | US_POPULATION  | DEPT_ID      | 1          | CHAR       | 10           | null           | null            | null             |
|            |              | US_POPULATION  | PARENT_ID    | 1          | CHAR       | 15           | null           | null            | null             |
+------------+--------------+----------------+--------------+------------+------------+--------------+----------------+-----------------+------------------+
0: jdbc:phoenix:mini1,mini2,mini3:2181> ALTER TABLE us_population DROP COLUMN dept_id, parent_id ;
No rows affected (0.262 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> !describe us_population
+------------+--------------+----------------+--------------+------------+------------+--------------+----------------+-----------------+------------------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   | COLUMN_NAME  | DATA_TYPE  | TYPE_NAME  | COLUMN_SIZE  | BUFFER_LENGTH  | DECIMAL_DIGITS  | NUM_PREC_RADIX   |
+------------+--------------+----------------+--------------+------------+------------+--------------+----------------+-----------------+------------------+
|            |              | US_POPULATION  | STATE        | 1          | CHAR       | 2            | null           | null            | null             |
|            |              | US_POPULATION  | CITY         | 12         | VARCHAR    | null         | null           | null            | null             |
|            |              | US_POPULATION  | POPULATION   | -5         | BIGINT     | null         | null           | null            | null             |
+------------+--------------+----------------+--------------+------------+------------+--------------+----------------+-----------------+------------------+

删除表

0: jdbc:phoenix:mini1,mini2,mini3:2181> drop table us_population;
No rows affected (4.121 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> !table
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+----------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUTABL |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+----------+
|            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                            |                 |              | false    |
|            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                            |                 |              | false    |
|            | SYSTEM       | LOG         | SYSTEM TABLE  |          |            |                            |                 |              | true     |
|            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                            |                 |              | false    |
|            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                            |                 |              | false    |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+----------+

退出命令行

!quit

DML

插入记录

0: jdbc:phoenix:mini1,mini2,mini3:2181> upsert into us_population values('NY','NewYork',8143197);
1 row affected (0.035 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> upsert into us_population values('CA','Los Angeles',3844829);
1 row affected (0.005 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> upsert into us_population values('IL','Chicago',2842518);
1 row affected (0.004 seconds)

说明: upset可以看成是update和insert的结合体.

查询记录

0: jdbc:phoenix:mini1,mini2,mini3:2181> select * from US_POPULATION;
+--------+--------------+-------------+
| STATE  |     CITY     | POPULATION  |
+--------+--------------+-------------+
| CA     | Los Angeles  | 3844829     |
| IL     | Chicago      | 2842518     |
| NY     | NewYork      | 8143197     |
+--------+--------------+-------------+
3 rows selected (0.049 seconds)

复杂查询语法请参看官网说明

删除记录

0: jdbc:phoenix:mini1,mini2,mini3:2181> delete from us_population where state='NY';
1 row affected (0.014 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> select * from US_POPULATION;
+--------+--------------+-------------+
| STATE  |     CITY     | POPULATION  |
+--------+--------------+-------------+
| CA     | Los Angeles  | 3844829     |
| IL     | Chicago      | 2842518     |
+--------+--------------+-------------+
2 rows selected (0.214 seconds)

修改记录

批量修改

Inserts if not present and updates otherwise rows in the table based on the results of running another query. The values are set based on their matching position between the source and target tables. The list of columns is optional and if not present will map to the column in the order they are declared in the schema. If auto commit is on, and both a) the target table matches the source table, and b) the select performs no aggregation, then the population of the target table will be done completely on the server-side (with constraint violations logged, but otherwise ignored). Otherwise, data is buffered on the client and, if auto commit is on, committed in row batches as specified by the UpsertBatchSize connection property (or the phoenix.mutate.upsertBatchSize HBase config property which defaults to 10000 rows)

Example:

UPSERT INTO test.targetTable(col1, col2) SELECT col3, col4 FROM test.sourceTable WHERE col5 < 100
UPSERT INTO foo SELECT * FROM bar;

UPSERT INTO US_POPULATION(STATE, CITY,POPULATION) SELECT 'CA','Los Angeles_update',3844827 FROM US_POPULATION WHERE POPULATION > 3844828;

单值修改

Inserts if not present and updates otherwise the value in the table. The list of columns is optional and if not present, the values will map to the column in the order they are declared in the schema. The values must evaluate to constants.

Use the ON DUPLICATE KEY clause (available in Phoenix 4.9) if you need the UPSERT to be atomic. Performance will be slower in this case as the row needs to be read on the server side when the commit is done. Use IGNORE if you do not want the UPSERT performed if the row already exists. Otherwise, with UPDATE, the expression will be evaluated and the result used to set the column, for example to perform an atomic increment. An UPSERT to the same row in the same commit batch will be processed in the order of execution.

Example:

UPSERT INTO TEST VALUES(‘foo’,‘bar’,3);
UPSERT INTO TEST(NAME,ID) VALUES(‘foo’,123);
UPSERT INTO TEST(ID, COUNTER) VALUES(123, 0) ON DUPLICATE KEY UPDATE COUNTER = COUNTER + 1;
UPSERT INTO TEST(ID, MY_COL) VALUES(123, 0) ON DUPLICATE KEY IGNORE;

0: jdbc:phoenix:mini1,mini2,mini3:2181> select * from US_POPULATION;
+--------+---------------------+-------------+
| STATE  |        CITY         | POPULATION  |
+--------+---------------------+-------------+
| A      | Rose                | 3844822     |
| CA     | Los Angeles         | 3844829     |
| CA     | Los Angeles_update  | 3844827     |
| IL     | Chicago             | 2842518     |
+--------+---------------------+-------------+
4 rows selected (0.039 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> UPSERT INTO US_POPULATION(STATE, CITY,POPULATION) VALUES('A','Rose',3844821);
1 row affected (0.005 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> select * from US_POPULATION;
+--------+---------------------+-------------+
| STATE  |        CITY         | POPULATION  |
+--------+---------------------+-------------+
| A      | Rose                | 3844821     |
| CA     | Los Angeles         | 3844829     |
| CA     | Los Angeles_update  | 3844827     |
| IL     | Chicago             | 2842518     |
+--------+---------------------+-------------+
4 rows selected (0.032 seconds)

如上,如果主键记录不存在会直接insert,如果存在会update

Phoenix表映射

Phoenix 和 HBase 映射关系:默认情况下, 直接在 HBase 中创建的表通过 Phoenix 是查不到的;

如果要在 Phoenix 中操作直接在 HBase 中创建的表,则需要在 Phoenix 中进行表的映射。

映射方式有两种: 1. 视图映射 2. 表映射

准备HBase表来测试

hbase(main):001:0>  create 'test', 'name', 'company'
0 row(s) in 1.5560 seconds

=> Hbase::Table - test

hbase(main):008:0> describe 'test'
Table test is ENABLED                                                                                                                                       
test                                                                                                                                                        
COLUMN FAMILIES DESCRIPTION                                                                                                                                 
{NAME => 'company', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREV
ER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                      
{NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER'
, COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                         
2 row(s) in 0.1270 seconds

此时在Phoenix中看不到该表

0: jdbc:phoenix:mini1,mini2,mini3:2181> !table
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+--------------+-------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUT |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+--------------+-------+
|            | SYSTEM       | CATALOG        | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            | SYSTEM       | FUNCTION       | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            | SYSTEM       | LOG            | SYSTEM TABLE  |          |            |                            |                 |              | true  |
|            | SYSTEM       | SEQUENCE       | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            | SYSTEM       | STATS          | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            |              | US_POPULATION  | TABLE         |          |            |                            |                 |              | false |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+--------------+-------+

视图映射

Phoenix 创建的视图是只读的, 所以只能用来查询, 无法通过视图对数据进行修改等操作.

0: jdbc:phoenix:mini1,mini2,mini3:2181> create view "test"(empid varchar primary key,"name"."firstname" varchar,"name"."lastname" varchar,"company"."name" varchar,"company"."address" varchar);
No rows affected (6.225 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> !table
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+--------------+-------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUT |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+--------------+-------+
|            | SYSTEM       | CATALOG        | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            | SYSTEM       | FUNCTION       | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            | SYSTEM       | LOG            | SYSTEM TABLE  |          |            |                            |                 |              | true  |
|            | SYSTEM       | SEQUENCE       | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            | SYSTEM       | STATS          | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            |              | US_POPULATION  | TABLE         |          |            |                            |                 |              | false |
|            |              | test           | VIEW          |          |            |                            |                 |              | false |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+--------------+-------+

之后就可以在phoenix中查询了,但是注意表名如果是小写要加引号

0: jdbc:phoenix:mini1,mini2,mini3:2181> select * from "test";
+--------+------------+-----------+-------+----------+
| EMPID  | firstname  | lastname  | name  | address  |
+--------+------------+-----------+-------+----------+
+--------+------------+-----------+-------+----------+
No rows selected (0.126 seconds)

表映射

表映射可以更改Hbase中表数据

HBase中表不存在时

Hbase中表不存在时,可以直接使用 create table 指令创建需要的表,系统将会自动在 Phoenix 和 HBase 中创建 person_infomation 的表,并会根据指令内的参数对表结构进行初始化。

HBase中表存在时

HBase 中已经存在表时,可以以类似创建视图的方式创建关联表,只需要将create view 改为 create table 即可

在 HBase 中创建表:

hbase(main):010:0>  create 'test1', 'name', 'company'
0 row(s) in 1.3540 seconds

=> Hbase::Table - test1
hbase(main):011:0> list
TABLE                                                                                     ..                                                                                       US_POPULATION                                                                             test                                                                                     test1                                                                                                                                                       
9 row(s) in 0.0160 seconds

=> ["SYSTEM.CATALOG", "SYSTEM.FUNCTION", "SYSTEM.LOG", "SYSTEM.MUTEX", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "US_POPULATION", "test", "test1"]

在Phoenix中进行表映射:

0: jdbc:phoenix:mini1,mini2,mini3:2181> create table "test1"(empid varchar primary key,"name"."firstname" varchar,"name"."lastname" varchar,"company"."name" varchar,"company"."address" varchar) column_encoded_bytes=0;
No rows affected (6.22 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> !table
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+--------------+-------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMUT |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+--------------+-------+
|            | SYSTEM       | CATALOG        | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            | SYSTEM       | FUNCTION       | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            | SYSTEM       | LOG            | SYSTEM TABLE  |          |            |                            |                 |              | true  |
|            | SYSTEM       | SEQUENCE       | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            | SYSTEM       | STATS          | SYSTEM TABLE  |          |            |                            |                 |              | false |
|            |              | US_POPULATION  | TABLE         |          |            |                            |                 |              | false |
|            |              | test1          | TABLE         |          |            |                            |                 |              | false |
|            |              | test           | VIEW          |          |            |                            |                 |              | false |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+--------------+-------+

Phoenix 区分大小写,切默认情况下会将小写转成大写,所以表名、列簇、列名需要用双引号。
Phoenix 4.10 版本之后,在创建表映射时需要将 COLUMN_ENCODED_BYTES 置为 0。
删除映射表,会同时删除原有 HBase 表。所以如果只做查询炒作,建议做视图映射。

视图映射和表映射总结

相比于直接创建映射表,视图的查询效率会低, 原因是:创建映射表的时候,Phoenix 会在表中创建一些空的键值对,这些空键值对的存在可以用来提高查询效率。

使用create table创建的关联表,如果对表进行了修改,源数据也会改变,同时如果关联表被删除,源表也会被删除。但是视图就不会,如果删除视图,源数据不会发生改变。

Phoenix创建HBase二级索引

HBase一般只能通过rowkey进行索引,这里说的二级索引就是对非rowkey检索时的索引使用,从 0.94 版本开始, HBase 开始支持二级索引. 这里记录使用 Phoenix 给 HBase 添加二级索引.

配置 HBase 支持 Phoenix 创建二级索引

需要先给 HBase 配置支持创建二级索引

  • 步骤 1: 添加如下配置到 HBase 的 Hregionerver 节点的 hbase-site.xml
<!-- phoenix regionserver 配置参数 -->
<property>
    <name>hbase.regionserver.wal.codec</name>
    <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>

<property>
    <name>hbase.region.server.rpc.scheduler.factory.class</name>
    <value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
<description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
</property>

<property>
    <name>hbase.rpc.controllerfactory.class</name>
    <value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
    <description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
</property>
  • 步骤 2: 添加如下配置到 HBase 的 Hmaster 节点的 hbase-site.xml

    <property>
        <name>hbase.master.loadbalancer.class</name>
        <value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value>
    </property>
    
    <property>
        <name>hbase.coprocessor.master.classes</name>
        <value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value>
    </property>
    
  • 重启HBase

测试二级索引

准备数据

0: jdbc:phoenix:mini1,mini2,mini3:2181> create table user_1(id varchar primary key, name varchar, addr varchar);
No rows affected (2.59 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> upsert into user_1 values ('1', 'zs', 'beijing');
1 row affected (0.078 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> upsert into user_1 values ('2', 'lisi', 'shanghai');
1 row affected (0.006 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> upsert into user_1 values ('3', 'ww', 'sz');
1 row affected (0.006 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> select * from user_1;
+-----+-------+-----------+
| ID  | NAME  |   ADDR    |
+-----+-------+-----------+
| 1   | zs    | beijing   |
| 2   | lisi  | shanghai  |
| 3   | ww    | sz        |
+-----+-------+-----------+
3 rows selected (0.074 seconds)

如下rowkey查询时是使用了索引的,仍然是使用explain关键字查看

0: jdbc:phoenix:mini1,mini2,mini3:2181> select * from user_1 where ID = '1';
+-----+-------+----------+
| ID  | NAME  |   ADDR   |
+-----+-------+----------+
| 1   | zs    | beijing  |
+-----+-------+----------+
1 row selected (0.036 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> explain select * from user_1 where ID = '1';
+-----------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                             PLAN                                              | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 1 ROWS 205 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 1 KEY OVER USER_1  | 205             | 1              | 0            |
+-----------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.036 seconds)

其他字段是不支持索引的目前

如下full scan

0: jdbc:phoenix:mini1,mini2,mini3:2181> explain select * from user_1 where NAME = 'zs';
+------------------------------------------------------------------+-----------------+----------------+--------------+
|                               PLAN                               | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER USER_1  | null            | null           | null         |
|     SERVER FILTER BY NAME = 'zs'                                 | null            | null           | null         |
+------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.097 seconds)

给 name 字段添加索引

0: jdbc:phoenix:mini1,mini2,mini3:2181> create index idx_user_1 on user_1(name);
3 rows affected (6.584 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> explain select * from user_1 where NAME = 'zs';
+------------------------------------------------------------------+-----------------+----------------+--------------+
|                               PLAN                               | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER USER_1  | null            | null           | null         |
|     SERVER FILTER BY NAME = 'zs'                                 | null            | null           | null         |
+------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.094 seconds)

Phoenix 索引分类

Phoenix 索引分全局索引和局部索引

全局索引

global index 全局索引是默认的索引格式。

  • 适用于多读少写的业务场景。写数据的时候会消耗大量开销,因为索引表也要更新,而索引表是分布在不同的数据节点上的,跨节点的数据传输带来了较大的性能消耗。

  • 在读数据的时候 Phoenix 会选择索引表来降低查询消耗的时间。

  • 如果想查询的字段不是索引字段的话索引表不会被使用,也就是说不会带来查询速度的提升。

创建全局索引方法

CREATE INDEX my_index ON my_table (my_col)
0: jdbc:phoenix:mini1,mini2,mini3:2181> CREATE INDEX my_index_name on user_1(name);
3 rows affected (6.317 seconds)

  • 创建全局索引, 也支持查询其他字段
CREATE INDEX my_index ON my_table (v1) INCLUDE (v2)

SELECT v2 FROM my_table WHERE v1 = 'foo'
0: jdbc:phoenix:mini1,mini2,mini3:2181> CREATE INDEX my_index_name on user_1(name) INCLUDE(ADDR);
3 rows affected (6.293 seconds)
0: jdbc:phoenix:mini1,mini2,mini3:2181> explain select * from user_1 where ADDR = 'beijing';
+-------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                  PLAN                                   | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER MY_INDEX_NAME  | null            | null           | null         |
|     SERVER FILTER BY "ADDR" = 'beijing'                                 | null            | null           | null         |
+-------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.069 seconds)

这里的意思是会在name字段的索引MY_INDEX_NAME的基础上full scan,查询会加快

局部索引

  • local index 适用于写操作频繁的场景。索引数据和数据表的数据是存放在相同的服务器中的,避免了在写操作的时候往不同服务器的索引表中写索引带来的额外开销。

  • 查询的字段不是索引字段索引表也会被使用,这会带来查询速度的提升。

创建局部索引的方法(相比全局索引多了一个关键字 local):

CREATE LOCAL INDEX my_index ON my_table (my_col)
0: jdbc:phoenix:mini1,mini2,mini3:2181> CREATE LOCAL INDEX my_local_index_name on user_1(name);
3 rows affected (6.309 seconds)

注意,建立local索引时,hbase-site.xml配置文件的zk信息不能加2181,否则会报错

Local index 和 Global index区别

  • Local index 由于是数据与索引在同一服务器上,所以要查询的数据在哪台服务器的哪个region是无法定位的,只能先找到region然后再利用索引。

  • Global index 是一种分布式索引,可以直接利用索引定位服务器和region,速度更快,但是由于分布式的原因,数据一旦出现新增变化,分布式的索引要进行跨服务的同步操作,带来大量的通信消耗。所以在写操作频繁的字段上不适合建立Global index。

删除索引

 DROP INDEX my_index ON my_table
0: jdbc:phoenix:mini1,mini2,mini3:2181> DROP INDEX my_index_name ON user_1;
No rows affected (4.003 seconds)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值