1.添加二级索引配置
为使用Phoenix二级索引,需要开启HBase regionServer相关参数配置
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
用户可直接在CM中的HBase配置页,搜索hbase-site.xml配置相关参数(保存并重启服务即可)
2.全局索引
全局索引适用于读多写少的场景,它会建立一张独立的HBase索引表,对原始数据侵入性小。所有对数据表的写操作都会引起索引表的更新(DELETE、UPSERT VALUES和UPSERT SELECT))。索引表分布在不同的数据节点上的,跨节点的数据传输带来了较大的性能消耗。如果查询列不在索引表中,默认索引表将不会被使用,除非使用hint(SELECT /*+ INDEX(table_name,index_name) */ …)
2.1首先,在Phoenix中创建一表。
create table if not exists "S_INDEX"("id" varchar primary key ,
"cf1"."name" varchar,
"cf1"."age" varchar ,
"cf1"."sex" varchar);
2.2插入数据
upsert into S_INDEX values('1','张三','34','男');
upsert into S_INDEX values('2','李四','23','男');
upsert into S_INDEX values('3','小红','27','女');
2.3.全局索引(单字段)
CREATE INDEX IDX_S_INDEX ON S_INDEX("cf1"."name");
查询
select * from S_INDEX where "name"='张三';
2.4分析计划
explain select count(*) from S_INDEX where “name”=‘张三’;
0: jdbc:phoenix:master01:2181:/hbase> explain select count(*) from S_INDEX where "name"='张三';
+-------------------------------------------------------------------+
| PLAN |
+-------------------------------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER IDX_S_INDEX ['张三'] |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER AGGREGATE INTO SINGLE ROW |
+-------------------------------------------------------------------+
3 rows selected (0.045 seconds)
2.5 删除索引并查看执行计划,发现进行了全局扫描FULL SCAN OVER TMP_STAFF
1删除索引
DROP INDEX IDX_S_INDEX ON S_INDEX;
查询
explain select count(*) from S_INDEX where "name"='张三';
0: jdbc:phoenix:master01:2181:/hbase> explain select count(*) from S_INDEX where "name"='张三';
+-------------------------------------------------------+
| PLAN |
+-------------------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER S_INDEX |
| SERVER FILTER BY cf1."name" = '张三' |
| SERVER AGGREGATE INTO SINGLE ROW |
+-------------------------------------------------------+
3 rows selected (0.113 seconds)
0: jdbc:phoenix:master01:2181:/hbase>
2 插入数据
upsert into S_INDEX values('4','张三','12','男');
upsert into S_INDEX values('5','张明','19','男');
3查询数据
0: jdbc:phoenix:master01:2181:/hbase> explain select count(*) from S_INDEX where "name"='张三';
+-------------------------------------------------------------------+
| PLAN |
+-------------------------------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER IDX_S_INDEX ['张三'] |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER AGGREGATE INTO SINGLE ROW |
+-------------------------------------------------------------------+
3 rows selected (0.036 seconds)
0: jdbc:phoenix:master01:2181:/hbase> select * from S_INDEX where "name"='张三';
+-----+-------+------+------+
| id | name | age | sex |
+-----+-------+------+------+
| 1 | 张三 | 34 | 男 |
| 4 | 张三 | 12 | 男 |
+-----+-------+------+------+
2 rows selected (0.11 seconds)
0: jdbc:phoenix:master01:2181:/hbase>
3通过添加一百万条数据(代码导入的方式,可以参考我另外一篇博客)
3.1未创建索引
0: jdbc:phoenix:master01:2181:/hbase> explain select count(*) from S_INDEX where "name"='张三';
+----------------------------------------------------------------------+
| PLAN |
+----------------------------------------------------------------------+
| CLIENT 1-CHUNK 0 ROWS 0 BYTES PARALLEL 1-WAY FULL SCAN OVER S_INDEX |
| SERVER FILTER BY cf1."name" = '张三' |
| SERVER AGGREGATE INTO SINGLE ROW |
+----------------------------------------------------------------------+
3 rows selected (0.078 seconds)
0: jdbc:phoenix:master01:2181:/hbase> select count(*) from S_INDEX where "name"='张三';
+-----------+
| COUNT(1) |
+-----------+
| 2 |
+-----------+
1 row selected (12.58 seconds)
0: jdbc:phoenix:master01:2181:/hbase>
3.2创建索引
0: jdbc:phoenix:master01:2181:/hbase> explain select count(*) from S_INDEX where "name"='张三';
+----------------------------------------------------------------------------------+
| PLAN |
+----------------------------------------------------------------------------------+
| CLIENT 1-CHUNK 0 ROWS 0 BYTES PARALLEL 1-WAY RANGE SCAN OVER IDX_S_INDEX ['张三'] |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER AGGREGATE INTO SINGLE ROW |
+----------------------------------------------------------------------------------+
3 rows selected (0.128 seconds)
0: jdbc:phoenix:master01:2181:/hbase>
0: jdbc:phoenix:master01:2181:/hbase> select count(*) from S_INDEX where "name"='张三';
+-----------+
| COUNT(1) |
+-----------+
| 2 |
+-----------+
1 row selected (0.096 seconds)
结论:1百万行数据查询两条数据,未建索引12.58秒,创建索引,查询为0.096秒
4创建多字段索引
CREATE INDEX IDX_S_INDEX_NUMS ON S_INDEX("cf1"."name","cf1"."age") include("cf1"."sex");
0: jdbc:phoenix:master01:2181:/hbase> CREATE INDEX IDX_S_INDEX_NUMS ON S_INDEX("cf1"."name","cf1"."age") include("cf1"."sex");
1,000,006 rows affected (172.484 seconds)
0: jdbc:phoenix:master01:2181:/hbase> select * from IDX_S_INDEX_NUMS limit 1;
+-----------+----------+---------+----------+
| cf1:name | cf1:age | :id | cf1:sex |
+-----------+----------+---------+----------+
| 01abqwe | 38 | 01test | 7 |
+-----------+----------+---------+----------+
1 row selected (0.412 seconds)
0: jdbc:phoenix:master01:2181:/hbase>
5错误处理
0: jdbc:phoenix:master01:2181:/hbase> CREATE INDEX IDX_S_INDEX ON S_INDEX(name);
Error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server. tableName=IDX_S_INDEX (state=42Y88,code=1029)
java.sql.SQLException: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server. tableName=IDX_S_INDEX
由于没有在cdh中修改配置文件,或者没有生效