Apache Hbase 系列文章
1、hbase-2.1.0介绍及分布式集群部署、HA集群部署、验证、硬件配置推荐
2、hbase-2.1.0 shell基本操作详解
3、HBase的java API基本操作(创建、删除表以及对数据的添加、删除、查询以及多条件查询)
4、HBase使用(namespace、数据分区、rowkey设计、原生api访问hbase)
5、Apache Phoenix(5.0.0-5.1.2) 介绍及部署、使用(基本使用、综合使用、二级索引示例)、数据分区示例
6、Base批量装载——Bulk load(示例一:基本使用示例)
7、Base批量装载-Bulk load(示例二:写千万级数据-mysql数据以ORCFile写入hdfs,然后导入hbase)
8、HBase批量装载-Bulk load(示例三:写千万级数据-mysql数据直接写成Hbase需要的数据,然后导入hbase)
本文介绍了Apache Phoenix功能、部署以及基本使用示例以及综合使用示例,其中涉及shell应用和Java api应用。
本文前提是hbase可用。
本文分为5个部分,即Apache Phoenix介绍、部署与验证、基本使用示例、综合使用示例和数据分区的示例。
一、Apache Phoenix介绍
1、Phoenix简介
Phoenix官方网址:http://phoenix.apache.org/
Phoenix官网:「We put the SQL back in NoSQL」
- Apache Phoenix让Hadoop中支持低延迟OLTP和业务操作分析。
- 提供标准的SQL以及完备的ACID事务支持
- 通过利用HBase作为存储,让NoSQL数据库具备通过有模式的方式读取数据,我们可以使用SQL语句来操作HBase,例如:创建表、以及插入数据、修改数据、删除数据等。
- Phoenix通过协处理器在服务器端执行操作,最小化客户机/服务器数据传输
- Apache Phoenix可以很好地与其他的Hadoop组件整合在一起,例如:Spark、Hive、Flume以及MapReduce。
Phoenix其本质是用Java写的基于JDBC API操作HBase的开源SQL引擎。
它有如下几个功能特性:
- 通过JDBC API实现了大部分的java.sql接口,包括元数据API
- DDL支持:通过CREATE TABLE、DROP TABLE及ALTER TABLE来添加/删除
- DML支持:用于逐行插入的UPSERT VALUES,用于相同或不同表之间大量数据传输的UPSERT SELECT,用于删除行的DELETE
- 事务支持:通过客户端的批处理实现的有限的事务支持
- 二级索引支持
- 遵循ANSI SQL标准
Phoenix可能存在各种不稳定,如下面描述的几点问题:
- 最新版本对HBase、Hadoop等有严格版本控制,对于已经用上HBase的业务来说要升级HBase版本适配Phoenix代价太大
- 与HBase强相关,作为HBase中的一个组件启动,HBase元数据容易遭到破坏
- 官方提供的创建索引方法,容易导致插入失败,查询失败,程序崩溃等问题
- 二级索引这个特性应该是大部分用户引入Phoenix主要考虑的因素之一。HBase因其历史原因只支持rowkey索引,当使用rowkey来查询数据时可以很快定位到数据位置。现实中,业务查询需求条件往往比较复杂,带有多个查询字段组合,如果用HBase查的话,只能全表扫描进行过滤,效率很低。而Phoenix支持除rowkey外的其它字段的索引创建,即二级索引,查询效率可大幅提升。
2、索引类别
1)、Covered Indexes
覆盖索引,即索引表中就包含你想要的全部字段数据,这样就只需要通过访问索引表而无需访问主表就能得到数据。
语法:create index my_index on test (v1) include(v2);
当执行select v2 from test where v1='…'时,就只会查找索引表数据,不会去主表扫描。
2)、Global Indexes 全局索引
-
全局索引适用于读多写少的场景。
-
全局索引在写数据时会消耗大量资源,所有对数据的增删改操作都会更新索引表,而索引表是分布在各个结点上的,性能会受到影响。
-
有点:在读多的场景下如果查询的字段用到索引,效率会很快,因为可以很快定位到数据所在具体结点region上,对于写性能就很慢了,因为每写一次,需要更新所有结点上的索引表数据。
-
语法:create index my_index on test (v1);
-
如果执行`select v2 from test where v1=‘…’, 实际是用不上索引的,因为v2不在索引字段中,对于全局索引来说,如果查询的字段不包含在索引表中,则还是会去全表扫描主表。
3)、Local Indexes 本地索引
- 本地索引适用于写多读少场景。
- 和全局索引类似,Phoenix会在查询时自动选择是否使用索引。如果定义为局部索引,索引表数据和主表数据会放在同一regionserver上,避免写操作时跨节点写索引表带来的额外开销(如Global Indexes)。当使用局部索引查询时,即使查询字段不是索引字段,索引表也会正常使用,这和Global Indexes是有区别的。
- 在4.8版本之前,所有局部索引数据存放在一个单独的共享表中,4.8之后是存储在主表的一个独立的列族中。因为是局部索引,所以在client端查询使用索引时,需要扫描每个结点上的索引表以得到数据所在具体region位置,当region多时,查询时耗会很高,所以查询性能比较低,适合读少写多场景。
- 语法:create local index my_index on test (v1);
4)、Mutable Indexing 和Immutable Indexing
- IMMutable Indexing
不可变索引主要创建在不可变表上,适用于数据只写一次不会有Update等操作,在什么场景下会用到不可变索引呢,很经典的时序数据:write once read many times。在这种场景下,所有索引数据(primary和index)要么全部写成功,要么一个失败全都失败返回错误给客户端。
不可变索引用到场景比较少,
语法:create table test (pk VARCHAR primary key,v1 VARCHAR, v2 VARCHAR) IMMUTABLE_ROWS=true;
即在创建表时指定IMMUTABLE_ROWS参数为true,默认这个参数为false。如果想把不可变索引改为可变索引,可用alter修改:
alter table test set IMMUTABLE_ROWS=false; - Mutable Indexing
可变索引意思是在修改数据如Insert、Update或Delete数据时会同时更新索引。这里的索引更新涉及WAL,即主表数据更新时,会把索引数据也同步更新到WAL,只有当WAL同步到磁盘时才会去更新实际的primary/index数据,以保证当中间任何一个环节异常时可通过WAL来恢复主表和索引表数据。
3、使用Phoenix是否会影响HBase性能
- Phoenix不会影响HBase性能,反而会提升HBase性能
- Phoenix将SQL查询编译为本机HBase扫描
- 确定scan的key的最佳startKey和endKey
- 编排scan的并行执行
- 将WHERE子句中的谓词推送到服务器端
- 通过协处理器执行聚合查询
- 用于提高非行键列查询性能的二级索引
- 统计数据收集,以改进并行化,并指导优化之间的选择
- 跳过扫描筛选器以优化IN、LIKE和OR查询
- 行键加盐保证分配均匀,负载均衡
4、官方性能测试
- Phoenix VS Hive(基于HDFS和HBase)
- Phoenix VS Impala
二、部署Phoenix
1、下载
大家可以从官网上下载与HBase版本对应的Phoenix版本。对应到HBase 2.1,应该使用版本「5.0.0-HBase-2.0」。
下载链接:http://phoenix.apache.org/download.html
2、安装
以下操作是在server1上完成的,如果需要多个phoenix,则需要复制多份即可。
以下英文部分是安装版本:phoenix-hbase-2.1-5.1.2-bin.tar.gz,其他是安装apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz。本示例全部是以apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz进行说明的。
启动时可能会出现警告:
java.lang.IllegalArgumentException: Bad history file syntax! The history file /home/alanchan/.sqlline/history
may be an older history: please remove it or use a different history file.
删除.sqlline/history(rm -R .sqlline/),再启动即可。
1)、上传安装包到Linux系统,并解压
cd /usr/local/bigdata
#文件上传至/usr/local/bigdata,解压至/usr/local/bigdata
tar -xvzf apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz -C /usr/local/bigdata
2)、将phoenix的所有jar包添加到所有HBase RegionServer和Master的lib目录
# 拷贝jar包到hbase lib目录
cp /usr/local/bigdata/apache-phoenix-5.0.0-HBase-2.0-bin/phoenix-*.jar /usr/local/bigdata/hbase-2.1.0/lib/
# 进入到hbase lib 目录查看相关jar文件是否拷贝完成
cd /usr/local/bigdata/hbase-2.1.0/lib/
# 分发jar包到每个HBase 节点
scp phoenix-*.jar server2:$PWD
scp phoenix-*.jar server3:$PWD
scp phoenix-*.jar server4:$PWD
3)、修改配置文件
该步骤注意:
如果是hbase和phoenix环境是同时建立的,则可以;否则参见后面启动phoenix时的异常说明。
cd /usr/local/bigdata/hbase-2.1.0/conf/
vim hbase-site.xml
# 1. 将以下配置添加到 hbase-site.xml 后边
<!-- 支持HBase命名空间映射 -->
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value>
</property>
<!-- 支持索引预写日志编码 -->
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
# 2. 将hbase-site.xml分发到每个节点
scp hbase-site.xml server2:$PWD
scp hbase-site.xml server3:$PWD
scp hbase-site.xml server4:$PWD
4)、将配置后的hbase-site.xml拷贝到phoenix的bin目录
#此处仅仅时在server1上进行部署phoenix,如果需要多台机器上进行部署,则需要拷贝至多个机器上,但未验证
cp /usr/local/bigdata/hbase-2.1.0/conf/hbase-site.xml /usr/local/bigdata/apache-phoenix-5.0.0-HBase-2.0-bin/bin/
5)、重新启动HBase
stop-hbase.sh
start-hbase.sh
6)、启动Phoenix客户端,连接Phoenix Server
注意:第一次启动Phoenix连接HBase会稍微慢一点。
cd /usr/local/bigdata/apache-phoenix-5.0.0-HBase-2.0-bin/
bin/sqlline.py server2:2118(zookeeper服务,任一即可)
# 输入!table查看Phoenix中的表
!table
#按照上述配置后出现的异常
[alanchan@server1 apache-phoenix-5.0.0-HBase-2.0-bin]$ bin/sqlline.py server1:2118
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:server1:2118 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:server1:2118
22/09/29 17:05:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Error: ERROR 726 (43M10): Inconsistent namespace mapping properties. Ensure that config phoenix.schema.isNamespaceMappingEnabled is consistent on client and server. (state=43M10,code=726)
java.sql.SQLException: ERROR 726 (43M10): Inconsistent namespace mapping properties. Ensure that config phoenix.schema.isNamespaceMappingEnabled is consistent on client and server.
at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494)
at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:1318)
# 解决办法
# if set once, should not be rollback. Old client will not work after this property is enabled.
# 如果设置一次,就不应该回滚。启用此属性后,旧客户端将无法工作。由于当前的环境是使用了一段时间后增加的phoenix,上述应该是针对新的环境配置(但未验证)。
# 解决方法如下:
# 1、删除在hbase和phoenix的hbase-site.xml下设置的关于命名空间启用的设置。
# 2、重启hbase,进入hbase shell
# 3、按照下面步骤进行操作
# 修改hbase中表SYSTEM:CATALOG名为SYSTEM.CATALOG
[alanchan@server1 conf]$ hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 2.1.0, re1673bb0bbfea21d6e5dba73e013b09b8b49b89b, Tue Jul 10 17:26:48 CST 2018
Took 0.0027 seconds
hbase(main):001:0> disable 'SYSTEM:CATALOG'
Took 1.7845 seconds
hbase(main):002:0> snapshot 'SYSTEM:CATALOG', 'cata_tableSnapshot'
Took 1.0076 seconds
hbase(main):003:0> clone_snapshot 'cata_tableSnapshot', 'SYSTEM.CATALOG'
Took 4.3310 seconds
hbase(main):004:0> drop 'SYSTEM:CATALOG'
Took 1.2898 seconds
# 按照上述处理后,重新启动phoenix后正常
[alanchan@server1 bin]$ ./sqlline.py server2:2118
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:server2:2118 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:server2:2118
22/09/29 17:16:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 5.0)
Driver: PhoenixEmbeddedDriver (version 5.0)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
133/133 (100%) Done
Done
sqlline version 1.2.0
0: jdbc:phoenix:server2:2118>
0: jdbc:phoenix:server2:2118> !table
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | IMMUTABLE_ROWS | SALT_BUCKETS | MULTI_TENANT | VIEW_STATEMENT | VIEW_TYPE | INDEX_TYPE | TRANSACTIONAL | IS_NA |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | SYSTEM | LOG | SYSTEM TABLE | | | | | | true | 32 | false | | | | false | false |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
0: jdbc:phoenix:server2:2118>
7)、查看HBase的Web UI
可以看到Phoenix在default命名空间下创建了一些表,而且该系统表加载了大量的协处理器。
三、入门示例
具体的官网操作示例:https://phoenix.apache.org/language/index.html
1、表操作
在Phoenix中,我们可以使用类似于SQL DDL的方式快速创建表。
1)、创建表
CREATE TABLE IF NOT EXISTS 表名 (
ROWKEY 名称 数据类型 PRIMARY KEY
列蔟名.列名1 数据类型 NOT NULL,
列蔟名.列名2 数据类型 NOT NULL,
列蔟名.列名3 数据类型
);
--- 订单明细建表语句
create table if not exists order_info(
ID varchar primary key,
C1.STATUS varchar,
C1.MONEY float,
C1.PAY_WAY integer,
C1.USER_ID varchar,
C1.OPERATION_TIME varchar,
C1.CATEGORY varchar
);
0: jdbc:phoenix:server2:2118> create table if not exists order_info(
. . . . . . . . . . . . . . > ID varchar primary key,
. . . . . . . . . . . . . . > C1.STATUS varchar,
. . . . . . . . . . . . . . > C1.MONEY float,
. . . . . . . . . . . . . . > C1.PAY_WAY integer,
. . . . . . . . . . . . . . > C1.USER_ID varchar,
. . . . . . . . . . . . . . > C1.OPERATION_TIME varchar,
. . . . . . . . . . . . . . > C1.CATEGORY varchar
. . . . . . . . . . . . . . > );
No rows affected (2.449 seconds)
0: jdbc:phoenix:server2:2118> select * from order_info;
+-----+---------+--------+----------+----------+-----------------+-----------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+-----+---------+--------+----------+----------+-----------------+-----------+
+-----+---------+--------+----------+----------+-----------------+-----------+
No rows selected (0.073 seconds)
通过HBase的Web UI,我们可以看到Phoenix帮助我们自动在HBase中创建了一张名为 order_info 的表格。
注意:
原 HBase 中的表不会自动映射到 Phoenix 中去,并不会在 Phoenix 中显示出来,但在 Phoenix 中创建相同名字的表,原表数据会导入到表中相应的列(列名一模一样),可以看到原先的数据。其中必须有一项设置为 PRIMARY KEY,表示该项作为 ROWKEY,且在 Phoenix 中,所有名字都会被自动转化成大写,如果需要保留小写,则需加上双引号
2)、查看表的信息
# !desc order_info
# 注意:一定要加上 !
0: jdbc:phoenix:server2:2118> !desc order_info
+------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+-------------+----------------+-------------------+--------------------+--------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | REMARKS | COLUMN_DEF | SQL_DATA_TYPE | SQL_DATETIME_SUB | CHAR_OCTET_LENGTH | ORDINAL_POSI |
+------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+-------------+----------------+-------------------+--------------------+--------------+
| | | ORDER_INFO | ID | 12 | VARCHAR | null | null | null | null | 0 | | | null | null | null | 1 |
| | | ORDER_INFO | STATUS | 12 | VARCHAR | null | null | null | null | 1 | | | null | null | null | 2 |
| | | ORDER_INFO | MONEY | 6 | FLOAT | null | null | null | null | 1 | | | null | null | null | 3 |
| | | ORDER_INFO | PAY_WAY | 4 | INTEGER | null | null | null | null | 1 | | | null | null | null | 4 |
| | | ORDER_INFO | USER_ID | 12 | VARCHAR | null | null | null | null | 1 | | | null | null | null | 5 |
| | | ORDER_INFO | OPERATION_TIME | 12 | VARCHAR | null | null | null | null | 1 | | | null | null | null | 6 |
| | | ORDER_INFO | CATEGORY | 12 | VARCHAR | null | null | null | null | 1 | | | null | null | null | 7 |
+------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+-------------+----------------+-------------------+--------------------+--------------+
3)、删除表语法
drop table if exists order_info;
4)、大小写问题
在HBase中,如果在列簇、列名没有添加双引号。Phoenix会自动转换为大写。
如果要将列的名字改为小写,需要使用双引号。
例如:
create table if not exists order_info(
ID varchar primary key,
C1.STATUS varchar,
C1.MONEY float,
C1.PAY_WAY integer,
C1.USER_ID varchar,
C1.OPERATION_TIME varchar,
C1.CATEGORY varchar
);
---注意:一旦加了小写,后面都得任何应用该列的地方都得使用双引号,否则将报以下错误:
---Error: ERROR 504 (42703): Undefined column. columnName=order_info.ID
2、插入数据
在Phoenix中,插入并不是使用insert来实现的。而是 「upsert 」命令。
它的功能为insert + update,与HBase中的put相对应。
如果不存在则插入,否则更新。列表是可选的,如果不存在,值将按模式中声明的顺序映射到列。这些值必须计算为常量。
-- 语法
upsert into 表名(列蔟列名, xxxx, ) VALUES(XXX, XXX, XXX)
--示例
UPSERT INTO order_info VALUES('000001', '已提交', 4070, 1, '4944191', '2020-04-25 12:09:16', '手机;');
插入一条数据
3、查询数据
与标准SQL一样,Phoenix也是使用select语句来实现数据的查询。
注意:创建表的时候,如果是小写的话,phoenix会自动给转成大写(包含字段和表明),使用的时候,字段必须使用转化后的字段名称,表名称可以使用小写。
-- 查询所有数据
SELECT * FROM order_info;
-- 根据ID查询数据
SELECT * FROM order_info WHERE "ID" = '000001';
4、更新数据
在Phoenix中,更新数据也是使用UPSERT。
-- 语法格式如下:
UPSERT INTO 表名(列名, …) VALUES(对应的值, …);
-- 示例:将ID为'000001'的订单状态修改为已付款。ID是rowkey。如果数据存在则更新;否则是插入。
UPSERT INTO order_info ("ID", C1."STATUS") VALUES ('000001', '已付款');
5、根据ID删除数据
DELETE FROM order_info WHERE "ID" = '000001';
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('000001', '已提交', 4070, 1, '4944191', '2020-04-25 12:09:16', '手机;');
1 row affected (0.084 seconds)
0: jdbc:phoenix:server2:2118> SELECT * FROM order_info;
+---------+---------+---------+----------+----------+----------------------+-----------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+---------+---------+---------+----------+----------+----------------------+-----------+
| 000001 | 已提交 | 4070.0 | 1 | 4944191 | 2020-04-25 12:09:16 | 手机; |
+---------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.045 seconds)
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info ("ID", C1."STATUS") VALUES ('000001', '已付款');
1 row affected (0.009 seconds)
0: jdbc:phoenix:server2:2118> select * from order_info;
+---------+---------+---------+----------+----------+----------------------+-----------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+---------+---------+---------+----------+----------+----------------------+-----------+
| 000001 | 已付款 | 4070.0 | 1 | 4944191 | 2020-04-25 12:09:16 | 手机; |
+---------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.032 seconds)
0: jdbc:phoenix:server2:2118> SELECT * FROM order_info WHERE "ID" = '000001';
+---------+---------+---------+----------+----------+----------------------+-----------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+---------+---------+---------+----------+----------+----------------------+-----------+
| 000001 | 已付款 | 4070.0 | 1 | 4944191 | 2020-04-25 12:09:16 | 手机; |
+---------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.035 seconds)
0: jdbc:phoenix:server2:2118> DELETE FROM order_info WHERE "ID" = '000001';
1 row affected (0.013 seconds)
0: jdbc:phoenix:server2:2118> select * from order_info;
+-----+---------+--------+----------+----------+-----------------+-----------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+-----+---------+--------+----------+----------+-----------------+-----------+
+-----+---------+--------+----------+----------+-----------------+-----------+
No rows selected (0.029 seconds)
0: jdbc:phoenix:server2:2118> select ID,STATUS from order_info;
+-----+---------+
| ID | STATUS |
+-----+---------+
+-----+---------+
No rows selected (0.018 seconds)
6、导入测试数据
为了方便我们做更多的查询,将以下SQL语句复制到Phoenix客户端中执行。
UPSERT INTO order_info VALUES('000002','已提交',4070,1,'4944191','2020-04-25 12:09:16','手机;');
UPSERT INTO order_info VALUES('000003','已完成',4350,1,'1625615','2020-04-25 12:09:37','家用电器;;电脑;');
UPSERT INTO order_info VALUES('000004','已提交',6370,3,'3919700','2020-04-25 12:09:39','男装;男鞋;');
UPSERT INTO order_info VALUES('000005','已付款',6370,3,'3919700','2020-04-25 12:09:44','男装;男鞋;');
UPSERT INTO order_info VALUES('000006','已提交',9380,1,'2993700','2020-04-25 12:09:41','维修;手机;');
UPSERT INTO order_info VALUES('000007','已付款',9380,1,'2993700','2020-04-25 12:09:46','维修;手机;');
UPSERT INTO order_info VALUES('000008','已完成',6400,2,'5037058','2020-04-25 12:10:13','数码;女装;');
UPSERT INTO order_info VALUES('000009','已付款',280,1,'3018827','2020-04-25 12:09:53','男鞋;汽车;');
UPSERT INTO order_info VALUES('000010','已完成',5600,1,'6489579','2020-04-25 12:08:55','食品;家用电器;');
UPSERT INTO order_info VALUES('000011','已付款',5600,1,'6489579','2020-04-25 12:09:00','食品;家用电器;');
UPSERT INTO order_info VALUES('000012','已提交',8340,2,'2948003','2020-04-25 12:09:26','男装;男鞋;');
UPSERT INTO order_info VALUES('000013','已付款',8340,2,'2948003','2020-04-25 12:09:30','男装;男鞋;');
UPSERT INTO order_info VALUES('000014','已提交',7060,2,'2092774','2020-04-25 12:09:38','酒店;旅游;');
UPSERT INTO order_info VALUES('000015','已提交',640,3,'7152356','2020-04-25 12:09:49','维修;手机;');
UPSERT INTO order_info VALUES('000016','已付款',9410,3,'7152356','2020-04-25 12:10:01','维修;手机;');
UPSERT INTO order_info VALUES('000017','已提交',9390,3,'8237476','2020-04-25 12:10:08','男鞋;汽车;');
UPSERT INTO order_info VALUES('000018','已提交',7490,2,'7813118','2020-04-25 12:09:05','机票;文娱;');
UPSERT INTO order_info VALUES('000019','已付款',7490,2,'7813118','2020-04-25 12:09:06','机票;文娱;');
UPSERT INTO order_info VALUES('000020','已付款',5360,2,'5301038','2020-04-25 12:08:50','维修;手机;');
UPSERT INTO order_info VALUES('000021','已提交',5360,2,'5301038','2020-04-25 12:08:53','维修;手机;');
UPSERT INTO order_info VALUES('000022','已取消',5360,2,'5301038','2020-04-25 12:08:58','维修;手机;');
UPSERT INTO order_info VALUES('000023','已付款',6490,0,'3141181','2020-04-25 12:09:22','食品;家用电器;');
UPSERT INTO order_info VALUES('000024','已付款',3820,1,'9054826','2020-04-25 12:10:04','家用电器;;电脑;');
UPSERT INTO order_info VALUES('000025','已提交',4650,2,'5837271','2020-04-25 12:08:52','机票;文娱;');
UPSERT INTO order_info VALUES('000026','已付款',4650,2,'5837271','2020-04-25 12:08:57','机票;文娱;');
7、分页查询
使用limit和offset可以快速进行分页。
limit表示每页多少条记录,offset表示从第几条记录开始查起。
-- 第一页
select * from order_info limit 10 offset 0;
-- 第二页
-- offset从10开始
select * from order_info limit 10 offset 10;
-- 第三页
select * from order_info limit 10 offset 20;
8、分区表
默认创建表的方式,则HBase顺序写入可能会受到RegionServer热点的影响。对行键进行加盐可以解决热点问题。在HBase中,可以使用两种方式:
- ROWKEY预分区
- 加盐指定数量分区
1)、ROWKEY预分区
按照用户ID来分区,一共4个分区。并指定数据的压缩格式为GZ
drop table if exists order_info;
create table if not exists order_info(
ID varchar primary key,
C1.STATUS varchar,
C1.MONEY float,
C1.PAY_WAY integer,
C1.USER_ID varchar,
C1.OPERATION_TIME varchar,
C1.CATEGORY varchar
)
CONPRESSION='GZ'
SPLIT ON ('3','5','7');
#SPLIT ON ('3','5','7'),3、5、7表示是rowkey以3、5、7作为判断数据落入的区间,具体参见本文末尾关于该部分的示例。
0: jdbc:phoenix:server2:2118> drop table if exists order_info;
No rows affected (2.303 seconds)
0: jdbc:phoenix:server3:2118> create table if not exists order_info(
. . . . . . . . . . . . . . > ID varchar primary key,
. . . . . . . . . . . . . . > C1.STATUS varchar,
. . . . . . . . . . . . . . > C1.MONEY float,
. . . . . . . . . . . . . . > C1.PAY_WAY integer,
. . . . . . . . . . . . . . > C1.USER_ID varchar,
. . . . . . . . . . . . . . > C1.OPERATION_TIME varchar,
. . . . . . . . . . . . . . > C1.CATEGORY varchar
. . . . . . . . . . . . . . > )
. . . . . . . . . . . . . . > CONPRESSION='GZ'
. . . . . . . . . . . . . . > SPLIT ON ('3','5','7');
No rows affected (6.944 seconds)
0: jdbc:phoenix:server3:2118> !table
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | IMMUTABLE_ROWS | SALT_BUCKETS | MULTI_TENANT | VIEW_STATEMENT | VIEW_TYPE | INDEX_TYPE | TRANSACTIONAL | IS_NA |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | SYSTEM | LOG | SYSTEM TABLE | | | | | | true | 32 | false | | | | false | false |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | | ORDER_INFO | TABLE | | | | | | false | null | false | | | | false | false |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-------+
0: jdbc:phoenix:server3:2118>
通过图形界面可以看到是分区了
插入一些数据,然后去HBase中查看数据的分布情况。
0: jdbc:phoenix:server3:2118> UPSERT INTO order_info VALUES('f642b16b-eade-4169-9eeb-4d5f294ec594','已提交',4010,1,'6463215','2020-04-25 12:09:29','男鞋;汽车;');
1 row affected (0.003 seconds)
0: jdbc:phoenix:server3:2118> UPSERT INTO order_info VALUES('f642b16b-eade-4169-9eeb-4d5f294ec594','已付款',4010,1,'6463215','2020-04-25 12:09:33','男鞋;汽车;');
1 row affected (0.004 seconds)
0: jdbc:phoenix:server3:2118> UPSERT INTO order_info VALUES('f8f3ca6f-2f5c-44fd-9755-1792de183845','已付款',5950,3,'4060214','2020-04-25 12:09:12','机票;文娱;');
1 row affected (0.003 seconds)
0: jdbc:phoenix:server3:2118> select * from order_info;
+---------------------------------------+---------+---------+----------+----------+----------------------+------------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+---------------------------------------+---------+---------+----------+----------+----------------------+------------+
| 02602f66-adc7-40d4-8485-76b5632b5b53 | 已提交 | 4070.0 | 1 | 4944191 | 2020-04-25 12:09:16 | 手机; |
| 31477850-8b15-4f1b-9ec3-939f7dc47241 | 已付款 | 4650.0 | 2 | 5837271 | 2020-04-25 12:08:57 | 机票;文娱; |
| 526e33d2-a095-4e19-b759-0017b13666ca | 已完成 | 3280.0 | 0 | 5553283 | 2020-04-25 12:09:01 | 食品;家用电器; |
| 70fa0ae0-6c02-4cfa-91a9-6ad929fe6b1b | 已付款 | 4100.0 | 1 | 8598963 | 2020-04-25 12:09:08 | 维修;手机; |
| 0968a418-f2bc-49b4-b9a9-2157cf214cfd | 已完成 | 4350.0 | 1 | 1625615 | 2020-04-25 12:09:37 | 家用电器;;电脑; |
| 39319322-2d80-41e7-a862-8b8858e63316 | 已完成 | 5000.0 | 1 | 5686435 | 2020-04-25 12:08:56 | 家用电器;;电脑; |
| 5a6932f4-b4a4-4a1a-b082-2475d13f9240 | 已提交 | 50.0 | 2 | 1764961 | 2020-04-25 12:10:07 | 家用电器;;电脑; |
| 7170ce71-1fc0-4b6e-a339-67f525536dcd | 已提交 | 9740.0 | 1 | 4816392 | 2020-04-25 12:10:03 | 数码;女装; |
| 0e01edba-5e55-425e-837a-7efb91c56630 | 已付款 | 6370.0 | 3 | 3919700 | 2020-04-25 12:09:44 | 男装;男鞋; |
| 3d2254bd-c25a-404f-8e42-2faa4929a629 | 已完成 | 5000.0 | 1 | 1274270 | 2020-04-25 12:08:43 | 男装;男鞋; |
| 5fc0093c-59a3-417b-a9ff-104b9789b530 | 已提交 | 6310.0 | 2 | 1292805 | 2020-04-25 12:09:36 | 男装;男鞋; |
| 71961b06-290b-457d-bbe0-86acb013b0e3 | 已完成 | 6550.0 | 3 | 2393699 | 2020-04-25 12:08:49 | 男鞋;汽车; |
| 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751 | 已付款 | 9380.0 | 1 | 2993700 | 2020-04-25 12:09:46 | 维修;手机; |
| 42f7fe21-55a3-416f-9535-baa222cc0098 | 已完成 | 3600.0 | 2 | 2661641 | 2020-04-25 12:09:58 | 维修;手机; |
| 605c6dd8-123b-4088-a047-e9f377fcd866 | 已完成 | 8980.0 | 2 | 6202324 | 2020-04-25 12:09:54 | 机票;文娱; |
发现数据分布在每一个Region中
2)、加盐指定数量分区
注意:CONPRESSION和SALT_BUCKETS之间需要使用逗号分隔,否则会出现语法错误
drop table if exists order_info;
create table if not exists order_info(
ID varchar primary key,
C1.STATUS varchar,
C1.MONEY float,
C1.PAY_WAY integer,
C1.USER_ID varchar,
C1.OPERATION_TIME varchar,
C1.CATEGORY varchar
)
CONPRESSION='GZ', SALT_BUCKETS=10;
0: jdbc:phoenix:server3:2118> drop table if exists order_info;
No rows affected (2.649 seconds)
0: jdbc:phoenix:server3:2118> create table if not exists order_info(
. . . . . . . . . . . . . . > ID varchar primary key,
. . . . . . . . . . . . . . > C1.STATUS varchar,
. . . . . . . . . . . . . . > C1.MONEY float,
. . . . . . . . . . . . . . > C1.PAY_WAY integer,
. . . . . . . . . . . . . . > C1.USER_ID varchar,
. . . . . . . . . . . . . . > C1.OPERATION_TIME varchar,
. . . . . . . . . . . . . . > C1.CATEGORY varchar
. . . . . . . . . . . . . . > )
. . . . . . . . . . . . . . > CONPRESSION='GZ', SALT_BUCKETS=10;
No rows affected (2.394 seconds)
在HBase的Web UI中可以查看到生成了10个Region
插入数据后,发现数据分部在每一个Region中
查看HBase中的表,发现Phoenix在每个ID前,都添加了一个Hash值,用来将分布分布到不同的Region中
#phoenix 客户端查询
0: jdbc:phoenix:server3:2118> select * from order_info where ID = 'f8f3ca6f-2f5c-44fd-9755-1792de183845';
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
| f8f3ca6f-2f5c-44fd-9755-1792de183845 | 已付款 | 5950.0 | 3 | 4060214 | 2020-04-25 12:09:12 | 机票;文娱; |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.029 seconds)
#HBase 客户端查询
hbase(main):025:0> scan "ORDER_INFO", {LIMIT => 1}
ROW COLUMN+CELL
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x00\x00\x00\x00, timestamp=1664447540561, value=x
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0B, timestamp=1664447540561, value=\xE5\xB7\xB2\xE4\xBB\x98\xE6\xAC\xBE
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0C, timestamp=1664447540561, value=\xC6\x12\x90\x01
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0D, timestamp=1664447540561, value=\x80\x00\x00\x01
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0E, timestamp=1664447540561, value=2993700
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0F, timestamp=1664447540561, value=2020-04-25 12:09:46
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x10, timestamp=1664447540561, value=\xE7\xBB\xB4\xE4\xBF\xAE;\xE6\x89\x8B\xE6\x9C\xBA;
1 row(s)
Took 0.0931 seconds
四、综合示例
1、建立视图
1)、应用场景
因为我们之前已经创建了 order_info 表,而且数据添加的方式都是以PUT方式原生API来添加的。故此时,我们不再需要再使用Phoenix创建新的表,而是使用Phoenix中的视图,通过视图来建立与HBase表之间的映射,从而实现数据快速查询。
2)、视图介绍
我们可以在现有的HBase或Phoenix表上创建一个视图。表、列簇和列名必须与现有元数据完全匹配,否则会出现异常。当创建视图后,就可以使用SQL查询视图,和操作Table一样。
语法示例
# Delete phoenix view
DROP VIEW IF EXISTS VIEW_NAME CASCADE;
# Create phoenix view
CREATE VIEW VIEW_NAME
(
ROWKEY VARCHAR PRIMARY KEY,
"F1".TEST_ID UNSIGNED_LONG -- F1 is column family
) AS SELECT * FROM VIEW_NAME;
- 第一种方式创建视图
假如HBase中存在表”student”,需要以视图的方式操作它,那么开始创建视图
会自动对应表名,自动关联字段。
--需要删除phoenix中存在的“student”表:
CREATE view "student"(
"ROW" varchar primary key,
"info"."age" varchar,
"info"."name" varchar,
"info"."sex" varchar
);
CREATE view "user_info"(
"ROW" varchar primary key,
"ext_info"."age" varchar,
"ext_info"."name" varchar,
"ext_info"."sex" varchar
);
CREATE view "user_info"(
"ROW" varchar primary key,
"ext_info"."age" varchar,
"ext_info"."name" varchar,
"ext_info"."hobby" varchar
);
---这里双引号内的 “student” 和HBase中的表名是一样的,所以会自动关联。
- 第二种方式创建视图
可以在Phoenix table的基础上创建
CREATE VIEW my_VIEW (
new_col VARCHAR,
new_col2 VARCHAR
)
AS
SELECT
*
FROM phoenix_Table
WHERE ……
- 第三种方式创建视图
是在视图之上建立视图
CREATE VIEW my_VIEW (
new_col VARCHAR,
new_col2 VARCHAR
)
AS
SELECT
*
FROM phoenix_view
WHERE ……
-- 在phoenix中创建的表,可以通过javaapi的原生方法写入,但在phoenix中select * 是查不出来数据的,但是用select 字段名 则可以查出数据
-- 此处的解决办法就是创建视图解决
-- 映射HBase中的表,wb该表通过hbase shell已经创建,现在在phoenix中创建视图
CREATE VIEW wb
(
ID varchar primary key,
C1.NAME varchar,
C1.ADDRESS varchar,
C1.SEX varchar,
C1.PAY_DATE varchar,
C1.NUM_CURRENT varchar,
C1.NUM_USAGE varchar,
C1.TOTAL_MONEY varchar,
C1.RECORD_DATE varchar,
C1.LATEST_DATE varchar
);
-- 映射Phoenix中的表
CREATE VIEW my_view ( new_col SMALLINT )
AS SELECT * FROM my_table WHERE k = 100;
-- 映射到一个SQL查询
CREATE VIEW my_view_on_view
AS SELECT * FROM my_view WHERE new_col > 70;
3)、建立ORDER_INFO的视图
-- 参考创建语句:
create view if not exists ORDER_INFO_V AS SELECT * FROM ORDER_INFO ;
0: jdbc:phoenix:server3:2118> create view if not exists ORDER_INFO_V AS SELECT * FROM ORDER_INFO ;
No rows affected (0.02 seconds)
0: jdbc:phoenix:server3:2118> select * from ORDER_INFO_V;
+---------------------------------------+---------+---------+----------+----------+----------------------+------------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+---------------------------------------+---------+---------+----------+----------+----------------------+------------+
| 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751 | 已付款 | 9380.0 | 1 | 2993700 | 2020-04-25 12:09:46 | 维修;手机; |
| 2a01dfe5-f5dc-4140-b31b-a6ee27a6e51e | 已付款 | 7490.0 | 2 | 7813118 | 2020-04-25 12:09:06 | 机票;文娱; |
| 31477850-8b15-4f1b-9ec3-939f7dc47241 | 已付款 | 4650.0 | 2 | 5837271 | 2020-04-25 12:08:57 | 机票;文娱; |
...
0: jdbc:phoenix:server3:2118> select * from ORDER_INFO_V LIMIT 1;
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
| 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751 | 已付款 | 9380.0 | 1 | 2993700 | 2020-04-25 12:09:46 | 维修;手机; |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.029 seconds)
2、开发基于SQL查询数据接口
默认情况下, 直接在 HBase 中创建的表通过 Phoenix 是查不到的
如果要在 Phoenix 中操作直接在 HBase 中创建的表,则需要在 Phoenix 中进行表的映射
映射方式有两种:
- 视图映射。相比于直接创建映射表,视图的查询效率会低。 原因是创建映射表的时候,Phoenix 会在表中创建一些空的键值对,这些空键值对的存在可以用来提高查询效率。
- 表映射。使用create table创建的关联表,如果对表进行了修改,源数据也会改变,同时如果关联表被删除,源表也会被删除。但是视图就不会,如果删除视图,源数据不会发生改变。
1)、建立MOMO_CHAT:MESSAGE視圖
由於我們在5、HBase使用(namespace、数据分区、rowkey设计、原生api访问hbase)上一節中已經建立好了表,本處僅僅是使用phoenix與hbase表進行映射。
-- 创建MOMO_CHAT:MESSAGE視圖
create view if not exists "MOMO_CHAT"."MESSAGE" (
"pk" varchar primary key, -- 指定ROWKEY映射到主键
"C1"."msg_time" varchar,
"C1"."sender_nickyname" varchar,
"C1"."sender_account" varchar,
"C1"."sender_sex" varchar,
"C1"."sender_ip" varchar,
"C1"."sender_os" varchar,
"C1"."sender_phone_type" varchar,
"C1"."sender_network" varchar,
"C1"."sender_gps" varchar,
"C1"."receiver_nickyname" varchar,
"C1"."receiver_ip" varchar,
"C1"."receiver_account" varchar,
"C1"."receiver_os" varchar,
"C1"."receiver_phone_type" varchar,
"C1"."receiver_network" varchar,
"C1"."receiver_gps" varchar,
"C1"."receiver_sex" varchar,
"C1"."msg_type" varchar,
"C1"."distance" varchar,
"C1"."message" varchar
);
# 執行時出現phoenix .ReadOnlyTableException: ERROR 505 (42000): Table is read only異常
# 1、在hbase-site.xml文件中增加如下配置,並重新啓動hbase
<!-- phoenix .ReadOnlyTableException: ERROR 505 (42000): Table is read only -->
<!-- Ensure that config phoenix.schema.isNamespaceMappingEnabled is consistent on client and server -->
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value>
</property>
<property>
<name>phoenix.schema.mapSystemTablesToNamespace</name>
<value>true</value>
</property>
# 2、同時將hbase-site.xml文件拷貝至phoenix的bin目錄下
cp /usr/local/bigdata/hbase-2.1.0/conf/hbase-site.xml /usr/local/bigdata/apache-phoenix-5.0.0-HBase-2.0-bin/bin/
# 3、再次執行創建視圖語句,成功。
0: jdbc:phoenix:server1:2118> create view if not exists "MOMO_CHAT"."MESSAGE" (
. . . . . . . . . . . . . . > "pk" varchar primary key, -- 指定ROWKEY映射到主键
. . . . . . . . . . . . . . > "C1"."msg_time" varchar,
. . . . . . . . . . . . . . > "C1"."sender_nickyname" varchar,
. . . . . . . . . . . . . . > "C1"."sender_account" varchar,
. . . . . . . . . . . . . . > "C1"."sender_sex" varchar,
. . . . . . . . . . . . . . > "C1"."sender_ip" varchar,
. . . . . . . . . . . . . . > "C1"."sender_os" varchar,
. . . . . . . . . . . . . . > "C1"."sender_phone_type" varchar,
. . . . . . . . . . . . . . > "C1"."sender_network" varchar,
. . . . . . . . . . . . . . > "C1"."sender_gps" varchar,
. . . . . . . . . . . . . . > "C1"."receiver_nickyname" varchar,
. . . . . . . . . . . . . . > "C1"."receiver_ip" varchar,
. . . . . . . . . . . . . . > "C1"."receiver_account" varchar,
. . . . . . . . . . . . . . > "C1"."receiver_os" varchar,
. . . . . . . . . . . . . . > "C1"."receiver_phone_type" varchar,
. . . . . . . . . . . . . . > "C1"."receiver_network" varchar,
. . . . . . . . . . . . . . > "C1"."receiver_gps" varchar,
. . . . . . . . . . . . . . > "C1"."receiver_sex" varchar,
. . . . . . . . . . . . . . > "C1"."msg_type" varchar,
. . . . . . . . . . . . . . > "C1"."distance" varchar,
. . . . . . . . . . . . . . > "C1"."message" varchar
. . . . . . . . . . . . . . > );
No rows affected (7.836 seconds)
0: jdbc:phoenix:server1:2118> !table
+------------+--------------+---------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | IMMUTABLE_ROWS | SALT_BUCKETS | MULTI_TENANT | VIEW_STATEMENT | VIEW_TYPE | INDEX_TYPE | TRANSACTIONAL | IS_ |
+------------+--------------+---------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | false | null | false | | | | false | tru |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | false | null | false | | | | false | tru |
| | SYSTEM | LOG | SYSTEM TABLE | | | | | | true | 32 | false | | | | false | tru |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | false | null | false | | | | false | tru |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | | false | null | false | | | | false | tru |
| | | ORDER_INFO | TABLE | | | | | | false | 10 | false | | | | false | fal |
| | | ORDER_INFO_V | VIEW | | | | | | false | 10 | false | | UPDATABLE | | false | fal |
| | MOMO_CHAT | MESSAGE | VIEW | | | | | | false | null | false | | MAPPED | | false | tru |
+------------+--------------+---------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----+
0: jdbc:phoenix:server1:2118>
2)、使用SQL语句查询数据
根据日期、发送人账号、接收人账号查询历史消息
-- 查询对应日期的数据
List<Message> message = hbaseNativeChatMessageService.getMessage("2022-09-30", "13978325309", "15338014470");
log.info("message count={}",message.size());
INFO - message count=87
SELECT * FROM "MOMO_CHAT"."MESSAGE" T
WHERE substr("msg_time", 0, 10) = '2022-09-30'
AND T."sender_account" = '13978325309'
AND T."receiver_account" = '15338014470' LIMIT 100;
0: jdbc:phoenix:server1:2118> SELECT * FROM "MOMO_CHAT"."MESSAGE" T
. . . . . . . . . . . . . . > WHERE substr("msg_time", 0, 10) = '2022-09-30'
. . . . . . . . . . . . . . > AND T."sender_account" = '13978325309'
. . . . . . . . . . . . . . > AND T."receiver_account" = '15338014470' LIMIT 100;
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+-------------------+
| pk | msg_time | sender_nickyname | sender_account | sender_sex | sender_ip | sender_os | sender_phone_type | sender_network | sender_gps | receiver_nickyname | receiver_ip |
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+-------------------+
| 00005903_13978325309_15338014470_1664505396000 | 2022-09-30 10:36:36 | 许兴文 | 13978325309 | 男 | 215.205.47.66 | Android 6.0 | 一加 OnePlus | 4G | 114.058533,36.247553 | 叶忆南 | 178.220.200.51 |
| 2b977e41_13978325309_15338014470_1664504375000 | 2022-09-30 10:19:35 | 伟飞鸿 | 13978325309 | 男 | 151.37.182.209 | IOS 12.0 | Apple iPhone XR | 4G | 120.607971,31.650084 | 瑞高飞 | 104.73.66.176 |
| 591b649a_13978325309_15338014470_1664505936000 | 2022-09-30 10:45:36 | 池涉 | 13978325309 | 男 | 35.252.199.53 | Android 8.0 | 华为 荣耀9X | 5G | 108.392166,23.665665 | 鞠良骏 | 196.75.11
....
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+-------------------+
87 rows selected (6.301 seconds)
3、编写Java代码
本部分也是在前面一篇文章的基础上实现的,即实现原来的接口。
1)、实现步骤
- 编写PhoenixChatMessageService实现ChatMessageService接口
- 在构造器中创建JDBC连接
a)、JDBC驱动为:PhoenixDriver.class.getName()
b)、JDBC连接URL为:jdbc:phoenix:server1:2118 - 基于JDBC实现getMessage查询
- 在close方法中
2)、代码
package org.hbase.momo.service.impl;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List;
import org.apache.phoenix.jdbc.PhoenixDriver;
import org.hbase.momo.bean.Message;
import org.hbase.momo.service.ChatMessageService;
public class PhoenixChatMessageService implements ChatMessageService {
private Connection connection;
public PhoenixChatMessageService() throws Exception {
// 1. 加载驱动
Class.forName(PhoenixDriver.class.getName());
// 2. 获取JDBC连接
connection = DriverManager.getConnection("jdbc:phoenix:server1:2118");
}
@Override
public List<Message> getMessage(String date, String sender, String receiver) throws Exception {
// 1. SQL语句
String sql = "select * from \"MOMO_CHAT\".\"MESSAGE\" where substr(\"msg_time\", 0, 10) = ? and \"sender_account\" = ? and \"receiver_account\" = ?";
// 2. 构建一个prepareStatement
PreparedStatement preparedStatement = connection.prepareStatement(sql);
// 3. 设置Prestatement对应的参数
preparedStatement.setString(1, date);
preparedStatement.setString(2, sender);
preparedStatement.setString(3, receiver);
// 4. 执行SQL语句,获取到一个ResultSet
ResultSet resultSet = preparedStatement.executeQuery();
ArrayList<Message> msgList = new ArrayList<>();
// 5. 迭代ResultSet将数据封装在Msg里面
while (resultSet.next()) {
Message msg = new Message();
msg.setMsg_time(resultSet.getString("msg_time"));
msg.setSender_nickyname(resultSet.getString("sender_nickyname"));
msg.setSender_account(resultSet.getString("sender_account"));
msg.setSender_sex(resultSet.getString("sender_sex"));
msg.setSender_ip(resultSet.getString("sender_ip"));
msg.setSender_os(resultSet.getString("sender_os"));
msg.setSender_phone_type(resultSet.getString("sender_phone_type"));
msg.setSender_network(resultSet.getString("sender_network"));
msg.setSender_gps(resultSet.getString("sender_gps"));
msg.setReceiver_nickyname(resultSet.getString("receiver_nickyname"));
msg.setReceiver_ip(resultSet.getString("receiver_ip"));
msg.setReceiver_account(resultSet.getString("receiver_account"));
msg.setReceiver_os(resultSet.getString("receiver_os"));
msg.setReceiver_phone_type(resultSet.getString("receiver_phone_type"));
msg.setReceiver_network(resultSet.getString("receiver_network"));
msg.setReceiver_gps(resultSet.getString("receiver_gps"));
msg.setReceiver_sex(resultSet.getString("receiver_sex"));
msg.setMsg_type(resultSet.getString("msg_type"));
msg.setDistance(resultSet.getString("distance"));
msg.setMessage(resultSet.getString("message"));
msgList.add(msg);
}
// 关闭资源
resultSet.close();
preparedStatement.close();
return msgList;
}
@Override
public void close() throws Exception {
connection.close();
}
}
3)、测试类
package org.hbase.momo;
import java.util.List;
import org.hbase.momo.bean.Message;
import org.hbase.momo.service.ChatMessageService;
import org.hbase.momo.service.impl.PhoenixChatMessageService;
import org.springframework.util.StopWatch;
import lombok.extern.slf4j.Slf4j;
/**
* @author chenw
*
*/
@Slf4j
public class TestApp {
public static void main(String[] args) throws Exception {
StopWatch clock = new StopWatch();
clock.start(TestApp.class.getSimpleName());
// ChatMessageService chatMessageService = new HBaseNativeChatMessageService();
ChatMessageService chatMessageService = new PhoenixChatMessageService();
// String msg_time, String sender, String receiver
// MD5Hash_发件人账号_收件人账号_消息时间戳
//13978325309_15338014470
List<Message> message = chatMessageService.getMessage("2022-09-30", "13978325309", "15338014470");
for (Message msg : message) {
log.info("message:{}", msg);
}
log.info("message count={}",message.size());
// log.info("message:{}", message);
chatMessageService.close();
clock.stop();
log.info(clock.prettyPrint());
}
}
#phoenix客戶端查詢耗時
INFO - message:Message(msg_time=2022-09-30 10:31:56, sender_nickyname=凤茂彦, sender_account=13978325309, sender_sex=女, sender_ip=251.228.196.190, sender_os=IOS 9.0, sender_phone_type=华为 荣耀Play4T, sender_network=4G, sender_gps=99.487874,33.21035, receiver_nickyname=井采, receiver_ip=183.88.173.174, receiver_account=15338014470, receiver_os=IOS 9.0, receiver_phone_type=一加 OnePlus, receiver_network=4G, receiver_gps=96.397128,28.774328 , receiver_sex=男, msg_type=TEXT, distance=60.27KM, message=爱你,却要无欲无求,好难!爱你,却要偷偷摸摸,好累!爱你,却让自己心碎,好惨!但竟然心甘情愿,好傻!!!)
INFO - message count=87
INFO - StopWatch '': running time (millis) = 4149
-----------------------------------------
ms % Task name
-----------------------------------------
04149 100% TestApp
#hbase原生客戶端查詢耗時
INFO - StopWatch '': running time (millis) = 9737
-----------------------------------------
ms % Task name
-----------------------------------------
09737 100% TestApp
通過運行結果看,時間上大概是hbase原生查詢一半不到,hbase原生查詢時間是9737毫秒
如果程序運行出現如下異常,則需要將上述的hbase-site.xml配置拷貝至resources文件夾下,或直接在conf中配置屬性
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value>
</property>
<property>
<name>phoenix.schema.mapSystemTablesToNamespace</name>
<value>true</value>
</property>
Exception in thread “main” java.sql.SQLException: ERROR 726 (43M10): Inconsistent namespace mapping properties. Cannot initiate connection as SYSTEM:CATALOG is found but client does not have phoenix.schema.isNamespaceMappingEnabled enabled
4、二级索引
索引具体介绍参见上文。
1)、索引创建语法
- 全局索引
CREATE INDEX 索引名称 ON 表名 (列名1, 列名2, 列名3...)
- 本地索引
CREATE local INDEX 索引名称 ON 表名 (列名1, 列名2, 列名3...)
- 覆盖索引
CREATE INDEX my_index ON my_table (v1,v2) INCLUDE(v3)
- 函数索引
-- 创建索引
CREATE INDEX UPPER_NAME_IDX ON EMP (UPPER(FIRST_NAME||' '||LAST_NAME))
-- 以下查询会走索引
SELECT EMP_ID FROM EMP WHERE UPPER(FIRST_NAME||' '||LAST_NAME)='JOHN DOE'
2)、索引示例一:创建全局索引 + 覆盖索引
根据用户ID(USER_ID)来查询订单的ID(RowKey)以及对应的支付金额(MOENEY)。
例如:查询已付款的订单ID和支付金额
此时,就可以在USER_ID列上创建索引,来加快查询
- 创建索引
-- 创建索引
create index GBL_IDX_ORDER_INFO on order_info (C1."USER_ID") INCLUDE("ID", C1."MONEY");
#創建時可能會出現如下異常
Error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server. tableName=GBL_IDX_ORDER_INFO (state=42Y88,code=1029)
#解決辦法,在hbase-site.xml文件中增加如下配置,並將該文件複製到其他的hbase集群中,并且也拷貝到phoenix的bin目錄下,然後重啓hbase
<!-- 創建二級索引需要的配置 -->
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
重新创建索引,成功
0: jdbc:phoenix:server2:2118> create index GBL_IDX_ORDER_INFO on order_info (C1."USER_ID") INCLUDE("ID", C1."MONEY");
66 rows affected (7.594 seconds)
scan "GBL_IDX_ORDER_INFO", { LIMIT => 1}
可以在HBase shell中看到,Phoenix自动帮助我们创建了一张GBL_IDX_ORDER_INFO的表。这种表就是一张索引表。
它的数据如下:
hbase(main):004:0> scan "GBL_IDX_ORDER_INFO", { LIMIT => 1}
ROW COLUMN+CELL
\x001274270\x003d2254bd-c25a-404f-8e42-2faa4929a629 column=C1:\x00\x00\x00\x00, timestamp=1664524735818, value=x
\x001274270\x003d2254bd-c25a-404f-8e42-2faa4929a629 column=C1:\x80\x0B, timestamp=1664524735818, value=\xC5\x9C@\x01
1 row(s)
Took 0.0135 seconds
# 这张表的ROWKEY为: \x00 +用户ID + \x00 + 原始表ROWKEY,列簇对应的就是include中指定的两个字段。
0: jdbc:phoenix:server2:2118> select * from order_info where USER_ID='1274270';
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
| ID | STATUS | MONEY | PAY_WAY | USER_ID | OPERATION_TIME | CATEGORY |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
| 3d2254bd-c25a-404f-8e42-2faa4929a629 | 已完成 | 5000.0 | 1 | 1274270 | 2020-04-25 12:08:43 | 男装;男鞋; |
+---------------------------------------+---------+---------+----------+----------+----------------------+-----------+
1 row selected (0.087 seconds)
# 查询数据
select user_id, id, money from order_info where user_id = '1274270';
0: jdbc:phoenix:server2:2118> select user_id, id, money from order_info where user_id = '1274270';
+----------+---------------------------------------+---------+
| USER_ID | ID | MONEY |
+----------+---------------------------------------+---------+
| 1274270 | 3d2254bd-c25a-404f-8e42-2faa4929a629 | 5000.0 |
+----------+---------------------------------------+---------+
1 row selected (0.041 seconds)
# 查看执行计划
explain select user_id, id, money from order_info where user_id = '1274270';
0: jdbc:phoenix:server2:2118> explain select user_id, id, money from order_info where user_id = '1274270';
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN RANGE SCAN OVER GBL_IDX_ORDER_INFO [0,'1274270'] - [9,'1274270'] | null | null | null |
+---------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.027 seconds)
# PLAN中能看到SCAN的是GBL_IDX_ORDER_INFO,说明Phoenix是直接通过查询索引表获取到数据。
- 删除索引
使用drop index 索引名 ON 表名
drop index GBL_IDX_ORDER_INFO on ORDER_INFO;
- 查看索引
!table
0: jdbc:phoenix:server2:2118> !table
+------------+--------------+---------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----------------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | IMMUTABLE_ROWS | SALT_BUCKETS | MULTI_TENANT | VIEW_STATEMENT | VIEW_TYPE | INDEX_TYPE | TRANSACTIONAL | IS_NAMESPACE_MAPPED |
+------------+--------------+---------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----------------------+
| | | GBL_IDX_ORDER_INFO | INDEX | | | | | ACTIVE | false | 10 | false | | | GLOBAL | false | false |
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | false | null | false | | | | false | true |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | false | null | false | | | | false | true |
| | SYSTEM | LOG | SYSTEM TABLE | | | | | | true | 32 | false | | | | false | true |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | false | null | false | | | | false | true |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | | false | null | false | | | | false | true |
| | | ORDER_INFO | TABLE | | | | | | false | 10 | false | | | | false | false |
| | | ORDER_INFO_V | VIEW | | | | | | false | 10 | false | | UPDATABLE | | false | false |
| | MOMO_CHAT | MESSAGE | VIEW | | | | | | false | null | false | | MAPPED | | false | true |
+------------+--------------+---------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+-----------------------+
0: jdbc:phoenix:server2:2118>
- 测试查询所有列是否会使用索引
explain select * from order_info where user_id = '1274270';
0: jdbc:phoenix:server2:2118> explain select * from order_info where user_id = '1274270';
+------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER ORDER_INFO | null | null | null |
| SERVER FILTER BY C1.USER_ID = '1274270' | null | null | null |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.027 seconds)
# 通过查询结果发现,PLAN中是执行的FULL SCAN,说明索引并没有生效,进行的全表扫描。
- 使用Hint强制使用索引
explain select /*+ INDEX(order_info GBL_IDX_ORDER_INFO) */ * from order_info where user_id = '1274270';
0: jdbc:phoenix:server2:2118> explain select /*+ INDEX(order_info GBL_IDX_ORDER_INFO) */ * from order_info where user_id = '1274270';
+-----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER ORDER_INFO | null | null | null |
| SKIP-SCAN-JOIN TABLE 0 | null | null | null |
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN RANGE SCAN OVER GBL_IDX_ORDER_INFO [0,'1274270'] - [9,'1274270'] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY | null | null | null |
| DYNAMIC SERVER FILTER BY "ORDER_INFO.ID" IN ($11.$13) | null | null | null |
+-----------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.042 seconds)
# 通过执行计划,我们可以观察到查看全局索引,找到ROWKEY,然后执行全表的JOIN,其实就是把对应ROWKEY去查询ORDER_INFO表。
3)、索引示例二:创建本地索引
根据订单ID、订单状态、支付金额、支付方式、用户ID来查询订单
针对这种场景,我们可以使用本地索引来提高查询效率
注意:创建表的时候指定了SALT_BUCKETS,是不支持本地索引的
以下操作是基於下面創建的表結構,用於驗證本地索引是否支持salt_buckets的創建表方式
create table if not exists order_info(
ID varchar primary key,
C1.STATUS varchar,
C1.MONEY float,
C1.PAY_WAY integer,
C1.USER_ID varchar,
C1.OPERATION_TIME varchar,
C1.CATEGORY varchar
)
CONPRESSION='GZ', SALT_BUCKETS=10;
- 创建本地索引
create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id) ;
#創建本地索引時會出現以下異常
Error: ERROR 102 (08001): Malformed connection url. ERROR 102 (08001): ERROR 102 (08001): Malformed connection url. :server1:2118,server2:2118,server3:2118:2181:/hbase; LOCAL_IDX_ORDER_INFO (state=08001,code=102)
java.sql.SQLException: ERROR 102 (08001): Malformed connection url. ERROR 102 (08001): ERROR 102 (08001): Malformed connection url. :server1:2118,server2:2118,server3:2118:2181:/hbase; LOCAL_IDX_ORDER_INFO
#解決辦法
#將hbase-site.xml文件中的關於zookeeper的配置進行修改
由
<!-- ZooKeeper的地址 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>server1:2118,server2:2118,server3:2118</value>
</property>
修改為
<!-- ZooKeeper的地址 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>server1,server2,server3</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2118</value>
</property>
#分發hbase-site.xml至hbase集群,並複製到phoenix的bin目錄下,重啓hbase和phoenix
- 再次執行創建本地索引
0: jdbc:phoenix:server2:2118> create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id) ;
66 rows affected (5.938 seconds)
通过查看WebUI,我们并没有发现创建名为:LOCAL_IDX_ORDER_INFO 的表。
那索引数据是存储在哪儿呢?我们可以通过HBase shell
hbase(main):002:0> scan 'ORDER_INFO' ,{LIMIT => 1}
ROW COLUMN+CELL
\x00\x00\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751\x00\xE5\xB7\xB2\xE4\xBB column=L#0:\x00\x00\x00\x00, timestamp=1664449069580, value=\x00\x00\x00\x00
\x98\xE6\xAC\xBE\x00\xC2^Q\x00\xC1\x02\x002993700
1 row(s)
Took 0.4211 seconds
可以看到Phoenix对数据进行处理,原有的数据发生了变化。
建立了二级索引,具有一定地侵入性,原有的数据Phoenix会编码后存储。但当我们执行get/put/delete请求时,按正常理解是不能操作数据的。Phoenix建立视图时,会在表上创建若干的协处理器。协处理会负责接收客户端发来的请求,get/put/delete/scan,然后进行解码操作,最后我们发现仍然可以操作数据。
- 查看数据
explain select * from ORDER_INFO WHERE status = '已提交';
explain select * from ORDER_INFO WHERE status = '已提交' AND pay_way = 1;
0: jdbc:phoenix:server2:2118> explain select * from ORDER_INFO WHERE status = '已提交';
+------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER ORDER_INFO | null | null | null |
| SERVER FILTER BY C1.STATUS = '已提交' | null | null | null |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.057 seconds)
0: jdbc:phoenix:server2:2118> explain select * from ORDER_INFO WHERE status = '已提交' AND pay_way = 1;
+------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 10-CHUNK PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER ORDER_INFO | null | null | null |
| SERVER FILTER BY (C1.STATUS = '已提交' AND C1.PAY_WAY = 1) | null | null | null |
+------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.033 seconds)
通过观察上面的两个执行计划发现,两个查询都是通过FULL SCAN來實現的,沒有走索引,説明本地索引未生效。
- 重新創建表
drop table if exists order_info;
create table order_info(
ID varchar primary key,
C1.STATUS varchar,
C1.MONEY float,
C1.PAY_WAY integer,
C1.USER_ID varchar,
C1.OPERATION_TIME varchar,
C1.CATEGORY varchar
)
CONPRESSION='GZ'
SPLIT ON ('3','5','7');
創建表之前可能需要將該表的相關索引、視圖等要先刪除
0: jdbc:phoenix:server2:2118> create table if not exists order_info(
. . . . . . . . . . . . . . > ID varchar primary key,
. . . . . . . . . . . . . . > C1.STATUS varchar,
. . . . . . . . . . . . . . > C1.MONEY float,
. . . . . . . . . . . . . . > C1.PAY_WAY integer,
. . . . . . . . . . . . . . > C1.USER_ID varchar,
. . . . . . . . . . . . . . > C1.OPERATION_TIME varchar,
. . . . . . . . . . . . . . > C1.CATEGORY varchar
. . . . . . . . . . . . . . > )
. . . . . . . . . . . . . . > CONPRESSION='GZ'
. . . . . . . . . . . . . . > SPLIT ON ('3','5','7');
- 重新創建本地索引
注意:創建表后,需要先添加數據后,再建索引,否則會出現如下異常:
org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: Failed to build index for unexpected reason!
據説是hbase與phoenix版本不兼容造成的,本示例使用的hbase2.1.0與phoenix5.0.0.但其他的版本没有验证。
phoenix官方版本说明:Current release 5.1.2 can run on Apache HBase 2.1, 2.2, 2.3 and 2.4。
create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id);
.。。。
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('f642b16b-eade-4169-9eeb-4d5f294ec594','已付款',4010,1,'6463215','2020-04-25 12:09:33','男鞋;汽车;');
1 row affected (0.003 seconds)
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('f8f3ca6f-2f5c-44fd-9755-1792de183845','已付款',5950,3,'4060214','2020-04-25 12:09:12','机票;文娱;');
1 row affected (0.004 seconds)
0: jdbc:phoenix:server2:2118> create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id);
66 rows affected (5.057 seconds)
以下示例創建索引前後插入數據所展示的情況(創建前插入數據沒有異常,創建後再插入數據出現異常)
hbase2.1.0对应的phoenix的版本是5.1.2(phoenix-hbase-2.1-5.1.2-bin.tar.gz,该版本部署之前有关于该部分的说明),验证通过
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('f8f3ca6f-2f5c-44fd-9755-1792de183845','已付款',5950,3,'4060214','2020-04-25 12:09:12','机票;文娱;');
1 row affected (0.004 seconds)
0: jdbc:phoenix:server2:2118> create local index LOCAL_IDX_ORDER_INFO on ORDER_INFO(id, status, money, pay_way, user_id);
66 rows affected (5.057 seconds)
0: jdbc:phoenix:server2:2118> UPSERT INTO order_info VALUES('f8f3ca6f-2f5c-44fd-9755-1792de183846','已付款',5950,3,'4060214','2020-04-25 12:09:12','机票;文娱;');
22/10/08 09:18:52 WARN client.AsyncRequestFutureImpl: id=1, table=ORDER_INFO, attempt=1/16, failed=1ops, last exception=org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: Failed to build index for unexpected reason!
- 重新查看數據
explain select * from ORDER_INFO WHERE status = '已提交';
explain select * from ORDER_INFO WHERE status = '已提交' AND pay_way = 1;
0: jdbc:phoenix:server2:2118> explain select * from ORDER_INFO WHERE status = '已提交';
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER ORDER_INFO [2] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND "STATUS" = '已提交' | null | null | null |
+---------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.024 seconds)
0: jdbc:phoenix:server2:2118> explain select * from ORDER_INFO WHERE status = '已提交' AND pay_way = 1;
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY ROUND ROBIN RANGE SCAN OVER ORDER_INFO [2] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND ("STATUS" = '已提交' AND TO_INTEGER("PAY_WAY") = 1) | null | null | null |
+-------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.022 seconds)
通过观察上面的两个执行计划发现,两个查询都是通过RANGE SCAN来实现的。说明本地索引生效。
- 删除本地索引
drop index LOCAL_IDX_ORDER_INFO on ORDER_INFO ;
0: jdbc:phoenix:server2:2118> drop index LOCAL_IDX_ORDER_INFO on ORDER_INFO ;
66 rows affected (0.923 seconds)
- 重新执行一次扫描,你会发现数据恢复出来了。
hbase(main):004:0> scan 'ORDER_INFO' , {LIMIT => 1}
ROW COLUMN+CELL
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x00\x00\x00\x00, timestamp=1664449069580, value=x
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0B, timestamp=1664449069580, value=\xE5\xB7\xB2\xE4\xBB\x98\xE6\xAC\xBE
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0C, timestamp=1664449069580, value=\xC6\x12\x90\x01
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0D, timestamp=1664449069580, value=\x80\x00\x00\x01
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0E, timestamp=1664449069580, value=2993700
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x0F, timestamp=1664449069580, value=2020-04-25 12:09:46
\x000f46d542-34cb-4ef4-b7fe-6dcfa5f14751 column=C1:\x80\x10, timestamp=1664449069580, value=\xE7\xBB\xB4\xE4\xBF\xAE;\xE6\x89\x8B\xE6\x9C\xBA;
1 row(s)
Took 0.0300 seconds
4)、索引示例三:创建本地函数索引
- 创建本地函数索引
CREATE LOCAL INDEX LOCAL_IDX_MOMO_MESSAGE ON MOMO_CHAT.MESSAGE(substr("msg_time", 0, 10), "sender_account", "receiver_account");
0: jdbc:phoenix:server2:2118> CREATE LOCAL INDEX LOCAL_IDX_MOMO_MESSAGE ON MOMO_CHAT.MESSAGE(substr("msg_time", 0, 10), "sender_account", "receiver_account");
976,135 rows affected (24.163 seconds)
0: jdbc:phoenix:server2:2118> select count(*) from "MOMO_CHAT"."MESSAGE";
+-----------+
| COUNT(1) |
+-----------+
| 976135 |
+-----------+
1 row selected (0.844 seconds)
- 执行数据查询
没有使用函数索引,程序查询的耗时
List message = chatMessageService.getMessage(“2022-09-30”, “13978325309”, “15338014470”);
INFO - message count=87
INFO - StopWatch ‘’: running time (millis) = 4149
基于函数索引查询
SELECT * FROM "MOMO_CHAT"."MESSAGE" T
WHERE substr("msg_time", 0, 10) = '2022-09-30'
AND T."sender_account" = '13978325309'
AND T."receiver_account" = '15338014470' LIMIT 100;
0: jdbc:phoenix:server2:2118> SELECT * FROM "MOMO_CHAT"."MESSAGE" T
. . . . . . . . . . . . . . > WHERE substr("msg_time", 0, 10) = '2022-09-30'
. . . . . . . . . . . . . . > AND T."sender_account" = '13978325309'
. . . . . . . . . . . . . . > AND T."receiver_account" = '15338014470' LIMIT 100;
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+------------------+-------------------+----+
| pk | msg_time | sender_nickyname | sender_account | sender_sex | sender_ip | sender_os | sender_phone_type | sender_network | sender_gps | receiver_nickyname | receiver_ip | receiver_account | re |
+-------------------------------------------------+----------------------+-------------------+-----------------+-------------+------------------+--------------+--------------------+-----------------+-----------------------+---------------------+------------------+-------------------+----+
| 00005903_13978325309_15338014470_1664505396000 | 2022-09-30 10:36:36 | 许兴文 | 13978325309 | 男 | 215.205.47.66 | Android 6.0 | 一加 OnePlus | 4G | 114.058533,36.247553 | 叶忆南 | 178.220.200.51 | 15338014470
...
87 rows selected (0.33 seconds)
0: jdbc:phoenix:server2:2118> explain SELECT * FROM "MOMO_CHAT"."MESSAGE" T
. . . . . . . . . . . . . . > WHERE substr("msg_time", 0, 10) = '2022-09-30'
. . . . . . . . . . . . . . > AND T."sender_account" = '13978325309'
. . . . . . . . . . . . . . > AND T."receiver_account" = '15338014470' LIMIT 100;
+---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 6-CHUNK 100 ROWS 6700 BYTES SERIAL 6-WAY ROUND ROBIN RANGE SCAN OVER MOMO_CHAT:MESSAGE [1,'2022-09-30','13978325309','15338014470'] | 6700 | 100 | 0 |
| SERVER FILTER BY FIRST KEY ONLY | 6700 | 100 | 0 |
| SERVER 100 ROW LIMIT | 6700 | 100 | 0 |
| CLIENT 100 ROW LIMIT | 6700 | 100 | 0 |
+---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.06 seconds)
可以看到,查询速度非常快,0.33秒就查询出来了数据
五、数据分区示例
1、rowkey 是整数
create table test1(
rk integer primary key,
C1.id varchar,
C1.name varchar
)
CONPRESSION='GZ'
SPLIT ON (10,20,30,40);
Region 1 : row key 的前两位是 min~10
Region 2 : row key 的前两位是 10~20
Region 3 : row key 的前两位是 20~30
Region 4 : row key 的前两位是 30~40
Region 5 : row key 的前两位是 40~max
--第一个分区
UPSERT INTO test1 VALUES(9,'a1','alan2452');
--第二个分区
UPSERT INTO test1 VALUES(10,'a1','alan2452');
UPSERT INTO test1 VALUES(11,'a1','alan2452');
UPSERT INTO test1 VALUES(19,'a1','alan2452');
--第三个分区
UPSERT INTO test1 VALUES(20,'a1','alan2452');
UPSERT INTO test1 VALUES(21,'a1','alan2452');
UPSERT INTO test1 VALUES(29,'a1','alan2452');
--第四个分区
UPSERT INTO test1 VALUES(30,'a1','alan2452');
UPSERT INTO test1 VALUES(31,'a1','alan2452');
UPSERT INTO test1 VALUES(39,'a1','alan2452');
--第五个分区
UPSERT INTO test1 VALUES(40,'a1','alan2452');
UPSERT INTO test1 VALUES(41,'a1','alan2452');
2、rowkey是字符串
create table test(
rk varchar primary key,
C1.id varchar,
C1.name varchar
)
CONPRESSION='GZ'
SPLIT ON ('aa','bb','cc');
--第0个分区
UPSERT INTO test VALUES('a1','a1','alan2452');
--第1个分区
UPSERT INTO test VALUES('aa1','aa1','alan2452');
UPSERT INTO test VALUES('ab1','ab1','alan2452');
UPSERT INTO test VALUES('ba1','ba1','alan2452');
UPSERT INTO test VALUES('b1','b1','alan2452');
--第2个分区
UPSERT INTO test VALUES('bb1','bb1','alan2452');
UPSERT INTO test VALUES('bc1','bc1','alan2452');
UPSERT INTO test VALUES('c1','c1','alan2452');
UPSERT INTO test VALUES('cb','cb','alan2452');
--第3个分区
UPSERT INTO test VALUES('cc','cc','alan2452');
UPSERT INTO test VALUES('cc1','cc1','alan2452');