Phoenix二级索引那些事儿（下）

最新推荐文章于 2022-07-20 13:53:56 发布

冥想者-定

最新推荐文章于 2022-07-20 13:53:56 发布

阅读量1.7k

点赞数

分类专栏： __HBase

__HBase 专栏收录该内容

23 篇文章 1 订阅

订阅专栏

http://www.icaijing.com/hot/article4940159/

Phoenix二级索引那些事儿（下）

作者：中兴大数据 | 发表时间：2015-7-30 03:31:18

索引配置

公共配置

hbase-site.xml

hbase.regionserver.wal.codec
org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec

全局索引配置

hbase-site.xml配置项

支持（HBase0.98.4+ and Phoenix 4.3.1+ only）

hbase.region.server.rpc.scheduler.factory.class
org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory
Factory to create thePhoenix RPC Scheduler that uses separate queues for index and metadataupdates

hbase.rpc.controllerfactory.class
org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory
Factory to create thePhoenix RPC Scheduler that uses separate queues for index and metadataupdates

局部索引配置

hbase-site.xml配置项

hbase.master.loadbalancer.class
org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer

hbase.coprocessor.master.classes
org.apache.phoenix.hbase.index.master.IndexMasterObserver

hbase.coprocessor.regionserver.classes
org.apache.hadoop.hbase.regionserver.LocalIndexMerger

索引配置调优

hbase-site.xml配置项

index.builder.threads.max
Default: 10

根据主表更新建立索引表更新的线程数目
调高这个值可以克服读取Region的row state的瓶颈，如果调的太高，HRegion又会遇到处理太多并发scan requests的瓶颈以及 generalthread-swapping 障碍.

index.builder.threads.keepalivetime
Default: 60

index builder线程池里的线程过期之后存活的时间
超过这个存活时间，未使用会立马被释放，核心线程是不会被保留的．如果从负载角度考虑，是可以手动去释放线程

index.writer.threads.max
Default: 10

将index update写入index table的线程数目
应该大致对应index table的数目

index.writer.threads.keepalivetime
Default: 60

类似index.builder.threads.keepalivetime

hbase.htable.threads.max
Default: 2,147,483,647

索引表可以使用的写线程最大数目
增加这个值会提高索引更新的并发量，提升全局吞吐

hbase.htable.threads.keepalivetime
Default: 60

类似index.builder.threads.keepalivetime

index.tablefactory.cache.size
Default: 10

放入缓存的索引表的数量
增加这个值，可以确保写index时不需要重新创建indexHTable，但是值越大，memory压力越大．

org.apache.phoenix.regionserver.index.priority.min
Default: 1000

Value to specify to bottom (inclusive) of therange in which index priority may lie.

org.apache.phoenix.regionserver.index.priority.max
Default: 1050

Value to specify to top (exclusive) of therange in which index priority may lie.
Higher priorites within the index min/max rangedo not means updates are processed sooner.

org.apache.phoenix.regionserver.index.handler.count
Default: 30

Number of threads to use when serving indexwrite requests for global index maintenance.
Though the actual number of threads is dictatedby the Max(number of call queues, handler count), where the number of callqueues is determined by standard HBase configuration. To further tune thequeues, you can adjust the standard rpc queue length parameters (currently,there are no special knobs for the index queues), specificallyipc.server.max.callqueue.length and ipc.server.callqueue.handler.factor. Seethe HBase Reference Guide for more details.

其它功能

Phoenix子查询

IN和Not In的子查询
例子

SELECT ItemName
FROM Items
WHERE ItemID IN
(SELECT ItemID
FROM Orders
WHERE Date >= to_date('2013/09/02'));

Exists和Not Exists的子查询
例子

SELECT ItemName
FROM Items i
WHERE EXISTS
(SELECT *
FROM Orders
WHERE Date >= to_date('2013/09/02')
AND ItemID = i.ItemID);

半连接、反连接、join
例子

SELECTd.dept_id,e.dept_id,e.name FROM DEPT d JOIN EMPL e ON e.dept_id = d.dept_id;
JOIN支持：
INNER
LEFT OUTER
RIGHT

比较运算
例子

SELECT ID, Name
FROM Contest
WHERE Score >
(SELECT avg(Score)
FROM Contest)
ORDER BY ScoreDESC;

ANY/SOME/ALL运算

例子

SELECT OrderID
FROM Orders
WHERE quantity>= ANY
(SELECT max(quantity)
FROM Orders
GROUP BY ItemID);

相关子查询
例子

SELECT PatentID,Title
FROM Patents p
WHERE FileDate , LIMIT
SELECT title,author, isbn, description
FROM library
WHEREpublished_date > 2010
AND (title, author,isbn) > (?, ?, ?)
ORDER BY title,author, isbn
LIMIT 20

Phoenix 统计收集

统计收集有助于提升query性能。

命令：
UPDATE STATISTICSmy_table

等效于
UPDATE STATISTICSmy_table ALL

如果只收集index或者column
UPDATE STATISTICSmy_table INDEX
UPDATE STATISTICSmy_table COLUMNS

参数配置
phoenix.stats.guidepost.width默认104857600
phoenix.stats.guidepost.per.region
phoenix.stats.updateFrequency默认900000 (15 mins)
phoenix.stats.minUpdateFrequency默认7.5 mins
phoenix.stats.useCurrentTime 默认true

Phoenix 常用语法

Commands

Phoenix二级索引那些事儿（下）-爱财经网

Other Grammar
Phoenix二级索引那些事儿（下）-爱财经网

Phoenix 常用函数

Aggregate Functions

Phoenix二级索引那些事儿（下）-爱财经网

String Functions

Phoenix二级索引那些事儿（下）-爱财经网

Time and Date Functions

Phoenix二级索引那些事儿（下）-爱财经网

Numeric Functions

Array Functions

Other Functions

Phoenix 数据类型

Phoenix二级索引那些事儿（下）-爱财经网

Phoenix 使用任意时间戳

在Property里面设置属性 \"CurrentSCN\"。ts是一个long。
Properties props = new Properties();
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
Connection conn = DriverManager.connect(myUrl, props);
conn.createStatement().execute(\"UPSERT INTO myTable VALUES ('a')\");
conn.commit();

相当于:
myTable.put(Bytes.toBytes('a'),ts);

Phoenix 性能提升

1、加盐：

加盐可以将数据存入多个region里，从而提升读写性能。

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SALT_BUCKETS=42

如果有16台region server，每台server有4核CPU，则SALT_BUCKETS 设置为32~64之间。即如果集群总的CPU核数为N，则SALT_BUCKETS为 0.5N ~ N 之间。

加盐后的注意事项：

sequential scan 返回的结果可能不是自然排序的，如果sequential scan使用了LIMIT语句，将与不加盐的情况不一样。
Spit point：If no split points are specified for the table, the salted table would be pre-split on salt bytes boundaries to ensure load distribution among region servers even during the initial phase of the table. If users are to provide split points manually, users need to include a salt byte in the split points they provide.
Row Key 排序：Pre-spliting also ensures that all entries in the region server all starts with the same salt byte, and therefore are stored in a sorted manner. When doing a parallel scan across all region servers, we can take advantage of this properties to perform a merge sort of the client side. The resulting scan would still be return sequentially as if it is from a normal table。实际上是改写了Row Key，添加了一个prefix：new_row_key = (++index % BUCKETS_NUMBER) + original_key。数据存储到 Buckects_Number 个Bucket中，每个Bucket的Prefix 相同，在query的时候，同时在各个Bucket进行。

2、split
如果不想通过加盐来分区，可以自己手动设置分区的方法。这样可以不引入额外的byte，或者改变row key的顺序，例子
CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA')

3、使用多个列族
CREATE TABLE TEST (MYKEY VARCHAR NOT NULL PRIMARY KEY, A.COL1 VARCHAR, A.COL2 VARCHAR, B.COL3 VARCHAR)

4、使用压缩
CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) COMPRESSION='GZ'

5、使用二级索引

6、优化集群

7、优化phoenix 参数

Phoenix 跳过SCAN

The List for SkipScanFilter forthe above query would be [ [ [ a - b ], [ d - e ] ], [ 1, 2 ] ] where [ [ a - b], [ d - e ] ] is the range for KEY1and [ 1, 2 ] keys for KEY2.

例子
SELECT * from T

WHERE ((KEY1>='a' AND KEY1 'c' AND KEY1

中兴大数据, 微软, 例子, 中兴

冥想者-定

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Phoenix二级索引那些事儿（下）

http://www.icaijing.com/hot/article4940159/Phoenix二级索引那些事儿（下）作者：中兴大数据| 发表时间：2015-7-30 03:31:18索引配置公共配置hbase-site.xmlhbase.regionserv
复制链接

扫一扫

专栏目录