Phoenix二级索引那些事儿(下)

http://www.icaijing.com/hot/article4940159/


Phoenix二级索引那些事儿(下)

作者:中兴大数据 |  发表时间:2015-7-30 03:31:18
Phoenix二级索引那些事儿(下)-爱财经网

索引配置


公共配置

hbase-site.xml





hbase.regionserver.wal.codec
org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec






全局索引配置

hbase-site.xml配置项

支持(HBase0.98.4+ and Phoenix 4.3.1+ only)




hbase.region.server.rpc.scheduler.factory.class
org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory
Factory to create thePhoenix RPC Scheduler that uses separate queues for index and metadataupdates


hbase.rpc.controllerfactory.class
org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory
Factory to create thePhoenix RPC Scheduler that uses separate queues for index and metadataupdates




局部索引配置

hbase-site.xml配置项




hbase.master.loadbalancer.class
org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer



hbase.coprocessor.master.classes
org.apache.phoenix.hbase.index.master.IndexMasterObserver




hbase.coprocessor.regionserver.classes
org.apache.hadoop.hbase.regionserver.LocalIndexMerger




索引配置调优

hbase-site.xml配置项



index.builder.threads.max
Default: 10

  • 根据主表更新建立索引表更新的线程数目
  • 调高这个值可以克服读取Region的row state的瓶颈,如果调的太高,HRegion又会遇到处理太多并发scan requests的瓶颈以及 generalthread-swapping 障碍.


index.builder.threads.keepalivetime
Default: 60

  • index builder线程池里的线程过期之后存活的时间
  • 超过这个存活时间,未使用会立马被释放,核心线程是不会被保留的.如果从负载角度考虑,是可以手动去释放线程


index.writer.threads.max
Default: 10

  • 将index update写入index table的线程数目
  • 应该大致对应index table的数目


index.writer.threads.keepalivetime
Default: 60

  • 类似index.builder.threads.keepalivetime


hbase.htable.threads.max
Default: 2,147,483,647

  • 索引表可以使用的写线程最大数目
  • 增加这个值会提高索引更新的并发量,提升全局吞吐


hbase.htable.threads.keepalivetime
Default: 60

  • 类似index.builder.threads.keepalivetime


index.tablefactory.cache.size
Default: 10

  • 放入缓存的索引表的数量
  • 增加这个值,可以确保写index时不需要重新创建indexHTable,但是值越大,memory压力越大.


org.apache.phoenix.regionserver.index.priority.min
Default: 1000

  • Value to specify to bottom (inclusive) of therange in which index priority may lie.


org.apache.phoenix.regionserver.index.priority.max
Default: 1050

  • Value to specify to top (exclusive) of therange in which index priority may lie.
  • Higher priorites within the index min/max rangedo not means updates are processed sooner.


org.apache.phoenix.regionserver.index.handler.count
Default: 30

  • Number of threads to use when serving indexwrite requests for global index maintenance.
  • Though the actual number of threads is dictatedby the Max(number of call queues, handler count), where the number of callqueues is determined by standard HBase configuration. To further tune thequeues, you can adjust the standard rpc queue length parameters (currently,there are no special knobs for the index queues), specificallyipc.server.max.callqueue.length and ipc.server.callqueue.handler.factor. Seethe HBase Reference Guide for more details.


其它功能


Phoenix子查询

IN和Not In的子查询
例子

SELECT ItemName
FROM Items 
WHERE ItemID IN 
(SELECT ItemID
FROM Orders
WHERE Date >= to_date('2013/09/02'));


Exists和Not Exists的子查询
例子

SELECT ItemName
FROM Items i
WHERE EXISTS
(SELECT *
FROM Orders
WHERE Date >= to_date('2013/09/02')
AND ItemID = i.ItemID);


半连接、反连接、join
例子

SELECTd.dept_id,e.dept_id,e.name FROM DEPT d JOIN EMPL e ON e.dept_id = d.dept_id;
JOIN支持:
INNER
LEFT OUTER
RIGHT


比较运算
例子

SELECT ID, Name
FROM Contest
WHERE Score >
(SELECT avg(Score)
FROM Contest)
ORDER BY ScoreDESC;


ANY/SOME/ALL运算

例子

SELECT OrderID
FROM Orders
WHERE quantity>= ANY
(SELECT max(quantity)
FROM Orders
GROUP BY ItemID);


相关子查询
例子

SELECT PatentID,Title
FROM Patents p
WHERE FileDate ,   LIMIT  
SELECT title,author, isbn, description
FROM library
WHEREpublished_date > 2010
AND (title, author,isbn) > (?, ?, ?)
ORDER BY title,author, isbn
LIMIT 20


Phoenix 统计收集

统计收集有助于提升query性能。



命令:
UPDATE STATISTICSmy_table


等效于
UPDATE STATISTICSmy_table ALL


如果只收集index或者column
UPDATE STATISTICSmy_table INDEX
UPDATE STATISTICSmy_table COLUMNS


参数配置
phoenix.stats.guidepost.width默认104857600 
phoenix.stats.guidepost.per.region
phoenix.stats.updateFrequency默认900000 (15 mins)
phoenix.stats.minUpdateFrequency默认7.5 mins
phoenix.stats.useCurrentTime 默认true


Phoenix 常用语法

Commands

Phoenix二级索引那些事儿(下)-爱财经网



Other Grammar
Phoenix二级索引那些事儿(下)-爱财经网



Phoenix 常用函数

Aggregate Functions

Phoenix二级索引那些事儿(下)-爱财经网



String Functions

Phoenix二级索引那些事儿(下)-爱财经网



Time and Date Functions

Phoenix二级索引那些事儿(下)-爱财经网



Numeric Functions

Phoenix二级索引那些事儿(下)-爱财经网



Array Functions

Phoenix二级索引那些事儿(下)-爱财经网



Other Functions

Phoenix二级索引那些事儿(下)-爱财经网



Phoenix 数据类型

Phoenix二级索引那些事儿(下)-爱财经网



Phoenix 使用任意时间戳

在Property里面设置属性 \"CurrentSCN\"。ts是一个long。
Properties props = new Properties();
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
Connection conn = DriverManager.connect(myUrl, props);
conn.createStatement().execute(\"UPSERT INTO myTable VALUES ('a')\");
conn.commit();


相当于:
myTable.put(Bytes.toBytes('a'),ts);


Phoenix 性能提升

1、加盐:

加盐可以将数据存入多个region里,从而提升读写性能。


CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SALT_BUCKETS=42


如果有16台region server,每台server有4核CPU,则SALT_BUCKETS 设置为32~64之间。即如果集群总的CPU核数为N,则SALT_BUCKETS为 0.5N ~ N 之间。


加盐后的注意事项:

  • sequential scan 返回的结果可能不是自然排序的,如果sequential scan使用了LIMIT语句,将与不加盐的情况不一样。
  • Spit point:If no split points are specified for the table, the salted table would be pre-split on salt bytes boundaries to ensure load distribution among region servers even during the initial phase of the table. If users are to provide split points manually, users need to include a salt byte in the split points they provide.
  • Row Key 排序:Pre-spliting also ensures that all entries in the region server all starts with the same salt byte, and therefore are stored in a sorted manner. When doing a parallel scan across all region servers, we can take advantage of this properties to perform a merge sort of the client side. The resulting scan would still be return sequentially as if it is from a normal table。实际上是改写了Row Key,添加了一个prefix:new_row_key = (++index % BUCKETS_NUMBER) + original_key。数据存储到 Buckects_Number 个Bucket中,每个Bucket的Prefix 相同,在query的时候,同时在各个Bucket进行。


2、split
如果不想通过加盐来分区,可以自己手动设置分区的方法。这样可以不引入额外的byte,或者改变row key的顺序,例子
CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA')


3、使用多个列族
CREATE TABLE TEST (MYKEY VARCHAR NOT NULL PRIMARY KEY, A.COL1 VARCHAR, A.COL2 VARCHAR, B.COL3 VARCHAR)


4、使用压缩
CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) COMPRESSION='GZ'


5、使用二级索引


6、优化集群


7、优化phoenix 参数


Phoenix 跳过SCAN

The List for SkipScanFilter forthe above query would be [ [ [ a - b ], [ d - e ] ], [ 1, 2 ] ] where [ [ a - b], [ d - e ] ] is the range for KEY1and [ 1, 2 ] keys for KEY2.



例子
SELECT * from T

WHERE ((KEY1>='a' AND KEY1  'c' AND KEY1
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值