18 hive索引

最新推荐文章于 2024-10-02 17:46:32 发布

莹火虫的另一半

最新推荐文章于 2024-10-02 17:46:32 发布

阅读量503

点赞数

分类专栏： hive

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/woshilovetg/article/details/113065080

版权

hive 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

Hive索引

要想使用以下任何一种索引，都必须打开全局索引开关

hive.optimize.index.filter

hive索引分为三种

1.原始索引（淘汰不使用）

2.行组索引，Row Group Index

3.Bloom Filter Index

注意：后面两种索引只适用于ocr格式的文件

一、Hive原始索引

一般不会

在Hive3.0中已被删除

二、Row Group Index

行组索引、主要用于数值类型条件查询。（=、<，>），如：int，时间戳类型

注意：为了使Row Group Index有效利用，向表中加载数据时，必须对需要使用索引的字段进行排序

如何生效

1.创建表时生效

'orc.create.index'='true'

create table test
(id int, pid int)
stored AS ORC
tblproperties ('orc.compress'='SNAPPY', 'orc.create.index'='true');

2.插入时生效

Insert into test
select * from test123 clusterd by id;
#distributed by id sorted by id

使用索引

set hive.optimize.index.filter=true;
select * from test where id > 5 or id <=10;

三、Bloom Filter Index

作用

所有等值条件，都可以使用此索引。包括String和Int类型等。

如何生效

创建表时

'orc.create.index'='true',

'orc.bloom.filter.columns'='pid'

如果指定多列索引，使用如下方式

'orc.create.index'='true',

'orc.bloom.filter.columns'='pid，id，name'

create table test
(id int, pid int)
stored AS ORC
tblproperties ('orc.compress'='SNAPPY',
               'orc.create.index'='true',
               'orc.bloom.filter.columns'='pid'
              );

使用索引

in 也包含在等值条件里面

select * from test where pid=50 or pid in (100, 20);

莹火虫的另一半

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。