索引
索引是数据库标配的技术,hive索引从0.7以后才开始支持的。
hive索引优缺点
优点:避免全表扫描或者减小扫描数据流,提高查询效率
缺点:将会有冗余存储,加载数据耗时
索引特点
索引文件本身有序,索引文件较小
测试索引
1.创建表
create external table if not exists log1(
id string COMMENT 'this is id column',
phonenumber bigint,
mac string,
ip string,
url string,
title string,
upflow int,
downflow int
)
row format delimited fields terminated by '\t'
lines terminated by '\n'
2.加载数据
3.查询并查看时间
select
count(l.phonenumber)
from log1 l
group by l.phonenumber;
4.创建compact索引
//查看索引
show index on log1;
create index idx_log1_pho
on table log1(phonenumber)
as 'compact'
with deferred rebuild
;