HiveQL：索引

最新推荐文章于 2024-05-27 10:39:13 发布

hwaholee

最新推荐文章于 2024-05-27 10:39:13 发布

阅读量399

点赞数

分类专栏：大数据文章标签： Hive Hadoop HiveQL

本文链接：https://blog.csdn.net/u013667492/article/details/55209999

版权

大数据专栏收录该内容

7 篇文章 0 订阅

订阅专栏

当逻辑分区实际上太多太细而几乎无法使用时，建立索引也就成为分区的另一个选择。建立索引可以帮助裁剪掉一张表的一些数据块，这样能够减少MapReduce的输入数据量。

创建索引

先创建一个employees表：

hive> create table employees(
name          string,
salary        float,
subordinates  array<string>,
address       struct<street:string,city:string,state:string,zip:int>
)
partitioned by (country string,state string);

下面我们仅对分区字段country建立索引：

hive> create index employees_index
on table employees(country)
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
with deferred rebuild
idxproperties ('creator'='me','created_at'='2017-2-13')
in table employees_index_table
partition by (country,name)
comment 'Employees indexed by country and name.';

as…语句指定了索引处理器，也就是一个实现了索引接口的Java类。

Bitmap索引

bitmap索引普遍应用于排重后值较少的列：

hive> create index employees_index
on table employees (country)
as 'Bitmap'
idxproperties ('creator'='me','created_at'='2017-02-13')
in table employees_index_table
partitioned by (country,name)
comment 'Employees indexed by country and name.';

重建索引

使用alter index可以对索引进行重建（如果重建索引失败，在重建开始之前，索引将提留在之前的版本状态）：

hive> alter index employees_index
on table employees
partition (country='US')
rebuild;

显示索引

hive> show formatted index on employees;

删除索引

hive> drop index if exists employees_index on table employees;

附我在开源中国的原文：
https://my.oschina.net/lonelycode/blog/837420

hwaholee

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HiveQL：索引

当逻辑分区实际上太多太细而几乎无法使用时，建立索引也就成为分区的另一个选择。建立索引可以帮助裁剪掉一张表的一些数据块，这样能够减少MapReduce的输入数据量。创建索引先创建一个employees表：hive> create table employees(name string,salary float,subordinates array<string>
复制链接

扫一扫