Hive之索引

最新推荐文章于 2024-06-27 22:44:53 发布

凤无痕

最新推荐文章于 2024-06-27 22:44:53 发布

阅读量5.3k

点赞数 3

本文链接：https://blog.csdn.net/weixin_38942735/article/details/108827796

版权

Hive只有有限的索引功能，没有普通关系型数据库中键的概念，但是可以对一些字段来建立索引，加速某些操作，一张表的索引数据存储在另外一张表中。维护索引需要额外的存储空间，同时创建索引也需要消耗计算量。
1、创建索引：

CREATE TABLE employees(
	name string,
	salary float,
	subordinates ARRAY<string>,
	deductions MAP<string,float>,
	address struct<street:STRING, city:STRING, state:STRING, zip:INT>
)
partition by(country string,state string);
#对上表的分区字段country建立索引
CREATE TABLE employees_index
on table employees(country)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH deferred rebuild
idxproperties('create ='me','create_at'='some_time')
in table employees_index_table
partition by(country,name)
comment 'employees indexed by country and name.';

上面这种情况，我们所创建的索引表没有使用同原表一样的分区划分。如果省略partition by语句的话，那么索引将会包含原始表所有分区。
AS…语句指定了索引处理器，这里的CompactIndexHandler是其中的一个实现。
WITH deferred rebuild

2、Bitmap索引
bitmap索引普遍用于重排后值较小的列，下面是对前面例子使用bitmap索引处理器重写后的语句：

Create index employees_index
on table employees(country)
AS 'BITMAP'
with deferred rebuild
idxproperties('create ='me','create_at'='some_time')
in table employees_index_table
partition by(country,name)
comment 'employees indexed by country and name.';

3、重建索引

ALTER　INDEX employees_index
on TABLE employees
Partition(country='US')
rebuild;

4、显示索引

show formatted index on employees;

5、删除索引

DROP INDEX IF EXISTS employees_index on TABLE employees;

凤无痕

关注

3
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
Hive之索引

Hive只有有限的索引功能，没有普通关系型数据库中键的概念，但是可以对一些字段来建立索引，加速某些操作，一张表的索引数据存储在另外一张表中。维护索引需要额外的存储空间，同时创建索引也需要消耗计算量。1、创建索引：CREATE TABLE employees( name string, salary float, subordinates ARRAY<string>, deductions MAP<string,float>, address struct<stre
复制链接

扫一扫