hive的索引功能是有限的。一张表的索引数据是存储在另外一张表中的。通过explain可以查看某条查询语句是否使用到了索引。
给分区表china_partition 表创建索引:
hive> create index china_partition_index on table china_partition(provinceid) as "org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler" with deferred rebuild in table china_partition_index_table;
OK
Time taken: 0.446 seconds
查询索引表:
hive> select * from china_partition_index_table;
重建索引:
hive> alter index china_partition_index on china_partition rebuild;
显示索引:
hive> show formatted index on china_partition;
删除索引:
hive> drop index china_partition_index on table china_partition;
实现一个定制化的索引处理器:
org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler 参考源码实现。