hive-lateral view+视图+索引

最新推荐文章于 2024-04-14 01:50:08 发布

weixin_46373361

最新推荐文章于 2024-04-14 01:50:08 发布

阅读量157

点赞数

分类专栏： hive hadoop 文章标签： hive

本文链接：https://blog.csdn.net/weixin_46373361/article/details/115830004

版权

hadoop 同时被 2 个专栏收录

16 篇文章 0 订阅

订阅专栏

hive

8 篇文章 0 订阅

订阅专栏

hive-lateral view+视图+索引

hive Lateral View
视图
索引

hive Lateral View

Lateral View用于和UDTF函数（explode、split）结合来使用。
首先通过UDTF函数拆分成多行，再将多行结果组合成一个支持别名的虚拟表。

1	小明1	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
2	小明2	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
3	小明3	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
4	小明4	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
5	小明5	["lol","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
6	小明6	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
7	小明7	["lol","book"]	{"beijing":"xisanqi","shanghai":"pudong"}
8	小明8	["lol","book"]	{"beijing":"xisanqi","shanghai":"pudong"}
9	小明9	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}

hive> select count(explode(hobby)) from bucket1;
FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

我们在自定义UDTF函数时候是不允许嵌套的，不可以添加其他字段使用，不可以和group by/cluster by/distribute by/sort by一起使用，否则会报错，这时候就可以通过lateral view来查询。
查询hobby共有多少种：

hive> select count(distinct(myCol1)) from bucket1
    > lateral view explode(hobby) mytb1 as myCol1;

查出来结果是3

MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.8 sec   HDFS Read: 11399 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 800 msec
OK
3
Time taken: 42.456 seconds, Fetched: 1 row(s)

视图

• 不支持物化视图
• 只能查询，不能做加载数据操作
• 视图的创建，只是保存一份元数据，查询视图时才执行对应的子查询
• view定义中若包含了ORDER BY/LIMIT语句，当查询视图时也进行ORDER BY/LIMIT语句操作，view当中定义的优先级更高

view: order by age asc;
select order by age desc;
select * from view order by age desc;

view支持迭代视图

view1: select * from tb_user1;
view2: select * from view1;
view3: select * from view2;

语法：

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name 
  [(column_name [COMMENT column_comment], ...) ]
  [COMMENT view_comment]
  [TBLPROPERTIES (property_name = property_value, ...)]
  AS SELECT ... ;

查询视图：

select colums from view;

删除视图：

DROP VIEW [IF EXISTS] [db_name.]view_name;

创建视图：

create view statistics (likenum, addrsnum) as
select count(distinct(myTable1.myCol1)), count(distinct(myTable2.myCol1)) from tb_user1
LATERAL VIEW explode(hobby) myTable1 AS myCol1 
LATERAL VIEW explode(addrs) myTable2 AS myCol1, myCol3;

create view myv1 (name, age) as select name, age from tb_user2;

可以通过show tables查看视图名，hdfs：50070上并不会显示。

hive> show tables;
myv1

>show formatted myv1
# View Information	 	 
View Original Text: 	select name,age from tb_user2	 
View Expanded Text: 	SELECT `name` AS `name`, `age` AS `age` FROM (select `tb_user2`.`name`,`tb_user2`.`age` from `data1`.`tb_user2`) `data1.myv1`	 
View Rewrite Enabled:	No

索引

目的：优化查询以及检索性能

>create index tbindex1 on table bucket1(name)
>as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild ；

as：指定索引器；
in table：指定索引表，若不指定默认生成在 (数据库名__表名_索引名__) data1__bucket1_tbindex1__表中

hive> show tables;
data1__bucket1_tbindex1__

create index tbindex2 on table bucket1(id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' 
with deferred rebuild 
in table bucket_tbindex1;

hive> show tables;
bucket_tbindex1

with deferred rebuild表示在执行alter index xxx_index on xxx rebuild时将调用generateIndexBuildTaskList获取Index的MapReduce，并执行为索引填充数据。

查询索引

hive> show index on bucket1;
OK
tbindex1            	bucket1             	name                	data1__bucket1_tbindex1__	compact             	
tbindex2            	bucket1             	id                  	bucket_tbindex1     	compact

但是没有数据需要重建索引（建立索引之后必须重建索引才能生效）
重建索引

>ALTER INDEX tbindex1 ON bucket1 REBUILD;
>ALTER INDEX tbindex2 ON bucket1 REBUILD;

[0]，[57]指的是数据的偏移量

hive> select * from data1__bucket1_tbindex1__;
OK
小明1	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[0]
小明2	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[57]
小明3	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[114]
小明4	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[171]
小明5	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[228]
小明6	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[280]
小明7	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[337]
小明8	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[388]
小明9	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[439]

hive> select * from bucket_tbindex1;
OK
1	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[0]
2	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[57]
3	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[114]
4	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[171]
5	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[228]
6	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[280]
7	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[337]
8	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[388]
9	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[439]

删除索引

DROP INDEX IF EXISTS tbindex1 ON bucket1;

weixin_46373361

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive-lateral view+视图+索引

hive-lateral view+视图+索引hive Lateral View视图索引hive Lateral ViewLateral View用于和UDTF函数（explode、split）结合来使用。首先通过UDTF函数拆分成多行，再将多行结果组合成一个支持别名的虚拟表。1 小明1 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}2 小明2 ["lol","book","movie"] {"beijing":"x
复制链接

扫一扫