hive-lateral view+视图+索引

8 篇文章 0 订阅

hive-lateral view+视图+索引

hive Lateral View

Lateral View用于和UDTF函数(explode、split)结合来使用。
首先通过UDTF函数拆分成多行,再将多行结果组合成一个支持别名的虚拟表。

1	小明1	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
2	小明2	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
3	小明3	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
4	小明4	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
5	小明5	["lol","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
6	小明6	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
7	小明7	["lol","book"]	{"beijing":"xisanqi","shanghai":"pudong"}
8	小明8	["lol","book"]	{"beijing":"xisanqi","shanghai":"pudong"}
9	小明9	["lol","book","movie"]	{"beijing":"xisanqi","shanghai":"pudong"}
hive> select count(explode(hobby)) from bucket1;
FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

我们在自定义UDTF函数时候是不允许嵌套的,不可以添加其他字段使用,不可以和group by/cluster by/distribute by/sort by一起使用,否则会报错,这时候就可以通过lateral view来查询。
查询hobby共有多少种:

hive> select count(distinct(myCol1)) from bucket1
    > lateral view explode(hobby) mytb1 as myCol1;

查出来结果是3

MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.8 sec   HDFS Read: 11399 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 800 msec
OK
3
Time taken: 42.456 seconds, Fetched: 1 row(s)

视图

• 不支持物化视图
• 只能查询,不能做加载数据操作
• 视图的创建,只是保存一份元数据,查询视图时才执行对应的子查询
• view定义中若包含了ORDER BY/LIMIT语句,当查询视图时也进行ORDER BY/LIMIT语句操作,view当中定义的优先级更高

view: order by age asc;
select order by age desc;
select * from view order by age desc;

view支持迭代视图

view1: select * from tb_user1;
view2: select * from view1;
view3: select * from view2;

语法:

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name 
  [(column_name [COMMENT column_comment], ...) ]
  [COMMENT view_comment]
  [TBLPROPERTIES (property_name = property_value, ...)]
  AS SELECT ... ;

查询视图:

select colums from view;

删除视图:

DROP VIEW [IF EXISTS] [db_name.]view_name;

创建视图:

create view statistics (likenum, addrsnum) as
select count(distinct(myTable1.myCol1)), count(distinct(myTable2.myCol1)) from tb_user1
LATERAL VIEW explode(hobby) myTable1 AS myCol1 
LATERAL VIEW explode(addrs) myTable2 AS myCol1, myCol3;
create view myv1 (name, age) as select name, age from tb_user2;

可以通过show tables查看视图名,hdfs:50070上并不会显示。

hive> show tables;
myv1
>show formatted myv1
# View Information	 	 
View Original Text: 	select name,age from tb_user2	 
View Expanded Text: 	SELECT `name` AS `name`, `age` AS `age` FROM (select `tb_user2`.`name`,`tb_user2`.`age` from `data1`.`tb_user2`) `data1.myv1`	 
View Rewrite Enabled:	No 

索引

目的:优化查询以及检索性能

>create index tbindex1 on table bucket1(name)
>as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild ;

as:指定索引器;
in table:指定索引表,若不指定默认生成在 (数据库名__表名_索引名__) data1__bucket1_tbindex1__表中

hive> show tables;
data1__bucket1_tbindex1__
create index tbindex2 on table bucket1(id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' 
with deferred rebuild 
in table bucket_tbindex1;
hive> show tables;
bucket_tbindex1

with deferred rebuild表示在执行alter index xxx_index on xxx rebuild时将调用generateIndexBuildTaskList获取Index的MapReduce,并执行为索引填充数据。

查询索引

hive> show index on bucket1;
OK
tbindex1            	bucket1             	name                	data1__bucket1_tbindex1__	compact             	
tbindex2            	bucket1             	id                  	bucket_tbindex1     	compact

但是没有数据需要重建索引(建立索引之后必须重建索引才能生效)
重建索引

>ALTER INDEX tbindex1 ON bucket1 REBUILD;
>ALTER INDEX tbindex2 ON bucket1 REBUILD;

[0],[57]指的是数据的偏移量

hive> select * from data1__bucket1_tbindex1__;
OK
小明1	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[0]
小明2	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[57]
小明3	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[114]
小明4	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[171]
小明5	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[228]
小明6	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[280]
小明7	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[337]
小明8	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[388]
小明9	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[439]
hive> select * from bucket_tbindex1;
OK
1	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[0]
2	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[57]
3	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[114]
4	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[171]
5	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[228]
6	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[280]
7	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[337]
8	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[388]
9	hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt	[439]

删除索引

DROP INDEX IF EXISTS tbindex1 ON bucket1;
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值