hive-lateral view+视图+索引
hive Lateral View
Lateral View用于和UDTF函数(explode、split)结合来使用。
首先通过UDTF函数拆分成多行,再将多行结果组合成一个支持别名的虚拟表。
1 小明1 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
2 小明2 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
3 小明3 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
4 小明4 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
5 小明5 ["lol","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
6 小明6 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
7 小明7 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
8 小明8 ["lol","book"] {"beijing":"xisanqi","shanghai":"pudong"}
9 小明9 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
hive> select count(explode(hobby)) from bucket1;
FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
我们在自定义UDTF函数时候是不允许嵌套的,不可以添加其他字段使用,不可以和group by/cluster by/distribute by/sort by一起使用,否则会报错,这时候就可以通过lateral view来查询。
查询hobby共有多少种:
hive> select count(distinct(myCol1)) from bucket1
> lateral view explode(hobby) mytb1 as myCol1;
查出来结果是3
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.8 sec HDFS Read: 11399 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 800 msec
OK
3
Time taken: 42.456 seconds, Fetched: 1 row(s)
视图
• 不支持物化视图
• 只能查询,不能做加载数据操作
• 视图的创建,只是保存一份元数据,查询视图时才执行对应的子查询
• view定义中若包含了ORDER BY/LIMIT语句,当查询视图时也进行ORDER BY/LIMIT语句操作,view当中定义的优先级更高
view: order by age asc;
select order by age desc;
select * from view order by age desc;
view支持迭代视图
view1: select * from tb_user1;
view2: select * from view1;
view3: select * from view2;
语法:
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name
[(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ... ;
查询视图:
select colums from view;
删除视图:
DROP VIEW [IF EXISTS] [db_name.]view_name;
创建视图:
create view statistics (likenum, addrsnum) as
select count(distinct(myTable1.myCol1)), count(distinct(myTable2.myCol1)) from tb_user1
LATERAL VIEW explode(hobby) myTable1 AS myCol1
LATERAL VIEW explode(addrs) myTable2 AS myCol1, myCol3;
create view myv1 (name, age) as select name, age from tb_user2;
可以通过show tables查看视图名,hdfs:50070上并不会显示。
hive> show tables;
myv1
>show formatted myv1
# View Information
View Original Text: select name,age from tb_user2
View Expanded Text: SELECT `name` AS `name`, `age` AS `age` FROM (select `tb_user2`.`name`,`tb_user2`.`age` from `data1`.`tb_user2`) `data1.myv1`
View Rewrite Enabled: No
索引
目的:优化查询以及检索性能
>create index tbindex1 on table bucket1(name)
>as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild ;
as:指定索引器;
in table:指定索引表,若不指定默认生成在 (数据库名__表名_索引名__) data1__bucket1_tbindex1__表中
hive> show tables;
data1__bucket1_tbindex1__
create index tbindex2 on table bucket1(id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
with deferred rebuild
in table bucket_tbindex1;
hive> show tables;
bucket_tbindex1
with deferred rebuild表示在执行alter index xxx_index on xxx rebuild时将调用generateIndexBuildTaskList获取Index的MapReduce,并执行为索引填充数据。
查询索引
hive> show index on bucket1;
OK
tbindex1 bucket1 name data1__bucket1_tbindex1__ compact
tbindex2 bucket1 id bucket_tbindex1 compact
但是没有数据需要重建索引(建立索引之后必须重建索引才能生效)
重建索引
>ALTER INDEX tbindex1 ON bucket1 REBUILD;
>ALTER INDEX tbindex2 ON bucket1 REBUILD;
[0],[57]指的是数据的偏移量
hive> select * from data1__bucket1_tbindex1__;
OK
小明1 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [0]
小明2 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [57]
小明3 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [114]
小明4 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [171]
小明5 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [228]
小明6 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [280]
小明7 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [337]
小明8 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [388]
小明9 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [439]
hive> select * from bucket_tbindex1;
OK
1 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [0]
2 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [57]
3 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [114]
4 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [171]
5 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [228]
6 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [280]
7 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [337]
8 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [388]
9 hdfs://mycluster/user/hive/warehouse/data1.db/bucket1/word2.txt [439]
删除索引
DROP INDEX IF EXISTS tbindex1 ON bucket1;