Hive(七) Hive Lateral View、视图与索引

最新推荐文章于 2024-04-14 01:50:08 发布

plenilune-望月

最新推荐文章于 2024-04-14 01:50:08 发布

阅读量180

点赞数

分类专栏：数据仓库Hive

本文链接：https://blog.csdn.net/donglinjob/article/details/108767446

版权

数据仓库Hive 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

1 Hive Lateral View、视图与索引

1.1Hive Lateral View

Lateral View 用于和 UDTF 函数（explode、split）结合来使用。
首先通过 UDTF 函数拆分成多行，再将多行结果组合成一个支持别名的虚拟表。
主要解决在 select 使用 UDTF 做查询过程中，查询只能包含单个 UDTF，不能包含其他字段、以及多个 UDTF 的问题
语法：

LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)

hive> select explode(likes) from person;
OK
col
lol
book
movie
......
hive> select id,explode(likes) from person;
FAILED: SemanticException [Error 10081]: UDTF's are not supported
outside the SELECT clause, nor nested in expressions

select id,name,myCol1,myCol2,myCol3 from person
LATERAL VIEW explode(likes) myTable1 AS myCol1
LATERAL VIEW explode(address) myTable2 AS myCol2, myCol3;

例：

统计 person 表中共有多少种爱好、多少个城市?

select count(distinct(myCol1)), count(distinct(myCol2)) from person
LATERAL VIEW explode(likes) myTable1 AS myCol1
LATERAL VIEW explode(address) myTable2 AS myCol2, myCol3;

1.2Hive 视图

和关系型数据库中的普通视图一样，hive 也支持视图

特点：

不支持物化视图
只能查询，不能做加载数据操作
视图的创建，只是保存一份元数据，查询视图时才执行对应的子查询
view定义中若包含了ORDER BY/LIMIT语句，当查询视图时也进行ORDER BY/LIMIT 语句操作，view 当中定义的优先级更高
view 支持迭代视图

mysql 中支持视图删除：
CREATE VIEW v_users AS SELECT * FROM myusers;
DELETE FROM v_users WHERE id = '1316403900579872';

View 语法

创建视图：

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name
   [(column_name [COMMENT column_comment], ...) ]
   [COMMENT view_comment]
   [TBLPROPERTIES (property_name = property_value, ...)]
   AS SELECT ... ;

hive>create view v_psn as select * from person;
hive> show tables;
...
v_psn
...

查询视图：

select colums from view;

在对应元数据库中的 TBLS 中多出一条记录：

hive>create view v_psn2 as select * from person order by id desc;
hive> select * from v_psn2 order by id; #和视图排序一致一个 job
Query ID = root_20200302193830_80bcc248-fafc-44e7-b1b9-7cdbe0117e91
Total jobs = 2 #不一致两个 job
Launching Job 1 out of 2
number of mappers: 1; number of reducers: 1

order by 不建议使用，reduce 为 1 时，如果大量数据都需要加载到内存中进行排序，很可能将内存塞满。

删除视图：

DROP VIEW [IF EXISTS] [db_name.]view_name;

drop view v_psn;

1.3 Hive 索引

目的：优化查询以及检索性能

创建索引：

create index t1_index on table person(name)
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
with deferred rebuild
in table t1_index_table;

as：指定索引器；

in table：指定索引表，若不指定默认生成在 default__person_t1_index__表中

create index t1_index on table person2(name)
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
with deferred rebuild;

索引表中没有索引数据：

hive> select * from t1_index_table;
OK
t1_index_table.name t1_index_table._bucketname t1_index_table._offsets
Time taken: 0.074 seconds

查询索引

show index on person;

重建索引（建立索引之后必须重建索引才能生效）

ALTER INDEX t1_index ON person REBUILD;

重建完毕之后，再次查询有索引数据：select * from t1_index_table;

删除索引

DROP INDEX IF EXISTS t1_index ON person;

hive> select * from person where name='小明 1';
OK
person.id person.name person.likes person.address
1 小明 1 ["lol","book","movie"] {"beijing":"xisanqi","shanghai":"pudong"}
Time taken: 0.108 seconds, Fetched: 1 row(s)
hive> DROP INDEX IF EXISTS t1_index ON person;
OK
Time taken: 0.202 seconds
hive> select * from person where name='小明 1';
OK
person.id person.name person.likes person.address
1 小明 1 ["lol","book","movie"]
{"beijing":"xisanqi","shanghai":"pudong"}
Time taken: 0.081 seconds, Fetched: 1 row(s)

由于使用索引需要查询两张表，当数据量少的时候，效率反而低了。

plenilune-望月

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Hive(七) Hive Lateral View、视图与索引

1 Hive Lateral View、视图与索引1.1Hive Lateral ViewLateral View 用于和 UDTF 函数（explode、split）结合来使用。首先通过 UDTF 函数拆分成多行，再将多行结果组合成一个支持别名的虚拟表。主要解决在 select 使用 UDTF 做查询过程中，查询只能包含单个 UDTF，不能包含其他字段、以及多个 UDTF 的问题语法：LATERAL VIEW udtf(expression) tableAlias AS column
复制链接

扫一扫