2021-02-06 大数据课程笔记 day17

最新推荐文章于 2021-09-09 18:04:19 发布

Rich Dad

最新推荐文章于 2021-09-09 18:04:19 发布

阅读量494

点赞数

分类专栏：西行日记文章标签：大数据 hadoop 数据库 hive linux

I love 段奥娟

本文链接：https://blog.csdn.net/qq_44745905/article/details/113720967

版权

本文详细介绍了Hive的Lateral View、视图、索引的使用，以及Hive的不同运行方式，包括命令行、脚本和Web GUI接口。文章还深入探讨了Hive的权限管理和优化技巧，如Fetch抓取、本地运行模式、并行模式等，并提供了表优化策略，如避免笛卡尔积、合理设置Map和Reduce数量。此外，文章还提到了Hive的高可用性设置。

摘要由CSDN通过智能技术生成

时间煮雨
@R星校长

Hive Lateral View、视图与索引

Hive Lateral View

Lateral View 用于和 UDTF 函数（explode、split）结合来使用。
首先通过 UDTF 函数拆分成多行，再将多行结果组合成一个支持别名的虚拟表。
主要解决在 select 使用 UDTF 做查询过程中，查询只能包含单个 UDTF，不能包含其他字段、以及多个 UDTF 的问题
语法：

LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)

hive> select explode(likes) from person;
OK
col
lol
book
movie
......
hive> select id,explode(likes) from person;
FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

select id,name,myCol1,myCol2,myCol3 from person
LATERAL VIEW explode(likes) myTable1 AS myCol1 
LATERAL VIEW explode(address) myTable2 AS myCol2, myCol3;

例：
统计 person 表中共有多少种爱好、多少个城市?

select count(distinct(myCol1)), count(distinct(myCol2)) from person
LATERAL VIEW explode(likes) myTable1 AS myCol1 
LATERAL VIEW explode(address) myTable2 AS myCol2, myCol3;

Hive视图

在这里插入图片描述
和关系型数据库中的普通视图一样，hive 也支持视图
特点：

不支持物化视图
只能查询，不能做加载数据操作
视图的创建，只是保存一份元数据，查询视图时才执行对应的子查询
view 定义中若包含了 ORDER BY/LIMIT 语句，当查询视图时也进行 ORDER BY/LIMIT 语句操作，view 当中定义的优先级更高
view 支持迭代视图

mysql 中支持视图删除：
CREATE VIEW v_users AS SELECT * FROM myusers;
DELETE FROM v_users WHERE id = '1316403900579872';

View 语法
创建视图：

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name 
  [(column_name [COMMENT column_comment], ...) ]
  [COMMENT view_comment]
  [TBLPROPERTIES (property_name = property_value, ...)]
  AS SELECT ... ;

hive>create view v_psn as select * from person;
hive> show tables;
...
v_psn
...

查询视图：

select colums from view;

在对应元数据库中的 TBLS 中多出一条记录：在这里插入图片描述

hive>create view v_psn2 as select * from person order by id desc;
hive> select * from v_psn2 order by id;  #和视图排序一致一个job
Query ID = root_20200302193830_80bcc248-fafc-44e7-b1b9-7cdbe0117e91
Total jobs = 2 #不一致两个job
Launching Job 1 out of 2
 number of mappers: 1; number of reducers: 1

order by 不建议使用，reduce 为 1 时，如果大量数据都需要加载到内存中进行排序，很可能将内存塞满。

删除视图：

DROP VIEW [IF EXISTS] [db_name.]view_name;

drop view v_psn;

Hive 索引

目的：优化查询以及检索性能
创建索引：

create index t1_index on table person(name) 
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild 
in table t1_index_table;

as：指定索引器；
in table：指定索引表，若不指定默认生成在 default__person_t1_index__ 表中

create index t1_index on table person2(name) 
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild;

索引表中没有索引数据：

hive> select * from t1_index_table;
OK
t1_index_table.name	t1_index_table._bucketname	t1_index_table._offsets
Time taken: 0.074 seconds

查询索引
show index on person;
重建索引（建立索引之后必须重建索引才能生效）
ALTER INDEX t1_index ON person REBUILD;
重建完毕之后，再次查询有索引数据：select * from t1_index_table;
删除索引
DROP INDEX IF EXISTS t1_index ON person;

hive> select * from person where name='小明1';
OK
person.id	person.name	person.likes	person.address
1	小明1	["lol","book","movie"]	{
  "beijing":"xisanqi","shanghai":"pudong"}
Time taken: 0.108 seconds, Fetched: 1 row(s)
hive> DROP INDEX IF EXISTS t1_index ON person;
OK
Time taken: 0.202 seconds
hive> select * from person where name='小明1';
OK
person.id	person.name	person.likes	person.address
1	小明1	["lol","book","movie"]	{
  "beijing":"xisanqi","shanghai":"pudong"}
Time taken: 0.081 seconds, Fetched: 1 row(s)

由于使用索引需要查询两张表，当数据量少的时候，效率反而低了。

Hive 运行方式

命令行方式 cli：控制台模式！！
脚本运行方式（实际生产环境中用最多）！！！
JDBC方式：hiveserver2 ！！！
web GUI接口（hwi、hue等）

命令行方式 cli：控制台模式

与 hdfs 交互：（了解）
执行执行 dfs 命令

hive>dfs –ls /；
hive>dfs -cat /user/hive_remote/warehouse/person/person01.txt；

与Linux交互
！开头

!pwd

最低0.47元/天解锁文章

Rich Dad

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录