Hadoop学习（三）——Hive学习1

最新推荐文章于 2022-03-23 18:06:09 发布

Remoa

最新推荐文章于 2022-03-23 18:06:09 发布

阅读量1.9k

点赞数 2

分类专栏： Hadoop 文章标签： Hadoop Hive UDTF Union LateralView

本文链接：https://blog.csdn.net/Remoa_Dengqinyi/article/details/78342870

版权

Hadoop 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

Hadoop学习（三）——Hive学习1

目录：

1、UDTFs(表生成函数)：

2、Lateral View(侧视图)：

3、Union和Union all：

4、Hive配置遇到的参数：

1、UDTFs(表生成函数)：

（1）基本介绍：

Built-in Table-Generating Functions：UDTF，表生成函数。

Normal user-defined functions, such as concat(), take in a single input row and output a single output row. In contrast, table-generating functions transform a single input row to multiple output rows.

译：正常的用户定义的函数，例如concat()，接受单个输入行以及输出单个输出行。相反的，表生成函数将单个输入行转换为多个输出行。

（2）注意事项：

Using the syntax "SELECT udtf(col) AS colAlias..." has a few limitations:

No other expressions are allowed in SELECT

SELECT pageid, explode(adid_list) AS myCol... is not supported

UDTF's can't be nested

SELECT explode(explode(adid_list)) AS myCol... is not supported

GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported

SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported

Please see LanguageManual LateralView for an alternative syntax that does not have these limitations.

Also see Writing UDTFs if you want to create a custom UDTF.

译：使用语法“SELECT udtf(col) AS colAlias...”有一些限制：

①SELECT中不允许使用其它表达式，如“SELECT pageid, explode(adid_list) AS myCol...”不被支持；

②UDTF不能嵌套，如“SELECT explode(explode(adid_list)) AS myCol... ”不被支持；

③不支持“GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY ”，如“SELECT explode(adid_list) AS myCol ... GROUP BY myCol”不被支持；

请参阅语言手册“横向视图”部分，以获得没有这些限制的替代语法。如果要创建自定义UDTF，也可以参阅“编写UDTFs”

（3）explode：

Hive内置的表生成函数，主要用于把一行输入拆成多行。

①explode(ARRAY<T> a)：Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array.

译：将一个数组展开为多行。返回一个带有单列(col)的行集，数组中每个元素都有一行。

②explode(MAP<Tkey,Tvalue> m)：Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.).

译：将一个映射展开为多行。返回一个带有两列(键，值)的行集，从输入映射中为每个键值对返回一行。（从Hive0.8.0起）。

（3）stack：

①stack(int r,T1 V1,...,Tn/r Vn)：Breaks up n values V1,...,Vn into r rows. Each row will have n/r columns. r must be constant.

译：将n个值V1,…,Vn分解为r行。每行将有n/r列，r必须是常数。

2、Lateral View(侧视图)：

（1）基本介绍：

Lateral View是Hive中提供给UDTF的连接词，它可以解决UDTF不能添加额外的select列的问题。Lateral View首先为原始表的每行调用UDTF，UDTF会把一行拆分成一行或多行，Lateral View再把结果组合，产生一个支持别名的虚拟表。

Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

译：侧视图与用户定义的表生成函数（如explode()）关联使用。如表生成函数所述，UDTF为每个输入行生成零个或多个输入行。侧视图首先将UDTF应用于基表的每一行，然后将它们的结果输出行连接到输入行，以形成拥有提供表别名的虚拟表。

（2）示例：

一张表有两个列：pageid STRING，adid_list Array<int>。

有两个示例字段：front_page [1, 2, 3]，contact_page [3, 4, 5]

执行Hive QL语句：

SELECT adid, count(1)

FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid

GROUP BY adid;

int adid	count(1)
1	1
2	1
3	2
4	1
5	1

（3）Multiple Lateral Views(多侧视图)：

A FROM clause can have multiple LATERAL VIEW clauses. Subsequent LATERAL VIEWS can reference columns from any of the tables appearing to the left of the LATERAL VIEW.

译：一个from字句可以有多个Lateral View字句。后续侧视图可以从左侧视图显示的任何表中引用列。

示例：

有一张基表有两个列：Array<int> col1，Array<string> col2。

有两个示例字段：[1, 2] ["a", "b", "c"]，[3, 4] ["d", "e", "f"]

①执行Hive QL语句：

SELECT myCol1, col2 FROM baseTable

LATERAL VIEW explode(col1) myTable1 AS myCol1;

int myCol1	Array<string> col2
1	["a", "b", "c"]
2	["a", "b", "c"]
3	["d", "e", "f"]
4	["d", "e", "f"]

②执行Hive QL语句：

SELECT myCol1, myCol2 FROM baseTable

LATERAL VIEW explode(col1) myTable1 AS myCol1

LATERAL VIEW explode(col2) myTable2 AS myCol2;

int myCol1	string myCol2
1	"a"
1	"b"
1	"c"
2	"a"
2	"b"
2	"c"
3	"d"
3	"e"
3	"f"
4	"d"
4	"e"
4	"f"

（4）Outer Lateral Views(外侧视图)：

The user can specify the optional OUTER keyword to generate rows even when a LATERAL VIEW usually would not generate a row. This happens when the UDTF used does not generate any rows which happens easily with explode when the column to explode is empty. In this case the source row would never appear in the results. OUTER can be used to prevent that and rows will be generated with NULL values in the columns coming from the UDTF.

译：用户可以指定可选的Outer关键词来生成行，即使当一个侧视图通常不会生成一行时。当发生所使用的UDTF展开为空的情况时，使用UDTF不会容易生成展开行为空的情况。在这种情况下，源行不会出现在结果中。可以使用Outer来使得在UDTF的列中将空的部分使用NULL值来生成行。

示例：

执行Hive QL语句：

SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;

238 val_238 NULL

86 val_86 NULL

311 val_311 NULL

27 val_27 NULL

165 val_165 NULL

409 val_409 NULL

255 val_255 NULL

278 val_278 NULL