[Hive] 08 - 内建操作符、函数（UDF）

最新推荐文章于 2023-03-03 15:10:44 发布

神是念着倒

最新推荐文章于 2023-03-03 15:10:44 发布

阅读量1.1k

点赞数 2

分类专栏： Hive 文章标签：操作符函数 Hive 聚合函数表生成函数

本文链接：https://blog.csdn.net/weixin_38256474/article/details/92651687

版权

Hive 专栏收录该内容

12 篇文章 2 订阅

订阅专栏

环境

宿主机：Windows 10 64_bit
虚拟机：VMware pro 12
- CentOS 7.5 64_bit（3台：1个master、2个slave）
- Hadoop-2.6.5
- MariaDB-5.5.60
- Hive 1.2.2
ssh工具：SecureCRT 7.3

0、内置运算符（Built-in Operators）

1、内置函数（Built-in Functions）

----|----1.7.0 xpath

----|----1.7.1 get_json_object

2、内置聚合函数（UDAF，Built-in Aggregate Functions）

3、内置表生成函数（UDTF，Built-in Table-Generating Functions）

----3.0 使用示例

----3.1 explode

----3.2 posexplode

----3.3 json_tuple

----3.4 parse_url_tuple

4、其他

----

正文

Case-insensitive，不区分大小写。所有的Hive关键字都是不区分大小写的，也包括Hive的操作符和函数的名称。

在Beeline或CLI中，使用使用下方的命令去显示最近的文档：

show functions;

describe function functionName;

describe function extended functionName;

当UDF嵌套在UDF或函数中时，对于表达式缓存会有bug：

当hive.cache.expr.evaluation=true时，（默认为true），假如一个UDF嵌套在另一个UDF或一个Hive函数中，这个UDF会给出不正确的结果。这个bug影响了0.12.0、0.13.0、0.13.1版本，不过在0.14.0修复了；
这个问题跟getDisplayString方法的UDF实现有关。

0、内置运算符

0.0 关系运算符

下方的操作符将会比较传递过来的操作数、并生成一个true或false值。

操作符	操作数类型	描述
`a=b`	所有的基本类型	假如表达式a等于表达式b，则为`true`；否则`false`
`a==b`	所有的基本类型	`=`操作符的同义词
`a<=>b`	所有的基本类型	对于非空操作数（`no-null`），返回跟`=`操作符相同的结果，但是假如两个都是`null`则返回`true`，假如它们中其中一个是`null`则返回`false`。`Hive 0.9.0`版本
`a<>b`	所有的基本类型	假如a或b是`null`，则为`null`；假如表达式a不等于表达式b，那么则是`true`，否则为`false`。
`a!=b`	所有的基本类型	`<>`操作符的同义词
`a<b`	所有的基本类型	若a或b是`null`，则为`null`；若表达式a小于表达式b，则为`true`；否则为`false`。
`a<=b`	所有的基本类型	若a或b是`null`，则为`null`；若表达式a小于等于表达式b，则为`true`；否则为`false`。
`a>b`	所有的基本类型	若a或b是`null`，则为`null`；若表达式a大于表达式b，则为`true`；否则为`false`。
`a>=b`	所有的基本类型	若a或b是`null`，则为`null`；若表达式a大于等于表达式b，则为`true`；否则为`false`。可以通过`not`关键字来反转。`Hive 0.9.0`版本
`a [not] between b and c`	所有的基本类型	假如`a`、`b`、或`c`是`null`，则为`null`；若`a`大于等于`b`且小于等于`c`，则为`true`；否则为`false`。`Hive 0.9.0`版本
`a is null`	所有类型	若表达式a计算结果为`null`，则为`true`；否则为`false`。
`a is not null`	所有类型	若表达式a计算结果为`null`，则为`false`；否则为`true`。
`a is [not] (true\|false)`	布尔类型	仅当满足条件时，则为`true`。`Hive 3.0.0+`版本注意：`null`是`unknown`，都会是`false`。
`a [not] like b`	字符串	假如a或b是`null`，则为`null`；假如字符串a跟SQL简单的正则表达式b匹配的话，则为`true`；否则为`false`。比较是按字符进行的。b中的`_`可以匹配a中的任何字符；而b中的`%`可以匹配a中任意个字符。例如：`foobar`跟`foo`计算结果为`false`；而`foobar`跟`foo_ _ _`计算结果为`true`；`foobar`像`foo%`。
`a rlike b`	字符串	假如a或b是`null`，则为`null`；假如a的任何子串（可能是空）匹配Java正则表达式b，则为`true`；否则为`false`。例如：`foobar` rlike `foo`将结算为`true`，而且`foobar` rlike `^f.*r$`
`a regexp b`	字符串	跟rlike一样

0.1 算术运算符

下方操作符支持对操作数的各种常见算术运算。所有都会返回数字类型；假如任何操作数都是null，那么结果也是null。

运算符	操作数类型	描述
`a+b`	所有的数字类型	得到a加上b的结果。结果的类型跟操作数的公共父级是相同的。
`a-b`	所有的数字类型	得到a减b的结果。
`a*b`	所有的数字类型	得到a乘b的结果。
`a/b`	所有的数字类型	得到a除以b的结果。
`a div b`	整数类型	取整。例如：`17 div 3`结果为`5`
`a % b`	所有的数字类型	取余
`a & b`	所有的数字类型	按位取与
`a \| b`	所有的数字类型	按位取或
`a ^ b`	所有的数字类型	按位取异或
`~a`	所有的数字类型	按位取反

0.2 逻辑运算符

下方运算符支持创建逻辑表达式。它们所有都返回true、false、或null，这取决于操作数的布尔值。null表示一个unknown标志，所以假如结果取决于unknown标志，那么结果本身就是unknown。

操作符	操作数类型	描述
`a and b`	布尔类型	假如a和b都是`true`，那么结果为`true`，否则为`false`。假如a或b是`null`，则结果是`null`。
`a or b`	布尔类型	假如a或b、或a和b都是`true`，那么结果为`true`；`false`或`null`则为`null`，否则为`false`。
`not a`	布尔类型	假如a是`false` 或`null`，则结果为`true`；否则为`false`。
`!a`	布尔类型	和`not a`一样
`a in (val1, val2, ...)`	布尔类型	假如a等于其中任一个值，则为`true`。`Hive 0.13+`
`a not in (val1, val2, ...)`	布尔类型	假如a不等于其中任一个值，则为`true`。`Hive 0.13+`
`[not] exists (subquery)`		假如子查询至少返回一行，则为`true`。`Hive 0.13+`

0.3 字符串操作符

操作符	操作数类型	描述
`a \|\| b`	字符串	连接操作数（`concat(a ,b)`的简写）`Hive 2.2.0+`

0.4 针对复杂类型的操作符

复杂类型构造函数

构造函数	操作数	描述
`map`	`(key1, value1, key2, value2, ...)`	根据给定的键值对创建一个map。
`struct`	`(val1, val2, va3, ...)`	根据给定的字段值创建一个struct。字段名将是`col1`，`col2`，…
`named_struct`	`(name1, val1, name2, val2, ...)`	根据给定的字段名称和值创建一个struct。`Hive 0.8.0+`
`array`	`(val1, val2, ...)`	根据给定的元素创建一个array。
`create_union`	`(tag, val1, val2, ...)`	根据tag参数指向的值创建一个union类型

下方操作符提供了访问复杂类型中元素的机制：

操作符	操作数类型	描述
`a[n]`	a是一个array类型，n是一个int类型	返回数组A中的第n个元素。第一个元素索引是0。
`m[key]`	m是一个`map<k, v>`，key有k类型	返回map中key对的值。
`s.x`	s是一个struct类型	返回s的x字段。

1、内置函数

在这里插入图片描述

1.0 数学函数

Hive支持下方内置数学函数；当参数为null时，大部分会返回null。

返回类型	名称	描述
`double`	`round(double a)`	返回对a四舍五入的bigint值
`double`	`round(double a, int d)`	返回a的四舍五入、并精确到d位的值
`double`	`bround(double a)`	返回a使用了`half_even`舍入法模式的bigint值。`Hive 1.3.0+`和`2.0.0+`。也称为高斯或银行家舍入法。例如：`bround(2.5)=2`、`bround(3.5)=4`
`double`	`round(double a, int d)`	指定精度为d位的银行家舍入法`Hive 1.3.0+`和`2.0.0+`。例如：`bround(8.25, 1)=8.2`、`bround(8.35, 1)=8.4`
`bigint`	`floor(double a)`	向下取整，返回等于或小于a的最大整数
`bigint`	`ceil(double a)`、`ceiling(double a)`	向上取整，返回大于或等于a的最小整数
`double`	`rand()`、`rand(int seed)`	返回一个0到1内的随机数。如果指定种子seed，则会取得一个稳定的随机数序列
`double`	`exp(double a)`、`exp(decimal a)`	自然指数，返回自然对数e的a次方。decimal是在`Hive 0.13.0`引入
`double`	`in(double a)`、`in(decimal a)`	返回以自然数为底的对数，a可以是小数，decimal是在`Hive 0.13.0`引入
`double`	`log10(double a)`、`log10(decimal a)`	返回以10为底的a的对数，decimal是在`Hive 0.13.0`引入
`double`	`log2(double a)`、`log2(decimal a)`	返回以2为底的a的对数，decimal是在`Hive 0.13.0`引入
`double`	`log(double base, double a)`、`log(decimal base, decimal a)`	返回以base为底的a的对数，decimal是在`Hive 0.13.0`引入
`double`	`pow(double a, double p)`、`power(double a, double p)`	返回a的p次幂
`double`	`sqrt(double a)`、 `sqrt(decimal a)`	返回a的平方根，decimal是在`Hive 0.13.0`引入
`string`	`bin(bigint a)`	返回a的二进制码表示
`string`	`hex(bigint a)`、`hex(string a)`、`hex(binary a)`	十六进制函数。如果变量是int类型，那么返回a的十六进制表示；如果变量是string类型，则返回该字符串的十六进制表示，等等。binary是在`Hive 0.12.0`引入
`binary`	`unhex(string a)`	反转十六进制函数。返回该十六进制字符串所代码的字符串。binary是在`Hive 0.12.0`引入
`string`	`conv(bigint num, int from_base, int to_base)`、`conv(string num, int from_base, int to_base)`	进制转换函数，将数值num从from_base进制转化到to_base进制
`double`	`abs(double a)`	绝对值函数，返回a的绝对值
`int or double`	`pmod(int a, int b)`、`pmod(double a, double b)`	取余函数，返回正的a除以b的余数
`double`	`sin(double a)`、`sin(decimal a)`	正弦函数，返回a的正弦值，decimal是在`Hive 0.13.0`引入
`double`	`asin(double a)`、`asin(decimal a)`	反正弦函数，返回a的反正弦值，decimal是在`Hive 0.13.0`引入
`double`	`cos(double a)`、`cos(decimal a)`	余弦函数，返回a的余弦值，decimal是在`Hive 0.13.0`引入
`double`	`acos(double a)`、`acos(decimal a)`	反余弦函数，返回a的反余弦值，decimal是在`Hive 0.13.0`引入
`double`	`tan(double a)`、`tan(decimal a)`	正切函数，返回a的正切值，decimal是在`Hive 0.13.0`引入
`double`	`atan(double a)`、`atan(decimal a)`	反正切函数，返回a的反正切值，decimal是在`Hive 0.13.0`引入
`double`	`degrees(double a)`、`degrees(decimal a)`	弧度值转换角度值函数，返回弧度a的角度值，decimal是在`Hive 0.13.0`引入
`double`	`radians(double a)`、`radians(decimal a)`	角度值转换成弧度值函数，返回角度a的弧度值，decimal是在`Hive 0.13.0`引入
`int or double`	`positive(int a)`、`positive(double a)`	返回a
`int or double`	`negative(int a)`、`negative(double a)`	返回a的相反数
`int or double`	`sign(int a)`、`sign(double a)`	判断数值是正数，0或负数。如果a是正数则返回1.0，是负数则返回-1.0，否则返回0.0。decimal是在`Hive 0.13.0`引入
`double`	`e()`	返回数学常数e
`double`	`pi()`	返回数学常数pi
`bigint`	`factorial(int a)`	返回a的阶乘，a的有效值为0-20。`Hive 1.2.0+`
`double`	`cbrt(double a)`	返回a的立方根。`Hive 1.2.0+`
`int or bigint`	`shiftleft(tinyint\|smallint\|int a, int b)`、`shiftleft(bigint a, int b)`	返回a按位左移b位。`Hive 1.2.0+`
`int or bigint`	`shifright(tinyint\|smallint\|int a, int b)`、`shiftright(bigint a, int b)`	回a按位右移b位。`Hive 1.2.0+`
`int or bigint`	`shiftrightunsigned(tinyint\|smallint\|int a, int b)`、`shiftrightunsigned(bigint a, int b)`	无符号按位右移（<<<），返回a按位右移b位。`Hive 1.2.0+`
`T`	`greatest(T v1, T v2, ...)`	返回值列表的最大值。`Hive 1.1.0+`
`T`	`least(T v1, T v2, ...)`	返回值列表的最小值。`Hive 1.1.0+`
`int`	`width_bucket(numeric expr, numeric min_value, numeric max_value, INT num_buckets)`	`Hive 3.0.0+`

1.1 集合函数

返回类型	名称	描述
`int`	`size(Map<K.V>)`	返回map的元素数量
`int`	`size(Array<T>)`	返回array的元素数量
`array<K>`	`map_keys(Map<K.V>)`	返回一个包含输入map的key的无序map
`array<V>`	`map_values(Map<K.V>)`	返回一个包含输入map的value的无序map
`boolean`	`array_contains(Array<T>, value)`	假如array包含value则返回true
`array<t>`	`sort_array(Array<T>)`	按照array元素的自然顺序对输入array按升序排序并返回。`Hive 0.9.0+`

实例：

hive> select 11 % 2;
OK
1
Time taken: 1.364 seconds, Fetched: 1 row(s)
hive> select ceil(28.0/6.999999999999999999999);
OK
4
Time taken: 0.136 seconds, Fetched: 1 row(s)
hive> select round(6.8 % 2, 2);
OK
0.8
Time taken: 0.17 seconds, Fetched: 1 row(s)

1.2 类型转换函数

返回类型	名称	描述
`binary`	`binary(string\|binary)`	将参数强制转换为二进制
type	`cast(expr as <type>)`	将表达式expr的结果转换为type类型。例如：`cast('1' as BIGINT)`，将把字符串`1`转换为整数1。假如转换失败，则返回`null`。对于一个非空字符串，`cast(expr as boolean)`将返回`true`。

1.3 日期函数

返回类型	名称	描述
`string`	`from_unixtime(bigint unixtime[,string format])`	UNIX时间戳转日期函数`from_unixtime`。即转化UNIX时间戳（从`1970-01-01 00:00:00 UTC`到指定时间的秒数）到当前时区的时间格式
`bigint`	`unix_timestamp()`	获取当前UNIX时间戳函数`unix_timestamp`。即获得当前时区的UNIX时间戳
`bigint`	`unix_timestamp(string date)`	日期转UNIX时间戳函数`unix_timestamp`。转换格式为`"yyyy-MM-dd HH:mm:ss"`的日期到UNIX时间戳。如果转化失败，则返回0。
`bigint`	`unix_timestamp(string date, string pattern)`	指定格式日期转UNIX时间戳函数`unix_timestamp`。转换pattern格式的日期到UNIX时间戳。如果转化失败，则返回0。
`Hive 2.1.0`之前是：`string`；`2.1.0`之后是：`date`	`to_date(string timestamp)`	日期时间转日期函数`to_date`。返回日期时间字段中的日期部分。
`int`	`year(string date)`	日期转年函数`year`。返回日期中的年。
`int`	`quarter(date/timestamp/string)`	`Hive 1.3.0+`。返回日期、时间戳或字符串在1到4范围内的一个季度
`int`	`month(string date)`	日期转月函数`month`。返回日期中的月份。
`int`	`day(string date) dayofmonth(date)`	日期转天函数`day`。返回日期中的天
`int`	`hour(string date)`	日期转小时函数`hour`。返回日期中的小时
`int`	`minute(string date)`	日期转分钟函数`minute`。返回日期中的分钟
`int`	`second(string date)`	日期转秒函数`second`。返回日期中的秒
`int`	`weekofyear(string date)`	日期转周函数`weekofyear`。返回日期在当前的周数
`int`	`extract(field from source)`	`Hive 2.2.0+`。从源检索像天、小时等字段。源必须是一个日期、时间戳、或者可以转换为日期或时间戳的间隔或字符串。支持的字段包括：日、日、时、分、月、季度、秒、周、年。
`int`	`datediff(string enddate, string startdate)`	日期比较函数`datediff`。返回结束日期减去开始日期的天数
`Hive 2.1.0`之前是：`string`；`2.1.0`之后是：`date`	`date_add(date/timestamp/string startdate, tinyint/smallint/int days)`	日期增加函数`date_add`。返回开始日期startdate增加days天后的日期
`Hive 2.1.0`之前是：`string`；`2.1.0`之后是：`date`	`date_sub(date/timestamp/string startdate, tinyint/smallint/int days)`	日期减少函数`date_sub`。返回开始日期startdate减少days天后的日期
`timestamp`	`from_utc_timestamp({any primitive type} ts, string timezone)`	`Hive 0.8.0+`。将UTC中的时间戳转换为给定时区
`timestamp`	`to_utc_timestamp({any primitive type} ts, string timezone)`	`Hive 0.8.0+`。将给定时区中的时间戳转换为UTC
`date`	`current_date`	`Hive 1.2.0+`。返回查询计算开始时的当前日期
`timestamp`	`current_timestamp`	`Hive 1.2.0+`。返回查询计算开始时的当前时间戳
`string`	`add_months(string start_date, int num_months, output_date_format)`	按指定格式返回指定日期增加几个月后的日期
`string`	`last_day(string date)`	返回月份中的最后一天
`string`	`next_day(string start_date, string day_of_week)`	返回指定日期下周的指定周几
`string`	`trunc(string date, string format)`	返回日期date月份的第一天/年中的第一天日期
`double`	`months_between(date1, date2)`	返回date1到date2之间的月数
`string`	`date_format(date/timestamp/string ts, string fmt)`	返回指定日期格式

1.4 条件函数

返回类型	名称	描述
`T`	`if(boolean testCondition, T valueTrue, T valueFalseOrNull)`	当testCondition为`true`时，返回valueTrue；否则返回`alueFalseOrNull`
`T`	``
`boolean`	`isnull( a )`	假如a是`null`，则返回`true`；否则`false`
`boolean`	`isnotnull ( a )`	假如a是`null`，则返回`false`；否则`true`
`T`	`nvl(T value, T default_value)`	假如value是`null`，则返回默认值；否则返回value。`HIve 0.11+`
`T`	`coalesce(T v1, T v2, ...)`	非`null`则返回第一个v；假如所有v是`null`则返回`null`
`T`	`case a when b then c [when d then e]* [else f] end`	当a=b，则返回c；当a=d，则返回e；其他返回f
`T`	`case when a then b [when c then d]* [else e] end`	当a=true，则返回b；当c=true，则返回d；其他返回e
`T`	`nullif( a, b )`	假如a=b，则返回null；其他返回a。`Hive 2.3.0+`
`void`	`assert_true(boolean condition)`	假如condition不为`true`，则抛出异常，否则返回null。`Hive 0.8.0+`

实例：

hive (test)> select from_unixtime(unix_timestamp(), 'yyyyMMdd');
OK
20190619

1.5 字符串函数

返回类型	名称	描述
`int`	`ascii(string str)`	返回字符串str第一个字符的ascii码
`string`	`base64(binary bin)`	将参数从二进制转换为base64字符串
`int`	`character_length(string str)`	`Hive 2.2.0+`。返回str中包含的utf-8字符数。`char_length`是这个函数的简写
`string`	`chr(bigint\|double A)`	返回等价于a的二进制的ascii码字符。`Hive 1.3.0+`、`Hive 2.1.0+`
`string`	`concat(string\|binary A, string\|binary B...)`	返回输入字符串连接后的结果，支持任意个输入字符串
`string`	`concat_ws(string SEP, array<string>)`
`string`	`concat_ws(string SEP, string A, string B...)`	返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符
`array<struct<string,double>>`	`context_ngrams(array<array<string>>, array<string>, int K, int pf)`
`string`	`decode(binary bin, string charset)`	`Hive 0.12.0+`。使用提供的字符集（`'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'`）将第一个参数解码为一个字符串。假如参数为`null`，那么结果也是`null`。
`binary`	`encode(string src, string charset)`	`Hive 0.12.0+`。使用提供的字符集（`'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'`）将第一个参数编码为一个二进制
`string`	`elt(N int,str1 string,str2 string,str3 string,...)`	返回指定索引号的字符串。例如：`elt(2,'hello','world')`返回`world`。假如N小于1或大于索引号数字，则返回`null`
`int`	`field(val T,val1 T,val2 T,val3 T,...)`	返回val的索引。若找不到则返回0。
`int`	`find_in_set(string str, string strList)`	返回str在strlist第一次出现的位置，strlist是用逗号分割的字符串。如果没有找该str字符，则返回0
`string`	`format_number(number x, int d)`	将数值X转换成"#,###,###.##"格式字符串，并保留d位小数，如果d为0，将进行四舍五入且不保留小数
`string`	`get_json_object(string json_string, string path)`	解析json的字符串json_string，返回path指定的内容。如果输入的json字符串无效，那么返回NULL。
`boolean`	`in_file(string str, string filename)`	假如str以整行出现在文件中，则返回`true`。
`string`	`initcap(string A)`	`Hive 1.1.0+`。返回字符串，每个单词的第一个字母为大写，所有其他字母为小写。单词由空格分隔。
`int`	`instr(string str, string substr)`	查找字符串str中子字符串substr出现的位置，如果查找失败将返回0，如果任一参数为Null将返回null，注意位置为从1开始的
`int`	`length(string A)`	返回字符串A的长度
`int`	`locate(string substr, string str[, int pos])`	查找字符串str中子字符串substr出现的位置，如果查找失败将返回0，如果任一参数为Null将返回null，注意位置为从1开始的
`string`	`lower(string A) lcase(string A)`	返回字符串A的小写格式
`string`	`lpad(string str, int len, string pad)`	将str进行用pad进行左补足到len位
`string`	`ltrim(string A)`	去除字符串左边的空格
`array<struct<string,double>>`	`ngrams(array<array<string>>, int N, int K, int pf)`
`int`	`octet_length(string str)`
`string`	`parse_url(string urlString, string partToExtract [, string keyToExtract])`	URL解析函数，返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO
`string`	`printf(String format, Obj... args)`
`string`	`quote(String text)`
`string`	`regexp_extract(string subject, string pattern, int index)`	正则表达式解析函数，将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符
`string`	`regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)`	正则表达式替换函数，将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符,类似oracle中的regexp_replace函数。
`string`	`repeat(string str, int n)`	返回重复n次后的str字符串
`string`	`replace(string A, string OLD, string NEW)`
`string`	`reverse(string A)`	返回字符串A的反转结果
`string`	`rpad(string str, int len, string pad)`	将str进行用pad进行右补足到len位
`string`	`rtrim(string A)`	去除字符串右边的空格
`array<array<string>>`	`sentences(string str, string lang, string locale)`
`string`	`soundex(string A)`
`array`	`space(int n)`	返回长度为n的空字符串
`array`	`split(string str, string pat)`	按照pat字符串分割str，会返回分割后的字符串数组
`map<string,string>`	`str_to_map(text[, delimiter1, delimiter2])`
`string`	`substr(string\|binary A, int start) substring(string\|binary A, int start)`	返回字符串A从start位置到结尾的字符串
`string`	`substr(string\|binary A, int start, int len) substring(string\|binary A, int start, int len)`	返回字符串A从start位置开始，长度为len的字符串
`string`	`substring_index(string A, string delim, int count)`
`string`	`translate(string\|char\|varchar input, string\|char\|varchar from, string\|char\|varchar to)`
`string`	`trim(string A)`	去除字符串两边的空格
`binary`	`unbase64(string str)`
`string`	`upper(string A) ucase(string A)`	返回字符串A的大写格式

实例：

hive (test)> select elt(2,'hello','world');
OK
world
Time taken: 0.134 seconds, Fetched: 1 row(s)
hive (test)> select elt(3,'hello','world');
OK
NULL

1.6 数据屏蔽函数

返回类型	名称	描述
`string`	`mask(string str[, string upper[, string lower[, string number]]])`	`Hive 2.1.0+`。返回str的屏蔽版本。默认情况下，大写字母转换为`X`，小写字母转换为`x`，数字转换为`n`。例如：`mask("abcd-EFGH-8765-4321")`将返回`xxxx-XXXX-nnnn-nnnn`
`string`	`mask_first_n(string str[, int n])`	`Hive 2.1.0+`。返回前n个值被屏蔽的str的屏蔽版本
`string`	`mask_last_n(string str[, int n])`	`Hive 2.1.0+`。返回最后n个值被屏蔽的str的屏蔽版本
`string`	`mask_show_first_n(string str[, int n])`	`Hive 2.1.0+`。返回str的屏蔽版本，显示未屏蔽的前n个字符
`string`	`mask_show_last_n(string str[, int n])`	`Hive 2.1.0+`。返回str的屏蔽版本，显示未屏蔽的最后n个字符
`string`	`mask_hash(string\|char\|varchar str)`	`Hive 2.1.0+`。返回基于str的哈希值

1.7 Misc. 函数

返回类型	名称	描述
`varies`	`java_method(class, method[, arg1[, arg2..]])`	`reflect`的同义词。`Hive 0.9.0+`
`varies`	`reflect(class, method[, arg1[, arg2..]])`	通过使用反射匹配参数签名来调用Java方法。`Hive 0.7.0+`
`int`	`hash(a1[, a2...])`	返回参数的哈希值。`Hive 0.4+`
`string`	`current_user()`	从配置的验证管理器返回当前用户名。`Hive 1.2.0+`
`string`	`logged_in_user()`	从会话状态返回当前用户名 `Hive 2.2.0+`
`string`	`current_database()`	返回当前数据库名称。`Hive 0.13.0+`
`string`	`md5(string/binary)`	计算字符串或二进制文件的MD5 128位校验和。`Hive 1.3.0`
`string`	`sha1(string/binary)、sha(string/binary)`	计算字符串或二进制的SHA-1摘要，并将值作为十六进制字符串返回。`Hive 1.3.0+`
`bigint`	`crc32(string/binary)`	为字符串或二进制参数计算循环冗余校验值并返回bigint值。`Hive 1.3.0+`
`string`	`sha2(string/binary, int)`	计算SHA-2散列函数族 (SHA-224, SHA-256, SHA-384, and SHA-512)。`Hive 1.3.0+`
`binary`	`aes_encrypt(input string/binary, key string/binary)`	使用AES加密输入。`Hive 1.3.0+`
`binary`	`aes_decrypt(input binary, key string/binary)`	使用AES解密输入。`Hive 1.3.0+`
`string`	`version()`	返回Hive的版本。`Hive 2.1.0+`
`bigint`	`surrogate_key([write_id_bits, task_id_bits])`	在表中输入数据时自动为行生成数字ID。只能用作ACID的默认值或只插入表。

1.7.0 xpath

下方函数都描述在LanguageManual XPathUDF：
xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, xpath_string

1.7.1 get_json_object

支持限制版本的JSONPath：

$ root对象
. 子操作符
[] array的下标运算符
* []的通配符

不支持的语法：

零长度的字符串作为key
.. 递归下降
@当前对象/元素
()脚本表达式
?()过滤（脚本）表达式
[,] 联合运算符
[start:end.step]array切片操作符

实例：src_json表是一个单列、单行的表

{"store":
  {"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
   "bicycle":{"price":19.95,"color":"red"}
  },
 "email":"amy@only_for_json_udf_test.net",
 "owner":"amy"
}

使用下方的查询，可提取json对象中的字段：

hive> SELECT get_json_object(src_json.json, '$.owner') FROM src_json;
amy
 
hive> SELECT get_json_object(src_json.json, '$.store.fruit\[0]') FROM src_json;
{"weight":8,"type":"apple"}
 
hive> SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
NULL

2、内置聚合函数

返回类型	名称	描述
`bigint`	`count(*)`，`count(expr)`， `count(DISTINCT expr[, expr...])`	个数统计函数。`count(*)`计检索出的行的个数，包括NULL值的行。`count(expr)`返回指定字段的非空值的个数。`count(DISTINCT expr[, expr...])`返回指定字段的不同的非空值的个数
`double`	`sum(col)`，`sum(DISTINCT col)`	`sum(col)`统计结果集中col的相加的结果；`sum(DISTINCT col)`统计结果中col不同值相加的结果
`double`	`avg(col)`，`avg(distinct col)`	`avg(col)`统计结果集中col的平均值；`avg(distinct col)`统计结果中col不同值相加的平均值
`double`	`min(col)`	统计结果集中col字段的最小值
`double`	`max(col)`	统计结果集中col字段的最大值
`double`	`variance(col)`，`var_pop(col)`	非空集合总体变量函数，统计结果集中col非空集合的总体变量（忽略null）
`double`	`var_samp(col)`	非空集合样本变量函数，统计结果集中col非空集合的样本变量（忽略null）
`double`	`stddev_pop(col)`	该函数计算总体标准偏离，并返回总体变量的平方根，其返回值与`var_pop()`函数的平方根相同
`double`	`stddev_samp(col)`	该函数计算样本标准偏离
`double`	`covar_pop(col1, col2)`	返回组中一对数值列的总体协方差
`double`	`covar_samp(col1, col2)`	返回组中一对数值列的样本协方差
`double`	`corr(col1, col2)`	返回组中一对数值列的Pearson相关系数
`double`	`percentile(bigint col, p)`	中位数函数，求准确的第pth个百分位数，p必须介于0和1之间，但是col字段目前只支持整数，不支持浮点数类型
`array<double>`	`percentile(bigint col, array(p1 [, p2]...))`	中位数函数，功能和上述类似，之后后面可以输入多个百分位数，返回类型也为array，其中为对应的百分位数。
`double`	`percentile_approx(double col, p [, B])`	近似中位数函数，求近似的第pth个百分位数，p必须介于0和1之间，返回类型为double，但是col字段支持浮点类型。参数B控制内存消耗的近似精度，B越大，结果的准确度越高。默认为10,000。当col字段中的distinct值的个数小于B时，结果为准确的百分位数
`array<double>`	`percentile_approx(double col, array(p1 [, p2]...) [, B])`	功能和上述类似，之后后面可以输入多个百分位数，返回类型也为array，其中为对应的百分位数。
`double`	`regr_avgx(independent, dependent)`	`Hive 2.2.0`，相当于`avg(dependent)`
`double`	`regr_avgy(independent, dependent)`	`Hive 2.2.0`，相当于`avg(dependent)`
`double`	`regr_count(independent, dependent)`	`Hive 2.2.0`，返回用于适合线性回归线的non-null对的数目
`double`	`regr_intercept(independent, dependent)`	`Hive 2.2.0`，返回线性回归线的y-intercept。例如：b的值依赖于`= a*`，而不依赖于`+ b`
`double`	`regr_r2(independent, dependent)`	`Hive 2.2.0`，返回回归的确定系数
`double`	`regr_slope(independent, dependent)`	`Hive 2.2.0`，返回线性回归线的坡度
`double`	`regr_sxx(independent, dependent)`	`Hive 2.2.0`，相当于`regr_count(independent, dependent) * var_pop(dependent)`
`double`	`regr_sxy(independent, dependent)`	`Hive 2.2.0`，相当于`regr_count(independent, dependent) * covar_pop(independent, dependent)`
`double`	`regr_syy(independent, dependent)`	`Hive 2.2.0`，相当于`regr_count(independent, dependent) * var_pop(independent)`
`array<struct {'x','y'}>`	`histogram_numeric(col, b)`	以b为基准计算col的直方图信息
`array`	`collect_set(col)`	返回一组消除了重复元素的对象
`array`	`collect_list(col)`	返回具有重复项的对象列表。`Hive 0.13.0+`
`integer`	`ntile(integer x)`	将一个有序分区划分为x组，称为bucket，并为分区中的每一行分配一个bucket编号。`Hive 0.11.0+`

3、内置表生成函数

普通的用户定义函数（UDF），如concat()，接受一个输入行并输出一个输出行。相反，表生成函数将单个输入行转换为多个输出行。

row-set columns types	名称	描述
`T`	`explode(ARRAY<T> a)`	数组转成多行函数
`T_key,T_value`	`explode(MAP<Tkey,Tvalue> m)`	map中每个key-value对，生成一行，key为一列，value为一列
`int,T`	`posexplode(ARRAY<T> a)`	使用int类型的附加位置列将数组分解为多行
`T1,...,Tn`	`inline(ARRAY<STRUCT<f1:T1,...,fn:Tn>> a)`	将结构数组分解为多行
`T1,...,Tn/r`	`stack(int r,T1 V1,...,Tn/r Vn)`	将n个值v1，…，vn分解为r行
`string1,...,stringn`	`json_tuple(string jsonStr,string k1,...,string kn)`	获取JSON字符串和一组n个键，并返回一个n值的元组。
`string 1,...,stringn`	`parse_url_tuple(string urlStr,string p1,...,string pn)`	获取url字符串和一组n个url部分，并返回n个值的元组。

实例1：explode (array)

hive (test)> select explode(array('A','B','C'));
OK
A
B
C
Time taken: 0.37 seconds, Fetched: 3 row(s)
hive (test)> select explode(array('A','B','C')) as col;
OK
A
B
C
Time taken: 0.115 seconds, Fetched: 3 row(s)
hive (test)> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
OK
A
B
C
Time taken: 3.067 seconds, Fetched: 3 row(s)
hive (test)> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;
OK
A
B
C
Time taken: 0.106 seconds, Fetched: 3 row(s)

实例2：explode (map)

hive (test)> select explode(map('A',10,'B',20,'C',30));
OK
A       10
B       20
C       30
Time taken: 0.153 seconds, Fetched: 3 row(s)
hive (test)> select explode(map('A',10,'B',20,'C',30)) as (key,value);
OK
A       10
B       20
C       30
Time taken: 0.108 seconds, Fetched: 3 row(s)
hive (test)> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
OK
A       10
B       20
C       30
Time taken: 0.529 seconds, Fetched: 3 row(s)
hive (test)> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;
OK
A       10
B       20
C       30
Time taken: 0.237 seconds, Fetched: 3 row(s)

实例3：posexplode (array)

select posexplode(array('A','B','C'));
select posexplode(array('A','B','C')) as (pos,val);
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf as pos,val;

实例4：inline (array of structs)

select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')));
select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) as (col1,col2,col3);
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf;
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf as col1,col2,col3;

实例5：stack (values)

select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01');
select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf;
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as col0,col1,col2;

使用SELECT udtf(col) AS colAlias...语法有一些限制：

select不允许使用其他表达式，如：SELECT pageid, explode(adid_list) AS myCol是不支持的
UDTF不能嵌套，如：SELECT explode(explode(adid_list)) AS myCol不支持
GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY是不支持的，比如：SELECT explode(adid_list) AS myCol ... GROUP BY myCol是不支持的。