Spark SQL内置函数

最新推荐文章于 2024-01-30 19:31:19 发布

流年ln

最新推荐文章于 2024-01-30 19:31:19 发布

阅读量3.4k

点赞数 1

本文链接：https://blog.csdn.net/admin_15082037343/article/details/104805509

版权

Spark版本号：2.4.3，Spark SQL命令行输入

show functions;

示例

abs：取绝对值

> select abs(-10);

+------+
| _c0  |
+------+
| 10   |
+------+

acos：反余弦函数

> select acos(0.5);

+---------------------+
|         _c0         |
+---------------------+
| 1.0471975511965979  |
+---------------------+

add_months(start_date, num_months)，增加月份

> select add_months('2016-02-29', 1);

+-------------+
|     _c0     |
+-------------+
| 2016-03-31  |
+-------------+

aggregate
and
approx_count_distinct(expr[, relativeSD])
approx_percentile(col, percentage [, accuracy])
array(expr, …) - 返回给定值组成的数组。

> select array(1, 2 , 3, 4, 5, 6);

+----------------+
|      _c0       |
+----------------+
| [1,2,3,4,5,6]  |
+----------------+

array_contains(array, value) - 如果数组包含了 value，则返回 true。

> select array_contains(array(1, 2, 3, 4, 5, 6), 1);

+------------------------------------------+
|array_contains(array(1, 2, 3, 4, 5, 6), 1)|
+------------------------------------------+
|                                      true|
+------------------------------------------+

array_distinct(array) - 数组去重

> select array_distinct(array(1, 1, 3, 3));

+---------------------------------+
|array_distinct(array(1, 1, 3, 3))|
+---------------------------------+
|                           [1, 3]|
+---------------------------------+

array_except(array1, array2) - 数组移除

> select array_except(array(1, 2, 3) , array(1));

+--------------------------------------+
|array_except(array(1, 2, 3), array(1))|
+--------------------------------------+
|                                [2, 3]|
+--------------------------------------+

array_intersect(array1, array2) - 数组求交

> select array_intersect(array(1, 2, 3) , array(1, 2));

+--------------------------------------------+
|array_intersect(array(1, 2, 3), array(1, 2))|
+--------------------------------------------+
|                                      [1, 2]|
+--------------------------------------------+

array_join(array, String[, String]): String：使用分隔符将数组的元素进行拼接。另外，我们还可以指定可选的字符来替换 null 值。如果我们没有指定 null 值的替换字符串，那么结果中将会把 null 值替换掉。

> select array_join(array('hello', null, 'world'), ',', 'null');

+--------------------------------------------------------------+
|array_join(array(hello, CAST(NULL AS STRING), world), ,, null)|
+--------------------------------------------------------------+
|                                              hello,null,world|
+--------------------------------------------------------------+

第一个参数为：stirng的数组
第二个参数为：拼接字符串
第三个参数为：null替换为字符串

array_max(array): T：返回数组中的最大值， null 元素将会被忽略。

> select array_max(array(1, 2, 10, null, 3));

+------------------------------------------------+
|array_max(array(1, 2, 10, CAST(NULL AS INT), 3))|
+------------------------------------------------+
|                                              10|
+------------------------------------------------+

array_min(array): T：返回数组中的最小值， null 元素将会被忽略。

> select array_min(array(1, 2, 10, null, 3));

+------------------------------------------------+
|array_min(array(1, 2, 10, CAST(NULL AS INT), 3))|
+------------------------------------------------+
|                                               1|
+------------------------------------------------+

array_position(array, T): Long：返回元素 T 在数组第一次出现的位置，从1开始算。

> select array_position(array(1, 3, 4, 1), 1);

+------------------------------------+
|array_position(array(1, 3, 4, 1), 1)|
+------------------------------------+
|                                   1|
+------------------------------------+

array_remove(array, T): array：从给定数组中删除所有与给定元素相等的元素。

> select array_remove(array(1, 3, 4, 1, 1), 1);

+-------------------------------------+
|array_remove(array(1, 3, 4, 1, 1), 1)|
+-------------------------------------+
|                               [3, 4]|
+-------------------------------------+

array_repeat(T, Int): array：返回包含元素计数次数的数组。

> select array_repeat('1', 2);

+------------------+
|array_repeat(1, 2)|
+------------------+
|            [1, 1]|
+------------------+

array_sort(array): array：按升序对输入数组进行排序。输入数组的元素必须是可排序的。null 元素将放置在返回数组的末尾。

> select array_sort(array(3, 4, 1, null, 0));

+------------------------------------------------+
|array_sort(array(3, 4, 1, CAST(NULL AS INT), 0))|
+------------------------------------------------+
|                                   [0, 1, 3, 4,]|
+------------------------------------------------+

array_union(array, array): array：返回给定两个数组元素的并集，结果已经去重了。使用如下

> select array_union(array(1, 2, 3), array(1, 3, 5));

+-------------------------------------------+
|array_union(array(1, 2, 3), array(1, 3, 5))|
+-------------------------------------------+
|                               [1, 2, 3, 5]|
+-------------------------------------------+

arrays_overlap(array, array): array：如果数组1最少包含数组2中一个非空的元素，则返回 true；如果两个数组没有公共元素，而且两个数组均为空则返回 false，如果其中任何一个数组包含 null 元素则返回 null。

> select arrays_overlap(array(1, 2, 3), array(3, 4, 5));

+----------------------------------------------+
|arrays_overlap(array(1, 2, 3), array(3, 4, 5))|
+----------------------------------------------+
|                                          true|
+----------------------------------------------+

array_zip(array …)：数组组合

> select arrays_zip(array(1, 2), array(2, 3));

+------------------------------------+
|arrays_zip(array(1, 2), array(2, 3))|
+------------------------------------+
|                    [[1, 2], [2, 3]]|
+------------------------------------+

ascii(string)：返回str中第一个字符的ascii值

> select ascii('China 100');

+----------------+
|ascii(China 100)|
+----------------+
|              67|
+----------------+

asin(string)：反正弦

> select ascii('0.5');

+----------+
|ascii(0.5)|
+----------+
|        48|
+----------+

assert_true(expr)：如果 expr 表达式的返回值不是 true 则抛出异常。

> select assert_true(0 < 1);

+--------------------+
|assert_true((0 < 1))|
+--------------------+
|                null|
+--------------------+

atan(double)：反正切函数

> select atan(0.5);

spark.sql("select atan(0.5)").show();
+-------------------------+
|ATAN(CAST(0.5 AS DOUBLE))|
+-------------------------+
|       0.4636476090008061|
+-------------------------+

atan2(double, double)：返回平面的正 x 轴与由坐标（expr1，expr2）点之间的弧度角度。

> select atan2(1, 1);

+-------------------------------------------+
|ATAN2(CAST(1 AS DOUBLE), CAST(1 AS DOUBLE))|
+-------------------------------------------+
|                         0.7853981633974483|
+-------------------------------------------+

avg：求平均
base64(bin)：将参数从二进制文件转换为 base64 的字符串

> select base64('hello world');

+-----------------------------------+
|base64(CAST(hello world AS BINARY))|
+-----------------------------------+
|                   aGVsbG8gd29ybGQ=|
+-----------------------------------+

bigint(expr)：将值 expr 转换为 bigint 数据类型

> select bigint('121');

+-------------------+
|CAST(121 AS BIGINT)|
+-------------------+
|                121|
+-------------------+

bin(expr)：返回 long 类型的参数 expr 的二进制字符串表示形式

> select bin(100);

+------------------------+
|bin(CAST(100 AS BIGINT))|
+------------------------+
|                 1100100|
+------------------------+

binary(expr)：将值 expr 转换为 binary 数据类型。

> select binary(bin(100));

+----------------------------------------+
|CAST(bin(CAST(100 AS BIGINT)) AS BINARY)|
+----------------------------------------+
|                    [31 31 30 30 31 3...|
+----------------------------------------+

bit_length(expr)：返回字符串数据的位长度或二进制数据的位数。

> select bit_length(binary(bin(100)));

+----------------------------------------------------+
|bit_length(CAST(bin(CAST(100 AS BIGINT)) AS BINARY))|
+----------------------------------------------------+
|                                                  56|
+----------------------------------------------------+


> select bit_length('hello world');

+-----------------------+
|bit_length(hello world)|
+-----------------------+
|                     88|
+-----------------------+

boolean(expr)：将值 expr 转换为 boolean 数据类型

> select boolean(0);

+------------------+
|CAST(0 AS BOOLEAN)|
+------------------+
|             false|
+------------------+

bround(expr, len)：四舍五入，保留len长度的小数位

> select bround(3.1415926, 2);

+--------------------+
|bround(3.1415926, 2)|
+--------------------+
|                3.14|
+--------------------+

~~cardinality~~
cast(expr AS type)：将 expr 转换成 type 类型的数据

> select cast('10' as integer);

+---------------+
|CAST(10 AS INT)|
+---------------+
|             10|
+---------------+

cbrt(expr)：返回expr的立方根

> select cbrt(8);

+-----------------------+
|CBRT(CAST(8 AS DOUBLE))|
+-----------------------+
|                    2.0|
+-----------------------+

ceil(expr)：向上取整

> select ceil(-3.1415926);

+----------------+
|CEIL(-3.1415926)|
+----------------+
|              -3|
+----------------+

ceiling(expr)：向上取整

> select ceiling(-3.1415926);

+----------------+
|CEIL(-3.1415926)|
+----------------+
|              -3|
+----------------+

char(expr)：返回二进制等效于 expr 的 ASCII 字符。如果 n 大于256，则结果等于 chr(n%256)

> select char(100);

+------------------------+
|chr(CAST(100 AS BIGINT))|
+------------------------+
|                       d|
+------------------------+

char_length(expr)：返回字符串数据的字符长度或二进制数据的字节数。字符串数据的长度包括尾随空格，二进制数据的长度包括二进制零。

> select char_length('hello world');

+-------------------+
|length(hello world)|
+-------------------+
|                 11|
+-------------------+

character_length(expr)：返回字符串数据的字符长度或二进制数据的字节数。字符串数据的长度包括尾随空格，二进制数据的长度包括二进制零。

> select character_length('hello world');

+-------------------+
|length(hello world)|
+-------------------+
|                 11|
+-------------------+

chr(expr) ：返回二进制等效于 expr 的 ASCII 字符。如果 n 大于256，则结果等于 chr(n%256)

> select chr(100);

+------------------------+
|chr(CAST(100 AS BIGINT))|
+------------------------+
|                       d|
+------------------------+

coalesce(expr1, expr2, …)：返回第一个非空参数（如果存在）。否则，返回 null。

> select coalesce(null, 1, null, 2);

+----------------------------------------------------+
|coalesce(CAST(NULL AS INT), 1, CAST(NULL AS INT), 2)|
+----------------------------------------------------+
|                                                   1|
+----------------------------------------------------+

collect_list(expr)：返回list，不去重

> select collect_list(carstate) from gps_car;

+----------------------+
|collect_list(carstate)|
+----------------------+
|  [2.0, 2.0, 2.0, 2...|
+----------------------+

collect_set(expr)：返回set，去重

> select collect_set(carstate) from gps_car;

+---------------------+
|collect_set(carstate)|
+---------------------+
|      [2.0, 0.0, 1.0]|
+---------------------+

concat(str1, str2,…,strN)：返回由 str1, str2, …, strN 组成的字符串

> select concat('hello', 'world');

+--------------------+
|concat(hello, world)|
+--------------------+
|          helloworld|
+--------------------+

concat_ws(sep, [str | array(str)]+) ：返回由 sep 分隔组成的字符串连接。

> select concat_ws(' ', 'hello', 'world');

+--------------------------+
|concat_ws( , hello, world)|
+--------------------------+
|               hello world|
+--------------------------+


> select concat_ws(' ', array('hello', 'world'));

+---------------------------------+
|concat_ws( , array(hello, world))|
+---------------------------------+
|                      hello world|
+---------------------------------+

conv(num, from_base, to_base) ：将 num 从 from_base 进制转换为 to_base 进制。

> select conv(100, 2, 10);

+--------------------------------+
|conv(CAST(100 AS STRING), 2, 10)|
+--------------------------------+
|                               4|
+--------------------------------+

~~corr(expr1, expr2)~~
cos(expr)：余弦函数

> select cos(0.5);

+------------------------+
|COS(CAST(0.5 AS DOUBLE))|
+------------------------+
|      0.8775825618903728|
+------------------------+

cosh(expr)：双曲余弦

> select cosh(0);

+-----------------------+
|COSH(CAST(0 AS DOUBLE))|
+-----------------------+
|                    1.0|
+-----------------------+

cot(expr)：余切

> select cot(1);

+----------------------+
|COT(CAST(1 AS DOUBLE))|
+----------------------+
|    0.6420926159343306|
+----------------------+

count
count_min_sketch
covar_pop
covar_samp
crc32
cube
cume_dist
current_database：查询当前数据库

> select current_database();

+------------------+
|current_database()|
+------------------+
|           default|
+------------------+

current_date：查询当前时间

> select current_date();

+--------------+
|current_date()|
+--------------+
|    2020-03-11|
+--------------+

current_timestamp()：当前时间戳

> select current_timestamp();

+--------------------+
| current_timestamp()|
+--------------------+
|2020-03-11 04:26:...|
+--------------------+

date(expr)：

> select date('2010-10-10');

+------------------------+
|CAST(2010-10-10 AS DATE)|
+------------------------+
|              2010-10-10|
+------------------------+

date_add(start_date, num_days)：增加时间

> select date_add('2010-10-10', 1);

+-------------------------------------+
|date_add(CAST(2010-10-10 AS DATE), 1)|
+-------------------------------------+
|                           2010-10-11|
+-------------------------------------+

date_format(timestamp, fmt)：时间格式化

> select date_format('2010-10-10', 'yyyy');

+------------------------------------------------+
|date_format(CAST(2010-10-10 AS TIMESTAMP), yyyy)|
+------------------------------------------------+
|                                            2010|
+------------------------------------------------+

date_sub(start_date, num_days)：时间减少天数

> select date_sub('2020-10-10', 1);

+-------------------------------------+
|date_sub(CAST(2020-10-10 AS DATE), 1)|
+-------------------------------------+
|                           2020-10-09|
+-------------------------------------+

date_trunc(fmt, ts)：时间精确，fmt：[“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]

> select date_trunc('yyyy', '2011-11-11 11:11:11.111');

+------------------------------------------------------------+
|date_trunc(yyyy, CAST(2011-11-11 11:11:11.111 AS TIMESTAMP))|
+------------------------------------------------------------+
|                                         2011-01-01 00:00:00|
+------------------------------------------------------------+


> select date_trunc('month', '2011-11-11 11:11:11.111');

+-------------------------------------------------------------+
|date_trunc(month, CAST(2011-11-11 11:11:11.111 AS TIMESTAMP))|
+-------------------------------------------------------------+
|                                          2011-11-01 00:00:00|
+-------------------------------------------------------------+


> select date_trunc('day', '2011-11-11 11:11:11.111');

+-----------------------------------------------------------+
|date_trunc(day, CAST(2011-11-11 11:11:11.111 AS TIMESTAMP))|
+-----------------------------------------------------------+
|                                        2011-11-11 00:00:00|
+-----------------------------------------------------------+

datediff(start_date, end_date)：返回天数差

> select datediff('2011-11-11', '2011-11-10');

+------------------------------------------------------------+
|datediff(CAST(2011-11-11 AS DATE), CAST(2011-11-10 AS DATE))|
+------------------------------------------------------------+
|                                                           1|
+------------------------------------------------------------+

day(date)：返回天数

> select day('2020-2-11');

+-----------------------------------+
|dayofmonth(CAST(2020-2-11 AS DATE))|
+-----------------------------------+
|                                 11|
+-----------------------------------+

dayofmonth(date)：返回天数

> select dayofmonth('2020-2-11');

+-----------------------------------+
|dayofmonth(CAST(2020-2-11 AS DATE))|
+-----------------------------------+
|                                 11|
+-----------------------------------+

dayofweek(date)：返回天数

> select dayofmonth('2020-2-11');

+-----------------------------------+
|dayofmonth(CAST(2020-2-11 AS DATE))|
+-----------------------------------+
|                                 11|
+-----------------------------------+

dayofyear(date)：返回天数

> select dayofyear('2020-2-11');

+----------------------------------+
|dayofyear(CAST(2020-2-11 AS DATE))|
+----------------------------------+
|                                42|
+----------------------------------+

decimal(expr)：转换为decimal

> select decimal(1.0);

+--------------------------+
|CAST(1.0 AS DECIMAL(10,0))|
+--------------------------+
|                         1|
+--------------------------+

decode(bin, charset)：解码

> select decode('abcd', 'utf-8');

+-----------------------------------+
|decode(CAST(abcd AS BINARY), utf-8)|
+-----------------------------------+
|                               abcd|
+-----------------------------------+

degrees
dense_rank
double(expr)：转成double

> select double(1);

+-----------------+
|CAST(1 AS DOUBLE)|
+-----------------+
|              1.0|
+-----------------+

e()：返回e的值

> select e();

+-----------------+
|              E()|
+-----------------+
|2.718281828459045|
+-----------------+

element_att(n, input1, input2, …)：返回第几个
elt

> select elt(2, 'java', 'php', 'R');

+--------------------+
|elt(2, java, php, R)|
+--------------------+
|                 php|
+--------------------+

encode(str, charset)：编码

> select encode('abcd', 'utf-8');

+-------------------+
|encode(abcd, utf-8)|
+-------------------+
|      [61 62 63 64]|
+-------------------+

exists
exp
explode(expr)：拆分数组等

> select explode(array(1, 3, 5));

+---+
|col|
+---+
|  1|
|  3|
|  5|
+---+

explode_outer(expr)：拆分数组等

> select explode_outer(array(1, 3, 5));

+---+
|col|
+---+
|  1|
|  3|
|  5|
+---+

expm1
factorial(expr)：阶乘

> select factorial(5);

+------------+
|factorial(5)|
+------------+
|         120|
+------------+

filter
find_in_set(str, str_array)：查询索引

> select find_in_set('ab','abc,b,ab,c,def');

+-------------------------------+
|find_in_set(ab, abc,b,ab,c,def)|
+-------------------------------+
|                              3|
+-------------------------------+

first
first_value
flatten
float(expr)：转成float

> select float(1);

+----------------+
|CAST(1 AS FLOAT)|
+----------------+
|             1.0|
+----------------+

floor(expr)：向下取整

> select floor(-0.1);

+-----------+
|FLOOR(-0.1)|
+-----------+
|         -1|
+-----------+

format_number(expr1, expr2)：数字格式化

> select format_number(3141.5926, 3);

+---------------------------+
|format_number(3141.5926, 3)|
+---------------------------+
|                  3,141.593|
+---------------------------+

format_string(strfmt, obj, …)：字符串格式化

> select format_string('Hello World %d %s', 100, 'days');

+-------------------------------------------+
|format_string(Hello World %d %s, 100, days)|
+-------------------------------------------+
|                       Hello World 100 days|
+-------------------------------------------+

from_json(jsonStr, schema[, options])：

> select from_json("{'id': 1, 'name': 'zhangsan'}", "id int, name string");

+--------------------------------------------+
|jsontostructs({'id': 1, 'name': 'zhangsan'})|
+--------------------------------------------+
|                               [1, zhangsan]|
+--------------------------------------------+

from_unixtime(unix_time, format)：unix_time为java.util.Date().getTime() / 1000

> select from_unixtime(1583888136, 'yyyy-MM-dd HH:mm:ss');

+--------------------------------------------------------------+
|from_unixtime(CAST(1583888136 AS BIGINT), yyyy-MM-dd HH:mm:ss)|
+--------------------------------------------------------------+
|                                           2020-03-11 08:55:36|
+--------------------------------------------------------------+

from_utc_timestamp(timestamp, timezone)

> select from_utc_timestamp('2020-01-01', 'Asia/Seoul');

+-------------------------------------------------------------+
|from_utc_timestamp(CAST(2020-01-01 AS TIMESTAMP), Asia/Seoul)|
+-------------------------------------------------------------+
|                                          2020-01-01 09:00:00|
+-------------------------------------------------------------+

get_json_object(json_txt, path)：取最大值
greatest(expr, …)：

> select greatest(1, 3, 4, 10);

+---------------------+
|greatest(1, 3, 4, 10)|
+---------------------+
|                   10|
+---------------------+

grouping
grouping_id
hash(expr1, …)：

> select hash('hello', 'world');

+------------------+
|hash(hello, world)|
+------------------+
|        -165041762|
+------------------+

hex(expr)：转成hexadecimal

> select hex('hello world');

+--------------------+
|    hex(hello world)|
+--------------------+
|68656C6C6F20776F7...|
+--------------------+

hour(timestamp)：返回小时

> select hour('2020-11-11 11:11:11');

+--------------------------------------------+
|hour(CAST(2020-11-11 11:11:11 AS TIMESTAMP))|
+--------------------------------------------+
|                                          11|
+--------------------------------------------+

hypot(expr1, expr2)：返回值为

sqrt(expr^2 + expr2^2)

> select hypot(3, 4);

+-------------------------------------------+
|HYPOT(CAST(3 AS DOUBLE), CAST(4 AS DOUBLE))|
+-------------------------------------------+
|                                        5.0|
+-------------------------------------------+

if(expr1, expr2, expr3)：expr1=true返回expr2，否则返回expr3

> select if(1 > 2, '1', '2');

+-------------------+
|(IF((1 > 2), 1, 2))|
+-------------------+
|                  2|
+-------------------+

ifnull(expr1, expr2)：如果expr1为返回expr2

> select ifnull(null, 'expr2');

+---------------------+
|ifnull(NULL, 'expr2')|
+---------------------+
|                expr2|
+---------------------+

in：expr1 in(expr2, expr3, …)

> select 1 in(1, 3, 4);

+----------------+
|(1 IN (1, 3, 4))|
+----------------+
|            true|
+----------------+

initcap(str)：每个单词首字母大写

> select initcap('hello world');

+--------------------+
|initcap(hello world)|
+--------------------+
|         Hello World|
+--------------------+

inline

> select inline(array(struct(1, 'a'), struct(2, 'b')));

+----+----+
|col1|col2|
+----+----+
|   1|   a|
|   2|   b|
+----+----+

inline_outer

> select inline_outer(array(struct(1, 'a'), struct(2, 'b')));

+----+----+
|col1|col2|
+----+----+
|   1|   a|
|   2|   b|
+----+----+

input_file_block_length()
input_file_block_start
input_file_name
instr(str, substr)：查询索引

> select instr('hello world', 'world');

+-------------------------+
|instr(hello world, world)|
+-------------------------+
|                        7|
+-------------------------+

int(expr)：转成int

> select int(3.14);

+-----------------+
|CAST(3.14 AS INT)|
+-----------------+
|                3|
+-----------------+

isnan
isnotnull(expr)：非空返回true

> select isnotnull(1);

+---------------+
|(1 IS NOT NULL)|
+---------------+
|           true|
+---------------+

isnull：空返回true

> select isnull(null);

+--------------+
|(NULL IS NULL)|
+--------------+
|          true|
+--------------+

java_method(class, method[, arg1[, arg2 …]])：调用java方法

> select java_method('java.util.UUID', 'randomUUID');

+-----------------------------------+
|reflect(java.util.UUID, randomUUID)|
+-----------------------------------+
|               3aaebaa8-6335-440...|
+-----------------------------------+

json_tuple
kurtosis
lag
last

> select last(array(1, 3, null, 5));

+----------------------------------------------+
|last(array(1, 3, CAST(NULL AS INT), 5), false)|
+----------------------------------------------+
|                                    [1, 3,, 5]|
+----------------------------------------------+

last_day(date)：返回最后一天

> select last_day('2020-02-01');

+----------------------------------+
|last_day(CAST(2020-02-01 AS DATE))|
+----------------------------------+
|                        2020-02-29|
+----------------------------------+

last_value
lcase(str)：转换为小写

> select lcase('Hello World');

+------------------+
|lower(Hello World)|
+------------------+
|       hello world|
+------------------+

lead
least(expr, …)：返回最小值

> select least(null, 10, 3, 4);

+----------------------------------+
|least(CAST(NULL AS INT), 10, 3, 4)|
+----------------------------------+
|                                 3|
+----------------------------------+

left(str, len)：返回左边几个字符

> select left('hello world', 3);

+----------------------+
|left('hello world', 3)|
+----------------------+
|                   hel|
+----------------------+

length(expr)：返回长度

> select length('hello world');

+-------------------+
|length(hello world)|
+-------------------+
|                 11|
+-------------------+

levenshtein
like
ln
locate(substr, str[, pos])：
log
log10
log1p
log2
lower(str)：转成小写

> select lower('Hello World');

+------------------+
|lower(Hello World)|
+------------------+
|       hello world|
+------------------+

lpad
ltrim(trimStr, str)：去除前后trimStr，不传责去除前空格

> select ltrim('ha', 'haHello worldha');

+--------------------------+
|ltrim(haHello worldha, ha)|
+--------------------------+
|             Hello worldha|
+--------------------------+

map(key0, value0, key1, value1, …)：创建map

> select map(1.0, '2', 3.0, '4');

+--------------------+
| map(1.0, 2, 3.0, 4)|
+--------------------+
|[1.0 -> 2, 3.0 -> 4]|
+--------------------+

map_concat(map<K, V>, …)：返回多个 map 的并集

> select map_concat(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd'));

+--------------------------------------------+
|map_concat(map(1, a, 2, b), map(2, c, 3, d))|
+--------------------------------------------+
|                        [1 -> a, 2 -> b, ...|
+--------------------------------------------+

map_from_arrays(array, array)

> select map_from_arrays(array(1.0, 3.0), array('2', '4'));

+---------------------------------------------+
|map_from_arrays(array(1.0, 3.0), array(2, 4))|
+---------------------------------------------+
|                         [1.0 -> 2, 3.0 -> 4]|
+---------------------------------------------+

map_from_entries(array<struct<K, V>>)

> select map_from_entries(array(struct(1, 'a'), struct(2, 'b')));

+---------------------------------------------------------------------------------------+
|map_from_entries(array(named_struct(col1, 1, col2, a), named_struct(col1, 2, col2, b)))|
+---------------------------------------------------------------------------------------+
|                                                                       [1 -> a, 2 -> b]|
+---------------------------------------------------------------------------------------+

map_keys(map)

> select map_keys(map(1, 'a', 2, 'b'));

+-------------------------+
|map_keys(map(1, a, 2, b))|
+-------------------------+
|                   [1, 2]|
+-------------------------+

map_values(map)

> select map_values(map(1, 'a', 2, 'b'));

+---------------------------+
|map_values(map(1, a, 2, b))|
+---------------------------+
|                     [a, b]|
+---------------------------+

max
md5(expr)：md5加密

> select md5('hello world');

+--------------------------------+
|md5(CAST(hello world AS BINARY))|
+--------------------------------+
|            5eb63bbbe01eeed09...|
+--------------------------------+

mean
min
minute(timestamp)：返回分钟数

> select minute('2011-11-11 11:11:11');

+----------------------------------------------+
|minute(CAST(2011-11-11 11:11:11 AS TIMESTAMP))|
+----------------------------------------------+
|                                            11|
+----------------------------------------------+

mod(expr1, expr2)：%

> select mod(11, 2);

+--------+
|(11 % 2)|
+--------+
|       1|
+--------+

monotonically_increasing_id
month(date)：返回月份

> select month('2011-11-11');

+-------------------------------+
|month(CAST(2011-11-11 AS DATE))|
+-------------------------------+
|                             11|
+-------------------------------+

months_between(timestamp1, timestamp2)：月份差

> select months_between('2000-10-1', '2000-11-2');

+--------------------------------------------------------------------------------+
|months_between(CAST(2000-10-1 AS TIMESTAMP), CAST(2000-11-2 AS TIMESTAMP), true)|
+--------------------------------------------------------------------------------+
|                                                                     -1.03225806|
+--------------------------------------------------------------------------------+

named_struct(name1, val1, name2, val2, …)

> select named_struct('a', 1, 'b', 2, 'c', 3);

+------------------------------+
|named_struct(a, 1, b, 2, c, 3)|
+------------------------------+
|                     [1, 2, 3]|
+------------------------------+

nanvl
negative(expr)：取反

> select negative(100);

+-------+
|(- 100)|
+-------+
|   -100|
+-------+

next_day(start_date, day_of_week)

> select next_day('2020-03-11', 'FRI');

+---------------------------------------+
|next_day(CAST(2020-03-11 AS DATE), FRI)|
+---------------------------------------+
|                             2020-03-13|
+---------------------------------------+

not
now()

> select now();

+--------------------+
| current_timestamp()|
+--------------------+
|2020-03-11 23:07:...|
+--------------------+

ntile
nullif
nvl(expr1, expr2)：expr1为空返回expr2

> select nvl(null, 'hello');

+------------------+
|nvl(NULL, 'hello')|
+------------------+
|             hello|
+------------------+

nvl2(expr1, expr2, expr3)：
octet_length(expr)：长度

> select octet_length('hello world');

+-------------------------+
|octet_length(hello world)|
+-------------------------+
|                       11|
+-------------------------+

or
parse_url
percent_rank
percentile
percentile_approx
pi()：圆周率

> select pi();

+-----------------+
|             PI()|
+-----------------+
|3.141592653589793|
+-----------------+

pmod
posexplode(expr)：分割数组

> select posexplode(array(1, 3, 4));

+---+---+
|pos|col|
+---+---+
|  0|  1|
|  1|  3|
|  2|  4|
+---+---+

posexplode_outer(expr)：分割数组

> select posexplode_outer(array(1, 3, 4));

+---+---+
|pos|col|
+---+---+
|  0|  1|
|  1|  3|
|  2|  4|
+---+---+

position(substr, str[, pos])：返回位置

> select position('he', 'world hello', 1);

+--------------------------+
|locate(he, world hello, 1)|
+--------------------------+
|                         7|
+--------------------------+

positive
pow(expr1, expr2)：乘方

> select pow(2, 4);

+-------------------------------------------+
|POWER(CAST(2 AS DOUBLE), CAST(4 AS DOUBLE))|
+-------------------------------------------+
|                                       16.0|
+-------------------------------------------+

power(expr1, expr2)：乘方

> select power(2, 4);

+-------------------------------------------+
|POWER(CAST(2 AS DOUBLE), CAST(4 AS DOUBLE))|
+-------------------------------------------+
|                                       16.0|
+-------------------------------------------+

printf(strfmt, obj, …)：字符串格式化

> select printf('Hello World %d %s', 100, 'days');

+-------------------------------------------+
|format_string(Hello World %d %s, 100, days)|
+-------------------------------------------+
|                       Hello World 100 days|
+-------------------------------------------+

quarter(date)：

> select quarter('2020-11-11');

+---------------------------------+
|quarter(CAST(2020-11-11 AS DATE))|
+---------------------------------+
|                                4|
+---------------------------------+

radians(expr)：

> select radians(180);

+----------------------------+
|RADIANS(CAST(180 AS DOUBLE))|
+----------------------------+
|           3.141592653589793|
+----------------------------+

rand([seed])：随机数

> select rand();

+-------------------------+
|rand(7896507467301193574)|
+-------------------------+
|       0.1643759947938076|
+-------------------------+

randn([seed])：随机数

> select randn();

+---------------------------+
|randn(-2498753337637632801)|
+---------------------------+
|        -0.4827279803917336|
+---------------------------+

rank
reflect(class, method[, arg1[, arg2 …]])，反射调用java方法

> select reflect('java.util.UUID', 'randomUUID');

+-----------------------------------+
|reflect(java.util.UUID, randomUUID)|
+-----------------------------------+
|               3930dab2-d5e0-49e...|
+-----------------------------------+

regexp_extract
regexp_replace(str, regexp, rep)：
repeat(str, n)：重复

> select repeat('ha', 4);

+-------------+
|repeat(ha, 4)|
+-------------+
|     hahahaha|
+-------------+

replace(str, search[, replace])：替换

> select replace('hello world', 'lo', 'ol');

+----------------------------+
|replace(hello world, lo, ol)|
+----------------------------+
|                 helol world|
+----------------------------+

reverse(str)：反转

> select reverse('hello world');

+--------------------+
|reverse(hello world)|
+--------------------+
|         dlrow olleh|
+--------------------+

right(str, len)：从右截取字符串

> select right('hello world', 3);

+-----------------------+
|right('hello world', 3)|
+-----------------------+
|                    rld|
+-----------------------+

rint(expr)：返回int部分的值

> select rint(3.14159);

+------------------------------+
|ROUND(CAST(3.14159 AS DOUBLE))|
+------------------------------+
|                           3.0|
+------------------------------+

rlike
rollup
round(expr, d)：

> select round(2.5, 0);

+-------------+
|round(2.5, 0)|
+-------------+
|            3|
+-------------+

row_number
rpad
rtrim(trimStr, str)：字符串右边截取

> select rtrim('ha', 'hahello worldha');

+--------------------------+
|rtrim(hahello worldha, ha)|
+--------------------------+
|             hahello world|
+--------------------------+

schema_of_json
second(timestamp)：查询秒

> select second('2011-11-11 11:11:11');

+----------------------------------------------+
|second(CAST(2011-11-11 11:11:11 AS TIMESTAMP))|
+----------------------------------------------+
|                                            11|
+----------------------------------------------+

sentences(str[, lang, country])

> select sentences('hello world!my dreams');

+------------------------------------+
|sentences(hello world!my dreams, , )|
+------------------------------------+
|                [[hello, world], ...|
+------------------------------------+

sequence
sha(expr)：加密

> select sha('hello world');

+---------------------------------+
|sha1(CAST(hello world AS BINARY))|
+---------------------------------+
|             2aae6c35c94fcfb41...|
+---------------------------------+

sha1(expr)：加密

> select sha1('hello world');

+---------------------------------+
|sha1(CAST(hello world AS BINARY))|
+---------------------------------+
|             2aae6c35c94fcfb41...|
+---------------------------------+

sha2(expr, bitLength)：加密

> select sha2('hello world', 256);

+--------------------------------------+
|sha2(CAST(hello world AS BINARY), 256)|
+--------------------------------------+
|                  b94d27b9934d3e08a...|
+--------------------------------------+

shiftleft
shiftright
shiftrightunsigned
shuffle
sign
signum
sin(expr)：正弦函数

> select sin(0);

+----------------------+
|SIN(CAST(0 AS DOUBLE))|
+----------------------+
|                   0.0|
+----------------------+

sinh
size(expr)：数据大小

> select size(array(1, 2, 3));

+--------------------+
|size(array(1, 2, 3))|
+--------------------+
|                   3|
+--------------------+

skewness
slice
smallint(expr)：转成smallint

> select smallint(100);

+---------------------+
|CAST(100 AS SMALLINT)|
+---------------------+
|                  100|
+---------------------+

sort_array(array[, ascendingOrder])：数组排序

> select sort_array(array(2, 3, 1, 0), true);

+-----------------------------------+
|sort_array(array(2, 3, 1, 0), true)|
+-----------------------------------+
|                       [0, 1, 2, 3]|
+-----------------------------------+

soundex
space
spark_partition_id()
split(str, regex)：字符串分割

> select split('hehahehahehaa', 'ha');

+------------------------+
|split(hehahehahehaa, ha)|
+------------------------+
|         [he, he, he, a]|
+------------------------+

sqrt(expr)：平方根

> select sqrt(9);

+-----------------------+
|SQRT(CAST(9 AS DOUBLE))|
+-----------------------+
|                    3.0|
+-----------------------+

stack
std
stddev
stddev_pop
stddev_samp
str_to_map(text[, pairDelim[, keyValueDelim]])：转成map

> select str_to_map('a:1,b:2,c:3', ',', ':');

+-----------------------------+
|str_to_map(a:1,b:2,c:3, ,, :)|
+-----------------------------+
|         [a -> 1, b -> 2, ...|
+-----------------------------+

string(expr)：转成string
struct(col1, col2, col3, …)

> select struct(1, 3, 4);

+---------------------------------------+
|named_struct(col1, 1, col2, 3, col3, 4)|
+---------------------------------------+
|                              [1, 3, 4]|
+---------------------------------------+

substr(str, pos[, len])：字符串截取

> select substr('hello world', 5, 1);

+----------------------------+
|substring(hello world, 5, 1)|
+----------------------------+
|                           o|
+----------------------------+

substring(str, pos[, len])：字符串截取

> select substring('hello world', 5, 1);

+----------------------------+
|substring(hello world, 5, 1)|
+----------------------------+
|                           o|
+----------------------------+

substring_index
sum(expr)：求和
tan(expr)：正切函数

> select tan(0);

+----------------------+
|TAN(CAST(0 AS DOUBLE))|
+----------------------+
|                   0.0|
+----------------------+

tanh
timestamp(expr)：转成时间戳
tinyint(expr)：转成tinyint
to_date(date_str[, fmt])：转成时间

> select to_date('2011-11-11 11:11:11');

+------------------------------+
|to_date('2011-11-11 11:11:11')|
+------------------------------+
|                    2011-11-11|
+------------------------------+

to_json(expr[, options])：生成json

> select to_json(named_struct('a', 1, 'b', 2));

+---------------------------------------+
|structstojson(named_struct(a, 1, b, 2))|
+---------------------------------------+
|                          {"a":1,"b":2}|
+---------------------------------------+

to_timestamp(timestamp[, fmt])：

> select to_timestamp('2011-11-11 11:11:11', 'yyyy-MM-dd HH:mm:ss');

+----------------------------------------------------------+
|to_timestamp('2011-11-11 11:11:11', 'yyyy-MM-dd HH:mm:ss')|
+----------------------------------------------------------+
|                                       2011-11-11 11:11:11|
+----------------------------------------------------------+

to_unix_timestamp(expr[, pattern])

> select to_unix_timestamp('2011-11-11 11:11:11', 'yyyy-MM-dd HH:mm:ss');

+-----------------------------------------------------------+
|to_unix_timestamp(2011-11-11 11:11:11, yyyy-MM-dd HH:mm:ss)|
+-----------------------------------------------------------+
|                                                 1320981071|
+-----------------------------------------------------------+

to_utc_timestamp(timestamp, timezone)：

> select to_utc_timestamp('2011-11-11 11:11:11', 'Asia/Seoul');

+--------------------------------------------------------------------+
|to_utc_timestamp(CAST(2011-11-11 11:11:11 AS TIMESTAMP), Asia/Seoul)|
+--------------------------------------------------------------------+
|                                                 2011-11-11 02:11:11|
+--------------------------------------------------------------------+

transform
translate
trim(str)：字符串去除前后空格

> select trim('ha', 'hahello worldha');

+-------------------------+
|trim(hahello worldha, ha)|
+-------------------------+
|               ello world|
+-------------------------+

trunc(date, fmt)：fmt - [“year”, “yyyy”, “yy”, “mon”, “month”, “mm”]

> select trunc('2011-11-11 02:11:11', 'yyyy');

+----------------------------------------------+
|trunc(CAST(2011-11-11 02:11:11 AS DATE), yyyy)|
+----------------------------------------------+
|                                    2011-01-01|
+----------------------------------------------+

ucase(str)：转换为大写

> select ucase('hello world');

+------------------+
|upper(hello world)|
+------------------+
|       HELLO WORLD|
+------------------+

unbase64(str)：

> select unbase64('aGVsbG8gd29ybGQ=');

+--------------------------+
|unbase64(aGVsbG8gd29ybGQ=)|
+--------------------------+
|      [68 65 6C 6C 6F 2...|
+--------------------------+

unhex(str)：

> select unhex('68656C6C6F');

+-----------------+
|unhex(68656C6C6F)|
+-----------------+
| [68 65 6C 6C 6F]|
+-----------------+

unix_timestamp([expr[, pattern]])：

> select unix_timestamp('2020-01-01', 'yyyy-MM-dd');

+--------------------------------------+
|unix_timestamp(2020-01-01, yyyy-MM-dd)|
+--------------------------------------+
|                            1577808000|
+--------------------------------------+

upper(str)：转换为大写

> select upper('hello world');

+------------------+
|upper(hello world)|
+------------------+
|       HELLO WORLD|
+------------------+

uuid()

> select uuid();

+--------------------+
|              uuid()|
+--------------------+
|bd8aa547-9d08-46c...|
+--------------------+

var_pop
var_samp
variance
weekday
weekofyear(date)：

> select weekofyear('2011-11-11');

+------------------------------------+
|weekofyear(CAST(2011-11-11 AS DATE))|
+------------------------------------+
|                                  45|
+------------------------------------+

when

> select case when 1 > 0 then 1 when 2 > 0 then 2.0 else 1.2 end;

+--------------------------------------------------------------------------------------------------------------------------------+
|CASE WHEN (1 > 0) THEN CAST(1 AS DECIMAL(11,1)) WHEN (2 > 0) THEN CAST(2.0 AS DECIMAL(11,1)) ELSE CAST(1.2 AS DECIMAL(11,1)) END|
+--------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                             1.0|
+--------------------------------------------------------------------------------------------------------------------------------+

window
xpath
xpath_boolean
xpath_double
xpath_float
xpath_int
xpath_long
xpath_number
xpath_short
xpath_string
year(date)：

> select year('2011-11-11');

+------------------------------+
|year(CAST(2011-11-11 AS DATE))|
+------------------------------+
|                          2011|
+------------------------------+