hive常用函数实例解析--是真的细

本文链接：https://blog.csdn.net/and52696686/article/details/108732098

hive常用的内置函数实例解析

一、日期函数
- 1.1 日期>>>>时间戳：unix_timestamp()
- 1.2 时间戳>>>>日期：from_unixtime(bigint unixtime,string format)
- 1.3 获取当前日期：current_date()
- 1.4 获取当前日期 + 分钟：current_timestamp()
- 1.5 日期格式字符串转化为日期：date(string date)
- 1.6 获取日期格式字符串中具体的年、月、日、时、分、秒：year()、month()、day()、hour()、minute()、second()
- 1.7 日期比较函数，返回两个日期的差值：datediff(string date1,string date2)
- 1.8 日期减少函数，返回减少 x 天后的日期：date_sub(string date,int x)
- 1.9 日期增加函数,返回增加 x 天后的日期：date_add(string date, int x)
- 1.10 日期转周函数，获取一年当中的第几周：weekofyear(string date)
- 1.11 日期转天函数,获取当天日期是一周当中的第几天：dayofweek(string date)
二、字符串函数
- 2.1 获取字符串长度：length(string A)
- 2.2 字符串反转：reverse(string A)
- 2.3 字符串连接：concat(string A, string B…)
- 2.4 带分隔符的字符串连接：concat_ws((string SEP, string A, string B…)
- 2.5 字符串截取：substr,substring
- 2.6 字符串转大写： upper(string A) 、ucase(string A)
- 2.7 字符串转小写：lower(string A) 、lcase(string A)
- 2.8 去除字符串两边的空格：trim(string A)
- 2.9 去除字符串左边的空格：ltrim(string A)
- 2.10 去除字符串右边的空格：rtrim(string A)
- 2.11 使用正则表达式替换：regexp_replace(string A, string B, string C)
- 2.12 使用正则表达式匹配：regexp_extract(string subject, string pattern, int index)
- 2.13 URL 解析函数：parse_url(string urlString, string partToExtract [, string keyToExtract])
- 2.14 json解析函数：get_json_object(string json_string, string path)
- 2.15 空格字符串函数，返回长度为n的字符串：space(int n)
- 2.16 重复字符串函数，返回重复n次后的str字符串：repeat(string str, int n)
- 2.17 获取首字符ASCII值函数，返回字符串str第一个字符的ascii码：ascii(string str)
- 2.18 左补足函数，将str用pad进行左补足到len位：lpad(string str, int len, string pad)
- 2.19 右补足函数，将str用pad进行左补足到len位：rpad(string str, int len, string pad)
- 2.20 分割字符串函数：split(string str, string pat)
- 2.21 集合查找函数，返回str在strlist第一次出现的位置：find_in_set(string str, string strList)
三、集合统计函数
- 3.1 个数统计函数：count()
- 3.2 总和统计函数：sum(col), sum(DISTINCT col)
- 3.3 平均值统计函数：avg(col), avg(DISTINCT col)
- 3.4 最小值统计函数： min(col)
- 3.5 最大值统计函数： max(col)
- 3.6 非空集合总体变量函数：var_pop(col)
- 3.7 非空集合样本变量函数：var_samp (col)
- 3.8 总体标准偏离函数：stddev_pop(col)
- 3.9 样本标准偏离函数：stddev_samp(col)
- 3.10 中位数函数：percentile()
- 3.12 近似中位数函数：percentile_approx()
- 3.13 直方图：histogram_numeric(col, b)
四、数值计算
- 4.1 取整函数(四舍五入)：round()
- 4.2 向上取整：ceil(double a), ceiling(double a)
- 4.3 向下取整：floor(double a)
- 4.4 取随机函数：rand()
- 4.5 自然指数函数，返回自然对数e的a次方：exp(double a)
- 4.6 以10为底对数函数，返回以10为底的a的对数：log10(double a)
- 4.7 以2为底对数函数，返回以2为底的a的对数：log2(double a)
- 4.8 对数函数,返回以base为底的a的对数： log(double base, double a)
- 4.9 幂运算函数，返回a的p次幂：pow(double a, double p)， power(double a, double p)
- 4.10 开平方函数，返回a的平方根：sqrt(double a)
- 4.11 二进制函数，返回a的二进制码：bin(BIGINT a)
- 4.12 十六进制函数：hex(BIGINT a)
- 4.13 反转十六进制函数，返回该十六进制字符串所代码的字符串： unhex(string a)
- 4.14 进制转换函数，将数值num从from_base进制转化到to_base进制：conv(BIGINT num, int from_base, int to_base)
- 4.15 绝对值函数，返回数值a的绝对值： abs(double a) ， abs(int a)
- 4.16 取余函数，返回a除以b的余数：pmod(int a, int b),pmod(double a, double b)
- 4.17 正弦函数，返回a的正弦值： sin(double a)
- 4.18 反正弦函数，返回a的反正弦值：asin(double a)
- 4.19 余弦函数，返回a的余弦值：cos(double a)
- 4.20 反余弦函数，返回a的反余弦值：acos(double a)
五、数学运算
- 5.1 加法：+
- 5.2 减法：-
- 5.3 乘法：*
- 5.4 除法：/
- 5.5 取余：%
- 5.6 位与：&
- 5.7 位或：|
- 5.8 位异或：^
- 5.9 位取反：~
六、关系运算
- 6.1 等值比较： =
- 6.2 不等值比较： <>, !=
- 6.3 小于比较：<
- 6.4 小于等于比较：<=
- 6.5 大于比较：>
- 6.6 大于等于比较：>=
- 6.7 空值判断：is null
- 6.8 非空判断：is not null
- 6.9 like比较,匹配字符串A中符合B的部分：string A like string B
- 6.10 rlike比较，匹配字符串A中符合正则表达式B的部分：string A like string B
- 6.11 正则regexp匹配：string A regexp string B
七、逻辑运算
- 7.1 逻辑与：and
- 7.2 逻辑或：or
- 7.3 逻辑非：not
八、条件函数
- 8.1 if 函数
- 8.2 条件判断函数 case
- 8.3 非空查找函数：COALESCE(T v1, T v2, …)
九、表生成函数
- 9.1 map类型构建：map (key1, value1, key2, value2, …)
- 9.2 struct类型构建：struct(val1, val2, val3, …)
- 9.3 array类型构建：array(val1, val2, …)
- 9.4 将array 或 map 扩展成多行多列：explode(array())，explode(map())
- 9.5 将array转单列多行输出，在每行结果前增加序号：posexplode(array())
- 9.6 按指定需求，将array转多列多行，灵活输出：stack(int num,array())
- 9.7 根据元祖中josn字符串构建：json_tuple()
- 9.8 将分组中的某列转为一个数组返回：collect_list、collect_set
十、复杂类型访问操作
- 10.1 map类型访问：M[key]
- 10.2 struct类型访问: S.x
- 10.3 array类型访问： A[n]
- 10.4 Map类型长度函数，返回map类型的长度：size(map)
- 10.5 array类型长度函数，返回array类型的长度：size(Array<T>)
十一、类型转换函数
- 11.1 类型转换函数，返回转换后的数据类型的结果：cast(expr as <type>)

一、日期函数

1.1 日期>>>>时间戳：unix_timestamp()

返回值：bigint

①获取当前时间戳： unix_timestamp()

select unix_timestamp();      
--1565858389

②将日期格式字符串转为时间戳： unix_timestamp(string timestame)

输入的时间戳格式必须为 ‘yyyy-MM-dd HH:mm:ss’ ,如不符合则返回 null

select unix_timestamp('2019-08-15 16:40:00');   
--1565858400

select unix_timestamp('2019-08-15');  
--null

③将指定时间字符串格式字符串转化成unix时间戳： unix_timestamp(string date,string pattern)
如日期不符合对应格式则返回 null

select unix_timestamp('2019-08-15','yyyy-MM-dd');   
--1565798400

select unix_timestamp('2019-08-15','yyyy-MM-dd HH:mm:ss');
--null

select unix_timestamp('2019-08-15 16:40:00','yyyy-MM-dd HH:mm:ss');
--1565858400

1.2 时间戳>>>>日期：from_unixtime(bigint unixtime,string format)

返回值：string

①将时间戳秒数转化为UTC时间，并用字符串表示，可通过format规定的时间格式，指定输出的时间

select from_unixtime(1565858389,'yyyy-MM-dd HH:mm:ss');
--2019-08-15 16:39:49

select from_unixtime(1565858389,'yyyy-MM-dd');
-- 2019-08-15

②如果 unixtime 是13位的，需要先转成10位

select from_unixtime(cast(1553184000488/1000 as int),'yyyy-MM-dd HH:mm:ss'); 
-- 2019-03-22 00:00:00

--或者使用字符串函数 substr 从第一位截取10个数字，截取时间戳前10位 
select from_unixtime(cast(substr(1553184000488,1,10) as int),'yyyy-MM-dd HH:mm:ss');
-- 2019-03-22 00:00:00

1.3 获取当前日期：current_date()

返回值：string

select current_date();
-- 2020-09-28

1.4 获取当前日期 + 分钟：current_timestamp()

返回值：string

select current_timestamp();  
-- 2020-09-19 20:46:46.232

1.5 日期格式字符串转化为日期：date(string date)

返回值：string

select date('2020-07-25');  
-- 2020-07-25

select date('2020-07-25 00:00:00');  
-- 2020-07-25

1.6 获取日期格式字符串中具体的年、月、日、时、分、秒：year()、month()、day()、hour()、minute()、second()

返回值：int

select year('2020-09-22 15:16:02');
-- 2020

select month('2020-09-22 15:16:02');
-- 9

select day('2020-09-22 15:16:02');
-- 22

select hour('2020-09-22 15:16:02');
-- 15

select minute('2020-09-22 15:16:02');
-- 16

select second('2020-09-22 15:16:02');
-- 2

-- 注：字符串不带分钟，查询 时、分、秒 返回 null
select hour('2020-09-22');
-- null

1.7 日期比较函数，返回两个日期的差值：datediff(string date1,string date2)

返回值：int

函数输入参数的日期格式为：yyyy-MM-dd HH:mm:ss 、 yyyy-MM-dd

select datediff('2020-01-20','2020-01-10');
-- 10 

select datediff('2020-01-10','2020-01-20');
-- 返回-10，所以日期相减，应该是新的日期在前

1.8 日期减少函数，返回减少 x 天后的日期：date_sub(string date,int x)

返回值：string

select date_sub('2020-09-04',3);
-- 2020-09-01
select date_sub('2020-09-04 09:12:41',2); 
-- 2020-09-02

1.9 日期增加函数,返回增加 x 天后的日期：date_add(string date, int x)

返回值：string

select date_add('2020-09-04',3);
-- 2020-09-07
select date_add('2020-09-04 09:12:41',2); 
-- 2020-09-06

1.10 日期转周函数，获取一年当中的第几周：weekofyear(string date)

返回值：int

select weekofyear('2020-01-02 13:49:02');
-- 1
select weekofyear('2020-01-02'); 
-- 1

1.11 日期转天函数,获取当天日期是一周当中的第几天：dayofweek(string date)

返回值：int

select dayofweek('2020-09-22'); 
-- 返回3 (当前是星期二，但是一周是从星期天开始，星期天是1，所以星期二就是返回3)
select dayofweek('2020-09-22 13:49:02'); 
-- 3

二、字符串函数

2.1 获取字符串长度：length(string A)

返回值： int

select length("abcd");
-- 4

2.2 字符串反转：reverse(string A)

返回值： string

select reverse("abcd");
-- dcba

2.3 字符串连接：concat(string A, string B…)

返回值: string

select concat('abc','def','g');
-- abcdefg

2.4 带分隔符的字符串连接：concat_ws((string SEP, string A, string B…)

返回值: string
返回输入字符串连接后的结果，SEP可以指定各个字符串间的分隔符

select concat_ws(',','abc','def','gh');
-- abc,def,gh

注： concat函数和concat_ws函数的区别：

concat函数在连接字符串的时候，只要其中一个是NULL，那么将返回NULL

select concat('a','b');
-- ab
select concat('a','b',null);
NULL

concat_ws函数在连接字符串的时候，只要有一个字符串不是NULL，就不会返回NULL

select concat_ws('-','a','b');
-- ab
select concat_ws('-','a','b',null);
-- a-b
select concat_ws('','a','b',null);
-- ab

2.5 字符串截取：substr,substring

二者用法一致，返回值为 string

① 从指定位置 start 开始截取，截取到末尾，起始下标为 1： substr(string A, int start),substring(string A, int start)

select substr('abcdef',3);
-- cdef

select substring('abcdef',3);
-- cdef

-- 输入负值 -x，末尾往头部从 "倒数第x位" 反转截取，截取到末尾
select substr('abcdef',-3);
-- def

select substring('abcdef',-3);
-- def

②从指定位置 start 开始截取，截取长度为len的字符串，起始下标为 1：substr(string A, int start, int len),substring(string A, int start, int len)

select substr('abcdef',3,2);
-- cd

select substring('abcdef',3,2);
-- cd

-- 重倒数第3位开始截取，保留两个字符
select substr('abcdef',-3,2);
-- de

select substring('abcdef',-3,2);
-- de

2.6 字符串转大写： upper(string A) 、ucase(string A)

返回值: string

select upper('abcde');
-- ABCDE

select upper('aBcDe');
-- ABCDE

select ucase('abcde');
-- ABCDE

select ucase('aBcDe');
-- ABCDE

2.7 字符串转小写：lower(string A) 、lcase(string A)

返回值: string

select lower('ABCDE');
-- abcde

select lower('ABCde');
-- abcde

select lcase('ABCDE');
-- abcde

select lcase('ABCde');
-- abcde

2.8 去除字符串两边的空格：trim(string A)

返回值: string

select trim(' abc ');
-- abc

select trim(' a b c ');
-- 不会取出字符串中间的空格a b c

2.9 去除字符串左边的空格：ltrim(string A)

返回值: string

select ltrim(' abc ');
-- abc(空格) 

select length(ltrim(' abc '));
-- 4

2.10 去除字符串右边的空格：rtrim(string A)

返回值: string

select rtrim(' abc ');
-- (空格)abc

select length(rtrim(' abc '));
-- 4

2.11 使用正则表达式替换：regexp_replace(string A, string B, string C)

将 字符串A 中的符合 java正则表达式B 的部分替换为 字符串C
返回值: string

select regexp_replace('oneDay-2019-03-15','\\d{4}-\\d{2}-\\d{2}','2020-09-22');
-- oneDay-2020-09-22

2.12 使用正则表达式匹配：regexp_extract(string subject, string pattern, int index)

将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符
subject 是需要匹配的字符串
pattern 代表正则表达式，表达式中需包含一个或两个括号 字符串选择器 ()
index 包含0、1、2 三个数字，0代表匹配整个 pattern 中的内容，1代表匹配 第一个() 中的内容，2代表匹配 第二 () 中的内容
无论 subject 中有有几个符合匹配条件的元素，默认取第一个
返回值: string

例①：

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([0-9]+)([a-z]+)',0);
-- x=111abc

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([0-9]+)([a-z]+)',1);
-- 111

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([0-9]+)([a-z]+)',2);
-- abc

例②：

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([a-z]+)([0-9]+)',0);
-- x=ab123

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([a-z]+)([0-9]+)',1);
-- ab

select regexp_extract('x=ab123&x=111abc&x=123&x=3aaa','x=([a-z]+)([0-9]+)',2);
-- 123

例③：

select regexp_extract('https://detail.tmall.com/item.htm?spm=608.7065813.ne.1.Ni3rsN&id=522228774076&tracelog=fromnonactive','id=([0-9]+)',0);   
-- id=522228774076

select regexp_extract('https://detail.tmall.com/item.htm?spm=608.7065813.ne.1.Ni3rsN&id=522228774076&tracelog=fromnonactive','id=([0-9]+)',1);    
-- 522228774076

例④：

select regexp_extract('http://a.m.taobao.com/i41915173660.htm','i([0-9]+)',0);   
-- i41915173660

select regexp_extract('http://a.m.taobao.com/i41915173660.htm','i([0-9]+)',1);
-- 41915173660

例⑤：

select regexp_extract('123-basketball-#@','bas(.*?)(ball)',0);
-- basketball

select regexp_extract('123-basketball-#@','bas(.*?)(ball)',1);
-- ket

select regexp_extract('123-basketball-#@','bas(.*?)(ball)',2);
-- ball

2.13 URL 解析函数：parse_url(string urlString, string partToExtract [, string keyToExtract])

返回值：string
返回URL中指定的部分，partToExtract 的选项包含：[ host， path，query，ref，protocol，file， authority，userinfo ]

select parse_url('http://facebook.com/path/p1.php?query=1', 'PROTOCOL');   
-- http

select parse_url('http://facebook.com/path/p1.php?query=1', 'HOST');
-- facebook.com

select parse_url('http://facebook.com/path/p1.php?query=1', 'REF');
-- NULL

select parse_url('http://facebook.com/path/p1.php?query=1', 'PATH');
-- /path/p1.php

select parse_url('http://facebook.com/path/p1.php?query=1', 'QUERY');
-- query=1

select parse_url('http://facebook.com/path/p1.php?query=1', 'FILE');
-- /path/p1.php?query=1

select parse_url('http://facebook.com/path/p1.php?query=1', 'AUTHORITY');
-- facebook.com

select parse_url('http://facebook.com/path/p1.php?query=1', 'USERINFO');
-- NULL

2.14 json解析函数：get_json_object(string json_string, string path)

解析json的字符串json_string,返回path指定的内容
如果输入的json字符串无效，那么返回null
返回值：string

select  get_json_object(
'{
"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"}},
"email":"amy@only_for_json_udf_test.net",
"owner":"amy"
}'
,'$.owner');
-- amy

2.15 空格字符串函数，返回长度为n的字符串：space(int n)

返回值：string

select space(10);
-- 返回10个空格，不可见

select length(space(10));
-- 10

2.16 重复字符串函数，返回重复n次后的str字符串：repeat(string str, int n)

返回值：string

select repeat('abc',5);
-- abcabcabcabcabc

2.17 获取首字符ASCII值函数，返回字符串str第一个字符的ascii码：ascii(string str)

返回值：int

select ascii('abcde');
-- 97

select ascii('123ABC');
-- 49

2.18 左补足函数，将str用pad进行左补足到len位：lpad(string str, int len, string pad)

返回值：string

 select lpad('abc',10,'123');
 -- 1231231abc

2.19 右补足函数，将str用pad进行左补足到len位：rpad(string str, int len, string pad)

返回值：string

 select rpad('abc',10,'123');
 -- abc1231231

2.20 分割字符串函数：split(string str, string pat)

将字符串 str 按照 pat 进行分割，返回分割后的字符串数组
返回值：array

select split('a1b1c1d1',"1");
-- ["a","b","c","d",""]

select split("a1 b1 c1 d1"," ");
-- ["a1","b1","c1","d1"]

select split("a,b,c,d",",");
-- ["a","b","c","d"]

2.21 集合查找函数，返回str在strlist第一次出现的位置：find_in_set(string str, string strList)

strlist是用逗号分割的字符串
起始下标从1开始，如果没有找该str字符，返回0
返回值：int

select find_in_set('ab','ab,ac,ad');
-- 1 

select find_in_set('ab','ab,ac,ad,ab');
-- 1 有多个重复的元素，默认返回第一个

select find_in_set('ad','ab,ac,ad');
-- 3

select find_in_set('abc','ab,ac,ad,ab');
-- 0  元素不存在，返回0

select find_in_set('ab','ab ac ad');
-- 0 用空格分割的字符串不识别，默认'ab ab ac ad'为一个整体

三、集合统计函数

建表：

create table if not exists student(
id int,
name string,
score int
)
;

插入数据：

insert into student values
(1,'a',100),
(2,'b',90),
(3,'c',80),
(4,'c',80),
(5,null,null)
;

查看表数据：

select * from student;

在这里插入图片描述

3.1 个数统计函数：count()

参数：count(*), count(expr), count(DISTINCT expr[, expr_.])
count(*) 统计检索出的行的个数，包括NULL值的行；
count(expr) 返回指定字段的非空值的个数；
count(DISTINCT expr[, expr_.]) 返回指定字段的不同的非空值的个数(去重)
返回值: int

select count(*) from student;
-- 5

select count(1) from student;
-- 5  注：count(1)效率更高一些

select count(name) from student;
-- 4

select count(distinct name ) from student;
-- 3

3.2 总和统计函数：sum(col), sum(DISTINCT col)

sum(col) 统计结果集中col的相加的结果
sum(DISTINCT col) 统计结果中col不同值相加的结果
返回值：double

select sum(score) from student;
-- 350

select sum(distinct score) from student;
-- 270

3.3 平均值统计函数：avg(col), avg(DISTINCT col)

avg(col) 统计结果集中col的平均值
avg(DISTINCT col) 统计结果中col不同值相加的平均值
返回值：double

select avg(score) from student;
-- 87.5

select avg(distinct score) from student;
-- 90

3.4 最小值统计函数： min(col)

返回统计结果col字段的最小值
返回值：double

select min(score) from student;
-- 80

3.5 最大值统计函数： max(col)

返回统计结果col字段的最大值
返回值：double

select max(score) from student;
-- 100

3.6 非空集合总体变量函数：var_pop(col)

统计结果集中col非空集合的总体变量（忽略null）
返回值：double

select var_pop(score) from student;
-- 68.75

3.7 非空集合样本变量函数：var_samp (col)

统计结果集中col非空集合的样本变量（忽略null）
返回值：double

select var_samp(score) from student;
-- 91.66666666666667

3.8 总体标准偏离函数：stddev_pop(col)

计算总体标准偏离，并返回总体变量的平方根，其返回值与VAR_POP函数的平方根相同
返回值：double

select stddev_pop(score) from student;
-- 8.2915619758885

3.9 样本标准偏离函数：stddev_samp(col)

计算样本标准偏离
返回值：double

select stddev_samp(score) from student;
-- 9.574271077563381

3.10 中位数函数：percentile()

percentile(BIGINT col, p)
求准确的第pth个百分位数，p必须介于0和1之间，但是col字段目前只支持整数，不支持浮点数类型
返回值：double
percentile(BIGINT col, array(p1 [, p2]…))
返回值：array<double>

select percentile(score,0) from student;
-- 80.0

select percentile(score,0.5) from student;
-- 80.0

select percentile(score,1) from student;
-- 100.0

3.12 近似中位数函数：percentile_approx()

percentile_approx(DOUBLE col, p [, B])
求近似的第pth个百分位数，p必须介于0和1之间，返回类型为double，但是col字段支持浮点类型。参数B控制内存消耗的近似精度，B越大，结果的准确度越高。默认为10,000。当col字段中的distinct值的个数小于B时，结果为准确的百分位数
返回值：double
percentile_approx(DOUBLE col, array(p1 [, p2]…) [, B])
功能和上述类似，之后后面可以输入多个百分位数，返回类型也为array，其中为对应的百分位数
返回值：array<double>

3.13 直方图：histogram_numeric(col, b)

以b为基准计算col的直方图信息
返回值：array<struct {‘x’,‘y’}>

select histogram_numeric(score,5) from student;
-- [{"x":80.0,"y":2.0},{"x":90.0,"y":1.0},{"x":100.0,"y":1.0}]

四、数值计算

4.1 取整函数(四舍五入)：round()

round(double a) 去除小数部分，返回四舍五后的整数
返回值：bigint
round(double a, int d) ,返回指定精度d的double类型
返回值：double

select round(3.1415);
-- 3

select round(3.5);
-- 4

select round(3.1415,2);
-- 3.14

4.2 向上取整：ceil(double a), ceiling(double a)

返回大于或等于该double变量 a 的最小整数
返回值：bigint

select ceil(3.14);
-- 4

select ceil(3);
-- 3

select ceiling(3.14);
-- 4

select ceiling(3);
-- 3

4.3 向下取整：floor(double a)

返回小于或等于该double a 变量的最大整数
返回值：

select floor(3.14);
-- 3

select floor(3.99);
-- 3

select floor(3);
-- 3

4.4 取随机函数：rand()

rand() 返回一个0到1范围内的随机数
rand(int seed) 指定一个种子 seed，返回一个稳定的0到1范围内的随机数
返回值：double

select rand();
-- 0.06562041568797305

select rand();
-- 0.15920967950757114

select rand(1);
-- 0.7308781907032909
 
select rand(1);
-- 0.7308781907032909

4.5 自然指数函数，返回自然对数e的a次方：exp(double a)

返回值：double

select exp(1);
-- 2.718281828459045

select exp(2);
-- 7.38905609893065

4.6 以10为底对数函数，返回以10为底的a的对数：log10(double a)

返回值：double

select log10(10);
-- 1.0

select log10(100);
-- 2.0

4.7 以2为底对数函数，返回以2为底的a的对数：log2(double a)

返回值：double

select log2(2);
-- 1.0

select log2(4);
-- 2.0

select log2(8);
-- 3.0

4.8 对数函数,返回以base为底的a的对数： log(double base, double a)

返回值：double

select log(2,32);
-- 5.0

select log(3,10);
-- 2.095903274289385

4.9 幂运算函数，返回a的p次幂：pow(double a, double p)， power(double a, double p)

返回值：double

select pow(2,4);
-- 16.0

select pow(3,3);
-- 27.0

select power(2,4);
-- 16.0

select power(3,3);
-- 27.0

4.10 开平方函数，返回a的平方根：sqrt(double a)

返回值：double

select sqrt(9);
-- 3.0

4.11 二进制函数，返回a的二进制码：bin(BIGINT a)

返回值：string

select bin(10);
-- 1010

4.12 十六进制函数：hex(BIGINT a)

如果变量 a 是int类型，那么返回a的十六进制码
如果变量 a 是string类型，则返回该字符串的十六进制码
返回值：string

select hex(10);
-- A

select hex(16);
-- 10

select hex('abc');
-- 616263

4.13 反转十六进制函数，返回该十六进制字符串所代码的字符串： unhex(string a)

返回值：string

select unhex('A');
-- 不显示

select unhex('10');
-- 不显示

select unhex('616263');
-- abc

4.14 进制转换函数，将数值num从from_base进制转化到to_base进制：conv(BIGINT num, int from_base, int to_base)

返回值：string

select conv(18,10,16);
-- 12

select conv(18,10,2);
-- 10010

4.15 绝对值函数，返回数值a的绝对值： abs(double a) ， abs(int a)

返回值：double， int

select abs(-2.124);
-- 2.124

select abs(-2);
-- 2

4.16 取余函数，返回a除以b的余数：pmod(int a, int b),pmod(double a, double b)

返回值：int，double

①当 被除数a 和 除数b 都为正时，余数为正：

select pmod(5,3);
-- 2 

select pmod(9,4);
-- 1

②当 被除数a 为负， 除数b 为正时，余数为正，此时余数的计算公式为：
余数 = （(a÷b的商)取绝对值 + 1）x b + a

select pmod(-5,3);
-- 1   余数=((5/3)+1)*3-5=1 

select pmod(-9,4);
-- 3   余数=((9/4)+1)*4-9=3

③当 被除数a 为正， 除数b 为负时，余数为负，计算公式同②
余数 = （(a÷b的商)取绝对值 + 1）x b + a

select pmod(5,-3);
--  -1  余数=((5/3)+1)*(-3)+5=-1 

select pmod(9,-4);
--  -3  余数=((9/4)+1)*(-4)+9=-3

④当 被除数a 和 除数b 都为负时，余数为负，计算公式同①中常规计算，结果为负

select pmod(-5,-3);
--  -2 

select pmod(-9,-4);
--  -1

⑤当 a 可以整除 b 时，无论a、b正负，余数都为0

select pmod(6,2);
-- 0
select pmod(-6,2);
-- 0
select pmod(6,-2);
-- 0
select pmod(-6,-2);
-- 0

4.17 正弦函数，返回a的正弦值： sin(double a)

返回值：double

select sin(1);
-- 0.8414709848078965

4.18 反正弦函数，返回a的反正弦值：asin(double a)

返回值：double

select asin(0.8414709848078965);
-- 1.0

4.19 余弦函数，返回a的余弦值：cos(double a)

返回值：double

select cos(1);
-- 0.5403023058681398

4.20 反余弦函数，返回a的反余弦值：acos(double a)

返回值：double

select acos(0.5403023058681398);
-- 1.0

五、数学运算

5.1 加法：+

A + B
返回值：
int, int => int
int,double => double
double, double =>dooble

select 1 + 2;
-- 3

select 1 + 2.5;
-- 3.5

select 1.2 + 2.2;
-- 3.4000000000000004

5.2 减法：-

A - B
返回值：
int, int => int
int,double => double
double, double =>dooble

select 2 - 1;
-- 1

5.3 乘法：*

A * B
返回值：
int, int => int
int,double => double
double, double =>dooble
注：如果A乘以B的结果超过默认结果类型的数值范围，则需要通过cast将结果转换成范围更大的数值类型

select 4 * 5;
-- 20

select 2 * 2.2;
-- 4.4

5.4 除法：/

A / B
返回值：double

select 4 / 2;
-- 2.0

5.5 取余：%

A % B
返回值：
int, int => int
int,double => double
double, double =>dooble

select 5 % 3;
-- 2

select 5.2 % 3;
--  2.2

5.6 位与：&

A & B
A,B是int类型
返回值：int

select 1 & 2;
-- 0

5.7 位或：|

A | B
A,B是int类型
返回值：int

select 1 | 2;
-- 3

select 2 | 3;
-- 3

5.8 位异或：^

A + B
A,B是int类型
返回值：int

select 4 ^ 8;
-- 12

select 6 ^ 4;
-- 2

5.9 位取反：~

~A
A只能是正整数
返回值：int类型，返回 -(A+1)

select ~ 3;
-- -4

select ~ 4;
-- -5

select ~ 5;
-- -6

六、关系运算

6.1 等值比较： =

返回值：boolean

6.2 不等值比较： <>, !=

返回值：boolean

6.3 小于比较：<

返回值：boolean

6.4 小于等于比较：<=

返回值：boolean

6.5 大于比较：>

返回值：boolean

6.6 大于等于比较：>=

返回值：boolean

6.7 空值判断：is null

返回值：boolean

6.8 非空判断：is not null

返回值：boolean

6.9 like比较,匹配字符串A中符合B的部分：string A like string B

返回值：string

6.10 rlike比较，匹配字符串A中符合正则表达式B的部分：string A like string B

rlike 可以使用 java 正则表达式匹配
返回值：string

6.11 正则regexp匹配：string A regexp string B

功能与 rlike 相同
返回值：boolean

七、逻辑运算

7.1 逻辑与：and

A and B
返回值：boolean

7.2 逻辑或：or

A or B
返回值：boolean

7.3 逻辑非：not

not A
返回值：boolean

八、条件函数

8.1 if 函数

if(boolean testCondition, T valueTrue, T valueFalseOrNull)
当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull
返回值：T

select if(1=1,10,20);
-- 10

select if(1=2,10,20);
-- 20

8.2 条件判断函数 case

返回值：T
用法① ：case 字段a when 满足c1 then 输出r1 [ when 满足c2 then 输出r2 ] [ else 输出 r3 ] end
如果字段a满足条件c1,则输出r1；
如果满足c2则输出r2；
都不满足则输出r3
用法②：case when 满足c1 then 输出r1 [when 满足c2 then 输出r2] [else 输出r3] end
如果满足条件c1，则输出r1；
满足条件c2，则输出r2；
都不满足则输出r3；

-- 举例两条sql语句：
select *,case sex when 'male' then '男' when 'female' then '女' else '未知' from student;

select *, case when score >=90 then 'good' when score >=60 and score <90 then 'general' else 'bad' end as grade from student;

8.3 非空查找函数：COALESCE(T v1, T v2, …)

返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL
返回值：T

select COALESCE(null,'10','20');
-- 10

九、表生成函数

9.1 map类型构建：map (key1, value1, key2, value2, …)

根据输入的key和value对构建map类型

create table t1 as select map('1','a','2','b','3','c');

select * from t1;
-- {"1":"a","2":"b","3":"c"}

describe t1;
-- _c0                     map<string,string>

9.2 struct类型构建：struct(val1, val2, val3, …)

根据输入的参数构建结构体struct类型

create table t2 as select struct('a1','b1','c1');

select * from t2;
-- {"col1":"a1","col2":"b1","col3":"c1"}

describe t2;
-- _c0                     struct<col1:string ,col2:string,col3:string>

9.3 array类型构建：array(val1, val2, …)

根据输入的参数构建数组array类型

create table t3 as select array('a1','b1','c1');

select * from t3;
-- ["a1","b1","c1"]

describe t3;
-- _c0       array<string>

9.4 将array 或 map 扩展成多行多列：explode(array())，explode(map())

explode() 函数可以将一个 array 或者 map 展开：

explode(array)：array 转单列多行，使得结果中将array列表里的每个元素生成一行
explode(map)：map 转多行两列， 使得结果中将map里的每一对元素作为一行，key为一列，value为一列
explode() 函数一般结合侧视图 lateral view 一起使用

select explode(array(1,2,3,4));
+------+--+
| col  |
+------+--+
| 1    |
| 2    |
| 3    |
| 4    |
+------+--+

select explode(map('A','apple','O','orage'));
+------+--------+--+
| key  | value  |
+------+--------+--+
| A    | apple  |
| O    | orage  |
+------+--------+--+

-- 一般需要在 explode() 中嵌套 split() 函数分割某个字段，将其转成数组，然后在explode实现 “行转列”
select explode(split("a,b,c,d,e,f",","));
+------+--+
| col  |
+------+--+
| a    |
| b    |
| c    |
| d    |
| e    |
| f    |
+------+--+

-- 结合侧视图使用，不展示结果：
select id,name,sub
from student
later view explode(subject) subView as sub;

注：对于explode的限制，以下sql语句不支持：

不能和其他字段一起使用
错误： select name,explode(subject) from student;
不能和group by、sort by 和cluster by一起使用
错误： select explode(subject) from employees group by explode(subject);

9.5 将array转单列多行输出，在每行结果前增加序号：posexplode(array())

在扩展的行前增加序号，从0开始递增
参数只支持数组类型 array

select posexplode(array('a','b','c'));
+------+------+--+
| pos  | val  |
+------+------+--+
| 0    | a    |
| 1    | b    |
| 2    | c    |
+------+------+--+

9.6 按指定需求，将array转多列多行，灵活输出：stack(int num,array())

num 可以指定输出几列，但是 num 必须能被 array() 的长度 length 整除

select stack(1,'a','b','c','d');
+-------+-------+-------+-------+--+
| col0  | col1  | col2  | col3  |
+-------+-------+-------+-------+--+
| a     | b     | c     | d     |
+-------+-------+-------+-------+--+


select stack(2,'a','b','c','d');
+-------+-------+--+
| col0  | col1  |
+-------+-------+--+
| a     | b     |
| c     | d     |
+-------+-------+--+

select stack(4,'a','b','c','d');
+-------+--+
| col0  |
+-------+--+
| a     |
| b     |
| c     |
| d     |
+-------+--+

9.7 根据元祖中josn字符串构建：json_tuple()

select json_tuple('{"name":"zhangsan","age":"18"}','name','age');
+-----------+-----+--+
|    c0     | c1  |
+-----------+-----+--+
| zhangsan  | 18  |
+-----------+-----+--+

9.8 将分组中的某列转为一个数组返回：collect_list、collect_set

collect_list、collect_set 都是将分组中的某列转为一个数组返回，不同的是collect_list不去重，而collect_set去重

创建一张实验用表，存放用户每天点播视频的记录：

create table t_visit_video (
username string,
video_name string
) 
partitioned by (day string)
row format delimited fields terminated by ',';

在本地文件系统创建测试数据文件：

张三,大唐双龙传
李四,天下无贼
张三,神探狄仁杰
李四,霸王别姬
李四,霸王别姬
王五,机器人总动员
王五,放牛班的春天
王五,盗梦空间

将数据加载到Hive表：

load data local inpath '/root/hive/visit.data' into table t_visit_video partition (day='20180516');

在这里插入图片描述
按用户分组，取出每个用户每天看过的所有视频的名字：

select username, collect_list(video_name) from t_visit_video group by username ;

在这里插入图片描述
但是上面的查询结果有点问题，因为霸王别姬实在太好看了，所以李四这家伙看了两遍，这直接就导致得到的观看过视频列表有重复的，所以应该增加去重，使用collect_set，其与collect_list的区别就是会去重：

select username, collect_set(video_name) from t_visit_video group by username;

在这里插入图片描述
李四的观看记录中霸王别姬只出现了一次，实现了去重效果

突破group by限制
还可以利用collect来突破group by的限制，Hive中在group by查询的时候要求出现在select后面的列都必须是出现在group by后面的，即select列必须是作为分组依据的列，但是有的时候我们想根据A进行分组然后随便取出每个分组中的一个B，代入到这个实验中就是按照用户进行分组，然后随便拿出一个他看过的视频名称即可：

select username, collect_list(video_name)[0] from t_visit_video group by username;

在这里插入图片描述
video_name不是分组列，依然能够取出这列中的数据

如果查询 collect_set 之前有多个字段 student_id , score，…，这些字段必须要group by

select student_id,score,concat_ws(",",collect_set(question_id)) 
from ex_exam_record 
where score = 1 
group by student_id,score;

详细用法参考：hive笔记之collect_list/collect_set(列转行)

十、复杂类型访问操作

10.1 map类型访问：M[key]

M为map类型，key为map中的key值
返回map类型M中，key值为指定值的value值

create table t1 as select map('1','a','2','b','3','c') as m;

select * from t1;
-- {"1":"a","2":"b","3":"c"}

select m['1'],m['2'],m['3'] from t1;
-- a	b	c

10.2 struct类型访问: S.x

S为struct类型
返回结构体S中的x字段

create table t2 as select struct('a1','b1','c1') as s;

select * from t2;
-- {"col1":"a1","col2":"b1","col3":"c1"}

select s.col1,s.col2,s.col3 from t2;
-- a1	b1	c1

10.3 array类型访问： A[n]

A为array类型，n为int类型
返回数组A中的第n个变量值
数组的起始下标为0

create table t3 as select array('a1','b1','c1') as arr;

select * from t3;
-- ["a1","b1","c1"]

select arr[0],arr[1],arr[2] from t3;
-- a1	b1	c1

10.4 Map类型长度函数，返回map类型的长度：size(map)

size(Map<k .V>)
返回值：int

select size(map('1','a','2','b','3','c'));
-- 3

10.5 array类型长度函数，返回array类型的长度：size(Array)

返回值：int

select size(array('a1','b1','c1'));
-- 3

十一、类型转换函数

11.1 类型转换函数，返回转换后的数据类型的结果：cast(expr as )

返回值：Expected “=” to follow “type”

select cast(1 as bigint);
-- 1

select cast (2.1 as int);
-- 2

select cast(2.99 as int);
-- 2

select cast('1.35' as int);
-- 1

select cast('1.35' as double);
-- 1.35