hive常用函数大全

最新推荐文章于 2024-05-24 11:54:36 发布

蒙奇D婵

最新推荐文章于 2024-05-24 11:54:36 发布

阅读量1.7k

点赞数 3

本文链接：https://blog.csdn.net/mqd_chan/article/details/109621296

版权

#数据类型

Name	Type
字符串	string/varchar(65536)/char(255)
整数	smallint/int/bigint
小数	float/double/decimal(m,n)
布尔	boolean
日期	date/timestamp
列表	array<data_type>
结构体	struct<col_name:data_type,…>
键值	map<key_type,value_type>
符合类型	uniontype<data_type,…>

#数学函数

Name	Description
pow(double a,double p)	计算a的p次幂
hex(bigint a)/hex(string a))	计算十六进制a的string类型，如果a为string类型就转换成字符相对应的十六进制
unhex(string a)	hex的逆方法
conv(bigint/string num,int from_base,int to_base)	将数值base从from_base进制转为to_base进制
positive(int/double a)	返回数值a
negative(int/double a)	返回数值a的相反数
sign(double/decimal a)	返回数值a的符号
greatest(T v1, T v2, …)	求最大值
least(T v1, T v2, …)	求最小值
factorial(int a)	求a的阶乘
shiftleft(int a ,int b)	位左移
shiftright(int a ,int b)	位右移

#集合函数

Name	Description
size(Map<k,v>/Array)	返回集合的长度，返回类型int
map_keys(Map<k,v>)	返回map中的所有key，返回Array
map_values(Map<k,v>)	返回map中的所有value，返回 Array
array_contains(Array,value)	判断该数组Array包含value
sort_array(Array)	按自然顺序对数组进行排序并返回

#类型转换

Name	Description
cast(expr as type)	将expr转换成type类型

#日期函数

Name	Description
from_unixtime(bigint time,string time_format)	将时间的秒值转换成format格式 from_unixtime(1250111000,“yyyy-MM-dd”) 得到2009-03-12
date_format(date/timestamp/string date,string format)	按指定格式返回时间date 如：date_format(“2016-06-22”,“MM-dd”)=06-22
current_date()	获取当前日期（年月日）
to_date(string timestamp)	获取给定时间的年月日 to_date(“1970-01-01 00:00:00”) = “1970-01-01”
current_timestamp()	获取当前日期（年月日时分秒毫秒）
unix_timestamp()	获取本地时区下的时间戳
unix_timestamp(string date)	将格式为yyyy-MM-dd HH:mm:ss的时间字符串转换成时间戳如unix_timestamp(‘2009-03-20 11:30:01’) = 1237573801
unix_timestamp(string datetime,string format_pattern)	将指定时间字符串格式字符串转换成Unix时间戳如：unix_timestamp(‘2009-03-20’, ‘yyyy-MM-dd’) = 1237532400
date_add(string date,int days)	从开始时间date加上days
add_months(string date,int numberOfMonths)	返回当前时间下再增加numberOfMonths个月的日期
last_day(string date)	返回这个月的最后一天的日期，忽略时分秒部分（HH:mm:ss）
next_day(string date,string dayOfWeek)	date之后的下一个dayOfWeek为哪一天 MO,TU…
trunc(string date,string format)	返回时间的最开始年份或月份如trunc(“2016-06-26”,“MM”)=2016-06-01 trunc(“2016-06-26”,“YY”)=2016-01-01
date_add(next_day(current_date(),‘SU’),-7)	本周第一天
datediff(string enddate, string startdate)	计算开始时间startdate到结束时间enddate相差的天数
months_between(string datefrom,string dateto)	计算开始时间datefrom到结束时间dateto相差的月份

1、from_unixtime

语法: from_unixtime(bigint unixtime，string format)将时间的秒值转换成format格式
返回值: string
示例:

hive> select from_unixtime(1250111000,"yyyy-MM-dd");
=> 2009-08-13

2、date_format

语法: date_format(date/timestamp/string date,string format)按指定格式返回时间date
返回值: string
示例:

hive> select date_format("2016-06-22","MM-dd");
=> 06-22

3、current_date

语法: current_date() 获取当前日期（年月日）
返回值: string
示例:

hive> select current_date();
=> 2021-01-28

4、current_timestamp

语法: current_timestamp()获取当前日期（年月日时分秒毫秒）
返回值: string
示例:

hive> select current_timestamp();
=> 2021-01-01 22:38:43.904

5、to_date

语法: to_date(string timestamp) 获取给定时间的年月日
返回值: string
示例:

hive> select to_date("1970-01-01 00:00:00");
=> 1970-01-01

6、unix_timestamp

语法:

unix_timestamp() 获取本地时区下的时间戳
unix_timestamp(string timestamp)将格式为yyyy-MM-dd HH:mm:ss的时间字符串转换成时间戳
unix_timestamp(string datetime,string format) 将指定时间字符串格式字符串转换成时间戳

返回值: bigint
示例:

hive> select unix_timestamp();
=> 1611845071 

hive> select unix_timestamp('2009-03-20 11:30:01');
=> 1237573801

hive> select unix_timestamp('2009-03-20', 'yyyy-MM-dd');
=>1237532400

7、date_add

语法: date_add(string date,int days) 从开始时间date加上days
返回值: string
示例:

hive> select date_add(current(),3);
=>2021-01-28

hive> select date_add(current(),-3);
=>2021-01-25

8、add_months

语法: add_months(string date,int months) 返回当前月加上months个月的日期
返回值: string
示例:

hive> select add_months(current(),2);
=>2021-03-28

hive> select add_months(current(),-5);
=>2020-08-28

9、last_day

语法: last_day(string date) 返回这个月的最后一天的日期
返回值: string
示例:

hive> select last_day(2012-2-2);
=>2012-02-29

10、next_day

语法: next_day(string date,string dayOfWeek) #date之后的下一个dayOfWeek为哪一天
返回值: string
示例:

dayOfWeek为星期的缩写，不区分大小写
hive> select next_day("2021-01-01","fr");
=>2021-01-08

11、trunc

语法: trunc(string date,string format)返回时间的最开始年份或月份
返回值: string
示例:

hive> select trunc("2016-06-26","MM");
=>2016-06-01

hive> select trunc("2016-06-26","YYYY");
=>2016-01-01

12、datediff

语法: datediff(string enddate, string startdate) 计算开始时间startdate到结束时间enddate相差的天数
返回值: string
示例:

hive> select datediff('2021-02-01','2020-12-03');
=>60

13、months_between

语法: months_between(string datefrom,string dateto)计算开始时间datefrom到结束时间dateto相差的月份
返回值: string
示例:

hive> select months_between('2021-02-01','2020-12-03');
=>1.93548387

14、本周第一天

示例:

hive> select date_add(next_day(current_date(),'SU'),-7);
=>2021-01-24

15、本季度第一天

示例:

hive> select concat_ws('-',cast(year(current_date()) as string),
cast(ceil(month(current_date())/3)*3-2 as string),'1');
=>2021-1-1

#字符串函数

1、concat

语法: concat(string A, string B…)返回输入字符串连接后的结果，支持任意个输入字符串
返回值: string
示例:

hive> select concat("a","b");
=>ab

2、concat_ws

语法: concat_ws(sep, [string | array(string)]+)用指定分隔符进行拼接
返回值: string
示例:

hive> select concat_ws('.', 'www', array('facebook', 'com'));
=>www.facebook.com

3、regexp_replace

语法: regexp_replace(string str,string regex,string replacement)
返回值: string
示例:

hive> select regexp_replace('["henry","pola","ariel"]','\\[|\\]|"','');
=> henry,pola,ariel

4、regexp_extract

语法: regexp_extract(string str,string regex,int index)抽取字符串str中符合正则表达式regex的第index个部分的子字符串
返回值: string
示例:

hive> select regexp_extract('100-200', '(\d+)-(\d+)', 1);
=> 100

5、spilt

语法: spilt(string str,string regex)
返回值: array<string>
示例:

hive> select split('oneAtwoBthreeC', '[ABC]') ; 
=> ["one", "two", "three"]

6、substr/substring

语法: substr/substring(string str,int begin[,int len])
返回值: string
示例:

hive> select substr('Facebook', 5) ; 
=> 'book' 
hive> select substr('Facebook', -5) ; 
=> 'ebook' 
hive> select substr('Facebook', 5,1) ; 
=> 'b'

7、parse_url

语法: parse_url(string url string part [, string key])返回从URL中抽取指定部分的内容,参数part是要抽取的部分，这个参数包含(HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO
返回值: string
示例:

hive> select parse_url('http://facebook.com/path/p1.php?query=1', 'HOST');
=>'facebook.com'
hive> select parse_url('http://facebook.com/path/p1.php?query=1', 'QUERY'); 
=> 'query=1'  
hive> select parse_url('http://facebook.com/path/p1.php?query=1', 'QUERY', 'query'); 
=> '1'

8、get_json_object

语法: get_json_object(string json_string, string path)解析json的字符串json_string,返回path指定的内容
返回值: string
示例:

hive>select get_json_object('{"name":"herny","hobbies":["s","b","c"],"address":
{"province":"jiangsu","city":"nanjing"}}','$.name');

9、locate

语法: locate(string substr,string str,int startPos)#查找字符串str中的pos位置后字符串substr第一次出现的位置,查找失败将返回0
返回值: int
示例:

hive>select locate('bar', 'foobarbar', 5);
=> 7

10、instr

语法: instr(string str,string substr)查找字符串str中子字符串substr出现的位置，如果查找失败将返回0
返回值: int
示例:

hive>select instr('Facebook', 'boo');
=> 5
hive>select name from employee where str(name,'i')>0;
//查询姓名中包含i的人员

#表生成函数

1、explode

语法: explode(array/map<k,v>) 每行对应数组中的一个元素/每行对应每个map键-值，需配合lateral view侧视图使用
返回值: N rows
示例:

hive>select name,city from employee_id lateral view explode(cities) ct as city;
hive>select name,course,score from employee_id lateral view explode(scores) ct as course,score;

2、posexplode

语法: posexplode(array)与explode类似，不同的是还返回各元素在数组中的位置
返回值: N rows
示例:

hive>select posexplode(array("a","b","c"));
 0   a  
 1   b    
 2   c

3、stack

语法: stack(int N, v1, v2, …, vM)把M列转换成N行，每行有M/N个字段
返回值: N rows
示例:

hive>select stack(2,'a','b','c','d');
 a   b  
 c   d

4、json_tuple

语法: json_tuple(string json, string…key)从一个JSON字符串中获取多个键并作为一个元组返回
返回值: tuple
示例:
在这里插入图片描述

hive>select name,hobbies,age from jsontuple lateral view 
json_tuple(line,'name','hobbies','age') ct as name,hobbies,age;

查询结果：
在这里插入图片描述

hive>select concat_ws('-',first,last) name,hobby,age from jsontuple 
lateral view json_tuple(line,'name','hobbies','age') ct as name,hobbies,age 
lateral view json_tuple(name,'first','last')jt as first,last
lateral view explode(split(regexp_replace(hobbies,'\\[|\\]|"',''),',')) jt1 as hobby

查询结果： 在这里插入图片描述

5、parse_url_tuple

语法: parse_url_tuple(string url, string…parts) 返回从URL中抽取指定N部分的内容，参数url是URL字符串，而参数parts是要抽取的部分，这个参数包含HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:
返回值: tuple
示例:

hive> select parse_url_tuple('http://facebook.com/path/p1.php?query=1', 'HOST','QUERY');
**运行结果**
  c0       	    c1    
 facebook.com   query=1

#窗口函数

1、over()从句=>  over(partition by ???  order by ??? rows|range between ??? and ???)
2、窗口规范=> rows|range between (unbounded|N) preceding and (N preceding  | current row | (unbounded|N) following)
3、当order by后面缺少窗口从句条件，窗口规范默认rows between unbounded preceding  and current row
4、当order by和窗口从句都缺失，窗口规范默认rows between unbounded preceding  and unbounded following

案例表数据：
在这里插入图片描述

1、row_number

语法: row_number()对所有数值输出不同的序号，序号唯一连续
示例:

hive>select name,cost,orderdate,row_number() over(partition by name) rn
from platorder

在这里插入图片描述

2、rank

语法: rank()对相同数值，输出相同的序号，下一个序号跳过（1,1,3）

3、dense_rank

语法: dense_rank()对相同数值，输出相同的序号，下一个序号连续（1,1,2）

4、lag

语法: lag(col,n,df) 窗口内往前第N行col的值

5、lead

语法: lead(col,n,df) 窗口内往后第N行col的值
示例:

hive>select name,cost,
lag(cost,1,0) over(partition by name order by cost) fv,
lead(cost,1,0) over(partition by name order by cost) lv
from platorder

在这里插入图片描述

6、first_value

语法: first_value(col) 分组内排序后截止到当前行的第一个值

7、last_value

语法: last_value(col) 分组内排序后截止到当前行的最后一个值
示例:

hive>select name,cost,
first_value(cost) over(partition by name order by cost) fv,
last_value(cost) over(partition by name order by cost) lv,
last_value(cost) over(partition by name order by cost rows 
between unbounded preceding and unbounded following) lv2
from platorder

在这里插入图片描述

蒙奇D婵

关注

3
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
hive常用函数大全

#数据类型NameType字符串string/varchar(65536)/char(255)整数smallint/int/bigint小数float/double/decimal(m,n)布尔boolean日期date/timestamp列表array<data_type>结构体struct<col_name:data_type,…>键值map<key_type,value_type>
复制链接

扫一扫