Hive常用函数整理

最新推荐文章于 2021-10-14 15:24:30 发布

anonymox

最新推荐文章于 2021-10-14 15:24:30 发布

阅读量530

点赞数

分类专栏： # ——hive

本文链接：https://blog.csdn.net/cathycheny/article/details/109406578

版权

——hive 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

基础函数

hive内置函数查看（show & desc命令）

查看系统自带函数

show functions;

显示自带函数的用法

desc function upper;
desc function extended upper;

一、日期函数

1. 日期获取

查看当前日期

select current_date;
select current_timestamp();
select unix_timestamp();

查询今天是当月的第几天

select dayofmonth(current_date);

得到月末最后一天、当月第一天、下月第一天

select last_day(current_date);  --当月最后一天
select date_sub(current_date, dayofmonth(current_date)-1);  --当月第一天
select add_months(date_sub(current_date,dayofmonth(current_date)-1), 1);  --下月第一天

2. 日期转换

日期 <-> 时间戳

时间戳转换成日期

select from_unixtime(1505456567);
select from_unixtime(1505456567, 'yyyyMMdd');
select from_unixtime(1505456567, 'yyyy-MM-dd HH:mm:ss');

日期转换成时间戳

select unix_timestamp('2019-09-15 14:23:00');

日期、时间戳、字符串类型格式化输出标准时间格式

select date_format(current_timestamp(), 'yyyy-MM-dd HH:mm:ss');
select date_format(current_date(), 'yyyyMMdd');
select date_format('2020-06-01', 'yyyy-MM-dd HH:mm:ss');

日期 <-> 字符串

字符串转时间（注意字符串必须为 yyyy-MM-dd格式）

select to_date('2020-01-01');
select to_date('2020-01-01 12:12:12');

3. 日期计算

计算时间差 datediff

datediff(string enddate, string startdate) 结果默认是天数

select datediff('2019-11-21', '2020-04-18');

日期相减 date_sub

date_sub (string startdate, int days)

select date_sub(current_date, 1)

日期增加 date_add

date_add(string startdate, int days)

select date_add(current_date, 1)

4. 截取日期

取时间的年月日时分秒，以及是在今年的多少周

year(string date)
month(string date)
day(string date)
hour(string date)
minute(string date)
second(string date)
weekofyear(string date)

例子

一个例子，通过生日计算年龄

select *, ceil(datediff(current_date, birthdate)/365) workingyears from emp;

二、字符串函数

转小写

select lower('HELLO WORLD');

转大写

select upper(ename), ename from emp;

字符串长度

select length(ename), ename from emp;

字符串拼接（ concat 或 || ）

select empno || ” ” ||ename idname from emp;
select concat(empno, ” ” ,ename) idname from emp;

指定分隔符的字符串拼接 concat_ws

concat_ws(string SEP, string A, string B…)
concat_ws(separator, [string | array(string)]+

SELECT concat_ws(‘.’, ‘www’, array(‘lagou','com’));
select concat_ws(” “, ename, job) from emp;
select concat_ws(',','abc','def','gh') from lxw_dual;
#abc,def,gh

求子字符串

SELECT substr(‘www.lagou.com’, 5);
SELECT substr(‘www.lagou.com’, -5);
SELECT substr(‘www.lagou.com’, 5, 5);

字符串切分

select split('www.lagou.com', '\\.');

字符串去空格

trim, ltrim, rtrim

类型转换函数

select cast(1 as bigint) from lxw_dual;

复杂函数

一、行列转换

1. 行转列

使用条件判断
- case when then …else end
- if(expr, true_result, false_result)

-- 方法一
select uid
, max(case when fid = 100 then num else 0 end) as fid_100
, max(case when fid = 101 then num else 0 end) as fid_101
, max(case when fid = 103 then num else 0 end) as fid_103
from user
group by uid

-- 方法二
select uid
, max(if(fid = 100, num, 0)) as fid_100
, max(if(fid = 101, num, 0)) as fid_101
, max(if(fid = 103, num, 0)) as fid_103
from user
group by uid

在这里插入图片描述

2. 多行合并

相当于是针对字符串的聚合操作
- collect_list：把列聚合成为数组，不去重
- collect_set：把列聚合成为数据，去重
- concat_ws(separator, str1, str2, …)：把多个字符串用分隔符进行拼接

select uid, concat_ws(" ", collect_list(content)) as content
from comment
group by uid

在这里插入图片描述
注意：

collect_list不去重，collect_set会去重
结合group by 使用，类似于其他地方的listagg函数

3. 列转行

explode简单数据类型

列转行
- explode：处理数组结构的字段，转换成多行；
- LATERAL VIEW：其实explode是一个UDTF函数（一行输入多行输出），这个时候如果要select除了explode得到的字段以外的多个字段，需要创建虚拟表。

select uid, game 
from user_game
LATERAL VIEW explode(split(game_list,",")) tmpTable as game
-- split(game_list,",")相对字符串切割，得到数组
-- game 是给 explode(split(game_list,",")) 列起的别名

在这里插入图片描述

关于explode函数
注意，explode函数在select子句中可以单独使用，但不能与其他列同时出现。如果要与其他列一起使用，就用用到 lateral view
select中单独使用explode的例子：

select explode(array('A','B','C')) as col;  --返回结果为3行
select explode(map('a', 8, 'b', 88, 'c', 888));  --返回结果为3行

explode复杂数据类型

对于hive中的复杂数据类型，explode可以返回二元结果。
看下面这个例子。

– 数据示例

lisi|Chinese:90,Math:80,English:70
wangwu|Chinese:88,Math:90,English:96
maliu|Chinese:99,Math:65,English:60

– 表的DDL

create table studscore
(
name string,
score map<String,string>   --看这里，score是复杂数据类型map
) 
row format delimited
fields terminated by ‘|’
collection items terminated by ‘,’
map keys terminated by ‘:’;

– 使用explode函数，将map结果拆分为多行，单独将结果select出来

select explode(score) as (subject, socre) from studscore;

– 与lateral view函数联用（能select其他字段）

select name, subject, score1 as score from studscore
lateral view explode(score) t1 as subject, score1;

– 在转换后的表里找出每个人的最好成绩

select name, max(mark) maxscore
from (select name, subject, mark
from studscore lateral view explode(score) t1 as subject, mark) t1
group by name;

二、条件函数

有两种写法，一种是if，一种是case when

if 写法

if (boolean testCondition, T valueTrue, T valueFalseOrNull)

select sal, if (sal<1500, 1, if (sal < 3000, 2, 3)) from emp;

case when 写法

CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END

select sal, case when sal<=1500 then 1
when sal<=3000 then 2
else 3 end sallevel
from emp;

COALESCE
返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL

COALESCE(T v1, T v2, …)

select COALESCE(null,'100','50′) from lxw_dual;

三、窗口函数

窗口函数可以对每个组返回多行，而聚合函数对于每个组只能返回一行。

partition by : 在over窗口中进行分区，对某一列进行分区统计，窗口的大小就是分区的大小
order by: 对输入的数据进行排序
rows between … and …

unbounded preceding。组内第一行数据
n preceding。组内当前行的前n行数据
current row。当前行数据
n following。组内当前行的后n行数据
unbounded following。组内最后一行数据

1. 分组求和

计算分组求和的例子

select ename, sal, deptno,
sum(sal) over(partition by deptno order by ename)
from emp;

-- 等价于
select ename, sal, deptno,
sum(sal) over(partition by deptno order by ename rows between unbounded preceding and unbounded following)
from emp;

计算running_sum的例子

select ename, sal, deptno,
sum(sal) over(partition by deptno order by ename rows between unbounded preceding and current row)
from emp;

2. 排名函数

row_number()。排名顺序增加不会重复；如1、2、3、4、… …
RANK()。排名相等会在名次中留下空位；如1、2、2、4、5、… …
DENSE_RANK()。排名相等会在名次中不会留下空位；如1、2、2、3、4、… …

select cname, sname, score,
row_number() over (partition by cname order by score desc) rank1,
rank() over (partition by cname order by score desc) rank2,
dense_rank() over (partition by cname order by score desc) rank3
from t2;

3. 序列函数

lag。返回当前数据行的上一行数据
lead。返回当前数据行的下一行数据
first_value。取分组内排序后，截止到当前行，第一个值
last_value。分组内排序后，截止到当前行，最后一个值
ntile。将分组的数据按照顺序切分成n片，返回当前切片值

上一个值和下一个值

select cid, ctime, pv,
lag(pv) over(partition by cid order by ctime) lagpv,
lead(pv) over(partition by cid order by ctime) leadpv
from userpv;

第一个值和最后一个值

select cid, ctime, pv,
first_value(pv) over (partition by cid order by ctime rows between unbounded preceding and unbounded following) as firstpv,
last_value(pv) over (partition by cid order by ctime rows  between unbounded preceding and unbounded following) as lastpv
from userpv;

ntile。按照cid进行分组，每组数据分成2份

select cid, ctime, pv,
ntile(2) over(partition by cid order by ctime) ntile
from userpv;

anonymox

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Hive常用函数整理

文章目录hive内置函数日期函数一、日期获取二、日期转换日期 <-> 时间戳日期 <-> 字符串三、日期计算四、截取日期例子字符串函数hive内置函数查看系统自带函数show functions;显示自带函数的用法desc function upper;desc function extended upper;日期函数一、日期获取查看当前日期select current_date;select current_timestamp();selec
复制链接

扫一扫