一、常用方式整理
快捷计算方式 | Hive/Spark | impala | 备注 |
当天日期 | cast(current_date() as string) cast(date_format(current_date(),'yyyyMMdd') as string) | cast(from_timestamp(now(),'yyyy-MM-dd') as string) cast(from_timestamp(now(),'yyyyMMdd') as string) | 2023-06-30 20230630 |
昨天日期 | cast(date_sub('2023-06-30',1) as string) cast(date_format(date_sub('2023-06-30',1),'yyyyMMdd') as string) | cast(from_timestamp(date_sub(now(),1),'yyyy-MM-dd') as string) cast(from_timestamp(date_sub(now(),1),'yyyyMMdd') as string) | 2023-06-29 20230629 |
去年当天 | cast(date_sub(current_date(),365) as string) cast(date_format(date_sub(current_date(),365),'yyyyMMdd') as string) | cast(from_timestamp(date_sub(now(),365),'yyyy-MM-dd') as string) cast(from_timestamp(date_sub(now(),365),'yyyyMMdd') as string) | 2022-06-30 20220630 |
去年当月 | cast(date_format(date_sub(current_date(),365),'yyyy-MM') as string) cast(date_format(date_sub(current_date(),365),'yyyyMM') as string) | cast(from_timestamp(date_sub(now(),365),'yyyy-MM') as string) cast(from_timestamp(date_sub(now(),365),'yyyyMM') as string) | 2022-06 202206 |
当月 | substr(cast(current_date() as string),1,7) replace(substr(cast(current_date() as string),1,7),'-','') | substr(cast(now() as string),1,7) replace(substr(cast(now() as string),1,7),'-','') | 2023-06 202306 |
上月 | substr(cast(add_months(current_date(),-1) as string),1,7) replace(substr(cast(add_months(current_date(),-1) as string),1,7),'-','') | substr(cast(add_months(now(),-1) as string),1,7) replace(substr(cast(add_months(now(),-1) as string),1,7),'-','') | 2023-05 202305 |
下个月 | substr(cast(add_months(current_date(),1) as string),1,7) replace(substr(cast(add_months(current_date(),1) as string),1,7),'-','') | substr(cast(add_months(now(),1) as string),1,7) replace(substr(cast(add_months(now(),1) as string),1,7),'-','') | 2023-07 202307 |
当年 | substr(cast(current_date() as string),1,4) | cast(year(now()) as string) | 2023 |
去年 | cast(cast(date_format(current_date(),'yyyy') - 1 as decimal(4,0)) as string) | cast(year(years_add(now(),-1)) as string) | 2022 |
明年 | cast(cast(date_format(current_date(),'yyyy') + 1 as decimal(4,0)) as string) | cast(year(years_add(now(),1)) as string) | 2024 |
二、常用函数整理
函数分类 | 细分 | 函数作用 | Hive/Spark | impala | 备注 |
时 间 函 数 | 获取当前 | 当前日期 | current_date() | 无 | 2023-06-30 |
当前日期时间 | current_timestamp() | current_timestamp() now() | 2023-06-30 13:14:57 | ||
转换时间戳(秒)缺省获取当前 | unix_timestamp() unix_timestamp('2021-08-14 10:05:20') | unix_timestamp() unix_timestamp('2021-08-14 10:05:20') | 1628911641 | ||
格式转换 | 日期时间转日期函数 | to_date('2023-06-30 13:34:12') | to_date('2023-06-30 13:34:12') | 2023-06-30 | |
转化UNIX时间戳 | from_unixtime(1323308945,'yyyy-MM-dd') | from_unixtime(1323308945,'yyyy-MM-dd') | 2023-06-30 | ||
格式化时间(字符串/时间戳/日期) | date_format('2023-06-30','yyyyMMdd') | from_timestamp('2023-06-30','yyyyMMdd') | 20230630 | ||
获取年 | year('2023-06-30 13:34:12') | year('2023-06-30 13:34:12') | 2023 | ||
获取季度 | quarter('2023-06-30 13:34:12') | quarter('2023-06-30 13:34:12') | 2 | ||
获取月份 | month('2023-06-30 13:34:12') | month('2023-06-30 13:34:12') | 6 | ||
获取天(从1开始) | day('2023-01-01 13:34:12') | day('2023-01-01 13:34:12') | 1 | ||
获取小时 | hour('2023-01-01 13:34:12') | hour('2023-01-01 13:34:12') | 13 | ||
获取分钟 | minute('2023-01-01 13:34:12') | minute('2023-01-01 13:34:12') | 34 | ||
获取秒 | second('2023-01-01 13:34:12') | second('2023-01-01 13:34:12') | 12 | ||
获取周 | weekofyear('2021-08-14 10:05:20') | weekofyear('2021-08-14 10:05:20') | 32 | ||
当前周的第几天(注:周日为第1天) | dayofweek('2023-06-27 10:05:20') | dayofweek('2023-06-27 10:05:20') | 3 | ||
当前月的第几天 | 无 | dayofmonth(now()) | |||
时间推算 | 日期相减(第一个日期减第二个日期) | datediff('2021-08-14','2021-08-08') | datediff('2021-08-14','2021-08-08') | 6 | |
日期减少函数 | date_sub('2021-08-14',6) | date_sub('2021-08-14',6) date_sub(now(),interval 2 months) | 2021-08-08 2023-08-27 17:40:31.94032 | ||
日期增加函数 | date_add('2021-08-14',6) | date_add('2021-08-14',6) date_add(now(),interval 2 hours) | 2021-08-20 2023-06-27 19:41:21.906604 | ||
两个日期之间包含的月数 | months_between('2021-10-14','2021-05-04') | months_between('2021-10-14','2021-05-04') | 5.32258065 | ||
获取周期最初日期(月初 / 年初) | trunc(current_date(),'MM') trunc(current_date(),'YY') | trunc(current_date(),'month') trunc(current_date(),'year') | 2023-06-01 2023-01-01 | ||
日期当月最后一天 | last_day('2023-01-15') | last_day('2023-01-15') | 2023-01-31 | ||
增加年 | 无 | years_add(cast('2023-01-15' as timestamp),1) | 2024-01-15 00:00:00.0 | ||
减少年 | 无 | years_sub(cast('2023-01-15' as timestamp),1) | 2022-01-15 00:00:00.0 | ||
增加月 | add_months('2023-01-15',1) | months_add(cast('2023-01-15' as timestamp),1) | 2023-02-15 00:00:00.0 | ||
减少月 | add_months('2023-01-15',-1) | months_sub(cast('2023-01-15' as timestamp),1) | 2022-12-15 00:00:00.0 | ||
增加周 | 无 | weeks_add(cast('2023-01-15' as timestamp),1) | 2023-01-22 00:00:00.0 | ||
减少周 | 无 | weeks_sub(cast('2023-01-15' as timestamp),1) | 2023-01-08 00:00:00.0 | ||
增加小时 | 无 | hours_add(cast('2023-01-15' as timestamp),1) | 2023-01-15 01:00:00.0 | ||
减少小时 | 无 | hours_sub(cast('2023-01-15' as timestamp),1) | 2023-01-14 23:00:00.0 | ||
增加分钟 | 无 | minutes_add(cast('2023-01-15' as timestamp),1) | 2023-01-15 00:01:00.0 | ||
减少分钟 | 无 | minutes_sub(cast('2023-01-15' as timestamp),1) | 2023-01-14 23:59:00.0 | ||
增加秒 | 无 | seconds_add(cast('2023-01-15' as timestamp),1) | 2023-01-15 00:00:01.0 | ||
减少秒 | 无 | seconds_sub(cast('2023-01-15' as timestamp),1) | 2023-01-14 23:59:59.0 | ||
下个周周一 | 无 | next_day(cast('2023-01-15' as timestamp),'monday') | 2023-01-16 00:00:00.0 | ||
每月的第几天 | 无 | dayofmonth(cast('2023-01-15' as timestamp)) | 15 | ||
月的名称 | 无 | monthname(cast('2023-01-15' as timestamp)) | January | ||
星期几的名称 | 无 | dayname(cast('2023-01-15' as timestamp)) | Sunday | ||
字符串函数 | 字符串长度 | length('jiaoxin') | length('jiaoxin') | 7 | |
反转字符串 | reverse('jiaoxin') | reverse('jiaoxin') | nixoaij | ||
拼接字符串(任何一个为null返回null) | concat('xiaojiao','xiaojiao','ETL') | concat('xiaojiao','xiaojiao','ETL') | xiaojiaoxiaojiaoETL | ||
带分隔符字符串拼接 | concat_ws("#",'jiao', 'xin') | concat_ws("#",'jiao', 'xin') | jiaoxin#xin | ||
字符串截取 | substr('jiaoxin',1,4) | substr('jiaoxin',1,4) | jiao | ||
字符串转大写 | upper('jiaoxin') ucase('jiaoxin') | upper('jiaoxin') ucase('jiaoxin') | JIAOXIN | ||
字符串转小写 | lower('JIAOXIN') lcase('JIAOXIN') | lower('JIAOXIN') lcase('JIAOXIN') | jiaoxin | ||
去空格 | trim(' haha woshi zhangsan ') | trim(' haha woshi zhangsan ') | haha woshi zhangsan | ||
去左边空格 | ltrim(' haha woshi zhangsan ') | ltrim(' haha woshi zhangsan ') | haha woshi zhangsan | ||
去右边空格 | rtrim(' haha woshi zhangsan ') | rtrim(' haha woshi zhangsan ') | haha woshi zhangsan | ||
正则替换 | regexp_replace('marry','a|r','') | regexp_replace('marry','a|r','') | my | ||
正则解析 (0 是表示把整个表达式对应的结果全部返回 1 表示返回正则表达式中第一个()对应的结果 ...依次类推) | regexp_extract('marry','m(.*?)(ry)',0) regexp_extract('marry','m(.*?)(ry)',1) regexp_extract('marry','m(.*?)(ry)',2) | regexp_extract('marry','m(.*?)(ry)',0) regexp_extract('marry','m(.*?)(ry)',1) regexp_extract('marry','m(.*?)(ry)',2) | marry ar ry | ||
json解析 | get_json_object('{"name":"marry","sex":"F"}','$.name') | get_json_object('{"name":"marry","sex":"F"}','$.name') | marry | ||
返回指定长度的空格字符串 | space(10) | space(10) | |||
重复字符串 | repeat('tom',3) | repeat("tom",3) | tomtomtom | ||
左补足 | lpad('jiao',11,'ha') | lpad("jiao",11,'ha') | hahahahjiao | ||
右补足 | rpad('jiao',10,'6') | rpad("jiao",10,'6') | jiao666666 | ||
分割字符串 | split('jiao,kong,qu',',') | 无 | ["jiao","kong","qu"] | ||
分割字符串(返回指定位置) | 无 | split_part('part1-part2-part3','-',2) | part2 | ||
集合字符串位置查找(未找到返回0) | find_in_set('ab','fwqieur,ab') | find_in_set('ab','fwqieur,ab') | 2 | ||
字符串位置查找 | instr('abcdf','df') | instr('abcdf','df') | 4 | ||
一行转多行 | select tmp_col from xxxxx LATERAL VIEW explode (split(column_1, '\\.')) tmp as tmp_col where xxxx = yyyyy | 无 | 把column_1按照英文符号.拆分 成多行(其他字段会和之前的那 行保持一致) | ||
返回第一个非空(NULL)字符串(两个参数) 注:如果是空字符串'' 则会返回'', 并不会继续往后找后面有内容的值 | nvl(NULL,'1') nvl('','1') | nvl(NULL,'1') nvl('','1') | 1' '' | ||
返回第一个非空字符串(无限多个参数) | coalesce(NULL,'','NULL','123') | coalesce(NULL,'','NULL','123') | '' | ||
开窗函数 | 窗口内的行号 | row_number()over(partition by industry order by sale_amt desc) as amt_rank | row_number() over(partition by weeks order by uid) as uid_num | ||
窗口内向下第n行的值 | lead(dates,1) over(distribute by uid sort by dates asc) as dates_end | lead(dates,1) over(distribute by uid sort by dates asc) as dates_end | 按照销额计算产业排名 | ||
窗口内向上第n行的值 | lag(dates,1) over(distribute by uid sort by dates asc) as dates_end | lag(dates,1) over(distribute by uid sort by dates asc) as dates_end | |||
窗口内排序的序号(考虑重复,挤占后续位置) | rank() over(partition by cookieid order by pv desc) as rank_num | rank() over(partition by cookieid order by pv desc) as rank_num | 1 1 3 4 5 | ||
窗口内排序的序号(考虑重复,不挤占后续位置) | dense_rank() over(partition by cookieid order by pv desc) as rank_num | dense_rank() over(partition by cookieid order by pv desc) as rank_num | 1 1 2 3 4 | ||
窗口内的第一个值 | first_value(url) over(partition by cookieid order by createtime) as url_first_value | first_value(url) over(partition by cookieid order by createtime) as url_first_value | |||
窗口内的最后一个值 | last_value(url) over(partition by cookieid order by createtime) as url_last_value | last_value(url) over(partition by cookieid order by createtime) as url_last_value | |||
窗口内求和 | sum(sale_qty) over(partition by industry order by weeks) as sale_qty | sum(sale_qty) over(partition by industry order by weeks) as sale_qty | |||
窗口内去重计数 | count(distinct uid) over(partition by weeks) as uid_num | 无 | |||
窗口内所有值最大值 | max(uid) over(partition by weeks) as uid_max | max(uid) over(partition by weeks) as uid_max | |||
窗口内所有值最小值 | min(uid) over(partition by weeks) as uid_min | min(uid) over(partition by weeks) as uid_min |