hive 去重字符串_hive常用函数

最新推荐文章于 2021-11-14 18:10:55 发布

Jerry大王

最新推荐文章于 2021-11-14 18:10:55 发布

阅读量363

点赞数

文章标签： hive 去重字符串

本文链接：https://blog.csdn.net/weixin_34070493/article/details/114174448

版权

1.日期函数

将时间戳转化为日期

•from_unixtime(bigint unixtime,string format)

举例：from_unixtime(1237573801,'yyyy-MM-dd HH:mm:ss')

•常用string format 格式

1.'yyyy-MM-dd HH:mm:ss' 年月日时分秒格式2.'yyyy-MM-dd HH:mm' 年月日时分格式3.'yyyy-MM-dd HH' 年月日时格式4.'yyyy-MM-dd ' 年月日格式

将日期转化为时间戳

•unix_timestamp(srtingdate)

举例：unix_timestamp('2020-01-01 06:06:00','yyyy-MM-dd HH:mm:ss')

日期比较函数

•datediff(string enddate,string startdate) 日期间隔天数，结束日期和开始日期间隔天数

举例：datediff('2020-02-02','2020-02-01')

结果：1

日期增加函数

•date_add (string startdate, int days) 返回开始日期startdate增加days天后的日期。

举例: date_add ('2020-02-12', 10)

结果：2020-02-22

日期减少函数

•date_sub (string startdate, int days) 返回开始日期startdate减少days天后的日期。

举例：date_sub ('2020-02-12', 10)

结果：2020-02-02

返回日期时间字段中的日期部分

•to_date('yyyy-MM-dd HH:mm:ss')

2.条件函数

if函数

•if(条件表达式，结果1，结果2) 当条件为true时，返回结果1，否则结果2

举例：if(2>1,'是','否')

结果：是

条件判断函数case when

•case 表达式 when 条件1 then 结果 1 when 条件2 then 结果 2 else 结果3 end as 字段名

举例:select case when age<20 then "20岁以下"when age>=20 and age<30 then "20-30岁"when age>=30 and age<40 then "30-40岁"else "40岁以上"end as age_type,count (uid)from usergroup by case when age<20 then "20岁以下"when age>=20 and age<30 then "20-30岁"when age>=30 and age<40 then "30-40岁"else "40岁以上"end;

非空查找函数

•coalesce(T v1, T v2) 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL

举例：coalesce( a,b,c) 如果a为null,则选择b，如果b为null，则选择c；

如果a不为null，则选择a；

如果a b c 都均为null，则返回为null。

3.窗口函数

累计计算窗口函数

•sum(A)over(partition by B order by C rows between D1 and D2)

•avg(A)over(partition by B order by C rows between D1 and D2)

1.rows between unbounded preceding and current row 包括本行和之前所有的行2.rows between current row preceding and unbounded following 包括本行和之后所有的行3.rows between 3 preceding and current row 包括本行以内和前三行4.rows between 3 preceding and 1 following 从前三行到下一行(5行)

分区排序窗口函数

•row_number()over(partition by A order by B)

依次排序，序号不会重复

•rank()over(partition by A order by B)

字段值相同，序号相同，跳跃排序，如果有两个第一名，接下来是第三名。

•dense_rank()over(partition by A order by B)

字段值相同，序号相同，连续排序，如果有两个第一名，接下来是第二名。

mmbizpngx9ibo11j92SQEuPbzwmReL7RruFGnPRTnBJeuW4nuDewIThcCO0BibdrCfFR1tcLGiblaowBBKj6Ht2XFh00cenbg640.png

分组排序窗口函数

•ntile(n) over(partition by A order by B)

n为要分组的数量

偏移分析窗口函数

•lag(exp_str,offset,defval)over(partition by A order by B)

同一次查询中取出同一字段的前N行数据

•lead(exp_str,offset,defval)over(partition by A order by B)

同一次查询中取出同一字段的后N行数据

exp_str：字段名称

offset：偏移量，即上一个或者上N个的值，以lag函数为列子，假设当前行在表中排在第5行，则offset为 3，则表示我们要找的数据行就是表中的第二行(即5-3=2)，offset默认值为1

defval：当两个函数取上N/下N个值，当在表中从当前行位置向前数N行已经超出了表的范围时，函数会将

defval这个参数作为函数的返回值，若没有指定默认值，则返回为null

mmbizpngx9ibo11j92SQEuPbzwmReL7RruFGnPRTnL6gjXmEAcgMrZ5xLlzhdgjb3zC5ictxMbePO4Ie5Yz1Q2ibZmLrtCxqw640.png

4.聚合函数

•sum()：求和•avg()：平均值•max()：最大值•min()：最小值•count()：计数•count(distinct ...) 去重计数

5.hive解析json字符串

•get_json_object(string json_string, string path)

string json_string：填写json对象变量

string path：第二个参数使用$表示json变量标识，通常为$.key形式

举例：假设fruit为fruits表中的字段，其结构为fruit: {"type":"apple","weight":2,"price":8,"color":"red"} select get_json_object(fruit, '$.type') from fruits

结果：apple

6.截取字符串函数

•substr(string,int start,int len)

举例: substr('2020-01-01' ,1,7)

结果：2020-01

mmbizpngb96CibCt70iaaTuCKQ0GWZica1bCbcaogd33fFUIPDBoaia52o53HV3Qq1sxh7PkTKLHZuLUZY6CHtDhEsdGGAJOEA640.png mmbizpngx9ibo11j92SRx0oCxG4b9ATbibIegWVobf71OsaUV56ZFktj8XlAkbFlKc10y0AOicrgOR7Hw3ibnMn79hHBtzrjdg640.png

Jerry大王

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive 去重字符串_hive常用函数

1.日期函数将时间戳转化为日期•from_unixtime(bigint unixtime,string format)举例：from_unixtime(1237573801,'yyyy-MM-dd HH:mm:ss')•常用string format 格式1.'yyyy-MM-dd HH:mm:ss' 年月日时分秒格式2.'yyyy-MM-dd HH:mm' 年月日时分格式3.'yyyy-MM-...
复制链接

扫一扫