【Hive常用函数大全】

Tonystark_sunshine

已于 2022-09-13 20:46:56 修改

阅读量560

点赞数 2

分类专栏： Hive 文章标签： hive hadoop 数据仓库

于 2022-09-07 11:30:51 首次发布

本文链接：https://blog.csdn.net/tonystark_lz/article/details/126740970

版权

Hive 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Hive常用函数大全

1.数值函数
2.字符串函数
3.日期函数
4.集合函数

1.数值函数

1）round：四舍五入

hive> select round(3.3);   
3

2）ceil：向上取整

hive> select ceil(3.1) ;   
4

3）floor：向下取整

hive> select floor(4.8);  
4

2.字符串函数

1）upper：转大写
2）lower：转小写
3）length：获取字符串长度
4）trim：去除字符串两边的空格
5）substring：截取字符串
6）replace 替换
7）regexp_replace：支持的正则的替换

select regexp_replace("a23b45c","\\d","");

abc

8）regexp：字符串能否被正则匹配

select "abc" regexp "^a.*";
true
select "abc" regexp "\\d";
false

9）regexp_extract 正则匹配字符串分组
将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符

hive> select regexp_extract('1-20-300', '(.*)-(.*)-(.*)', 3);

输出：

hive> 300

10）repeat 将字符串复制多次

hive> select repeat('123', 3);

输出：

hive> 123123123

11）split 将字符串进行切割
12）nvl 替换null值

select nvl(null,1); --如果数据为null则替换为后面的值，如果不为null保留值本身

输出：

hive> 1

13）concat 拼接字符串
14）concat_ws 以指定分隔符拼接字符串或者字符串数组

select concat_ws('-',"2022","09","07");
2022-09-07

15）get_json_object解析json的字符串
get_json_object(string json_string, string path)
解析json的字符串json_string，返回path指定的内容。如果输入的json字符串无效，那么返回NULL。

select get_json_object('[{"name":"大海海","sex":"男","age":"25"},{"name":"小宋宋","sex":"男","age":"47"}]','$.[0].name');
输出：大海海
select get_json_object('{"id":1001,"name":"lz"}',"$.id");
输出：1001
select get_json_object('{
    "job": {
        "setting": {
            "speed": {
                 "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "column": [
                            "id",
                            "name"
                        ],
                        "splitPk": "db_id",
                        "connection": [
                            {
                                "table": [
                                    "table"
                                ],
                                "jdbcUrl": [
     "jdbc:mysql://127.0.0.1:3306/database"
                                ]
                            }
                        ]
                    }
                },
               "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "print":true
                    }
                }
            }
        ]
    }
}','$.job.content.reader.parameter.username');

输出：root

3.日期函数

1）unix_timestamp:返回当前或指定时间的时间戳
2）from_unixtime，from_utc_timestamp：转化UNIX时间戳（从 1970-01-01 00:00:00 UTC 到指定时间的秒数）到当前时区的时间格式
3）current_date：当前日期
4）current_timestamp：当前的日期加时间，并且精确的毫秒
5）month：获取日期中的月
6）day：获取日期中的日
7）hour：获取日期中的小时
8）dayofmonth：当前时间是一个月中的第几天
9）datediff：两个日期相差的天数（结束日期减去开始日期的天数）
10）date_add：日期加天数
11）date_sub：日期减天数
12）date_format:将标准日期解析成指定格式字符串
案例：

-- 当前时间：2022-09-07 11:11:07
select unix_timestamp(); -- 1662520267
select from_unixtime(1662520267);-- 2022-09-07 03:11:07 弊端：无法更换时区
-- 整数以毫秒为单位 小数以秒为单位
select from_utc_timestamp(1662520267,'GMT+8');-- 1970-01-20 13:48:40.267000000
select from_utc_timestamp(1662520267000,'GMT+8');-- 2022-09-07 11:11:07.000000000
select from_utc_timestamp(1662520267.0,'GMT+8');-- 2022-09-07 11:11:07.000000000
select date_format(from_utc_timestamp(1662520267.0,'GMT+8'),"yyyy-MM-dd HH:mm:ss");-- 2022-09-07 11:11:07

4.集合函数

1）size：集合中元素的个数

select size(friends) from test;  --2/2  每一行数据中的friends集合里的个数

2）map：创建map集合

hive> select map('xiaohai',1,'dahai',2);  
输出：
hive> {"xiaohai":1,"dahai":2}

3）map_keys：返回map中的key

hive> select map_keys(map('xiaohai',1,'dahai',2));
输出：
hive>["xiaohai","dahai"]

4）map_values: 返回map中的value

hive> select map_values(map('xiaohai',1,'dahai',2));
输出：
hive>[1,2]

5）array 声明array集合

hive> select array('1','2','3','4');
输出：
hive>["1","2","3","4"]

6）array_contains: 判断array中是否包含某个元素

hive> select array_contains(array('a','b','c','d'),'a');
输出：
hive> true

7）sort_array：将array中的元素排序

hive> select sort_array(array('a','d','c'));
输出：
hive> ["a","c","d"]

8）struct声明struct中的各属性

hive> select struct('name','age','weight');
输出：
hive> {"col1":"name","col2":"age","col3":"weight"}

9）named_struct声明struct的属性和值

hive> select named_struct('name','xiaosong','age',18,'weight',80);
输出：
hive> {"name":"xiaosong","age":18,"weight":80}

Tonystark_sunshine

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【Hive常用函数大全】

Hive常用函数
复制链接

扫一扫

专栏目录