扩展hive函数案列以及解释

最新推荐文章于 2023-06-02 15:46:49 发布

置顶 weidajiangjiang

最新推荐文章于 2023-06-02 15:46:49 发布

阅读量776

点赞数

文章标签： hive 函数案例解释

本文链接：https://blog.csdn.net/weidajiangjiang/article/details/101320576

版权

扩展hive函数案列以及解释

1.rand()；

rand(int sedd)取随机数返回值为double类型
说明:返回一个0到1范围内的随机数。如果指定种子seed，则会等到一个稳定的随机数序列

2.round(double n,a) ：

对输入的n数，保留a位小数，四舍五入，
round(double a) 四舍五入

3.ceil：

向上取整
select ceil(45.6); --46

4.floor：

向下取整
select floor(45.6); --45

5.split(str,spliter) :

对字符串str 根据字符spliter切割
hive> select split(“a,d,c”,",");
[“a”,“d”,“c”]

6.substr(str,start,length) 或者 substring() :

对字符串str从第start个字符开始，去length个字符
hive> select substr(“abcdef”,2,3);
bcd

7.concat(str,str1…)：

将多个字符串一次连在一起返回值: string
说明：返回输入字符串连接后的结果，支持任意个输入字符串
hive> select concat(“a”, “c”, “e”);
ace

8.concat_ws(seprater,str,str1…) :

指定分割字符的拼接函数，第一个参数是分隔符，第二个参数往后输入类型一定是字符串或者数组，根据第一个参数将多个字符串拼接
注意数组内的类型也应该是string类型
hive> select concat_ws(",", “a”, “b”, array(“1”,“2”,“3”));
a,b,1,2,3

9.cast(col as datatype) :

可以将数据col转换成需要的类型datatype，转换失败则换回null
hive> select cast(1 as double);
1.0

10.case when

case col
when value then …
when value1 then …
else
…
end

case
when col=value then …
when col=value1 then …
else
…
end

12.nvl(val,default_value) :

val为空则返回default
hive > select nvl(null, 12);
OK
12

13.coalesce(v1,v2,v3…):

返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL
hive>
> select coalesce(null, null,1, 2, 3, 4);
OK
1

14.from_unixtime(unix_time, format):

返回unix时间，参数要求为整数，且单位为秒
hive> select from_unixtime(1568199482);
OK
2019-09-11 18:58:02
还可以指定格式
hive> select from_unixtime(1568199482, “yyyy-MM-dd hh:mm”);
OK
2019-09-11 06:58

15.1 unix_timestamp()

Returns the UNIX timestamp，返回当前unix时间戳

hive> select unix_timestamp();
OK
1568199703

15.2 unix_timestamp([date[, pattern]])

返回指定时间的unix时间戳，指定日期参数调用，它返回参数值’1970- 01 – 0100:00:00′到指定日期的秒数。
hive> select unix_timestamp(“2019-09-11 19:09:00”); 不指定格式时要精确到秒
1568200140

15.3 unix_timestamp(string date, string pattern)

hive> select unix_timestamp(“2019-09-11”, “yyyy-MM-dd”);
1568131200

16.to_date(string date):

返回给定时间的年月日
hive> select to_date(“2019-11-11 21:23:23”);
2019-11-11

17.year 返回指定时间的年份

hive> select year(“2019-11-11 21:23:23”);
OK
2019
Time taken: 0.148 seconds, Fetched: 1 row(s)
hive> select day(“2019-11-11 21:23:23”); 返回指定时间的这个月的天数
OK
11
Time taken: 0.133 seconds, Fetched: 1 row(s)
hive> select month(“2019-11-11 21:23:23”);返回指定时间的月份
OK
11
Time taken: 0.099 seconds, Fetched: 1 row(s)
hive> select hour(“2019-11-11 21:23:23”);返回指定时间的小时
OK
21
Time taken: 0.11 seconds, Fetched: 1 row(s)
hive> select minute(“2019-11-11 21:23:23”);返回指定时间的分钟
OK
23
Time taken: 0.109 seconds, Fetched: 1 row(s)
hive> select second(“2019-11-11 21:23:23”);返回指定时间的秒数
OK
23
Time taken: 0.115 seconds, Fetched: 1 row(s)
hive> select weekofyear(“2019-11-11 21:23:23”);返回指定日期所在一年中的星期号
OK
46
Time taken: 0.154 seconds, Fetched: 1 row(s)
hive> select dayofmonth(“2019-11-11 21:23:23”);
OK
11
Time taken: 0.092 seconds, Fetched: 1 row(s)
hive>

18.datediff(string enddate, string startdate)：

两个时间参数的日期之差
hive> select datediff(“2019-11-11 21:23:23”, “2019-11-15 20:23:23”);
OK
-4

19.date_sub(string startdate, int days)：

给定时间，在此基础上减去指定的时间段
hive> select date_sub(“2019-11-11 21:23:23”, 10);
OK
2019-11-01

20.date_add(string startdate, int days)

给定时间，在此基础上加上指定的时间段。
hive> select date_add(“2019-11-11 21:23:23”, 10);
OK
2019-11-21

21.last_day(date)

Returns the last day of the month which the date belongs to.

hive> select last_day(“2019-11-11 21:23:23”);
OK 好像是返回给定时间当前月最后一天
2019-11-30

22.next_day(start_date, day_of_week)

Returns the first date which is later than start_date and named as indicated.

应该是返回当前日期的下一个给定星期数的日期，9.11号周三，我填了TU，表示周二返回最近的周二的日期是 9.17号
第二个参数英文字符串Mo, tue, FRIDAY
hive> select next_day(“2019-9-11 21:23:23”, “TU”);
OK
2019-09-17
hive> select next_day(“2019-9-11 21:23:23”, “tue”);
OK
2019-09-17

hive> select next_day("2019-9-11 21:23:23", "FRIDAY");
OK
2019-09-13
hive> select next_day("2019-9-11 21:23:23", "Mo");
OK
2019-09-16

23.current_date:获取当天时间

hive> select current_date();
OK
2019-09-11

24.current_timestamp:

精确到毫秒返回当前时间
hive> select current_timestamp();
OK
2019-09-11 19:44:02.014

25.md5(string/binary)

hive 1.3以上版本，返回md5码 1.2.1版本没有

26.base64(bin)

Convert the argument from binary to a base 64 string 将二进制格式转换成 base 64位的字符串

括号内要输入二进制数不回输入一直报错

27.encode(string src, string charset):

编码，使用指定的字符集charset将字符串编码成二进制值，支持的字符集有：‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’，如果任一输入参数为NULL都将返回NULL
hive> select encode(“1ss23”, “ISO-8859-1”);
OK 为啥还是原来的字符？？？
1ss23

28.decode(binary bin, string charset):

解码，使用指定的字符集charset将二进制值bin解码成字符串，支持的字符集有：‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’，如果任意输入参数为NULL都将返回NULL

hive> select decode(“1ss23”, “ISO-8859-1”);
FAILED: SemanticException [Error 10016]: Line 1:14 Argument type mismatch ‘“1ss23”’: The first argument to Decode() must be binary

咋输入二进制

29.format_number(X, D)

将数值X转换成"#,###,###.##"格式字符串，并保留d位小数，如果d为0，将进行四舍五入且不保留小数
)- Formats the number X to a format like ‘#,###,###.##’, rounded
to D decimal places, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. This is supposed to function like MySQL’s FORMAT

hive> select format_number(123.213, 0);
OK
123
hive> select format_number(123.213, 2);
OK
123.21

30.instr(string str, string substr)

查找字符串str中子字符串substr出现的位置，如果查找失败将返回0，
如果任一参数为Null将返回null，注意位置为从1开始的

    hive> select instr("asdb", "db");
    OK
    3

31.trim(string A):去掉字段A值前后空格

hive> select trim(" asdf asdf ");
OK
asdf asdf

32.ltrim(string A):去掉字符串A前面的空格

33.lpad(string str, int len, string pad):

左补足，从左边开始对字符串str使用字符串pad填充，最终len长度为止，如果字符串str本身长度比len大的话，将去掉多余的部分
hive> select lpad(“aaa”, 10, “b”);
OK
bbbbbbbaaa

34.substring_index(string A, string delim, int count)

截取第count分隔符之前的字符串，如count为正则从左边开始截取，如果为负则从右边开始截取
1.2.1貌似没有， 1.3.0之后出现

35.initcap(string A):首字符变大写

hive> select initcap(“a123”);
OK
A123

36.find_in_set(string str, string strList):

查找str是否在strList中，返回位置，费性能
返回以逗号分隔的字符串中str出现的位置，如果参数str为逗号或查找失败将返回0，如果任一参数为NULL将返回NULL回
hive> select find_in_set(‘ab’, ‘abc,b,ab,c,def’);
OK
3

37.str_to_map(text[, delimiter1, delimiter2]):

转map
将字符串str按照指定分隔符转换成Map，第一个参数是需要转换字符串，第二个参数是键值对之间的分隔符，
默认为逗号;第三个参数是键值之间的分隔符，默认为"="

hive> select str_to_map(“name:shi,age:21,sex:nan”, “,”, “:”);
OK
{“sex”:“nan”,“name”:“shi”,“age”:“21”}

38.get_json_object(string json_string, string path):

获取json对象
从指定路径上的JSON字符串抽取出JSON对象，并返回这个对象的JSON格式，如果输入的JSON是非法的将返回NULL,注意此路径上JSON字符串只能由数字字母下划线组成且不能有大写字母和特殊字符，且key不能由数字开头，这是由于Hive对列名的限制
第一个参数填写json对象变量，第二个参数使用$表示json变量标识，然后用 . 或 [] 读取对象或数组；如果输入的json字符串无效，那么返回NULL。
每次只能返回一个数据项。
数据如下
{“name”:“ta”,“age”:12,“sex”:1}
{“name”:“la”,“age”:13,“sex”:2}

create table if not exists json_test (
json string
)
stored as textfile;

load data local inpath ‘/home/hivedata/json.dat’ overwrite into table json_test;

hive> select get_json_object(t.json,’ $name'), get_json_object(t.json,'$ .age’) from json_test t;
OK
ta 12
la 13

39.regexp_extract(string subject, string pattern, int index):

正则抽取
抽取字符串subject中符合正则表达式pattern的第index个部分的子字符串，
注意些预定义字符的使用，如第二个参数如果使用’\s’将被匹配到s,’\s’才是匹配空格

.*？  表示匹配任意字符到下一个符合条件的字符
正则表达式a.*?xxx   可以匹配 abxxx  axxxxx  abbbbbxxx

    > select regexp_extract('hitdecisiondlist','(i)(.*?)(e)',0);
   OK
   itde

40.regexp_replace(string A, string B, string C):

字符串A中的B字符被C字符替代

    > select regexp_replace("abcdabdv", "b", "ee");
    OK
    aeecdaeedv

regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)
按照Java正则表达式PATTERN将字符串INTIAL_STRING中符合条件的部分成REPLACEMENT所指定的字符串，
如里REPLACEMENT这空的话，抽符合正则的部分将被去掉
如：regexp_replace(“foobar”, “oo|ar”, “”) = ‘fb.’
注意些预定义字符的使用，如第二个参数如果使用’\s’将被匹配到s,’\s’才是匹配空格

41.parse_url():解析url

parse_url(string urlString, string partToExtract [, string keyToExtract])
返回从URL中抽取指定部分的内容，参数url是URL字符串，而参数partToExtract是要抽取的部分，这个参数包含(HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO,例如：parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’, ‘HOST’) =‘facebook.com’，如果参数partToExtract值为QUERY则必须指定第三个参数key 如：parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’, ‘QUERY’, ‘k1’) =‘v1’

 hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST');
    OK
    facebook.com

hive> select parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’, ‘QUERY’,‘k1’);
OK
v1

42.size(a1):

size(Map<K.V>) size(Map<K.V>)
size(Array)返回数组类型的元素数量
hive> select size(array(1,2,3));
OK
3

43.length(str): 返回字符串的长度

hive> select length(“asdfasd”);
OK
744.数据
1 job:80,team:60,person:70
2 job:60,team:80
3 job:90,team:70,person:100

hive> create table maptest(id string, perf map<string, int>)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ‘\t’
> COLLECTION ITEMS TERMINATED BY ‘,’
> MAP KEYS TERMINATED BY ‘:’;
OK

hive> LOAD DATA LOCAL INPATH ‘/home/hivedata/test7’ INTO TABLE maptest;
Loading data to table qf24.maptest
OK

hive> select * from maptest;
OK
1 {“job”:80,“team”:60,“person”:70}
2 {“job”:60,“team”:80}
3 {“job”:90,“team”:70,“person”:100}
hive>

44. map_keys()：返回map中的所有key
上面表中map的名字为perf
hive> select map_keys(perf) from maptest;
OK
[“job”,“team”,“person”]
[“job”,“team”]
[“job”,“team”,“person”]

45.map_values()：返回map中的所有value

hive> select map_values(perf) from maptest;
OK
[80,60,70]
[60,80]
[90,70,100]

47:row_number:

分组以后的值得个数排序
col1 clo2 rn

1 str1 1
2 str2 1
3 str33 1
3 str31 2
3 str3 3
4 str42 1
4 str41 2

select
uid,
udate,
row_number() over(partition by uid order by udate) as rn,
first_value(udate) over(partition by uid order by udate) as fir,
last_value(udate) over(partition by uid order by udate) as last
from login

first_vlaue
取分组内排序后，截止到当前行，第一个值
last_value

48.CUME_DIST 函数

cume_dist 返回小于等于当前值的行数 / 分组内总行数
比如，我们可以统计小于等于当前薪水的人数，所占总人数的比例
SELECT cookieid, createtime, pv,
round(CUME_DIST() OVER(ORDER BY pv),2) AS cd1,
round(CUME_DIST() OVER(PARTITION BY cookieid ORDER BY pv),2) AS cd2 FROM test1 ;

49.NTILE 函数

NTILE(n)，用于将分组数据按照顺序切分成n片，返回当前切片值
注1：如果切片不均匀，默认增加第一个切片的分布注2：NTILE不支持ROWS BETWEEN
用法举例：统计一个cookie，pv数最多的前1/3的天：
取 ntile = 1 的记录，就是我们想要的结果！

weidajiangjiang

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
扩展hive函数案列以及解释

扩展hive函数案列以及解释1.rand()；rand(int sedd)取随机数返回值为double类型说明:返回一个0到1范围内的随机数。如果指定种子seed，则会等到一个稳定的随机数序列2.round(double n,a) ：对输入的n数，保留a位小数，四舍五入，round(double a) 四舍五入3.ceil：向上取整select ceil(45.6); --4...
复制链接

扫一扫