Hive常用的字符串相关函数
-
lower(str) - 将字符串的字母全部转换为小写
SELECT lower(‘Facebook’);
‘facebook’ -
upper(str) - 将字符串的字母全部转换为大写
-
length(str) - 返回字符串的长度
hive (default)> SELECT length(‘Facebook1’);
9 -
trim(str) - 去除字符串两端的空字符串
hive (default)> select trim(’ facebook ');
facebook -
lpad(str, len, pad) - 将一个字符串的左边用给定的字符pad匹配,直到长度等于给定的len
hive (default)> select lpad(‘hi’,4,’|’);
||hi
Time taken: 0.044 seconds, Fetched: 1 row(s)
hive (default)> select lpad(‘hi’,1,’|’);
h
Time taken: 0.045 seconds, Fetched: 1 row(s)
hive (default)> select lpad(‘hi’,4,’’);
NULL
注意: 如果给定的长度小于str自身长度,则会从str的右边开始截取,如果参数pad是空字符串(‘’),那么返回的就是一个NULL -
rpad(str, len, pad) - 与lpad同理,它是从右边开始匹配
-
regexp_replace(str, regexp, rep) - 将str中按regexp匹配到的字符串,用rep替换
hive (default)> select regexp_replace(‘2020-12-16’,’-’,’/’);
2020/12/16
Time taken: 0.044 seconds, Fetched: 1 row(s) -
substr(str, pos[, len]) - 从指定的位置截取指定长度的字符串(如果len没有指定,默认是截取到最后)
hive (default)> select substr(‘facebook’, 3);
cebook
Time taken: 0.048 seconds, Fetched: 1 row(s)
hive (default)> select substr(‘facebook’, -3);
ook
Time taken: 0.039 seconds, Fetched: 1 row(s)
hive (default)> select substr(‘facebook’, 3, 2);
ce
Time taken: 0.038 seconds, Fetched: 1 row(s)
注意: 字符串下标默认是从1开始而不是0,最后一个字符下标是-1 -
split(str, regex) - 根据regex对str进行切割
hive (default)> select split(‘ni-hao-ya’, ‘-’);
[“ni”,“hao”,“ya”]
Time taken: 0.064 seconds, Fetched: 1 row(s) -
concat(str1, str2, … strN) - 拼接str1,str2…strN
hive (default)> select concat(‘ni’,‘hao’);
nihao
Time taken: 0.039 seconds, Fetched: 1 row(s)
hive (default)> select concat(‘ni’,NULL);
NULL
Time taken: 0.039 seconds, Fetched: 1 row(s)
注意: 如果任意一个字符串为NULL,那么最终结果都为NULL,因此在需要字符串拼接的场景下,要记得先将NULL值替换哦 -
concat_ws(separator, [string | array(string)]+) - 将指定的字符串或者字符串数组,根据指定的separator进行拼接
hive (default)> select concat_ws(’.’,‘www’,array(‘facebook’,‘com’));
www.facebook.com
Time taken: 0.047 seconds, Fetched: 1 row(s) -
json_tuple(jsonStr, p1, p2, …, pn) - 从一个json字符串中获取相应字段
这里有一份存放json字符串的表hive (ddl_create)> select * from rating_json limit 5;
OK
rating_json.json
{“movie”:“1193”,“rate”:“5”,“time”:“978300760”,“userid”:“1”}
{“movie”:“661”,“rate”:“3”,“time”:“978302109”,“userid”:“1”}
{“movie”:“914”,“rate”:“3”,“time”:“978301968”,“userid”:“1”}
{“movie”:“3408”,“rate”:“4”,“time”:“978300275”,“userid”:“1”}
{“movie”:“2355”,“rate”:“5”,“time”:“978824291”,“userid”:“1”}
Time taken: 0.077 seconds, Fetched: 5 row(s)现在把json字符串的4个字段解析处理并放到一个新的表中(从time中提取出年、月、日、时、分字段)
hive (ddl_create)> create table t_rate
as
select
movie, rate, time, userid,
year(from_unixtime(cast(time as bigint))) as year,
month(from_unixtime(cast(time as bigint))) as month,
day(from_unixtime(cast(time as bigint))) as day,
hour(from_unixtime(cast(time as bigint))) as hour,
minute(from_unixtime(cast(time as bigint))) as minute,
from_unixtime(cast(time as bigint)) ts
from
(
select json_tuple(json, “movie”,“rate”,“time”,“userid”) as(movie, rate, time, userid) from rating_json
) tmp;
注意: 这里用到了cast类型转换函数,因为from_unixtime的参数是bigint类型,这里解析出的time是string类型,所以需要转换为bignit,然后在用year()等相关函数提取出年、月、日…查看一下解析出来的数据
hive (ddl_create)> select * from t_rate limit 5;
OK
t_rate.movie t_rate.rate t_rate.time t_rate.userid t_rate.year t_rate.month t_rate.day t_rate.hour t_rate.minute t_rate.ts
1193 5 978300760 1 2001 1 1 6 12 2001-01-01 06:12:40
661 3 978302109 1 2001 1 1 6 35 2001-01-01 06:35:09
914 3 978301968 1 2001 1 1 6 32 2001-01-01 06:32:48
3408 4 978300275 1 2001 1 1 6 4 2001-01-01 06:04:35
2355 5 978824291 1 2001 1 7 7 38 2001-01-07 07:38:11
Time taken: 0.035 seconds, Fetched: 5 row(s) -
parse_url_tuple(url, partname1, partname2, …, partnameN):解析出一个URL字段的各个字段
hive (ddl_create)> select parse_url_tuple(‘http://www.google.com/test/film?param1=value1¶m2=value2’, ‘HOST’,‘PATH’,‘QUERY’) as(host, path, query);
host path query
www.google.com /test/film param1=value1¶m2=value2
Time taken: 0.038 seconds, Fetched: 1 row(s)
结语
这里就是一些常用的字符串相关的函数,觉得有用的话可以收藏哦。
若有不足的地方,请多多指正