HIVE常用函数速查

lianchaozhao

已于 2024-06-21 19:23:46 修改

阅读量2.2k

点赞数

分类专栏： hive 文章标签： hive big data

于 2021-08-12 15:32:55 首次发布

本文链接：https://blog.csdn.net/weixin_40809627/article/details/119645746

版权

hive 专栏收录该内容

44 篇文章 6 订阅

订阅专栏

Hive 提供了较完整的 SQL 功能，HQL 与 SQL 基本上一致，旨在让会 SQL 而不懂 MapReduce 编程的用户可以调取 Hadoop 中的数据，进行数据处理和分析。

记录日常数据分析过程中 Hive SQL 需要的查询函数，方便手头随时查询，定期更新补充。

0、常用函数记录

多个值聚合在一个list 数据中
array_join(array_agg( distinct t1.promotion_name), ',') as promotion_list -- 字符串聚合，presto语法
CONCAT_WS(',', COLLECT_LIST(distinct t1.promotion_name)) as promotion_list -- 字符串聚合，sparkSQL语法
例如：concat_ws('&', collect_set(concat(ta.tag, '-' ,tb.sku_id))) as tag-sku_ids

一、关系运行

1、等值比较: =
2、不等值比较: <>
3、小于比较: <
4、小于等于比较: <=< a>
5、大于比较: >
6、大于等于比较: >=
7、空值判断: IS NULL
8、非空判断: IS NOT NULL
9、LIKE比较: LIKE 例如：name like ‘%小兰%’
10、JAVA的LIKE操作: RLIKE
11、REGEXP操作: REGEXP (例如多值like: name regexp ‘小贝|小兰|小红’)

注意：
其中关系判断中注重值为NULL（null）这种情况。
select 1 where null != 1 没有返回
select 1 where ‘’ != 1 有返回
select 1 where 0 != 1 有返回
length(NULL) <> 1 没有返回
select length(NULL) 返回NULL ，而不是0
最好是把null coalesce成目标类型的某个默认值
select 1 where COALESCE(null, 0) != 1

二、条件函数

方法名	返回类型	描述
if(boolean testCondition, T valueTrue, T valueFalseOrNull)	T	如果testCondition 为true就返回valueTrue,否则返回valueFalseOrNull
nvl(T value, T default_value)	T	如果value值为NULL就返回default_value,否则返回value
COALESCE(T v1, T v2, …)	T	返回第一个非null的值，如果全部未NULL就返回NULL
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END	T	如果a=b就返回c,a=d就返回e，否则返回f(支持嵌套操作)
CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END	T	上面变相，如果a=ture就返回b,c= ture就返回d，否则返回e
isnull( a )	boolean	如果a为null就返回true，否则返回false
isnotnull ( a )	boolean	如果a为非null就返回true，否则返回false

例子：

hive> select if(3>4,1,2);
OK
2
Time taken: 0.097 seconds, Fetched: 1 row(s)
hive> select if(3>2,'hello',2);
OK
hello
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive> select nvl(null,'a');
OK
a
Time taken: 0.04 seconds, Fetched: 1 row(s)
hive> select nvl('a','b');
OK
a
Time taken: 0.048 seconds, Fetched: 1 row(s)
hive> select coalesce(null,'hello');
OK
hello
Time taken: 0.048 seconds, Fetched: 1 row(s)
hive> select coalesce(null,null,null,'hello');
OK
hello
Time taken: 0.032 seconds, Fetched: 1 row(s)
hive> select case when 3>4 then 'hello' when 3<4 then 'world' end;
OK
world
Time taken: 0.088 seconds, Fetched: 1 row(s)
hive> select case 3 when 3 then 'hello' when 4 then 'world' end;
OK
hello
Time taken: 0.028 seconds, Fetched: 1 row(s)
hive> select isnull(null);
OK
true
Time taken: 0.16 seconds, Fetched: 1 row(s)
hive> select isnull('a');
OK
false
Time taken: 0.047 seconds, Fetched: 1 row(s)
hive> select isnotnull(null);
OK
false
Time taken: 0.034 seconds, Fetched: 1 row(s)
hive> select isnotnull('a');
OK
true
Time taken: 0.099 seconds, Fetched: 1 row(s)

三、类型转换函数

方法名	返回值类型	描述
binary(string	binary)	binary
cast(expr as)	NULL 或 type	将expr转换成type类型如：cast(“1” as INT) 将字符串1转换成了INT类型，如果转换失败将返回NULL
hive> select cast(‘1’ as int)*cast(‘10’ as int);
OK
10
Time taken: 0.043 seconds, Fetched: 1 row(s)
hive> select binary(‘123456’);
OK
123456
Time taken: 0.121 seconds, Fetched: 1 row(s)

四、日期转换函数

方法名称	返回值类型	描述
from_unixtime(bigint unixtime, [string format])	String	将时间的秒值转换成format格式
unix_timestamp()	bigint	获取本地时区下的时间戳
unix_timestamp(string date)	bigint	将格式为yyyy-MM-dd HH:mm:ss的时间字符串转换成时间戳
unix_timestamp(string date,string pattern)	bigint	将指定时间字符串格式字符串转换成Unix时间戳，如果格式不对返回0
to_date(string date)	string	返回时间字符串的日期部分
year(string date)	int	返回时间字符串的年份部分
month(string date)	int	返回时间字符串的月份部分
day(string date)	int	返回时间字符串的天(月份)
hour(string date)	int	返回时间字符串的小时
minute(string date)	int	返回时间字符串的分钟
second(string date)	int	返回时间字符串的秒
weekofyear(string date)	int	返回时间字符串位于一年中的第几周内
dayofweek(string date)	int	返回时间字符串位于一周中的第几天，星期天位于第一天
datediff(string enddate,string startdate)	int	计算开始时间startdate到结束时间enddate相差的天数
date_add(string startdate ,int days)	string	从开始时间startdate 加上days天数
date_sub(string startdate, int days)	string	从开始时间startdate减去days天数
current_date	date	返回当前时间日期
current_timestamp	timestamp	返回当前时间戳,返回的是字符串格式
add_months(string start_date, int num_months)	string	返回当前时间下再增加num_months个月的日期
last_day(string date)	string	返回这个月的最后一天的日期，忽略时分秒部分（HH:mm:ss）
next_day(string start_date, string day_of_wee)	string	返回当前时间的下一个星期X所对应的日期
trunc(string date,string fromat)	sting	返回时间的最开始年份或月份 ,注意所支持的格式为MONTH/MON/MM, YEAR/YYYY/YY
months_between(date1, date2)	double	返回date1与date2之间相差的月份，如date1>date2，则返回正，如果date1<date2,则返回负，否则返回0.0
date_format(date/timestamp/string ts, string fmt)	string	fmt格式

hive> select from_unixtime(1594604100);
OK
2020-07-13 09:35:00
Time taken: 0.045 seconds, Fetched: 1 row(s)
hive> select from_unixtime(1594604100,'yyyy-MM-dd');
OK
2020-07-13
Time taken: 0.051 seconds, Fetched: 1 row(s)
hive> select unix_timestamp();
unix_timestamp(void) is deprecated. Use current_timestamp instead.
OK
1594616493
Time taken: 0.058 seconds, Fetched: 1 row(s)
hive> select unix_timestamp('2020-07-13 09:35:00');
OK
1594604100
Time taken: 0.059 seconds, Fetched: 1 row(s)
hive> select to_date('2020-07-13 09:35:00');
OK
2020-07-13
Time taken: 0.096 seconds, Fetched: 1 row(s)
hive> select year('2020-07-13 09:35:00');
OK
2020
Time taken: 0.051 seconds, Fetched: 1 row(s)
hive> select month('2020-07-13 09:35:00');
OK
7
Time taken: 0.051 seconds, Fetched: 1 row(s)
OK
13
Time taken: 0.196 seconds, Fetched: 1 row(s)
hive> select hour('2020-07-13 09:35:00');
OK
9
Time taken: 0.09 seconds, Fetched: 1 row(s)
hive> select minute('2020-07-13 09:35:00');
OK
35
Time taken: 0.058 seconds, Fetched: 1 row(s)
hive> select second('2020-07-13 09:35:00');
OK
0
hive> select weekofyear('2020-07-13 09:35:00');
OK
29
Time taken: 0.047 seconds, Fetched: 1 row(s)
select case dayofweek(current_date())-1 when 1 then '星期一' 
										when 2 then '星期二'
										when 3 then '星期三'
										when 4 then '星期四'
										when 5 then '星期五'
										when 6 then '星期六'
										else  '星期天' end
星期一
hive> select datediff('2020-07-13 00:00:00','2020-07-11 09:35:00');
OK
2
Time taken: 0.11 seconds, Fetched: 1 row(s)
hive> select date_add('2020-07-13 09:35:00',30);
OK
2020-08-12
Time taken: 0.057 seconds, Fetched: 1 row(s)
hive> select date_add('2020-07-13 09:35:00',-30);
OK
2020-06-13
Time taken: 0.057 seconds, Fetched: 1 row(s)
hive> select date_sub('2020-07-13 09:35:00',30);
OK
2020-06-13
Time taken: 0.045 seconds, Fetched: 1 row(s)
hive> select current_date();
OK
2020-07-13
Time taken: 0.039 seconds, Fetched: 1 row(s)
hive> select current_timestamp();
OK
2020-07-13 13:47:37.025
Time taken: 0.038 seconds, Fetched: 1 row(s)
hive> select add_months('2020-07-13 09:35:00', 4);
OK
2020-11-13
Time taken: 0.074 seconds, Fetched: 1 row(s)
hive> select last_day(current_date());
OK
2020-07-31
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive> select next_day(current_date(),'TU');
OK
2020-07-14
Time taken: 0.052 seconds, Fetched: 1 row(s)
hive> select trunc(current_date(),'MM');
OK
2020-07-01
Time taken: 0.071 seconds, Fetched: 1 row(s)
hive> select months_between(current_date(),'2020-05-18');
OK
1.83870968
Time taken: 0.055 seconds, Fetched: 1 row(s)
hive> select date_format('2020-07-13','yyyy-MM-dd HH:mm:ss');
OK
2020-07-13 00:00:00
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive> select date_format('2020-07-13','yyyy-dd');
OK
2020-13
Time taken: 0.048 seconds, Fetched: 1 row(s)

五、数学函数：

方法名	返回值类型	描述
round(DOUBLE a)	DOUBLE	返回对a四舍五入的最大整数值，类型为double
round(DOUBLE a, INT n)	DOUBLE	对a取几位小数返回
floor(DOUBLE a)	BIGINT	向下取整
ceil(DOUBLE a)/ceiling(double a)	BIGINT/double	向上取整
rand(), rand(INT seed)	DOUBLE	返回一个随机数 seed是种子,种子一样随机数一样
exp(DOUBLE a), exp(DECIMAL a)	DOUBLE	返回e的a幂次方， a可为小数
pow(double a,double b)	DOUBLE	a的b次方
sqrt(double a)	DOUBLE	开根号,若a为负数返回null
abs(double a)	DOUBLE	取绝对值
pmod(int a,int b)	int	a对b取余

hive> select round(10.234);
OK
10.0
Time taken: 0.07 seconds, Fetched: 1 row(s)
hive> select round(10.234,2);
OK
10.23
Time taken: 0.043 seconds, Fetched: 1 row(s)
hive> select floor(10.3);
OK
10
Time taken: 0.088 seconds, Fetched: 1 row(s)
hive> select floor(-10.3);
OK
-11
Time taken: 0.042 seconds, Fetched: 1 row(s)
hive> select ceil(10.3);
OK
11
Time taken: 0.048 seconds, Fetched: 1 row(s)
hive> select ceiling(10.3);
OK
11
Time taken: 0.05 seconds, Fetched: 1 row(s)
hive> select rand();
OK
0.4474707641445139
Time taken: 0.066 seconds, Fetched: 1 row(s)
hive> select rand(10);
OK
0.7304302967434272
Time taken: 0.045 seconds, Fetched: 1 row(s)
hive> select exp(3);
OK
20.085536923187668
Time taken: 0.054 seconds, Fetched: 1 row(s)
hive> select pow(3,2);
OK
9.0
Time taken: 0.072 seconds, Fetched: 1 row(s)
hive> select sqrt(10);
OK
3.1622776601683795
Time taken: 0.043 seconds, Fetched: 1 row(s)
hive> select abs(-10.1);
OK
10.1
Time taken: 0.039 seconds, Fetched: 1 row(s)
hive> select pmod(16,3);
OK
1
Time taken: 0.036 seconds, Fetched: 1 row(s)

六、字符函数

方法名	返回值类型	描述
length(string A)	int	返回字符串的长度
concat(string\|binary A, string\|binary B…)	string	对二进制字节码或字符串按次序进行拼接
concat_ws(string SEP, string A,String B)	string	与concat 类似，但使用指定分隔符分隔比较方便。
find_in_set(string str,string strList)	int	返回以逗号分隔的字符串中str出现的位置，如果参数str为逗号或查找失败将返回0，如果任一参数为NULL将返回NULL
format_number(number x, int d)	string	将数值x转换成"x,xxx,xxx.xx"格式字符串，并保留d位小数，如果d为0，将进行四舍五入且不保留小数
get_json_object(string json_string, string path)	string	get_json_object(‘{“userid”:“1”,“username”:“zs”}’,‘$.username’)从指定路径上的JSON字符串抽取出JSON对象，并返回这个对象的JSON格式，如果输入的JSON是非法的将返回NULL,注意此路径上JSON字符串只能由数字字母下划线组成且不能有大写字母和特殊字符，且key不能由数字开头，这是由于Hive对列名的限制
instr	int	查找字符串str中子字符串substr出现的位置，如果查找失败将返回0，如果任一参数为Null将返回null，注意位置为从1开始的
locate(string substr, string str[, int pos])	int	查找字符串str中pos位置后字符串substr第一次出现的位置
lower(string A) lcase(string A)	string	将字符串A的所有字母转换成小写字母
upper(string A)	string	将字符串A的所有字母转换成大写字母
lpad(string str, int len, string pad)	string	从左边开始对字符串str使用字符串pad填充，最终len长度为止，如果字符串str本身长度比len大的话，将去掉多余的部分
ltrim(string A)	string	去掉字符串A前面的空格
rtrim(string A)	string	去掉字符串A后面的空格
trim(string A)	string	去掉字符串A两端空格
pars_urlstring urlString, string partToExtract [, string keyToExtract])	string	返回从URL中抽取指定部分的内容，参数url是URL字符串，而参数partToExtract是要抽取的部分，这个参数包含(HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO)
regexp_extract(string subject, string pattern, int index)	string	正则截取
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)	string	正则替换
repeat(string str, int n)	string	重复输出n次字符串str
reverse(string A)	string	reverse(string A)
split(string str, string pat)	string	按照正则表达式pat来分割字符串str,并将分割后的数组字符串的形式返回
substr(string A, [int start],int length) substring(string, [int start],int length)	string	对于字符串A,从start位置开始截取length长度字符串并返回
translate(string input, string from, string to)	string	将input出现在from中的字符串替换成to中的字符串如：translate(“MOBIN”,“BIN”,“M”)=“MOM”
initcap(string A)	string	将字符串A转换第一个字母大写其余字母小写的字符串

extract 提取函数，例如提取字符串日期。
select extract(month from “2016-10-20”) results
select extract(hour from “2016-10-20 05:06:07”) results
select extract(dayofweek from “2016-10-20 05:06:07”) results
select extract(month from interval ‘1-3’ year to month) results
select extract(minute from interval ‘3 12:20:30’ day to second) results

hive> select length('abc');
OK
3
Time taken: 0.033 seconds, Fetched: 1 row(s)
hive> select concat('userid',',','username',',','birthday');
OK
userid,username,birthday
Time taken: 0.053 seconds, Fetched: 1 row(s)
hive> select concat_ws(',','userid','username');
OK
userid,username
Time taken: 0.031 seconds, Fetched: 1 row(s)
hive> select find_in_set('ab','ab,abc,ede,acd,abe');
OK
1
Time taken: 0.058 seconds, Fetched: 1 row(s)
hive> select format_number(1234456.565,2);
OK
1,234,456.56
Time taken: 0.08 seconds, Fetched: 1 row(s)
hive> select get_json_object('{"userid":"1","username":"zs"}','$.username');
OK
zs
Time taken: 0.046 seconds, Fetched: 1 row(s)
hive> select instr('abcdfe','fe');
OK
5
Time taken: 0.122 seconds, Fetched: 1 row(s)
hive> select locate('ef','abefcdefd',1);
OK
3
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive> select locate('ef','abefcdefd',5);
OK
7
Time taken: 0.031 seconds, Fetched: 1 row(s)
hive> select lower('ABCde');
OK
abcde
Time taken: 0.038 seconds, Fetched: 1 row(s)
hive> select lcase('ABCde');
OK
abcde
Time taken: 0.044 seconds, Fetched: 1 row(s)
hive> select upper('addA');
OK
ADDA
Time taken: 0.047 seconds, Fetched: 1 row(s)
hive> select lpad('123',2,'0');
OK
12
Time taken: 0.025 seconds, Fetched: 1 row(s)
hive> select lpad('123',8,'0');
OK
00000123
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive> select lpad('123',length('123')+1,'$');
OK
$123
Time taken: 0.076 seconds, Fetched: 1 row(s)
hive> select trim(' abc ');
OK
abc
Time taken: 0.066 seconds, Fetched: 1 row(s)
hive> select ltrim(' abc ');
OK
abc
Time taken: 0.059 seconds, Fetched: 1 row(s)
hive> select rtrim(' abc ');
OK
 abc
Time taken: 0.025 seconds, Fetched: 1 row(s)
hive> select length(ltrim(' abc '));
OK
4
Time taken: 0.034 seconds, Fetched: 1 row(s)
hive> select length(rtrim(' abc '));
OK
4
Time taken: 0.075 seconds, Fetched: 1 row(s)
hive> select length(trim(' abc '));
OK
3
Time taken: 0.031 seconds, Fetched: 1 row(s)
hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','QUERY');
OK
name=1&pwd=123456
Time taken: 0.06 seconds, Fetched: 1 row(s)
hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','QUERY','pwd');
OK
123456
Time taken: 0.022 seconds, Fetched: 1 row(s)
hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','HOST');
OK
facebook.com
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','PATH');
OK
/path/p1.php
Time taken: 0.037 seconds, Fetched: 1 row(s)
hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','PROTOCOL');
OK
https
Time taken: 0.035 seconds, Fetched: 1 row(s)
hive> select regexp_extract("hello,world","^((\\w+),(\\w+))$",2); -- 正则提取;
OK
hello
Time taken: 0.039 seconds, Fetched: 1 row(s)
提取数字正则表达
regexp_extract('需要35天', '\d{1,}')
结果返回 35
hive> select regexp_replace("hello,world","o|l","e"); -- 正则替换;
OK
heeee,wered
Time taken: 0.042 seconds, Fetched: 1 row(s)
hive> select repeat("abc",5);  -- 重复输出;
OK
abcabcabcabcabc
Time taken: 0.062 seconds, Fetched: 1 row(s)
hive> select reverse("abc"); -- 反转字符;
OK
cba
Time taken: 0.04 seconds, Fetched: 1 row(s)
hive> select split('hello|world,abc|edf kyy',"[\\|, ]"); -- 正则分割;
OK
["hello","world","abc","edf","kyy"]
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive> select substr('hello,world',1,5); -- 字符串截取 开始位置 长度;
OK
hello
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive> select translate('hello,world','world','func'); -- 密码常用 一一对应 没有则截掉;
OK
heccu,func
Time taken: 0.047 seconds, Fetched: 1 row(s)
hive> select translate('hello,world','world','cc');
OK
hec,cc
Time taken: 0.048 seconds, Fetched: 1 row(s)
hive> select translate('hello,world','world','c'); -- 一个对一个
    > ;
OK
he,c
Time taken: 0.048 seconds, Fetched: 1 row(s)
hive> select initcap('hello world'); -- 首字母大写 空格分隔;
OK
Hello World
Time taken: 0.031 seconds, Fetched: 1 row(s)

七、集合函数

方法名称	返回值类型	描述
size(Map<K.V>)	int	求map的长度
size(Array)	int	求数组的长度
map_keys(Map<k，v>)	array	返回map中所有的key
map_values(Map<K.V>)	array	返回map中的所有value
array_contains(Array,value)	boolean	若该数组Array包含value返回true。，否则返回false
sort_array(Array)	array	按自然顺序对数组进行排序并返回,只能升序

hive> select size(map(1,'a',2,'b'));
OK
2
Time taken: 0.052 seconds, Fetched: 1 row(s)
hive> select size(array(1,2,3,4,5,6));
OK
6
Time taken: 0.057 seconds, Fetched: 1 row(s)
hive> select map_keys(map(1,'a',2,'b'));
OK
[1,2]
Time taken: 0.059 seconds, Fetched: 1 row(s)
hive> select map_values(map(1,'a',2,'b'));
OK
["a","b"]
Time taken: 0.058 seconds, Fetched: 1 row(s)
hive> select array_contains(array(1,2,3,4,5,6),6);
OK
true
Time taken: 0.032 seconds, Fetched: 1 row(s)
hive> select sort_array(array(1,2,3,6,5,7,8,0));
OK
[0,1,2,3,5,6,7,8]
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive>

八.聚合函数

方法名称	返回值类型	描述
ount(*), count(expr), count(DISTINCT expr[, expr…])	bigint	count(*)统计总行数其中包含null的行数。count(列名)统计列中非null行数，count(DISTINCT) 统计列中去重后非null的行数
sum(col), sum(DISTINCT col)	double	求指定类得和，列可去重
avg(col), avg(DISTINCT col)	double	表示求指定列的平均值,列可去重
min(col)	double	求指定列的最小值
max(col)	double	求指定列的最大值
collect_list()	array	返回每个组列中的对象集,不去重
collect_set()	array	返回每个组列中的对象集，去重
variance(col), var_pop(col)	double	求指定列数值的方差
var_samp(col)	double	求指定列数值的样本方差
stddev_pop(col)	DOUBLE	求指定列数值的标准偏差
stddev_samp(col)	DOUBLE	求指定列数值的样本标准偏差
covar_pop(col1, col2)	DOUBLE	求指定列数值的协方差
covar_samp(col1, col2)	DOUBLE	求指定列数值的样本协方差
corr(col1, col2)	DOUBLE	返回两列述职相关系数
percentile(BIGINT col, p)	DOUBLE	返回col的p%分位数

多行取最大最小值
greatest函数：
1.取多列最大值
select greatest(-99, 0, 73) --73
2 存在 null 或者字符串
select greatest(-99, 0, 73, null) --null
select greatest(-99, 0, 73, ‘string’) --null
3 存在日期
select greatest(‘2022-01-01’,‘2022-06-01’,‘2022-06-09’) --2022-06-09
least函数：
1 取多列最小值
select least(-99, 0, 73) – -99
2 存在 null 或者字符串
select least(-99, 0, 73, null) --null
select least(-99, 0, 73, ‘string’) --null
3 存在日期
select least(‘2022-01-01’,‘2022-06-01’,‘2022-06-09’) – 2022-01-01

九.表生成函数

函数名称	返回值类型	描述
explode(array)	N row	每行对应数组中一个元素
explode(map)	N row	每行对应每个map键值对，其中一个字段是map的键，另一个字段是map的值
posexplode(array)	N row	与explode类似，不同的是还返回各元素在数组中的位置
stack(INT n, v_1, v_2, …, v_k)	N rows	把M列转换成N行，每行有M/N个字段，其中n必须是一个常数
json_tuple(jsonStr, k1, k2, …)	tuple	从一个json字符串中获取多个键并作为一个元组返回。

hive> select explode(array(1,2,3,4));
OK
1
2
3
4
Time taken: 0.048 seconds, Fetched: 4 row(s)
hive> select explode(map('A','apple','o','orage'));
OK
A       apple
o       orage
Time taken: 0.056 seconds, Fetched: 2 row(s)
hive> select posexplode(array('a','b','c'));
OK
0       a
1       b
2       c
Time taken: 0.044 seconds, Fetched: 3 row(s)
hive> select stack(2,'a','b','c','d');
OK
a       b
c       d
Time taken: 0.032 seconds, Fetched: 2 row(s)
hive> select json_tuple('{"name":"zs","age":"20"}','name','age');
OK
zs      20
Time taken: 0.057 seconds, Fetched: 1 row(s)

工作常用函数总结：
1、筛选出不包含某个字符串的条件比如_ 和%
size(split(employee_no,‘‘))<2 或instr(employee_no,’’)=0
2、将一列转成行
concat_ws(‘,’,collect_set(cast(合并字段 as string)))
或者筛选出特点维度值放到一个新的字段中
3、[100,200,300] 一行膨胀转成多行
LATERAL VIEW explode(split(regexp_replace(ids, ‘[|]’, ‘’), ‘,’)) exploded_table AS number
4、plit(‘1,2,3’,‘,’)，取最后一个
用split(‘1,2,3’,‘,’)[size(split(‘1,2,3’,‘,’))]
5、

列1	列2
A	B
A	c
输出A (B，c) ,参考：select concat_ws(‘,’,collect_list(列2)) from xxx group by 列1

6、A字段是group_id，637152，B字段是-group_ids，是数字+逗号拼接，调用啥函数，判断A是否在B内呀
array_contains(split(group_ids,‘,’),group_id) = true
7、两个或多个字段取最大的那个
用greatest 或least 使用例如LEAST(column1, column2)
8、

10、json_arry 结构查询出某个值
可以采用类似：
第一中
JSON_EXTRACT(collected_states, ‘ $0].index') AS index_1, JSON_EXTRACT(collected_states, '$ [0].current’) AS battery_current_1,
JSON_EXTRACT(collected_states, ‘ $KaTeX parse error: Expected '}', got 'EOF' at end of input: \dotsertId":0}]' ),'$ .data[0].name’) as name

[{"index":1,"packVoltage":51203,"current":4118,"displaySoc":88,"batteryProtectCode":0,"batteryErrorCode":0,"cellTemperatureMax":12,"cellTemperatureMin":12,"mosTemperatureMax":13,"pcbTemperatureMax":13,"soh":95,"cycleNo":114,"capacity":21873,"cellVoltageMax":3938,"cellVoltageMin":3918,"mosStatus":6,"realSoc":88,"balanceTubeStatus":0},{"index":2,"packVoltage":51734,"current":62954,"displaySoc":88,"batteryProtectCode":0,"batteryErrorCode":0,"cellTemperatureMax":12,"cellTemperatureMin":12,"mosTemperatureMax":14,"pcbTemperatureMax":13,"soh":95,"cycleNo":114,"capacity":21847,"cellVoltageMax":3983,"cellVoltageMin":3953,"mosStatus":6,"realSoc":88,"balanceTubeStatus":0},{"index":3,"packVoltage":51455,"current":59,"displaySoc":88,"batteryProtectCode":0,"batteryErrorCode":0,"cellTemperatureMax":12,"cellTemperatureMin":12,"mosTemperatureMax":14,"pcbTemperatureMax":13,"soh":95,"cycleNo":114,"capacity":21803,"cellVoltageMax":3957,"cellVoltageMin":3941,"mosStatus":6,"realSoc":88,"balanceTubeStatus":0},{"index":4,"packVoltage":49156,"current":9874,"displaySoc":87,"batteryProtectCode":0,"batteryErrorCode":0,"cellTemperatureMax":17,"cellTemperatureMin":15,"mosTemperatureMax":17,"pcbTemperatureMax":19,"soh":95,"cycleNo":114,"capacity":21778,"cellVoltageMax":3907,"cellVoltageMin":3448,"mosStatus":6,"realSoc":87,"balanceTubeStatus":0},{"index":5,"packVoltage":46438,"current":8364,"displaySoc":87,"batteryProtectCode":0,"batteryErrorCode":0,"cellTemperatureMax":17,"cellTemperatureMin":15,"mosTemperatureMax":18,"pcbTemperatureMax":24,"soh":95,"cycleNo":114,"capacity":21728,"cellVoltageMax":3900,"cellVoltageMin":3059,"mosStatus":6,"realSoc":87,"balanceTubeStatus":0},{"index":6,"packVoltage":45754,"current":0,"displaySoc":0,"batteryProtectCode":8196,"batteryErrorCode":192,"cellTemperatureMax":21,"cellTemperatureMin":16,"mosTemperatureMax":19,"pcbTemperatureMax":24,"soh":95,"cycleNo":114,"capacity":0,"cellVoltageMax":3956,"cellVoltageMin":1450,"mosStatus":0,"realSoc":0,"balanceTubeStatus":0}]

11、hive 中jsonArray或Array 信息筛选出特定类型并拼接成一行
参照：regexp_replace(concat_ws(‘’,transform(json_to_array(json_arr_str),x->if(get_json_object(x,‘ $KaTeX parse error: Expected group after '_' at position 30: \dots') like 'COUPON_̲%',concat(get_j\dots$ .text’),‘\;’),‘’))),‘.$’,‘’) as coupon,

regexp_replace(regexp_replace(yh_preference_group_info,‘start_str(.+?)end_str’,‘end_str’),‘start_2str(.+?)end_2str’,‘end_2str’) as format_preference_group_info

12、字符串中替换某开头和某结尾的中间的字符串信息。

非常重要： spark sql 函数查询：https://spark.apache.org/docs/latest/api/sql/index.html#max_by
https://blog.csdn.net/liuzhoulong/article/details/77969224
https://www.huaweicloud.com/articles/6dbba690185b6e8dbc3cad8e72caeb19.html
https://blog.csdn.net/sun_0128/article/details/107296322

lianchaozhao

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
HIVE常用函数速查

Hive 提供了较完整的 SQL 功能，HQL 与 SQL 基本上一致，旨在让会 SQL 而不懂 MapReduce 编程的用户可以调取 Hadoop 中的数据，进行数据处理和分析。记录日常数据分析过程中 Hive SQL 需要的查询函数，方便手头随时查询，定期更新补充。...
复制链接

扫一扫

专栏目录