hive学习笔记二——常用内置函数

最新推荐文章于 2024-04-26 09:21:28 发布

五花尾巴

最新推荐文章于 2024-04-26 09:21:28 发布

阅读量195

点赞数

分类专栏： hive 文章标签： hive

本文链接：https://blog.csdn.net/NotMeYa/article/details/108855283

版权

hive 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

hive函数使用

1、类型转换函数
eg：select id,cast(birthday as date) as bir,cast(salary as float) from t_fun;
	
2、数学运算函数
round(5,4)   ->  5   四舍五入
round(5.1314,3)  ->  5.135   保留三位小数及四舍五入
ceil(5.4)   ->    6   向上取整
floor(5.4)    ->    5  向下取整
abs(-5.4)    ->    5.4   绝对值
greatest(3,5,6)  ->   6   三个字段里的最大值（至少两个参数）
least(3,5,6)   ->   3   多个字段里的最小值
max()    min()   聚合函数（常＋group by），一组里的最大、最小值

3、字符串函数
substr(string,int start)    截取子串
substr(string,int start,int len)
concat(string A,string B,……)   拼接字符串
concat_ws(string SEP,string A,string B,……)   加分隔符的拼接
length(string A)     字符串长度
split(string str,string pat)   切割
upper(string str)   转大写

4、时间函数
from_unixtime(bigint unixtime[,string format])     unix时间戳转字符串
unix_timestamp(string date,string pattern)         字符串转unix时间戳
to_date("2020-0929 18:46:20")              字符串转日期date

5、表生成函数
	行 转 列函数：explode()        可配合distinct去重
	表生成函数：lateral view      相当于两个表join
	
	某文本文件xx.txt
	内容如下：
	hello jackson  hello  rich
	hello jack  hello jackson
	jackson loves rich rich loves jackson
	jack loves jackson love is what
	what is love

使用hive做一个简单的wordcount

建表映射：create table t_wc(sentence string);
导入数据：load data inpath 'root/hivetest/xx.txt' into table t_wc;

HQL：
	SELECT 
		word,count(1) as cnts
	FROM
		(
			SELECT explode(split(sentence,' ')) AS word 
			FROM t_wc
		) tmp
	GROUP BY word
	order by cnts desc;

6、集合函数
array_contains(Array<T>,value)  返回boolean值
sort_array(Array<T>)  返回排序后的数组
size(Array<T>)      返回一个int值
size(Map<K,V>)      返回一个int值
map_keys(Map<K,V>)    返回一个数组
map_vlaues(Map<K,V>)   返回一个数组

7、条件控制函数
	a、case when
	eg：CASE [表达式]
				WHEN condition1  THEN  result1
				WHEN condition2  THEN  result2
				……
				WHEN conditionN  THEN  resultN
				ELSE  result
			END
			
	b、IF

8、json解析函数：表生成函数
	json_tuple函数
	eg：json_tuple(json,'movie','rate')   给一个json，输入要提取的key即可

9、row_number() over()函数：分组topN——解析函数
有如下数据 t_user 表：常查询出每种性别中年龄最大的两条数据
	1,18,a,male
	2,19,b,male
	3,22,c,female
	4,16,d,female
	5,30,e,male
	6,26,f,female

分组后标记序号再查询：
SELECT id,age,name,sex,
	row_number() over(partition by sex order by age desc) as rn
FROM t_user;

即有：
SELECT *
FROM
(select id,age,name,sex,
row_number() over(partition by sex order by age desc) as rn
FROM t_rn)  tmp
where rn<3;
即可查询出每种性别 年龄的前top2

10、窗口分析函数——复杂报表
	sum(amount) over(partition by uid by month 
		rows between unbounded preceding and current row);
	回溯前面所有行直到当前行

sum()  over()  ——可实现再窗口中进行逐行累加。