Hive行转列函数，窗口函数，排序函数，UDF

最新推荐文章于 2024-08-08 09:51:12 发布

遥遥晚风点点

最新推荐文章于 2024-08-08 09:51:12 发布

阅读量1.2k

点赞数 1

分类专栏： Hive 文章标签： hive

本文链接：https://blog.csdn.net/Mr_ye931/article/details/107585577

版权

Hive 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

行转列

explode和lateral view：

数据:

示例：

select explode(category) from movie;

select * from movie lateral view explode(category) table_tmp as category_name;

category_name是别名，from后面看做一张表

窗口函数

在进行分组聚合以后，还要操作聚合以前的数据使用到窗口函数

默认是偏移量到当前行，适用sum,count等函数

偏移量到分组范围内，适用max,min等聚合函数：

rows between unbounded preceding and unbounded following

参考：https://zhuanlan.zhihu.com/p/92654574

排序函数

RANK()                 排序相同时会重复，总数不会变
DENSE_RANK()             排序相同时会重复，总数会减少
ROW_NUMBER()           会根据顺序计算

示例：按每科分组，按分数排名

select name,subject,core,
rank() over(partition by subject order by score desc) rp,
dense_rank() over(partition by subject order by score desc) drp,
row_number() over(partition by subject order by score desc) rmp
from score;

UDF

UDF（User-Defined-Function）一进一出
UDAF（User-Defined Aggregation Function）聚集函数，多进一出
类似于：count/max/min
UDTF（User-Defined Table-Generating Functions）一进多出
如lateral view explore() split()

自定义UDF步骤：

1，继承org.apache.hadoop.hive.ql.UDF

2，实现evaluate函数；evaluate函数支持重载

3，在hive的命令行窗口创建函数
a）：添加jar：add jar linux_jar_path
b）：创建function：create [temporary] function [dbname.]function_name AS class_name;

4，在hive的命令行窗口删除函数
Drop [temporary] function [if exists] [dbname.]function_name;

实例：

maven依赖：

<dependencies>
		<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
		<dependency>
			<groupId>org.apache.hive</groupId>
			<artifactId>hive-exec</artifactId>
			<version>1.2.1</version>
		</dependency>
</dependencies>

实现类：

import org.apache.hadoop.hive.ql.exec.UDF;

public class Lower extends UDF {

	public String evaluate (final String s) {
	
		if (s == null) {
			return null;
		}
		return s.toLowerCase();
	}
}

打成jar包上传到服务器/opt/module/jars/udf.jar

将jar包添加到hive的classpath
hive (default)> add jar /opt/module/datas/udf.jar;

创建临时函数与开发好的java class关联
hive (default)> create temporary function mf as "com._51doit.func.MyFunction";

即可在hql中使用自定义的函数strip
hive (default)> select ename, mylower(ename) lowername from emp;

删除： drop function mylower;

遥遥晚风点点

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Hive行转列函数，窗口函数，排序函数，UDF

行转列explode和lateral view：数据:示例：select explode(category) from movie;select * from movie lateral view explode(category) table_tmp as category_name;category_name是别名，from后面看做一张表窗口函数在进行分组聚合以后，还要操作聚合以前的数据使用到窗口函数排序函数RANK()...
复制链接

扫一扫

专栏目录