行转列
explode和lateral view:
数据:
示例:
select explode(category) from movie;
select * from movie lateral view explode(category) table_tmp as category_name;
category_name是别名,from后面看做一张表
窗口函数
在进行分组聚合以后 ,还要操作聚合以前的数据 使用到窗口函数
默认是偏移量到当前行,适用sum,count等函数
偏移量到分组范围内,适用max,min等聚合函数:
rows between unbounded preceding and unbounded following
参考:https://zhuanlan.zhihu.com/p/92654574
排序函数
RANK() 排序相同时会重复,总数不会变
DENSE_RANK() 排序相同时会重复,总数会减少
ROW_NUMBER() 会根据顺序计算
示例:按每科分组,按分数排名
select name,subject,core,
rank() over(partition by subject order by score desc) rp,
dense_rank() over(partition by subject order by score desc) drp,
row_number() over(partition by subject order by score desc) rmp
from score;
返回:
UDF
UDF(User-Defined-Function) 一进一出
UDAF(User-Defined Aggregation Function) 聚集函数,多进一出
类似于:count/max/min
UDTF(User-Defined Table-Generating Functions) 一进多出
如lateral view explore() split()
自定义UDF步骤:
1,继承org.apache.hadoop.hive.ql.UDF
2,实现evaluate函数;evaluate函数支持重载
3,在hive的命令行窗口创建函数
a):添加jar:add jar linux_jar_path
b):创建function:create [temporary] function [dbname.]function_name AS class_name;
4,在hive的命令行窗口删除函数
Drop [temporary] function [if exists] [dbname.]function_name;
实例:
maven依赖:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>
实现类:
import org.apache.hadoop.hive.ql.exec.UDF;
public class Lower extends UDF {
public String evaluate (final String s) {
if (s == null) {
return null;
}
return s.toLowerCase();
}
}
打成jar包上传到服务器/opt/module/jars/udf.jar
将jar包添加到hive的classpath
hive (default)> add jar /opt/module/datas/udf.jar;
创建临时函数与开发好的java class关联
hive (default)> create temporary function mf as "com._51doit.func.MyFunction";
即可在hql中使用自定义的函数strip
hive (default)> select ename, mylower(ename) lowername from emp;
删除: drop function mylower;