大数据之hive函数_以下hive函数,不要属于table-generate-CSDN博客

本文链接：https://blog.csdn.net/JeitZz/article/details/116614373

函数

系统内置函数(较多用的)

hive> show funtions;
日期函数:
-- 时间戳转日期 
select from_unixtime(1505456567); 
select from_unixtime(1505456567,'yyyyMMdd'); 
select from_unixtime(1505456567,'yyyy-MM-dd HH:mm:ss');

-- 获取当前时间戳 
select unix_timestamp();

-- 日期转时间戳 
select unix_timestamp('2017-09-15 14:23:00');

-- 计算时间差 
select datediff('2018-06-18','2018-11-21');

-- 查询当月第几天 
select dayofmonth(current_date);


字符串函数:
lower--（转小写） 
select lower('ABC');

concat--（字符串拼接） 
select concat("A", 'B'); 

concat_ws --（指定分隔符） 
select concat_ws('-','a' ,'b','c'); 

substr--（求子串） 
select substr('abcde',3);

upper--（转大写） 
select lower('abc');

类型转换函数:
cast(value as type) -- 类型转换 
select cast('123' as int)+1;

数学函数:
round --四舍五入((42.3 =>42)) 
select round(42.3); 

ceil --向上取整(42.3 =>43) 
select ceil(42.3); 

floor --向下取整(42.3 =>42) 
select floor(42.3);

判断为空的函数:
nvl（expr1，expr2） 
#作用：将查询为Null值转换为指定值。 
#若expr1为Null，则返回expr2，否则返回expr1。 
select nvl(count,2);

窗口函数

select * ,count(*) over() from t_order;

select name,order,cost,sum(cost) over(partition by month(orderdate)) from t_order;

select name, orderdate, cost, sum(cost) over (partition by month(orderdate) order by orderdate) from t_order;

window字句

preceding:往前
following:往后
current row:当前行
unbounded:起点   unbounded precending 表示从前面的起点  unbounded following:表示到后面的终点

序列函数

按照name进行分组,在分组内将数据切成3份

select name,orderdate,cost, ntile(3) over(partition by name) from t_order;

排名函数

要结合开窗函数使用
row_number()：没有并列，相同名次依顺序排  12345

rank()：有并列，相同名次空位             12225

dense_rank()：有并列，相同名次不空位     1223445

自定义函数

实现hive内置函数满足不了的功能

UDF：用户自定义函数，user defined function。一对一的输入输出。
UDTF：用户自定义表生成函数。user defined table-generate function.一对多的输入输
出。lateral view explode
UDAF：用户自定义聚合函数。user defined aggregate function。多对一的输入输出
count sum max。

自定义函数实现

以UDF为例子

1.在pom.xml,加入以下maven的依赖包请查看

<dependency>
	<groupId>org.apache.hive</groupId>
	<artifactId>hive-exec</artifactId>
	<version>2.1.1</version>
</dependency>

2.函数继承org.apache.hadoop.hive.ql.exec.UDF

3.重写 evaluate ()，

public class FirstUDF extends UDF {
public String evaluate(String str){ 
//关于默认输出值是null，还是“”，这个要看需求具体定义，在这里先默认定义为 null， 
    String result = null; //1、检查输入参数 
    if (!StringUtils.isEmpty(str)){
        result = str.toUpperCase();
    }
    return result;
}//调试自定义函数 
    public static void main(String[] args){
        System.out.println(new FirstUDF().evaluate("qianfengedu"));
    }
}

4.打包，上传到Linux

1.命令加载
--进入到hive客户端,执行下面命令
hive> add jar /opt/jar/udf.jar

-- 创建一个临时函数名
hive> create temporary function toUP as 'com.fun.hive.FunUDF';

-- 检查函数是否创建成功 
hive> show functions;

-- 测试功能 
hive> select toUp('abcdef');

2.启动参数加载
 vi ./hive-init
 
 文件中加入脚本
 add jar /opt/jar/udf.jar
 create temporary function toUP as 'com.fun.hive.FunUDF'
 
 启动hive带上初始文件
 hive -i ./hive-init
 
 -- 测试功能 
hive> select toUp('abcdef');

3.配置文件加载
 在hive的安装目录的bin目录下创建一个配置文件，文件名：.hiverc
 vi ./bin/.hiverc

 文件中加入脚本
 add jar /opt/jar/udf.jar
 create temporary function toUP as 'com.fun.hive.FunUDF'
 
 启动hive