Hive函数及性能优化
Hive函数分类
- 从输入输出角度分类
标准函数:一行数据中的一列或多列为输入,结果为单一值
聚合函数:多行的零列到多列为输入,结果为单一值
表生成函数:零个或多个输入,结果为多列或多行 - 从实现方式分类
1、内置函数
2、自定义函数: -
- 2.1、UDF:自定义标准函数
-
- 2.2、UDAF:自定义聚合函数
-
- 2.3、UDTF:自定义表生成函数
内置函数
Hive提供大量内置函数供开发者使用
标准函数
字符函数
案例:
select locate("Plaza",customer_street) rst from customers where locate("Plaza",customer_street) <>0
select split(customer_street," ") rst from customers
select customer_street,split(customer_street," ") rst,str_to_map(customer_street,":"," ") from customers
#将customers表中所有顾客姓名转换成大写
select upper(concat(customer_fname,"·",customer_lname)) from customers
类型转换函数
案例:
select customer_street,binary(customer_street,"latin")rst from customers
select customer_street,binary(customer_street,"latin"),cast(split(customer_street," ")[0] as bigint)r from customers where cast(split(customer_street," ")[0] as bigint)<6000
数学函数
案例:
#将orders表中订单金额保留两位小数
select round(order_item_subtotal,2) from order_items
日期函数
案例:
select from_unixtime(1600740000,"yyyy-MM-dd HH:mm:ss.S")rst1, unix_timestamp()rst2, unix_timestamp("1970-01-01 08:00:00")rst3,
to_date("2020-09-22 09:43:20")rst4, datediff("2020-09-22 09:43:20","2020-09-22 23:43:20")rst5,
date_add("2020-09-22 09:43:20",-1)rst6, date_format("2020-09-22 09:43:20","yyyy/MM/dd HH:mm:ss")rst7
集合函数
条件函数
案例:
select if(1=2,1,2)rst1, nvl(null,"abc")rst2, isnull(null)rst3, isnotnull("null")rst4
select if(1=1,1,2)rst1, nvl("def","abc")rst2, isnull("a")rst3, isnotnull(null)rst4