一、自定义UDF
在hql中可以满足大部分的需求,但是可能还有部分特定的业务需要自己自定义函数来实现,下面来模拟实现:
1、表的建立:
t_p
0: jdbc:hive2://mini01:10000> desc t_p;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| id | string | |
| name | string | |
+-----------+------------+----------+--+
t_flow
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| phone | int | |
| flow | int | |
+-----------+------------+----------+--+
hql表的建立就省略了,看看数据:
0: jdbc:hive2://mini01:10000> select * from t_p;
+---------+-----------+--+
| t_p.id | t_p.name |
+---------+-----------+--+
| 14 | TIANJUN |
| 1 | ni |
| 2 | hao |
| 3 | wo |
| 4 | shi |
| 5 | zhong |
| 6 | guo |
| 7 | ren |
| 8 | ha |
| 9 | wa |
| 10 | ni |
| 12 | wo |
| 13 | ta |
+---------+-----------+--+
0: jdbc:hive2://mini01:10000> select * from t_flow;
+---------------+--------------+--+
| t_flow.phone | t_flow.flow |
+---------------+--------------+--+
| 136214 | 10 |
| 137123 | 14 |
| 137123 | 14 |
| 136123 | 14 |
| 137123 | 14 |
| 138123 | 14 |
| 137123 | 14 |
| 136127 | 14 |
+---------------+--------------+--+
2、自定义函数的实现:
可以通过重载实现函数的多样性,函数必须是public
package hive.lower;
import org.apache.hadoop.hive.ql.exec.UDF;
import java.util.HashMap;
import java.util.Map;
/**
* hive自定义函数
* Created by tianjun on 2017/3/21.
*/
public class ToLowerCase extends UDF {
public Map<String,String> province = new HashMap<>();
//hive使用,必须是public
public String evaluate(String field){
String result = field.toLowerCase();
return result;
}
public String evaluate(int number){
province.put("136","beigjing");
province.put("137","shanghai");
province.put("138","wuhan");
String pnb = String.valueOf(number);
return province.get(pnb.substring(0,3))==null?"unknow":province.get(pnb.substring(0,3));
}
public static void main(String[] args) {
System.out.println(new ToLowerCase().evaluate(137214));
}
}
二、步骤
1、打成jar包上传到服务器
2、将jar包添加到hive的classpath
hive>add JAR /root/original-mr.demo-1.0-SNAPSHOT.jar;
3、创建临时函数与开发好的java class关联
Hive>create temporary function province as 'hive.lower.ToLowerCase';
t_p的查询:
0: jdbc:hive2://mini01:10000> select id, province(name) lowerName from t_p;
+-----+------------+--+
| id | lowername |
+-----+------------+--+
| 14 | tianjun |
| 1 | ni |
| 2 | hao |
| 3 | wo |
| 4 | shi |
| 5 | zhong |
| 6 | guo |
| 7 | ren |
| 8 | ha |
| 9 | wa |
| 10 | ni |
| 12 | wo |
| 13 | ta |
+-----+------------+--+
t_flow的查询
0: jdbc:hive2://mini01:10000> select phone,province(phone) province ,flow from t_flow;
+---------+-----------+-------+--+
| phone | province | flow |
+---------+-----------+-------+--+
| 136214 | beigjing | 10 |
| 137123 | shanghai | 14 |
| 137123 | shanghai | 14 |
| 136123 | beigjing | 14 |
| 137123 | shanghai | 14 |
| 138123 | wuhan | 14 |
| 137123 | shanghai | 14 |
| 136127 | beigjing | 14 |
+---------+-----------+-------+--+