转载自 http://blog.csdn.net/wisgood/article/details/26167367
在之前的一篇博文中,演示了一个使用通用UDTF来计算总分的小示例,下面用UDAF来做这个工作。
1.编写UDAF。
- package com.wz.udf;
- import org.apache.hadoop.hive.ql.exec.UDAF;
- import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
- import org.apache.hadoop.io.Text;
- import java.util.HashMap;
- import java.util.Map;
- public class helloUDAF extends UDAF {
- public static class Evaluator implements UDAFEvaluator
- {
- //存放不同学生的总分
- private static Map<String,Integer> ret;
- public Evaluator()
- {
- super();
- init();
- }
- //初始化
- public void init()
- {
- ret = new HashMap<String,Integer>();
- }
- //map阶段,遍历所有记录
- public boolean iterate(String strStudent,int nScore)
- {
- if(ret.containsKey(strStudent))
- {
- int nValue = ret.get(strStudent);
- nValue +=nScore;
- ret.put(strStudent,nValue);
- }
- else
- {
- ret.put(strStudent,nScore);
- }
- return true;
- }
- //返回最终结果
- public Map<String,Integer> terminate()
- {
- return ret;
- }
- //combiner阶段,本例不需要
- public Map<String,Integer> terminatePartial()
- {
- return ret;
- }
- //reduce阶段
- public boolean merge(Map<String,Integer> other)
- {
- for (Map.Entry<String, Integer> e : other.entrySet()) {
- ret.put(e.getKey(),e.getValue());
- }
- return true;
- }
- }
- }
2.编译并打包成jar包。
javac -classpath /home/wangzhun/hadoop/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/wangzhun/hive/hive-0.8.1/lib/hive-exec-0.8.1.jar helloUDAF.java
jar cvf helloUDAF.jar com/wz/udf/helloUDAF*.class
3.在hive下面调用,创建临时函数,并执行查询得到结果。
- hive> add jar /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar;
- Added /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar to class path
- Added resource: /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar
- hive> create temporary function helloudaf as 'com.wz.udf.helloUDAF';
- OK
- Time taken: 0.02 seconds
- hive> select helloudaf(studentScore.name,studentScore.score) from studentScore;
- Total MapReduce jobs = 1
- Launching Job 1 out of 1
- Number of reduce tasks determined at compile time: 1
- In order to change the average load for a reducer (in bytes):
- set hive.exec.reducers.bytes.per.reducer=<number>
- In order to limit the maximum number of reducers:
- set hive.exec.reducers.max=<number>
- In order to set a constant number of reducers:
- set mapred.reduce.tasks=<number>
- Starting Job = job_201311282251_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201311282251_0009
- Kill Command = /home/wangzhun/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201311282251_0009
- Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
- 2013-11-29 00:34:01,290 Stage-1 map = 0%, reduce = 0%
- 2013-11-29 00:34:04,316 Stage-1 map = 100%, reduce = 0%
- 2013-11-29 00:34:13,403 Stage-1 map = 100%, reduce = 100%
- Ended Job = job_201311282251_0009
- MapReduce Jobs Launched:
- Job 0: Map: 1 Reduce: 1 HDFS Read: 40 HDFS Write: 12 SUCESS
- Total MapReduce CPU Time Spent: 0 msec
- OK
- {"A":290,"B":325}
- Time taken: 32.275 seconds