Hive+UDAF简单示例

最新推荐文章于 2024-06-12 13:57:58 发布

yangxiangpao

最新推荐文章于 2024-06-12 13:57:58 发布

阅读量991

点赞数 1

分类专栏： Hive 文章标签： hive udaf

Hive 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

转载自 http://blog.csdn.net/wisgood/article/details/26167367

在之前的一篇博文中,演示了一个使用通用UDTF来计算总分的小示例,下面用UDAF来做这个工作。

1.编写UDAF。

[java]view plaincopy 
   
 
   
 package com.wz.udf;  
   
 import org.apache.hadoop.hive.ql.exec.UDAF;  
 import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;  
 import org.apache.hadoop.io.Text;  
 import java.util.HashMap;  
 import java.util.Map;  
 public class helloUDAF extends UDAF {  
     public static class Evaluator implements UDAFEvaluator  
     {  
        //存放不同学生的总分  
        private static Map<String,Integer> ret;  
   
        public Evaluator()  
        {  
        super();  
            init();  
        }  
   
        //初始化  
        public void init()  
        {  
       ret = new HashMap<String,Integer>();  
        }  
   
        //map阶段，遍历所有记录  
        public boolean iterate(String strStudent,int nScore)  
        {   
          if(ret.containsKey(strStudent))  
          {  
             int nValue = ret.get(strStudent);  
             nValue +=nScore;  
             ret.put(strStudent,nValue);  
          }  
          else  
          {  
            ret.put(strStudent,nScore);  
          }  
          return true;  
        }  
       
        //返回最终结果   
        public Map<String,Integer> terminate()  
        {  
          return ret;  
        }  
   
        //combiner阶段，本例不需要  
        public Map<String,Integer> terminatePartial()   
        {  
           return ret;  
        }  
   
        //reduce阶段  
        public boolean merge(Map<String,Integer> other)  
        {  
             for (Map.Entry<String, Integer> e : other.entrySet()) {  
                 ret.put(e.getKey(),e.getValue());  
             }  
             return true;  
        }  
     }     
 }  

2.编译并打包成jar包。

javac -classpath /home/wangzhun/hadoop/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/wangzhun/hive/hive-0.8.1/lib/hive-exec-0.8.1.jar helloUDAF.java

jar cvf helloUDAF.jar com/wz/udf/helloUDAF*.class

3.在hive下面调用,创建临时函数,并执行查询得到结果。

[plain]view plaincopy 
   
 
   
 hive> add jar /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar;                  
 Added /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar to class path  
 Added resource: /home/wangzhun/hive/hive-0.8.1/lib/helloUDAF.jar  
 hive> create temporary function helloudaf as 'com.wz.udf.helloUDAF';             
 OK  
 Time taken: 0.02 seconds  
 hive> select helloudaf(studentScore.name,studentScore.score) from studentScore;  
 Total MapReduce jobs = 1  
 Launching Job 1 out of 1  
 Number of reduce tasks determined at compile time: 1  
 In order to change the average load for a reducer (in bytes):  
   set hive.exec.reducers.bytes.per.reducer=<number>  
 In order to limit the maximum number of reducers:  
   set hive.exec.reducers.max=<number>  
 In order to set a constant number of reducers:  
   set mapred.reduce.tasks=<number>  
 Starting Job = job_201311282251_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201311282251_0009  
 Kill Command = /home/wangzhun/hadoop/hadoop-0.20.2/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201311282251_0009  
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1  
 2013-11-29 00:34:01,290 Stage-1 map = 0%,  reduce = 0%  
 2013-11-29 00:34:04,316 Stage-1 map = 100%,  reduce = 0%  
 2013-11-29 00:34:13,403 Stage-1 map = 100%,  reduce = 100%  
 Ended Job = job_201311282251_0009  
 MapReduce Jobs Launched:   
 Job 0: Map: 1  Reduce: 1   HDFS Read: 40 HDFS Write: 12 SUCESS  
 Total MapReduce CPU Time Spent: 0 msec  
 OK  
 {"A":290,"B":325}  
 Time taken: 32.275 seconds