hive row_number()内置函数使用例子

文章来源:http://zhaoyan0922-photo.blog.163.com/blog/static/262384412014111542856545/

关于hive row_number()内置http://zhaoyan0922-photo.blog.163.com/blog/static/262384412014111542856545/函数使用例子

  • Hive-0.11.0中内置row_number函数

    org.apache.hadoop.hive.ql.exe.FunctionRegistry 
    
    
    registerHiveUDAFsAsWindowFunctions();
    registerWindowFunction("row_number", new GenericUDAFRowNumber());  --row_number实现类
    registerWindowFunction("rank", new GenericUDAFRank());
    registerWindowFunction("dense_rank", new GenericUDAFDenseRank());
    registerWindowFunction("percent_rank", new GenericUDAFPercentRank());
    registerWindowFunction("cume_dist", new GenericUDAFCumeDist());
    registerWindowFunction("ntile", new GenericUDAFNTile());
    registerWindowFunction("first_value", new GenericUDAFFirstValue());
    registerWindowFunction("last_value", new GenericUDAFLastValue());
    registerWindowFunction(LEAD_FUNC_NAME, new GenericUDAFLead(), false);
    registerWindowFunction(LAG_FUNC_NAME, new GenericUDAFLag(), false);
    
  • 使用的DEMO

    • 从test_s表中根据c1字段除重,选取c2最大的值,导入test_c表.
  • DEMO
    hive> 
    desc test_s;
    OK
    col_name data_type      comment
    c1                      int                     None                
    c2                      string                  None                
    Time taken: 1.87 seconds, Fetched: 2 row(s)

    hive> 
        > select * from test_s;
    OK
    c1      c2
    1       str1
    2       str2
    3       str3
    3       str31
    3       str33
    4       str41
    4       str42
    Time taken: 0.671 seconds, Fetched: 7 row(s)

    hive> select c1,c2,row_number() over (distribute by c1
        > sort by c2 desc) rownum
        >   from test_s ;

    OK
    c1      c2      rownum
    1       str1    1
    2       str2    1
    3       str33   1
    3       str31   2
    3       str3    3
    4       str42   1
    4       str41   2
    Time taken: 17.23 seconds, Fetched: 7 row(s)

    hive> 
        > 
        > desc test_c;
    OK
    col_name data_type      comment
    c1                      int                     None                
    c2                      string                  None                
    Time taken: 0.118 seconds, Fetched: 2 row(s)


    hive> 
        > insert into table test_c
        > select c1,c2
        > from (
        > select c1,c2,row_number() over (distribute by c1
        > sort by c2 desc) rownum
        >   from test_s 
        > ) aa
        > where aa.rownum = 1;
        OK
        Time taken: 19.609 seconds

    hive> 
        > 
        > 
        > 
        > select * from test_c;
        OK
        c1      c2
        1       str1
        2       str2
        3       str33
        4       str42
        Time taken: 0.144 seconds, Fetched: 4 row(s)
以上都是转发自http://blog.csdn.net/yangjun2/article/details/9339641
下面是我的感受:
我现在使用的hive的版本是0.12,但如果写成上面那样,还是报错,不会执行:如:
hive> select sale_ord_id,ivc_tm from 
    > (select sale_ord_id,ivc_tm,row_number() over (distribute by sale_ord_id sort by ivc_tm desc) rn 
    > from gdm_mXX_inv_actual_det_sum_da 
    > where dt='2014-12-09'
    > and valid_flag=1) a
    > where a.rn=1
    > limit 50
    > ;
FAILED: NullPointerException null
后来同事告诉我,这个版本还是不支持的,需要写在row_number()的括号里面,于是我改成了如下方式:
select sale_ord_id,ivc_title,row_number(ivc_tm) as rn 
from 
(select sale_ord_id,ivc_tm,ivc_title 
from gdm_mXX_inv_actual_det_sum_da 
where dt='2014-12-09'
and valid_flag=1
distribute by sale_ord_id 
sort by ivc_tm desc) a
where row_number(ivc_tm)=1
limit 50
;
这次可以了。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值