【ClickHouse】row_number() over (partition by)的几种实现方法

        hive中有row_number() over (partition by)函数,可以一句SQL实现想要的排序,在ClickHouse中有很多种实现方式,本篇就介绍一下几种方法。

目录

1.row_number排序

2.row_number排序后取出rank=1的结果

3.特殊场景


1.row_number排序

HIVE中写法:

select number,
       row_number() over (partition by number order by time desc) as rank
  from table a
 GROUP BY number

ClickHouse写法:

select number,
       groupArray(time) AS arr_val,
       arrayEnumerate(arr_val) as row_number
  from (select distinct orderid as number,
               toDate(operatetime) as time
          from table
         order by time desc
         ) a
 GROUP BY number

2.row_number排序后取出rank=1的结果

hive写法:

select orderid
  from (select orderid,
               row_number() over(partition by orderid order by datachange_lasttime desc) as row_num
          from table
         where d = '${CurrentDate}'
         ) a
 where row_num = 1;

ClickHouse写法:

方法1:利用groupArray

 

select orderid, 
       groupArray(1)(datachange_lasttime) as dates
  from (select orderid, 
               datachange_lasttime
          from table
         ORDER BY orderid, datachange_lasttime desc
        ) a
 group by orderid

方法2:利用max函数实现倒序,如果正序使用min函数即可

 

select orderid,
       max(datachange_lasttime) as datachange_lasttime
  from table
 group by orderid

方法3:利用rowNumberInAllBlocks函数

select orderid, status
  from (select orderid, status, rowNumberInAllBlocks() as rank
          from (select orderid, status, datachange_lasttime
                  from table
                 order by orderid, datachange_lasttime desc
                 ) a
       ) b LIMIT 1 BY orderid

方法4:利用arrayEnumerate函数

select orderid
  from (select orderid,
               groupArray(datachange_lasttime) AS arr_val,
               arrayEnumerate(arr_val) as row_number
          from (select orderid, datachange_lasttime
                  from table
                 order by datachange_lasttime desc
                 ) a
         GROUP BY number
         ) b
 where row_number = 1

3.特殊场景

要求:

对于以下场景,需要按照orderid分组,按照日期倒序,取最新一条,若日期一致,则随机取一条作为结果即可

hive写法:

select orderid
  from (select orderid,
               status,
               row_number() over(partition by orderid order by datachange_lasttime desc) as row_num
          from table
         where d = '${CurrentDate}'
         ) as b
 where row_num = 1

ClickHouse写法:

 通过上面的案例,我们很容易想到,把上面的结果作为一个子表,与原表进行关联,只是这样关联,随便举一个关联的写法:

select a.orderid as orderid_a, a.status as status
  from olap_htlmaindb.tmp_ord_orders_status_s_pre a
 inner join (select orderid, groupArray(1)(datachange_lasttime) as dates
               from (select orderid, datachange_lasttime
                       from table
                      ORDER BY orderid, datachange_lasttime desc
                     ) a
              group by orderid) b
    on a.orderid = b.orderid
   and cast(a.datachange_lasttime as String) = cast(b.dates [ 1 ] as String) 

 

 

这里我们是先把符合要求的orderid和时间取出来,再回去关联,取出需要的列,因为这些函数都有一个缺点是只能有partition by的字段和排序字段,不能有其他字段,所以要返回关联,所以上面四种方法,ininer join原表,都不能解决上面案例的问题。

 

这里就想到了LIMIT 1 BY这个方法,这个方法其实是最有效的,如下:

select orderid, 
       status, 
       datachange_lasttime
  from table
 order by orderid, datachange_lasttime desc 
 LIMIT 1 BY orderid

 

评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值