在hive做增量数据的时候,多次重复运行可以同行去重取第一条数据,达到“重跑”的效果。
函数:
ROW_NUMBER() OVER
简单的说row_number()从1开始,为每一条分组记录返回一个数字,这里的ROW_NUMBER() OVER (ORDER BY xlh DESC) 是先把xlh列降序,再为降序以后的没条xlh记录返回一个序号。
例如:
select ownerid,createdatetime,row_number() over (partition by ownerid order by createdatetime desc) num
from stage.src_t_order where ownerid=56000 limit 10
表示根据ownerid分组,在分组内部根据 createdatetime降序,排序后会给一条num
然后通过过滤即可获取第一条数据
select * from
(select ownerid,createdatetime,row_number() over (partition by ownerid order by createdatetime desc) num
from stage.src_t_order where ownerid=56000 limit 10) t where num=1
结果如下: