hive 分析函数缺陷

 
------  table(1)  数据全集
use databasename;
drop table if exists databasename.Tmp_CiiFcst_hotel_Cii_up_sample_1020;
create  table  databasename.Tmp_CiiFcst_hotel_Cii_up_sample_1020 as
select a.*,b.city
from
(select *
from databasename.CiiFcst_hotel_Cii_basetable
where d='2016-08-29') a
inner join 
(select distinct masterhotel, city  from databasename.CiiFrcst_Hotel_Info  ---有重复的数据
where d='2016-10-19') b on a.hotelid=b.masterhotel
where b.city in (375
,28
,258
,38
,660
,3996
,3997
,4004
,4029
,4125
,4130
,4131
,4137
,4139
,4144);


--- select hotelid, count(*)
--- from databasename.Tmp_CiiFcst_hotel_Cii_up_sample_1020
--- group by hotelid;
------   各组均为31
------ table (2)
---- distribute by, sort by
use databasename;
drop table if exists databasename.Tmp_CiiFcst_hotel_Cii_up_sample_1020_1;
create  table  databasename.Tmp_CiiFcst_hotel_Cii_up_sample_1020_1 as
select a.*, 
rank()over(partition by hotelid order by starttime asc) rank   ----- 用row_num()会导致重复,为何? 
---,sum(notcancelcii)over(partition by hotelid) /sum(notcancelcii)over() r1, round(100000*notcancelcii/sum(notcancelcii)over()) cnt
from databasename.Tmp_CiiFcst_hotel_Cii_up_sample_1020 a;
 
 
 
Hive bug: rank()over() 与row_num()over() 与 sum()over()分析函数不能混用,会导致数据重复
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值