关于一条查询语句的讨论

由于数据仓库中某表数据达到137,669,168条记录,但是我们需要对该表中的数据做某些特殊的业务处理。因此,我们决定按天来划分出一些统计表。
我首先写出的SQL语句是:
[code]
create table views_ads_date
as select a.user_id,count(b.id) as Impressions,b.crdate as DateRange
from ads a join views b on a.id = b.ad_id
group by a.user_id,b.crdate;
[/code]
Marcus指出这样存在问题:
No, this will not group per day. This will group per millisecond.

select a.user_id,count(b.id) as Impressions,
to_char(b.crdate, 'YYYY-MM-DD') as DateRange
from ads a join views b on a.id = b.ad_id
group by a.user_id,DateRange;

Marcus再次指出这样在性能上会有问题:
No. You'll convert the date to a string which is slow to search on.
[code]
select a.user_id,count(b.id) as Impressions,b.crdate::date as DateRange
from ads a join views b on a.id = b.ad_id
group by a.user_id,DateRange;
[/code]
Marcus反问为什么不直接使用date函数。
因为我在测试的时候,发现用date_extract返回的格式是:0000-00-00 00:00:00,而我想要YYYY-MM-DD的格式,所以就那样写了。
结果,Marcus告知我:No, the format isn't that. It is PRINTED as that. The format is a binary number of the amount of microseconds since 1976-01-01 I think it is.

Big difference. You want this type because it is fast to search at.

最终:
[code]
create table views_ads_date as select a.user_id,count(b.id) as Impressions,date_trunc('day',b.crdate) as DateRange
from ads a join views b on a.id = b.ad_id
group by a.user_id,DateRange;
[/code]
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值