postgres查询序列,Postgres中的时间序列查询

This is a follow on question from @Erwin's answer to Efficient time series querying in Postgres.

In order to keep things simple I'll use the same table structure as that question

id | widget_id | for_date | score |

The original question was to get score for each of the widgets for every date in a range. If there was no entry for a widget on a date then show the score from the previous entry for that widget. The solution using a cross join and a window function worked well if all the data was contained in the range you were querying for. My problem is I want the previous score even if it lies outside the date range we are looking at.

Example data:

INSERT INTO score (id, widget_id, for_date, score) values

(1, 1337, '2012-04-07', 52),

(2, 2222, '2012-05-05', 99),

(3, 1337, '2012-05-07', 112),

(4, 2222, '2012-05-07', 101);

When I query for the range May 5th to May 10th 2012 (ie generate_series('2012-05-05'::date, '2012-05-10'::date, '1d')) I would like to get the following:

DAY WIDGET_ID SCORE

May, 05 2012 1337 52

May, 05 2012 2222 99

May, 06 2012 1337 52

May, 06 2012 2222 99

May, 07 2012 1337 112

May, 07 2012 2222 101

May, 08 2012 1337 112

May, 08 2012 2222 101

May, 09 2012 1337 112

May, 09 2012 2222 101

May, 10 2012 1337 112

May, 10 2012 2222 101

The best solution so far (also by @Erwin) is:

SELECT a.day, a.widget_id, s.score

FROM (

SELECT d.day, w.widget_id

,max(s.for_date) OVER (PARTITION BY w.widget_id ORDER BY d.day) AS effective_date

FROM (SELECT generate_series('2012-05-05'::date, '2012-05-10'::date, '1d')::date AS day) d

CROSS JOIN (SELECT DISTINCT widget_id FROM score) AS w

LEFT JOIN score s ON s.for_date = d.day AND s.widget_id = w.widget_id

) a

LEFT JOIN score s ON s.for_date = a.effective_date AND s.widget_id = a.widget_id

ORDER BY a.day, a.widget_id;

But as you can see in this SQL Fiddle it produces null scores for widget 1337 on the first two days. I would like to see the earlier score of 52 from row 1 in its place.

Is it possible to do this in an efficient way?

解决方案

As @Roman mentioned, DISTINCT ON can solve this. Details in this related answer:

Subqueries are generally a bit faster than CTEs, though:

SELECT DISTINCT ON (d.day, w.widget_id)

d.day, w.widget_id, s.score

FROM generate_series('2012-05-05'::date, '2012-05-10'::date, '1d') d(day)

CROSS JOIN (SELECT DISTINCT widget_id FROM score) AS w

LEFT JOIN score s ON s.widget_id = w.widget_id AND s.for_date <= d.day

ORDER BY d.day, w.widget_id, s.for_date DESC;

You can use a set returning function like a table in the FROM list.

One multicolumn index should be the key to performance:

CREATE INDEX score_multi_idx ON score (widget_id, for_date, score)

The third column score is only included to make it a covering index in Postgres 9.2 or later. You would not include it in earlier versions.

Of course, if you have many widgets and a wide range of days, the CROSS JOIN produces a lot of rows, which has a price-tag. Only select the widgets and days you actually need.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值