average in mysql_How to calculate a moving average in MySQL in a correlated subquery?

I want to create a timeline report that shows, for each date in the timeline, a moving average of the latest N data points in a data set that has some measures and the dates they were measured. I have a calendar table populated with every day to provide the dates. I can calculate a timeline to show the overall average prior to that date fairly simply with a correlated subquery (the real situation is much more complex than this, but it can essentially be simplified to this):

SELECT c.date

, ( SELECT AVERAGE(m.value)

FROM measures as m

WHERE m.measured_on_dt <= c.date

) as `average_to_date`

FROM calendar c

WHERE c.date between date1 AND date2 -- graph boundaries

ORDER BY c.date ASC

I've spent days reading around this and I've not found any good solutions. Some have suggested that LIMIT might work in the subquery (LIMIT is supported in subqueries the current version of MySQL), however LIMIT applies to the return set, not the rows going into the aggregate, so it makes no difference to add it.

Nor can I write a non-aggregated SELECT with a LIMIT and then aggregate over that, because a correlated subquery is not allowed inside a FROM statement. So this (sadly) WON'T work:

SELECT c.date

, SELECT AVERAGE(last_5.value)

FROM ( SELECT m.value

FROM measures as m

WHERE m.measured_on_dt <= c.date

ORDER BY m.measured_on_dt DESC

LIMIT 5

) as `last_5`

FROM calendar c

WHERE c.date between date1 AND date2 -- graph boundaries

ORDER BY c.date ASC

I'm thinking I need to avoid the subquery approach completely and see if I do this with a clever join / row numbering technique with user-variables and then aggregate that but while I'm working on that I thought I'd ask if anyone knew a better method?

UPDATE: Okay, I've got a solution working which I've simplified for this example. It relies on some user-variable trickery to number the measures backwards from the calendar date. It also does a cross product with the calendar table (instead of a subquery) but this has the unfortunate side-effect of causing the row-numbering trick to fail (user-variables are evaluated when they're sent to the client, not when the row is evaluated) so to workaround this, I've had to nest the query one level, order the results and then apply the row-numbering trick to that set, which then works.

This query only returns calendar dates for which there are measures, so if you wanted the whole timeline you'd simply select the calendar and LEFT JOIN to this result set.= outertable.id. However, MySQL (5.0.45) reports "Unknown column 'outertable.id' in 'where clause'". Is this type of query possible The inner query is pivoting rows to columns using a GROUP BY. This could be entirely be performed in the ou

set @day = 0;

set @num = 0;

set @LIMIT = 5;

SELECT date

, AVG(value) as recent_N_AVG

FROM

( SELECT *

, @num := if(@day = c.date, @num + 1, 1) as day_row_number

, @day := day as dummy

FROM

( SELECT c.full_date

, m.value

, m.measured_on_dt

FROM calendar c

JOIN measures as m

WHERE m.measured_on_dt <= c.full_date

AND c.full_date BETWEEN date1 AND date2

ORDER BY c.full_date ASC, measured_on_dt DESC

) as full_data

) as numbered

WHERE day_row_number <= @LIMIT

GROUP BY date

The row numbering trick can be generalised to more complex data (my measures are in several dimensions which need aggregating up).

mysql

correlated-subquery

moving-average

|

this question

edited Apr 12 '12 at 14:51 asked Apr 12 '12 at 10:36

Gruff 424 2 11      So, is your solution self-resolved, or are you still stuck on something, and if so, what is it. Providing some sample data would help too... –

DRapp Apr 12 '12 at 14:00      I've resolved it, but it's a hack. It must be a common problem, so I'm wondering if there's a better solution. –

Gruff Apr 12 '12 at 16:21      Not really a hack if you want certain number per each group candidacy. the sql variables are perfect for that type of processing. –

DRapp Apr 12 '12 at 16:49

|

1 Answers

1

If your timeline is continuous (1 value each day) you could

your first attempt like this:

SELECT c.date,

( SELECT AVERAGE(m.value)

FROM measures as m

WHERE m.measured_on_dt

BETWEEN DATE_SUB(c.date, INTERVAL 5 day) AND c.date

) as `average_to_date`

FROM calendar c

WHERE c.date between date1 AND date2 -- graph boundaries

ORDER BY c.date ASC

If your timeline has holes in it this would result in less than 5 values for the average.

|

this answer answered Apr 12 '12 at 11:22

dgw 8,222 8 31 47      No, unfortunately the measured data is stochastic so this won't work. –

Gruff Apr 12 '12 at 13:44      @Gruff Oh well, I'll think about your new information ... –

dgw Apr 12 '12 at 13:57

|

s. Each patient has a medical record number and each procedure has a unique accession number. Therefore, a MRN can have multiple Accession numbers for difference procedures. Accession numbers are ascending, so if a patient has multiple acce

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值