[tbl_votes]
- id
- item_id
- vote
Of course we can fix this by getting:
the smallest observation (so)
the lower quartile (lq)
the median (me)
the upper quartile (uq)
and the largest observation (lo)
..one-by-one using multiple queries but I am wondering if it can be done with a single query.
In Oracle I can use COUNT OVER and RATIO_TO_REPORT, but this is not supported in mySQL.
For those who don't know what a boxplot is: http://en.wikipedia.org/wiki/Box_plot
Any help would be appreciated.
解决方案
Here is an example of calculation of the quartiles for e256 value ranges within e32 groups, an index on (e32, e256) in this case is a must:
SELECT
@group:=IF(e32=@group, e32, GREATEST(@index:=-1, e32)) as e32_,
MIN(e256) as so,
MAX(IF(lq_i=(@index:=@index+1), e256, NULL)) as lq,
MAX(IF(me_i=@index, e256, NULL)) as me,
MAX(IF(uq_i=@index, e256, NULL)) as uq,
MAX(e256) as lo
FROM (SELECT @index:=NULL, @group:=NULL) as init, test t
JOIN (
SELECT e32,
COUNT(*) as cnt,
(COUNT(*) div 4) as lq_i, -- lq value index within the group
(COUNT(*) div 2) as me_i, -- me value index within the group
(COUNT(*) * 3 div 4) as uq_i -- uq value index within the group
FROM test
GROUP BY e32
) as cnts
USING (e32)
GROUP BY e32;
If there is no need in groupings, the query will be slightly simplier.
P.S. test is my playground table of random values where e32 is the result of Python's int(random.expovariate(1.0) * 32), etc.