PostgreSQL SQL优化一列

开发咨询一个业务统计, 场景类似于统计一个学生成绩表总成绩的第一名,有可能有多个值。提交上来的SQL看了下性能较差,优化如下。

一、环境
PostgreSQL 9.2.4
OS CentOS 6.3

二、业务场景模拟
1.数据准备
CREATE TABLE tbl_student_record(
  stu_name varchar,
  stu_record int,
  stu_type int2,
  create_date date
);
INSERT INTO tbl_student_record VALUES('zhangsan',80, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('zhangsan',90, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('zhangsan',100, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('sili',80, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('sili',95, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('sili',95, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('wangwu',75, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('wangwu',35, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('wangwu',55, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('zhaoliu',70, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('zhaoliu',70, 1, '2014/02/26');
INSERT INTO tbl_student_record VALUES('zhaoliu',90, 1, '2014/02/26');
--需要统计出的结果是
stu_name | nn  
----------+-----
 zhangsan | 270
 sili     | 270
(2 rows)
2.优化前
因为没有另外的冗余表提供学生的总成绩,故一般的思路是先计算出第一名的总分是多少然后根据这个总分再去匹配,或者能提前知道总分然后按排序取其最大的值

优化前的脚本:
kenyon=# explain select stu_name,sum(stu_record)  nn from tbl_student_record d 
           group by stu_name having sum(stu_record)= (select t.n from(
              select b.stu_name,sum(b.stu_record)as n from tb l_studen t_record b 
                group by stu_name order by n desc)as t limit 1);
                                               QUERY PLAN                                                
--------------------------------------------------------------------------------------------
 HashAggregate  (cost=66.38..68.88 rows=200 width=36)
   Filter: (sum(d.stu_record) = $0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=36.59..36.61 rows=1 width=8)
           ->  Subquery Scan on t  (cost=36.59..39.09 rows=200 width=8)
                 ->  Sort  (cost=36.59..37.09 rows=200 width=36)
                       Sort Key: (sum(b.stu_record))
                       ->  HashAggregate  (cost=26.95..28.95 rows=200 width=36)
                             ->  Seq Scan on tbl_student_record b  (cost=0.00..21.30 rows=1130 width=36)
   ->  Seq Scan on tbl_student_record d  (cost=0.00..21.30 rows=1130 width=36)
(10 rows)
3.优化后

--使用window function
kenyon=# explain select * from (with tmp as(
       select stu_name,sum(stu_record) as sum_score from tbl_student_record group by stu_name
       select rank() OVER ( order by tmp.sum_score desc) as cc,* from tmp) as tc where tc.cc = 1;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Subquery Scan on tc  (cost=40.59..46.59 rows=1 width=48)
   Filter: (tc.cc = 1)
   ->  WindowAgg  (cost=40.59..44.09 rows=200 width=40)
         CTE tmp
           ->  HashAggregate  (cost=26.95..28.95 rows=200 width=36)
                 ->  Seq Scan on tbl_student_record  (cost=0.00..21.30 rows=1130 width=36)
         ->  Sort  (cost=11.64..12.14 rows=200 width=40)
               Sort Key: tmp.sum_score
               ->  CTE Scan on tmp  (cost=0.00..4.00 rows=200 width=40)
(9 rows)
三、分析

相比较之前的SQL其COST值能下降下来,是因为少了一次group by分组统计,如果数据量较大其SQL性能差异更明显;当然对这种事实数据整理到一张冗余表统计其实对查询效率提升最大,可以使用物化视图或者定期程序更新。

四、参考
http://my.oschina.net/Kenyon/blog/79543

转载于:https://my.oschina.net/Kenyon/blog/203724

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值