9,group by语句
Group by主要用来分组统计,这个是开发中经常被使用的语句,先来看几个语句的运行情况:
例如求各部门内的最大工钱值,根据部门来分组统计:
SQL> select deptno,max(sal) from scott.emp group by deptno;
DEPTNO MAX(SAL)
---------- ----------
30 2850
20 3000
10 5000
Execution Plan
----------------------------------------------------------
Plan hash value: 4067220884
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 21 | 4 (25)| 00:00:01 |
| 1 | HASH GROUP BY | | 3 | 21 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
524 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
增加一个order by deptno子句
SQL> select deptno,max(sal) from scott.emp group by deptno order by deptno;
DEPTNO MAX(SAL)
---------- ----------
10 5000
20 3000
30 2850
Execution Plan
----------------------------------------------------------
Plan hash value: 15469362
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 21 | 4 (25)| 00:00:01 |
| 1 | SORT GROUP BY | | 3 | 21 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
524 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
3 rows processed
增加一个order by max(sal)子句
SQL> select deptno,max(sal) from scott.emp group by deptno order by max(sal);
DEPTNO MAX(SAL)
---------- ----------
30 2850
20 3000
10 5000
Execution Plan
----------------------------------------------------------
Plan hash value: 2664716850
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 21 | 5 (40)| 00:00:01 |
| 1 | SORT ORDER BY | | 3 | 21 | 5 (40)| 00:00:01 |
| 2 | HASH GROUP BY | | 3 | 21 | 5 (40)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
524 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
3 rows processed
上述三个语句略有不同,不过都是从全表扫描emp表开始执行,然后在此基础上进行group by或者order by,具体方式有所不同,这里仅仅把执行计划和统计信息给出来,具体在性能调整一章中进行详细测试分析与叙述。
主要来看看group by的增强语句,例如:having 子句,grouping sets,roll up,cube,group_id等
Having 子句
改进上述查询语句,只需要工钱大于等于3000的记录:
SQL> select deptno,max(sal) from scott.emp group by deptno having max(sal) >=3000 order by max(sal);
DEPTNO MAX(SAL)
---------- ----------
20 3000
10 5000
Execution Plan
----------------------------------------------------------
Plan hash value: 3611938775
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 5 (40)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 7 | 5 (40)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | HASH GROUP BY | | 1 | 7 | 5 (40)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Rollup语句
Rollup可以理解为一维多层的统计,往往在需要按层次统计的时候用到。
修改一下需求,现在需要按照部门,分工作职位来统计工钱的总和,并求全公司之和,可以简单的使用rollup来完成:
SQL> select a.deptno, b.job, sum(b.sal)
2 from scott.dept a, scott.emp b
3 where a.deptno = b.deptno
4 group by rollup(a.deptno, b.job);
DEPTNO JOB SUM(B.SAL)
---------- --------- ----------
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
10 8750
20 CLERK 1900
20 ANALYST 6000
20 MANAGER 2000
20 9900
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600
DEPTNO JOB SUM(B.SAL)
---------- --------- ----------
30 9400
28050
13 rows selected.
可以看到使用rollup使得sum增加了聚合的级别,即实现了对多层进行分组统计计算。
现在稍稍修改一下需求,不需要对全公司进行统计,只需要计算各部门的总和和部门下不同工种的求和,那么修改后的语句可以如下:
SQL> select a.deptno, b.job, sum(b.sal)
2 from scott.dept a, scott.emp b
3 where a.deptno = b.deptno
4 group by a.deptno,rollup( b.job);
DEPTNO JOB SUM(B.SAL)
---------- --------- ----------
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
10 8750
20 CLERK 1900
20 ANALYST 6000
20 MANAGER 2000
20 9900
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600
DEPTNO JOB SUM(B.SAL)
---------- --------- ----------
30 9400
12 rows selected.
再增加一个工头的字段,需要知道不同的工头下面工人的情况:
SQL> select a.deptno, b.job,b.mgr, sum(b.sal)
2 from scott.dept a, scott.emp b
3 where a.deptno = b.deptno
4 group by a.deptno,rollup( b.job,b.mgr);
DEPTNO JOB MGR SUM(B.SAL)
---------- --------- ---------- ----------
10 CLERK 7782 1300
10 CLERK 1300
10 MANAGER 7839 2450
10 MANAGER 2450
10 PRESIDENT 5000
10 PRESIDENT 5000
10 8750
20 CLERK 7788 1100
20 CLERK 7902 800
20 CLERK 1900
20 ANALYST 7566 6000
DEPTNO JOB MGR SUM(B.SAL)
---------- --------- ---------- ----------
20 ANALYST 6000
20 MANAGER 7839 2000
20 MANAGER 2000
20 9900
30 CLERK 7698 950
30 CLERK 950
30 MANAGER 7839 2850
30 MANAGER 2850
30 SALESMAN 7698 5600
30 SALESMAN 5600
30 9400
22 rows selected.
可以这样理解上面rollup语句:
group by col1,rollup(col2,col3)会进行如下计算:
(col1,col2,col3)
(col1,col2)
(col1)
Cube语句
Cube的意思是立方,主要是用来进行多维度的统计的,Cube有时也可以代替rollup,比如group by col1,rollup(col2)就可以用group by col1,cube(col2)来代替,可以用下面的语句带实现前面rollup的实现2:
SQL> select a.deptno, b.job, sum(b.sal)
2 from scott.dept a, scott.emp b
3 where a.deptno = b.deptno
4 group by a.deptno,cube( b.job);
DEPTNO JOB SUM(B.SAL)
---------- --------- ----------
10 8750
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
20 9900
20 CLERK 1900
20 ANALYST 6000
20 MANAGER 2000
30 9400
30 CLERK 950
30 MANAGER 2850
DEPTNO JOB SUM(B.SAL)
---------- --------- ----------
30 SALESMAN 5600
12 rows selected.
可以看到除了结果集的排序不同以外,其他均相同。这时候使用rollup和cube分别进行了如下的计算:
Rollup:
(col1,col2)
(col1)
Cube:
(col1)
(col1,col2)
注意上述表达式的顺序
但是cube的group by col1,cube(col2,col3)与rollup差别较大,事实上它的计算如下:
(col1)
(col1,col3)
(col1,col2)
(col1,col2,col3)
而group by cube(col1,col2,col3)则会计算2的3次方次,即维度为3.计算如下:
(col1),(col2),(col3),(col2,col3),(col1,col2),(col1,col3),(col1,col2,col3),()
Grouping sets语句
如果说rollup和cube是oracle预定义了的计算维度,那么grouping sets则可以理解为可以自己设置计算维度的一个表达式,用下面一个例子来看:
SQL> select a.deptno, b.job,b.mgr, sum(b.sal)
2 from scott.dept a, scott.emp b
3 where a.deptno = b.deptno
4 group by a.deptno,cube( b.job,b.mgr)
5 minus
6 select a.deptno, b.job,b.mgr, sum(b.sal)
7 from scott.dept a, scott.emp b
8 where a.deptno = b.deptno
9 group by a.deptno,grouping sets((b.job,b.mgr),(b.job),(b.mgr),());
no rows selected
说明了group by a.deptno,cube( b.job,b.mgr)
与group by a.deptno,grouping sets((b.job,b.mgr),(b.job),(b.mgr),())的结果是完全一样的,根据对cube的理解则grouping sets也很容易理解,不在累述。
Grouping_id和group_id函数
对于使用cube或者rollup合作而后grouping sets的语句,可以使用group_id()或者grouping_id()来获得不同的分组:
SQL> select a.deptno, b.job, sum(b.sal),group_id() g_id
2 from scott.dept a, scott.emp b
3 where a.deptno = b.deptno
4 group by a.deptno,cube( a.deptno,b.job);
DEPTNO JOB SUM(B.SAL) G_ID
---------- --------- ---------- ----------
10 CLERK 1300 0
10 MANAGER 2450 0
10 PRESIDENT 5000 0
20 CLERK 1900 0
20 ANALYST 6000 0
20 MANAGER 2000 0
30 CLERK 950 0
30 MANAGER 2850 0
30 SALESMAN 5600 0
10 CLERK 1300 1
10 MANAGER 2450 1
DEPTNO JOB SUM(B.SAL) G_ID
---------- --------- ---------- ----------
10 PRESIDENT 5000 1
20 CLERK 1900 1
20 ANALYST 6000 1
20 MANAGER 2000 1
30 CLERK 950 1
30 MANAGER 2850 1
30 SALESMAN 5600 1
10 8750 0
20 9900 0
30 9400 0
10 8750 1
DEPTNO JOB SUM(B.SAL) G_ID
---------- --------- ---------- ----------
20 9900 1
30 9400 1
24 rows selected.
SQL> select group_id() gp_id,
2 grouping_id(a.deptno, b.job,b.mgr) gpp_id,
3 a.deptno, b.job,b.mgr,sum(b.sal)
4 from scott.dept a, scott.emp b
5 where a.deptno = b.deptno
6 group by a.deptno,grouping sets( a.deptno,b.job,b.mgr);
GP_ID GPP_ID DEPTNO JOB MGR SUM(B.SAL)
---------- ---------- ---------- --------- ---------- ----------
0 2 20 7839 2000
0 2 10 7839 2450
0 2 30 7698 6550
0 2 20 7566 6000
0 2 10 7782 1300
0 2 20 7902 800
0 2 10 5000
0 2 30 7839 2850
0 2 20 7788 1100
0 1 10 CLERK 1300
0 1 10 MANAGER 2450
GP_ID GPP_ID DEPTNO JOB MGR SUM(B.SAL)
---------- ---------- ---------- --------- ---------- ----------
0 1 10 PRESIDENT 5000
0 3 10 8750
0 1 20 CLERK 1900
0 1 20 ANALYST 6000
0 1 20 MANAGER 2000
0 3 20 9900
0 1 30 CLERK 950
0 1 30 MANAGER 2850
0 1 30 SALESMAN 5600
0 3 30 9400
21 rows selected.
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16179598/viewspace-662904/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/16179598/viewspace-662904/