GROUP BY 子句
GROUP BY子句主要用于对WHERE中得到的结果进行分组,也就是说它是在WHERE子句之后执行,对经过WHERE筛选后的结果按照某些列进行分组,之后进行相应的处理工作。
Specify the GROUP BY clause if you want the database to group the selected rows based on the value of expr(s) for each row and return a single row of summary information for each group.
当使用聚集函数的时候,除非对整个语句的查询结果集进行聚集运算,否则都要通过指定GROUP BY子句来确定是对某类结果集进行聚集运算。请看下面的例子:
NUM
------
50599
17:11:17 SQL> select count(object_name) num from t where STATUS='VALID'; --这里对返回来的所有结果进行count运算,不会出错。
NUM
-----
50578
17:12:00 SQL> select owner,status,count(object_name) num from t; --这里是想对owner和status进行分组,计算出它们的count(object_name)值,没有使用group by来显示分组,出错。
select owner,status,count(object_name) num from t
*
ERROR at line 1:
ORA-00937: not a single-group group function
17:13:36 SQL> select owner,status,count(object_name) num from t group by owner,status;
OWNER STATUS NUM
---------- ----------- -------
PUBLIC VALID 19968
CTXSYS VALID 339
..............................
NING VALID 1
PUBLIC INVALID 19
ORDSYS VALID 1669
28 rows selected.
17:23:13 SQL> select owner,status,count(object_name) num from t group by owner,status,temporary; --这里多出了表t中的列temporary
OWNER STATUS NUM
---------- ------------ ------
WMSYS VALID 1
SCOTT INVALID 2
..............................
ORDSYS VALID 1669
HR VALID 34
IX VALID 53
35 rows selected.
17:29:06 SQL> select owner,status,count(object_name) num from t group by owner; --GROUP BY子句的内容少了status列
select owner,status,count(object_name) num from t group by owner
*
ERROR at line 1:
ORA-00979: not a GROUP BY expression
可以看到,如果想对某一类结果集进行聚集运算,就必须通过GROUP BY来指定这类结果集,我们还可以看到以下两个特点:
1、 Expressions in the GROUP BY clause can contain any columns of the tables, views, or materialized views in the FROM clause, regardless of whether the columns appear in the select list.
2、 在SELECT列表中,除非是函数(主要是聚集函数)、常量以外,其他所有的列名或者表达式(在GROUP BY中必须也要是表达式的原样而不能是它的别名),都必须要在GROUP BY子句中出现。
3、 The GROUP BY clause groups rows but does not guarantee the order of the result set. To order the groupings, use the ORDER BY clause.
使用GROUP BY子句还有下面一些的限制:
1、 你不能在子句中使用LOB/VARRAYS/NESTED TABLE。
2、 子句中的表达式can be of any form except scalar subquery expressions.
3、 如果GROUP BY中引用了对象类型列,则这个查询就不能使用并行。
HAVING 子句
Use the HAVING clause to restrict the groups of returned rows to those groups for which the specified condition is TRUE. If you omit this clause, then the database returns summary rows for all groups.这里的condition是HAVING子句指定的筛选条件。
HAVING子句依附于GROUP BY子句而出现,也就是它不能单独出现。
18:00:43 2 group by owner,status,temporary having count(object_name)>500;
OWNER S NUM
---------- --------- ----------
SYS VALID 23129
XDB VALID 682
SYSMAN VALID 1321
PUBLIC VALID 19968
MDSYS VALID 859
OLAPSYS VALID 706
ORDSYS VALID 1669
7 rows selected.
18:00:47 SQL> select owner,status s,count(object_name) num from t
18:01:58 2 group by owner,status,temporary having num>500; --可见这里不能使用别名
group by owner,status,temporary having num>500
*
ERROR at line 2:
ORA-00904: "NUM": invalid identifier
GROUP BY子句扩展
1、自动汇总ROLLUP与CUBE扩展
我们先来看看ROLLUP的官方定义:
The ROLLUP operation in the simple_grouping_clause groups the selected rows based on the values of the first n, n-1, n-2, … 0 expressions in the GROUP BY specification, and returns a single row of summary for each group. You can use the ROLLUP operation to produce subtotal values by using it with the SUM function.这里simple_grouping_clause表示GROUP BY子句。
这句话可以这样理解,如果是ROLLUP(A, B, C)的话,首先会对(A、B、C)进行GROUP BY,然后对(A、B)进行GROUP BY,然后是(A)进行GROUP BY,最后对全表进行GROUP BY操作(即GROUP BY(0))。即: ROLLUP操作字,对group by子句的各字段从右到左进行再聚合,逐个减少字段。ROLLUP常用于统计小计。
请看一个例子:
A B C
------ -------- ------
a test1 1
a test1 2
a test2 1
a test2 2
18:28:53 SQL> select a,b,c,sum(c) sc from t group by a,b,c order by 1;
A B C SC
------ -------- ------ ------
a test1 1 1
a test1 2 2
a test2 1 1
a test2 2 2
18:28:57 SQL> select a,b,c,sum(c) sc from t group by rollup(a,b,c) order by 1;
A B C SC
----- --------- ------ ------
a test1 1 1
a test1 2 2
a test1 3 -- A
a test2 1 1
a test2 2 2
a test2 3 -- A
a 6 -- B
6 -- C
8 rows selected.
注意看上面最后一个选择的结果,它多了一些显示记录。很显然,这个A部分是通过对GROUP BY(A,B)而得到的,这个B部分是通过对GROUP BY(A),而C部分是通过GROUP BY(0)而来。如果要分别得到这些值,我们可以对等使用下面的方法得到:
A B SC
------ -------- ------
a test1 3
a test2 3
18:35:30 SQL> select a,sum(c) sc from t group by a order by 1;
A SC
------ --------
a 6
18:35:40 SQL> select sum(c) sc from t order by 1; --相当于GROUP BY(0)
SC
------
6
部分ROLLUP:你也可以使用Rollup包含有限的几个小计,其语法是:Group by A,Rollup(B,C)。这种情况下,group by 条款先计算rollup中的部分,然后加上非rollup的部分,故层次为(A,B,C)(A,B)(A)。这里去除A不看就是GROUP BY(B,C),GROUP BY(B),GROUP BY(0),刚好就是ROLLUP(B,C)部分,然后再加上A,就得到GROUP BY A,ROLLUP(B,C)。