《Pro Oracle SQL》Chapter7 Advanced Grouping -- 7.1Basic GROUP BY Usage

caohong286

于 2011-11-09 23:31:52 发布

阅读量124

点赞数

分类专栏： Pro Oracle SQL Chapter 7 文章标签： Pro Oracle SQL Oracle SQL Apress DBA

Pro Oracle SQL Chapter 7 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Chapter 7 Advanced Grouping 第七章高级分组
   Jared Still    (page 213)

    The GROUP BY clause is a venerable member of the SQL statement family. After learning basic SELECT statements, it is one of first specialized parts of SQL that many practitioners cut their teeth on when learning to create aggregations from raw data and transform that data into useful information.
Before I get too deeply into the use of GROUP BY, this would be a good time provide some information
about the test environment. All SQL in this chapter was prepared using the Oracle 11.2.0.1 64 bit version on Windows 7.   The database account I used has DBA privileges. Using an account with DBA privileges simplifies the use of dbms_xplan.display_cursor, using data from other accounts, and selecting data from system views. I would recommend that if you do the same, do so in a database strictly used for testing.
    GROUP BY子句是SQL语句家族中一位多才多艺的成员。在学习了基本的SELECT语句之后，它是SQL第一个特殊的部分，许多实践者在学习从源数据创建聚集和转换数据成有用的信息等上面摔过跟头。在我深入讲解GROUP BY的使用之前，现在是提供一些测试环境的信息的好时机。在本章中所有的SQL是在Oracle 11.2.0.1 64位版，windows 7上准备的。我所使用的数据库账户有DBA权限。使用一个带有DBA权限的账户将简化dbms_xplan.display_cursor的使用，从其他账户获取数据，以及从系统视图选择数据。我强烈推荐你做的和我一样，在用于测试数据库中严格的一致。

    Much of the test data is provided by Oracle and can be setup on your database by running the
scripts found in ORACLE_HOME/demo/schema and ORACLE_HOME/sqlplus/demo directories. The familiar SCOTT schema is included in the demo directory, along with the HR , SH, and other demo accounts. Where it makes sense to create test data to more clearly explain concepts, I will do so. For more complex examples, the built-in test data may be used. At times the examples may seem trivial. These examples are constructed with the purpose of demonstrating the results of different facets of the GROUP BY clause, without any requirement to focus needlessly on the values in the output. While there are many excellent examples based on financial data throughout the Oracle documentation, these examples are sometimes difficult to follow as too much attention is focused on the output values rather than how they were obtained.
    很多测试数据是Oracle提供的，且通过运行在ORACLE_HOME/demo/schema和ORACLE_HOME/sqlplus/demo目录下的脚本能安装在你的数据库中。熟悉的SCOTT账户包含在demo目录中，还有HR,SH,和其他演示账户。如果某处需要创建测试数据更利于清晰的解释概念，我将这样做。对于更复杂的例子，可能使用到内建的(自带的)测试数据。很多时候例子们可能看上去繁琐。这些例子是基于演示GOURP BY子句不同方面的目的而构建的，而不需要不必要的关注于输出值。纵观Oracle文档有许多优秀的基于财务数据的例子，这些例子有时候很难理解 (follow)因为太多的注意力集中在输出值上了而不是它们是怎样获得的。

    The execution plans seen in the SQL examples were generated by the script showplan_last.sql,
which is using the DBMS_XPLAN.DISPLAY_CURSOR procedure to generate the output. The code for that script is seen in Listing 7-1. You may have noticed that most of the SQL statements shown include the hint /*+ gather_plan_statistics */. This tells the optimizer to gather statistics for use by DBMS_XPLAN. The results of execution plans may be edited to fit the page, so the results may appear somewhat different when you execute the showplan_last.sql script.
     SQL例子的执行计划是由脚本showplan_last.sql生成的。它底层就是用DBMS_XPLAN.DISPLAY_CURSOR过程产生的。列表 7-1展示了脚本的代码。你可能注意到展示的大部分SQL语句带有提示/*+gather_plan_statistics */。它告诉优化器用DBMS_XPLAN收集统计信息.执行计划的结果可能被编辑过以适于页面排版，这样可能有些不同于你执行 showplan_last.sql脚本所生成的结果。
Listing 7-1. showplan_last.sql
-- showplan_last.sql
set pause off
set verify off
set trimspool on
set line 200 arraysize 1
clear break
clear compute
-- serveroutput must be OFF for dbms_xplan.display_cursor to work.
-- but do not turn it off here, or the SET statemeent will be the 'last' cursor
select * from table(dbms_xplan.display_cursor( null,null,'TYPICAL LAST'));

    Now, let’s learn about GROUP BY before moving on to its advanced functions.
    现在，在开始高级功能之前我们先学习GROUP BY.

Basic GROUP BY Usage      GROUP BY的基本用法
    If you needed to know the number of employees in each department of your company, you might use SQL such as that in Listing 7-2 because it will produce one row of output for each row in the DEPT table plus a count of the employees from each department. The output includes the OPERATIONS department, which does not have any employees. This row would not have appeared in the output from a standard JOIN, so the LEFTOUTER JOIN statement was used to include rows from the DEPT table that did not have any matching rows in the EMP table.
    如果你需要知道你公司每个部门的员工数目，你可能用到像列表7-2一样的SQL，它将输出DEPT表中每行和每个部门的员工小计。输出包括 OPERATIONS部门，该部门没有员工。如果使用标准的JOIN，这行将不会出现在结果中，所以就使用LEFTOUTER JOIN语句来包含来自DEPT表中的但是不匹配EMP表任何行的记录。
Listing 7-2. Basic GROUP BY
1 select d.dname, count(empno) empcount
2 from scott.dept d
3 left outer join scott.emp e on d.deptno = e.deptno
4 group by d.dname
5 order by d.dname;

DNAME                EMPCOUNT
--------------            ----------
ACCOUNTING              3
OPERATIONS             0
RESEARCH                  5
SALES                           6

4 rows selected.

    The columns used in the GROUP BY must match the set of columns in the SELECT statement upon
which no aggregation functions are used. In Listing 7-2, for example, there are two columns in the
SELECT list, deptno and empno. The COUNT() function is used to perform aggregation on the EMPNO
column so that the total number of employees in each department can be determined. The only other
column in the SELECT list, deptno, must then be included in the GROUP BY clause.
Failure to include the correct columns will result in an error condition as seen in Listing 7-3.
    GROUP BY 中使用的列必须匹配SELECT语句中的没有使用聚合函数的列集。例如，列表7-2中在SELECT中有两列，deptno和empno。COUNT() 函数用于执行在EMPNO列上的聚合，这样每个部门的员工总数就能求得。只有当SELECT中的其他列,deptno,也必须被包含于GROUP BY子句中(语句才能运行OK)。(在GROUP BY子句中)没有包含正确的列将导致错误发生，如列表7-3的情况。
Listing 7-3. GROUP BY Columns Requirement
1 select d.dname, count(empno) empcount
2 from scott.emp e
3 join scott.dept d on d.deptno = e.deptno
4 order by d.dname;
select d.dname, count(empno) empcount
       *
ERROR at line 1:
ORA-00937: not a single-group group function

    There is a very important point you need to understand about GROUP BY: although the output of a
SELECT statement that includes a GROUP BY clause may always appear to be sorted, you cannot expect GROUP BY to always return your data in sorted order. If the output must be sorted, you must use an ORDER BY clause. This has always been the case with Oracle, and this behavior has been documented since at least Oracle 7.0.
     关于GROUP BY有非常重要的一点：尽管包含GROUP BY子句的SELECT语句的输出结果看起来好像总是排过序的，你不能期望GROUP BY总是返回你排过序的数据。如果输出必须排序，你必须使用ORDER BY子句。这在ORACLE中是一定的，这个行为自少从Oracle 7.0就文档化了。
    While the sorting behavior of GROUP BY is not specifically mentioned in the Oracle 7 documentation,
there was little room for doubt when the 9i documentation was published, which specifically states that GROUP BY does not guarantee the order of the result set.
    尽管GROUP BY的排序行为没有在Oracle7的文档中特别提及，在9i发布的文档中还是有些怀疑的地方，它特别声明GROUP BY不保证结果集的顺序。
    Listing 7-4 provides a good example of GROUP BY not returning results in sorted order. Notice that
the data is not sorted. The only way to guarantee sorted data is by including the ORDER BY clause, which must follow the GROUP BY clause.
    列表7-4提供了GROUP BY没有返回排序结果的好例子。注意数据没有排序。保证得到排序数据的唯一方法是包含ORDER BY子句，它必须跟在GROUP BY子句之后。
Listing 7-4. GROUP BY Not Sorted
1 select deptno, count(*)
2 from emp
3 group by deptno;

    DEPTNO   COUNT(*)
---------- ----------
        30          6
        20          5
        10          3

3 rows selected.
    The GROUP BY clause may just be one of the most under-appreciated workhorses of all the SELECT
clauses. It is quite easy to take it for granted, as once you understand how to include it in a SELECT
statement, it is quite easy to use. Perhaps a better appreciation for just how much work it does (and how much work it saves you from doing) can be gained by trying to write the SELECT statement in Figure 7-2 without using the GROUP BY clause. There are likely many different methods by which this can be done.
    GROUP BY子句可能是SELECT子句中最不受重视的“吃苦耐劳者”。它太平常了，一旦你理解怎么把它包含在SELECT语句中，它非常易用。可能更好的评价来至于它做了多少工作(和它为你省了多少事)，通过重写列表7-2中的SELECT语句就能体会的到。
    Think for just a moment how you might go about writing that SELECT statement. One such attempt
was made by your intrepid author, and this attempt can be seen in Listing 7-5. This is not SQL that
most people would care to maintain. As you can see, it does create nearly the same output as that found in Listing 7-2. In addition to being somewhat convoluted, you must ask yourself, “What will happen when a new department is added to the DEPT table?”
    先想一会，你将怎么重写那条SELECT语句。你勇猛无畏的作者做了尝试，在列表7-5中能看到它。这不是大多数人想要维护的SQL。你能看出，它创建了和列表7-2几乎一样的输出。这里还是有点绕(复杂的)，你必须问自己“当一个新的部门加入到DEPT表了将会发生什么？”
    The answer to that question, of course, is that you will then need to modify the SQL statement in
Listing 7-5 to accommodate the change in the data. While it would be possible use dynamic SQL to
duplicate the functionality of the SQL to cope with changes to the DEPT table data, doing so would create a
piece of SQL that is even more difficult to follow and even harder to maintain.
    问题的答案，当然是，你必须修改列表7-5中的SQL语句适应数据的变化。虽然它可能使用动态SQL仿制SQL功能处理DEPT表数据变化，这样做将创建SQL片断(来拼凑完整的SQL)，将非常难于理解且更难于维护。
Listing 7-5. Convoluted SQL
1 select /*+ gather_plan_statistics */
2 distinct dname, decode(
3     d.deptno,
4     10, (select count(*) from emp where deptno= 10),
5     20, (select count(*) from emp where deptno= 20),
6     30, (select count(*) from emp where deptno= 30),
7     (select count(*) from emp where deptno not in (10,20,30))
8 ) dept_count
9 from (select distinct deptno from emp) d
10 join dept d2 on d2.deptno = d.deptno;

DNAME          DEPT_COUNT
-------------- ----------
SALES                   6
ACCOUNTING              3
RESEARCH                5

3 rows selected.

22:19:51 MORIARTY - jkstill@jks1 SQL> @showplan_last

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
| Id | Operation                                                       | Name    | Rows   | Bytes | Cost (%CPU) | Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                 |                 |          |            |     7 (100)      |         |
|   1 |   SORT AGGREGATE                                    |        |     1    |     3    |           |          |
|* 2 |      TABLE ACCESS FULL                            | EMP        |     5    |    15    |     3   (0)       | 00:00:01 |
|   3 |          SORT AGGREGATE                            |           |     1    |     3     |           |            |
|* 4 |             TABLE ACCESS FULL                       | EMP    |     5    |    15    |     3   (0)       | 00:00:01 |
|   5 |                SORT AGGREGATE                     |                  |     1    |     3     |            |         |
|* 6 |                   TABLE ACCESS FULL         | EMP    |     5    |    15    |     3   (0)       | 00:00:01 |
|   7 |                      SORT AGGREGATE               |        |     1     |     3    |              |           |
|* 8 |                         TABLE ACCESS FULL        | EMP          |     4 |    12    |     3   (0)      | 00:00:01 |
|   9 |    HASH UNIQUE                                            |                |     9     |   144   |     7 (29)     | 00:00:01 |
| 10 |      MERGE JOIN                                           |                 |    14    |   224   |     6 (17)    | 00:00:01 |
| 11 |         TABLE ACCESS BY INDEX ROWID    | DEPT        |     4   |    52    |     2   (0)     | 00:00:01 |
| 12 |         INDEX FULL SCAN                                | PK_DEPT |     4     |      |     1   (0)      | 00:00:01 |
|* 13 |     SORT JOIN                                                |        |    14 |    42 |     4 (25)    | 00:00:01 |
| 14 |        TABLE ACCESS FULL                            | EMP          |    14     |    42 |     3   (0)     | 00:00:01 |
-----------------------------------------------------------------------------------------
    In addition to greatly simplifying the SQL that must be written, the GROUP BY clause eliminates
unnecessary IO in the database. Take another look at Listing 7-5. You will see that a full table scan was performed on the EMP table five times. If you think that seems rather excessive, you are on the right track. Listing 7-6 shows the same SQL as executed in Listing 7-2, this time including the execution plan statistics. There is still a full table scan taking place against the EMP table, but only once—not five times as in the convoluted SQL in Listing 7-5.
    除了极大的简化SQL的书写，GROUP BY子句消除了不必要的数据库IO操作。再看一下列表7-5.你将发现在EMP表上执行了5次全表扫描。如果你认为那是相当的过量的话，你就开窍了。列表 7-6展示了列表7-2一样的SQL，这次包含执行计划统计。在EMP表上还是发生了一次全表扫描，但是只有一次，不是列表7-5那样的五次。
Listing 7-6. GROUP BY Execution Plan
1 select /*+ gather_plan_statistics */
2     d.dname
3     , count(empno) empcount
4 from scott.emp e
5 join scott.dept d on d.deptno = e.deptno
6 group by d.dname
7 order by d.dname;
DNAME            EMPCOUNT
-------------- ----------
ACCOUNTING              3
RESEARCH                5
SALES                   6
3 rows selected.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
| Id | Operation                                     | Name        | Starts | E-Rows | A-Rows |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                 |      1   |            |      3        |
|   1 |   SORT GROUP BY                                |                 |      1   |      4    |      3       |
|   2 | MERGE JOIN                                         |                 |      1   |     14    |     14    |
|   3 |         TABLE ACCESS BY INDEX ROWID    | DEPT      |      1   |      4    |      4      |
|   4 |            INDEX FULL SCAN                          | PK_DEPT |      1 |      4    |      4      |
|* 5 |         SORT JOIN                                         |                  |      4 |     14    |     14     |
|   6 |            TABLE ACCESS FULL                      | EMP         |      1 |     14    |     14    |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("D"."DEPTNO"="E"."DEPTNO")
       filter("D"."DEPTNO"="E"."DEPTNO")

caohong286

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
《Pro Oracle SQL》Chapter7 Advanced Grouping -- 7.1Basic GROUP BY Usage

Chapter 7 Advanced Grouping 第七章高级分组 Jared Still (page 213) The GROUP BY clause is a venerable member of the SQL statement family. After learning basic SELECT statements, it is one o...
复制链接

扫一扫

专栏目录