高级分组6

最新推荐文章于 2025-04-01 18:07:55 发布

ckkuc00228

最新推荐文章于 2025-04-01 18:07:55 发布

阅读量90

点赞数

文章标签：数据库

10.2分组扩展功能增强
      10.1节我们只是对分组扩展功能使用了几个简单的例子说明，还足以证明扩展分组功能的强大，这些简单的例子适用于oracle8i以及后续版本，但是在Oracle9i开始，对分组扩展的功能提供了新的可行操作：
      ◆ 在group by子句中可以重复列名。
      ◆ 对复合列进行分组。
◆ 分组连接。
本节内容就是围绕上面3点在9i中引入的分组扩展扩展新功能展开。

10.2.1在group by中使用重复列
      在Oracle8i的时候，在group by中重复列名是不允许的。比如group by中包含了扩展子句(cube,rollup等)，在这些扩展子句内外使用相同的列名是不合法的，比如在oracle8i中这样写：
SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY o.year,ROLLUP (o.year, o.month, r.name);
ERROR at line 6:
ORA-30490: Ambiguous expression in GROUP BY ROLLUP or CUBE list
报分组列名混淆错误。但是在Oracle9i之后，则是正确的，结果如下：


   YEAR MONTH REGION                                     TOTAL
---------- ---------------- ---------------------------------------- ----------------------------------------------
   2000 1月             New England                               1018430
   2000 1月             Mid-Atlantic                               1221394
   2000 1月             Southeast US                               758042
   2000 1月                                                       2997866
   2000 2月             New England                               1231492
   2000 2月             Mid-Atlantic                               857352
   2000 2月             Southeast US                               1236846
   2000 2月                                                       3325690
   2000 3月             New England                               1132966
   2000 3月             Mid-Atlantic                               1274062
   2000 3月             Southeast US                               1311986
   2000 3月                                                       3719014
   2001 1月             New England                               509215
   2001 1月             Mid-Atlantic                               610697
   2001 1月             Southeast US                               379021
   2001 1月                                                       1498933
   2001 2月             New England                               615746
   2001 2月             Mid-Atlantic                               428676
   2001 2月             Southeast US                               618423
   2001 2月                                                       1662845
   2001 3月             New England                               566483
   2001 3月             Mid-Atlantic                               637031
   2001 3月             Southeast US                               655993
   2001 3月                                                       1859507
   2000                                                          10042570
   2001                                                             5021285
   2000                                                          10042570
   2001                                                             5021285

      因为o.year同时出现在group by中和rollup中，所以对每年的所有月份的合计重复统计，见黄色部分，在group by中同名列出现多次没有什么多大用处，只不过为了说明在Oracle9i是允许的。

10.2.2对复合列进行分组
      首先说下复合列在Oracle中的使用，想必复合列的使用大家已经比较熟悉了。常见的复合列是在子查询和group by使用。多个列以括号包括，逗号分隔，这样的一个组被当作一个整体。比如下面是一个子查询的例子：
select id,name,trade_id from product
where (name,trade_id) in
(('易达卡',2),('畅听卡',2));

      ID NAME       TRADE_ID
---------- ---------- ------------------------------
      55 易达卡             2
      58 畅听卡             2
      可以看出，查询了name,trade_id分别对应('易达卡',2)和('畅听卡',2)的值。
      下面看下在单独的group by中使用复合列。例如：
      select id,name,trade_id,count(*) from product
group by (id,name),trade_id;

上面的语句就是先找id,name相同的放在一起，然后再找trade_id相同的放在一组，这里的括号相同于优先级。

在Oracle8i中，分组只支持列的独立分组，不支持复合列的分组，Oracle9i开始支持复合列的分组，也就是上面所说的多个列用括号包括，中间短号，括号内的列相当于一个整体，就像一个列一样，不可拆开。比如Oracle8i只允许rollup(a,b,c),但是oracle9i允许rollup(a,(b,c)),那么将(b,c)作为一个列参与分组计算。增强了分组的功能，可以过滤某些结果。如下例：
SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY ROLLUP ((o.year, o.month),r.name);


   YEAR MONTH          REGION                                     TOTAL
---------- ---------------- ---------------------------------------- ----------
   2000 1月             New England                               1018430
   2000 1月             Mid-Atlantic                               1221394
   2000 1月             Southeast US                               758042
   2000 1月                                                       2997866
   2000 2月             New England                               1231492
   2000 2月             Mid-Atlantic                               857352
   2000 2月             Southeast US                               1236846
   2000 2月                                                       3325690
   2000 3月             New England                               1132966
   2000 3月             Mid-Atlantic                               1274062
   2000 3月             Southeast US                               1311986
   2000 3月                                                       3719014
   2001 1月             New England                               509215
   2001 1月             Mid-Atlantic                               610697
   2001 1月             Southeast US                               379021
   2001 1月                                                       1498933
   2001 2月             New England                               615746
   2001 2月             Mid-Atlantic                               428676
   2001 2月             Southeast US                               618423
   2001 2月                                                       1662845
   2001 3月             New England                               566483
   2001 3月             Mid-Atlantic                               637031
   2001 3月             Southeast US                               655993
   2001 3月                                                       1859507
                                                                     15063855



由上面的结果分析，GROUP BY ROLLUP ((o.year, o.month),r.name)相当于group by (o.year,o.month),r.name、group by (o.year,o.month),null和group by null,null三个的union结果。与GROUP BY ROLLUP (o.year, o.month,r.name)少了group by rollup o.year,null,null。按年计算所有月份的小计。因为(o.year,o.month)是复合列，相当于一个列，不可分隔。
      虽然Oracle8i未提供复合列分组的功能，但是我们可以用复合列的思想，将两个列通过连接操作，变成一个列就可以了，当然在select显示的时候必须与group by中的一致。
下面是oracle8i的实现：
      SELECT TO_CHAR(o.year)||' '||TO_CHAR(TO_DATE(o.month,'MM'),'Month')
   Year_Month,
   r.name region, SUM(o.tot_sales)
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY
ROLLUP (TO_CHAR(o.year)||' '||
      TO_CHAR(TO_DATE(o.month,'MM'),'Month'), r.name);
      结果与oracle9i的例子一致，只不过year和month通过连接操作符变成一个列显示而已。
      group by中使用复合列的好处在于过滤某些本来有的结果。比如上例就过滤了按年计算所有月份的小计值,但是保留了最后的合计值。如果使用前面说的局部rollup:group by o.year,rollup(o.month,r.name)则会丢掉最后的合计值，而且包含了按年统计所有月份的小计值，总之，应该根据具体需求，选择合适的方案。
10.2.3级联分组
      在Oracle9i及后续版本，允许在group by子句后面使用多个ROLLUP,CUBE,GROUPING SETS操作，这在Oracle8i中是不允许的，在Oracle8i中值允许使用其中的一个。
      下面研究一下rollup级联分组。
比如在Oracle8i中查询如下语句：

SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY ROLLUP (o.year, o.month), ROLLUP(r.name);

那么报错：ERROR at line 6:
ORA-30489: Cannot have more than one rollup/cube expression list
在Oracle9i中运行结果为：

   YEAR MONTH REGION                                     TOTAL
---------- ---------------- ---------------------------------------- -----------------------------------------------
                        New England                               5074332
                        Mid-Atlantic                               5029212
                        Southeast US                               4960311
                                                                     15063855
   2000                New England                               3382888
   2000                Mid-Atlantic                               3352808
   2000                Southeast US                               3306874
   2000                                                          10042570
   2000 1月             New England                               1018430
   2000 1月             Mid-Atlantic                               1221394
   2000 1月             Southeast US                               758042
   2000 1月                                                       2997866
   2000 2月             New England                               1231492
   2000 2月             Mid-Atlantic                               857352
   2000 2月             Southeast US                               1236846
   2000 2月                                                       3325690
   2000 3月             New England                               1132966
   2000 3月             Mid-Atlantic                               1274062
   2000 3月             Southeast US                               1311986
   2000 3月                                                       3719014
   2001                New England                               1691444
   2001                Mid-Atlantic                               1676404
   2001                Southeast US                               1653437
   2001                                                             5021285
   2001 1月             New England                               509215
   2001 1月             Mid-Atlantic                               610697
   2001 1月             Southeast US                               379021
   2001 1月                                                       1498933
   2001 2月             New England                               615746
   2001 2月             Mid-Atlantic                               428676
   2001 2月             Southeast US                               618423
   2001 2月                                                       1662845
   2001 3月             New England                               566483
   2001 3月             Mid-Atlantic                               637031
   2001 3月             Southeast US                               655993
   2001 3月                                                       1859507



GROUP BY ROLLUP (o.year, o.month), ROLLUP(r.name)等价于：
GROUP BY o.year,o.month,r.name
GROUP BY o.year,null,r.name
GROUP BY null,null,r.name
GROUP BY null,null,null
GROUP BY o.year,o.month,null
GROUP BY o.year,null,null
的union all的结果。提供了比rollup(o.year,o.month,r.name)更多的分组功能，多了一个GROUP BY o.year,null,r.name。

      当在group by中使用多个rollup,cube,grouping sets操作的时候，我们叫这样的分组为级联分组。级联分组的结果就是产生每个单独分组操作的交叉列表，但是要符合每个操作的处理步骤，比如上面的GROUP BY ROLLUP (o.year, o.month), ROLLUP(r.name)操作，第1个rollup(o.year,o.month)的执行顺序是group by o.year,o.monthgroup by o.year,nullgroup by null,null。第2个rollup(r.name)的执行顺序是group by r.namegroup by null。因此才出现上述的6种分组结果：分组的结果为3*2=6,而没有group by null,o.month,r.name和group by null,o.month,null这两种结果。

      根据上述的规则，我们将rollup(o.year,o.month)改为cube(o.year,o.month),那么cube的计算种类是4中，后面的rollup(r.name)为两种，那么这种级联分组应该有相当于group by的分类为4*2=8种，比rollup(o.year,o.month),rollup(r.name)多了group by null,o.month,r.name和group by null,o.month,null这两种结果。我们看下结果：

      SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY cube (o.year, o.month), ROLLUP(r.name);

   YEAR MONTH          REGION                                     TOTAL
---------- ---------------- ---------------------------------------- ----------
   2000 1月             New England                               1018430
   2001 1月             New England                               509215
   2000 1月             Mid-Atlantic                               1221394
   2001 1月             Mid-Atlantic                               610697
   2000 1月             Southeast US                               758042
   2001 1月             Southeast US                               379021
   2000 2月             New England                               1231492
   2001 2月             New England                               615746
   2000 2月             Mid-Atlantic                               857352
   2001 2月             Mid-Atlantic                               428676
   2000 2月             Southeast US                               1236846
   2001 2月             Southeast US                               618423
   2000 3月             New England                               1132966
   2001 3月             New England                               566483
   2000 3月             Mid-Atlantic                               1274062
   2001 3月             Mid-Atlantic                               637031
   2000 3月             Southeast US                               1311986
   2001 3月             Southeast US                               655993
         1月             New England                               1527645
      1月             Mid-Atlantic                               1832091
         1月             Southeast US                               1137063
         2月             New England                               1847238
      2月             Mid-Atlantic                               1286028
         2月             Southeast US                               1855269
         3月             New England                               1699449
      3月             Mid-Atlantic                               1911093
         3月             Southeast US                               1967979
   2000                New England                               3382888
   2001                New England                               1691444
                        New England                               5074332
   2000                Mid-Atlantic                               3352808
   2001                Mid-Atlantic                               1676404
                        Mid-Atlantic                               5029212
   2000                Southeast US                               3306874
   2001                Southeast US                               1653437
                        Southeast US                               4960311
      1月                                                       4496799
      2月                                                       4988535
      3月                                                       5578521
   2000                                                          10042570
   2001                                                             5021285
                                                                     15063855
   2000 1月                                                       2997866
   2001 1月                                                       1498933
   2000 2月                                                       3325690
   2001 2月                                                       1662845
   2000 3月                                                       3719014
   2001 3月                                                       1859507


      分析结果：黄色部分就是与使用rollup的时候多出的结果，正好是group by null,o.month,r.name和group by null,o.month,null的结果，其他与rollup(o.year,o.month),rollup(r.name)完全一致。


      下面研究一下CUBE级联分组。
      有了级联分组，可以将rollup与cube之间相互转换，比如：
      SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY ROLLUP(o.year),  ROLLUP (o.month), ROLLUP (r.name);

3个单列的rollup级联分组，每个分组有2种，那么总共有2^3=8种，正好相当于下列cube运算的结果：
SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY CUBE (o.year, o.month, r.name);

      再来考虑一个问题，如果对于cube级联分组，会出现什么情况下，我们知道cube是对所有可能性进行分组，有n个列，分组的种类是2^n个。那么cube级联分组不管怎么拆分，实际上和单独的cube所有列的结果是一致的。比如下列语句和上面的一致(简写如下)：
GROUP BY CUBE (o.year, o.month), CUBE (r.name);
GROUP BY CUBE (o.year), CUBE (o.month, r.name);
GROUP BY CUBE (o.year), CUBE (o.month), CUBE (r.name);

都是有8种分组方法，当然如果和其它的比如rollup联合起来，cube拆开与不拆开的结果也是一致的。

10.2.3.1 GROUPING SETS级联分组
      Grouping Sets的级联分组很有用，因为可以知道，grouping sets分组只是对单列分别进行小计统计，比如有n列就是分别对这个n列进行单列小计，有n种结果。但是当我们需要使用Grouping sets获得复合列的小计的时候，那么单独的grouping sets分组就没有办法了，但是可以使用级联grouping sets操作。同rollup和cube的级联分组一样，比如grouping sets(a,b),grouping sets(c,d)那么有2*2=4种统计方法：分别是group by(a,c)、
group by(a,d)、group by(b,c)和group by(b,c)，即列的交叉分组。如：
      SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY GROUPING SETS (o.year, o.month), GROUPING SETS (r.name);



   YEAR MONTH          REGION                                     TOTAL
--------- ---------------- ---------------------------------------- ----------
      3月             Mid-Atlantic                               1911093
      1月             New England                               1527645
      2月             Southeast US                               1855269
      2月             Mid-Atlantic                               1286028
      1月             Southeast US                               1137063
      1月             Mid-Atlantic                               1832091
      3月             New England                               1699449
      3月             Southeast US                               1967979
      2月             New England                               1847238
   2000                Mid-Atlantic                               3352808
   2001                New England                               1691444
   2000                Southeast US                               3306874
   2001                Southeast US                               1653437
   2000                New England                               3382888
   2001                Mid-Atlantic                               1676404

      上面的黄色部分就是对(o.month,r.name)的分组结果，红色部分是(o.year,r.name)的分组结果，有两个复合列的分组统计。

      下面我们对上面的例子，增加一个列，看看结果：
      SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY GROUPING SETS (o.year, o.month), GROUPING SETS (o.year, r. name);

   YEAR MONTH          REGION                                     TOTAL
---------- ---------------- ---------------------------------------- ----------
      3月             Mid-Atlantic                               1911093
         1月             New England                               1527645
         2月             Southeast US                               1855269
      2月             Mid-Atlantic                               1286028
         1月             Southeast US                               1137063
      1月             Mid-Atlantic                               1832091
         3月             New England                               1699449
         3月             Southeast US                               1967979
         2月             New England                               1847238
   2000                Mid-Atlantic                               3352808
   2001                New England                               1691444
   2000                Southeast US                               3306874
   2001                Southeast US                               1653437
   2000                New England                               3382888
   2001                Mid-Atlantic                               1676404
   2000 1月                                                       2997866
   2000 2月                                                       3325690
   2000 3月                                                       3719014
   2000                                                          10042570
   2001 1月                                                       1498933
   2001 2月                                                       1662845
   2001 3月                                                       1859507
   2001                                                             5021285

      GROUPING SETS (o.year, o.month), GROUPING SETS (o.year, r. name)相当于group by(o.year,o.year)等价于group by o.year,group by(o.year,r.name),group by(o.month,o.year),group by(o.month,r.name)对应上面4个区域的结果。

      其实，因为最终的结果可以转化为对应的group by分组的union结果，而group by分组和列的顺序没有关系，因此级联grouping sets和列的顺序也没有关系，只要分组的种类一致即可，比如上面的等价于：
      GROUPING SETS (o.year, r.name), GROUPING SETS (o.year, o.month);
GROUPING SETS (o.month, o.year), GROUPING SETS (r.name, o.year);

      另外，在一个group by中可以同时使用grouping sets,cube,rollup，比如：
      SELECT o.year, TO_CHAR(TO_DATE(o.month, 'MM'), 'Month') month,
r.name region, SUM(o.tot_sales) total
FROM all_orders o JOIN region r
ON r.region_id = o.region_id
WHERE o.month BETWEEN 1 AND 3
GROUP BY GROUPING SETS (o.month, o.year), ROLLUP(r.name), CUBE (o.year);
上面的有8种统计结果，但是这样的语句其实一般意义不大，如果需要这样复杂的统计，需要仔细分析。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/24984814/viewspace-704466/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/24984814/viewspace-704466/