oracle cube 交叉分组,oracle学习笔记７: 高级分组

最新推荐文章于 2021-04-16 05:29:53 发布

108杯美式

最新推荐文章于 2021-04-16 05:29:53 发布

阅读量248

点赞数

文章标签： oracle cube 交叉分组

group by 用来在原始数据上创建聚合来将数据转化为有用的信息。

基本的group by 列出个个部门的名称，员工总数

select d.dname, count(empno) empcount

from scott.dept d

left outer join scott.emp e

on d.deptno = e.deptno

group by d.dname

order by d.dname;

select列表中的每一列必须包含在group by子句中。如果没有则会导致错误。如：

SQL> select d.dname, d.loc, count(empno) empcount

2 from scott.emp e

3 join scott.dept d

4 on d.deptno = e.deptno

5 group by d.dname;

select d.dname, d.loc, count(empno) empcount

第 1 行出现错误:

ORA-00979: 不是 GROUP BY 表达式

尽管包含group by子句的select语句输出看上去是按顺序列出的，你不能期待group by每次都排好序地返回数据，如果输出结果必须按照一定的顺序排列，则必须使用order by子句。

--没有排序的group by

select deptno,count(*)

from emp

group by deptno;

--复杂的sql

set serveroutput off;

--复杂的sql

select /* lst7-4 */

distinct dname,

decode(d.deptno,

(select count(*) from emp where deptno = 1),

(select count(*) from emp where deptno = 2),

(select count(*) from emp where deptno = 3),

(select count(*) from emp where deptno not in (1, 2, 3))) dept_count

from (select distinct deptno from emp) d

join dept d2

on d.deptno = d.deptno;

@E:\bjc2016\study\pln lst7-4

上面的写法，会使SQL语句更加复杂难以理解并且难以维护。group by子句极大的简化必须写的sql语句以外，还消除了数据库不必要的IO。

set serveroutput off;

--复杂的sql

select /* lst7-5 */

distinct dname,

count(empno) empcount

from emp e

join dept d

on d.deptno = d.deptno

group by d.dname

order by d.dname;

@E:\bjc2016\study\pln lst7-5

group by 优点：

使sql语句更具有可读性

书写起来比使用很多相关子查询更简单

减少了重复访问同一个数据块的次数，从而得到更好的性能。

在分组之后，还在数据集上应用了having子句。另一方面，在获取数据行之后，进行分组之前，应用了where子句。having 子句中可以使用运算，函数及子查询。

SQL> --having子句

SQL> select /* lst7-6 */

2 d.dname, trunc(e.hiredate, 'yyyy') hiredate, count(empno) empcount

3 from emp e

4 join dept d

5 on e.deptno = e.deptno

6 group by d.dname, trunc(e.hiredate, 'yyyy')

7 having count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select min(hiredate)

8 from scott.emp) and (select max(hiredate)

9 from scott.emp)

10 order by d.dname;

未选定行

SQL> @E:\bjc2016\study\pln lst7-6

原值 8: WHERE UPPER(SQL_TEXT) LIKE '%&1%'

新值 8: WHERE UPPER(SQL_TEXT) LIKE '%lst7-6%'

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

SQL_ID 0wcfknkztdxqt, child number 0

-------------------------------------

select /* lst7-6 */ d.dname, trunc(e.hiredate, 'yyyy') hiredate,

count(empno) empcount from emp e join dept d on e.deptno =

e.deptno group by d.dname, trunc(e.hiredate, 'yyyy') having

count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select

min(hiredate)

from scott.emp) and (select max(hiredate)

from

scott.emp) order by d.dname

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 239717969

---------------------------------------------------------------------------

---------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | | | | |

|* 1 | FILTER | | | | | |

| 2 | SORT GROUP BY | | 1 | 2048 | 2048 | 2048 (0)|

| 3 | MERGE JOIN CARTESIAN| | 40 | | | |

| 4 | TABLE ACCESS FULL | DEPT | 4 | | | |

| 5 | BUFFER SORT | | 10 | 2048 | 2048 | 2048 (0)|

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

| 6 | TABLE ACCESS FULL | EMP | 10 | | | |

| 7 | SORT AGGREGATE | | 1 | | | |

| 8 | TABLE ACCESS FULL | EMP | 10 | | | |

| 9 | SORT AGGREGATE | | 1 | | | |

| 10 | TABLE ACCESS FULL | EMP | 10 | | | |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter((COUNT(*)<=5 AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

fmyyyy')>= AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'fmyyyy')<=))

Note

-----

- Warning: basic plan statistics not available. These are only collected when

* hint 'gather_plan_statistics' is used for the statement or

* parameter 'statistics_level' is set to 'ALL', at session or system leve

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

已选择41行。

SQL>

group by的cube扩展

当与group by子句一起使用时，将会使得对每一行都要考虑包含在cube的参数中的所有可能的元素组合。这个运算将会生成比表中实际存在的行数更多的数据行。

-- hr.emplyees表的cube运算

select last_name, first_name

from hr.employees

group by first_name, last_name;

set autotrace off;

set autotrace on statistics;

with emps as

(select /* lst-7 */

last_name, first_name

from hr.employees

group by cube(last_name, first_name))

select rownum, last_name, first_name from emps;

对于每一对last_name,first_name，cube将会按顺序为每个元素替换为null值。cube生成的数据行在Oracle文档中称为超级聚合行，可以在运算列中加入null值来识别。

SQL> set autotrace off;

SQL> --预测cube返回行数

SQL> with counts as

2 (select count(distinct first_name) first_name_count,

3 count(distinct last_name) last_name_count,

4 count(distinct(first_name || last_name)) full_name_count

5 from hr.employees)

6 select first_name_count,

7 last_name_count,

8 full_name_count,

9 first_name_count + last_name_count + full_name_count + 1 total_count

10 from counts;

FIRST_NAME_COUNT LAST_NAME_COUNT FULL_NAME_COUNT TOTAL_COUNT

---------------- --------------- --------------- -----------

91 102 107 301

下面用ＳＱＬ语句模拟cube，可以看出cube为我们节省了许多力气。

--用union all生成cube数据行

with emps as (

select last_name,first_name from hr.employees

mycube as (

select last_name,first_name from emps

union all

select last_name,null first_name from emps

union all

select null last_name,first_name from emps

union all

select null last_name,null first_name from emps

)

select /*+ gather_plan_statistics */ *

from mycube group by last_name,first_name;

cube实际应用

sales_history模式中包含1998~2001年的销售数据。

下面的ＳＱＬ展示2001年的所有销售数据。并想要查看各个产品种类的销售情况汇总，包含基于10年消费者年龄段，收入水平的聚合；按照收入水平而不考虑年龄的汇总；以及按年龄而不考虑收入水平的聚合。

--销售数据的union all查询

with tsales as

(select /* lst7-10 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001)),

gb as

(select --Q1 - 所有分类通过收入和年龄范围

'Q1' query_tag,

prod_category,

cust_income_level,

age_range,

sum(profit) profit

from tsales

group by prod_category, cust_income_level, age_range

union all

select --Q2 - 所有分类通过年龄范围

'Q2' query_tag,

prod_category,

'ALL INCOME' cust_income_level,

age_range,

sum(profit) profit

from tsales

group by prod_category, 'ALL INCOME', age_range

union all

select --Q3 - 所有分类通过收入

'Q3' query_tag,

prod_category,

cust_income_level,

'ALL AGE' age_range,

sum(profit) profit

from tsales

group by prod_category, cust_income_level, 'ALL AGE'

union all

select --Q4 - 所有分类

'Q4' query_tag,

prod_category,

'ALL INCOME' cust_income_level,

'ALL AGE' age_range,

sum(profit) profit

from tsales

group by prod_category, 'ALL INCOME', 'ALL AGE'

)

select * from gb order by prod_category, profit;

【语法】NVL (expr1, expr2)

【功能】若expr1为NULL，返回expr2；expr1不为NULL，返回expr1。

注意两者的类型要一致

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level

若c.cust_income_level为null,则返回'A: below 30,000'

mod(x,y)

【功能】返回x除以y的余数

【参数】x,y，数字型表达式

【返回】数字

【示例】

select mod(23,8),mod(24,8) from dual;

返回：7,0

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' || to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range

上面是求年龄段，如果56，则求出的范围为50_60

--用cube代替union all

with tsales as

(select /* lst7-11 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || decode(cust_income_level,

null,

decode(age_range, null, 4, 3),

decode(age_range, null, 2, 1)) query_tag,

prod_category,

cust_income_level,

age_range,

sum(profit) profit

from tsales

group by prod_category, cube(cust_income_level, age_range)

order by prod_category, profit;

decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)

【功能】根据条件返回相应值

【参数】c1, c2, ...,cn,字符型/数值型/日期型，必须类型相同或null

注：值1……n 不能为条件表达式,这种情况只能用case when then end解决

·含义解释：

decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)

该函数的含义如下：

IF 条件=值1 THEN

RETURN(翻译值1)

ELSIF 条件=值2 THEN

RETURN(翻译值2)

......

ELSIF 条件=值n THEN

RETURN(翻译值n)

ELSE

RETURN(缺省值)

END IF

或：

when case 条件=值1 THEN

RETURN(翻译值1)

ElseCase 条件=值2 THEN

RETURN(翻译值2)

......

ElseCase 条件=值n THEN

RETURN(翻译值n)

ELSE

RETURN(缺省值)

END

'Q' || decode(cust_income_level, null,decode(age_range, null, 4, 3),decode(age_range, null, 2, 1)) query_tag

是返回查询分类标识cust_income_level为null返返回decode(age_range, null, 4, 3)否则返回decode(age_range, null, 2, 1)

cust_income_level==null and age_range==null,query_tag=4

cust_income_level==null and age_range!=null,query_tag=3

cust_income_level!=null and age_range==null,query_tag=2

cust_income_level!=null and age_range!=null,query_tag=1

用grouping()函数排除空值

上面的ＳＱＬ有个问题，尽管总行数与之前使用union all运算符所得到的相一致，一些数据行中的cust_income_level和age_range具有空值，并且有一行的这两列都为空值。当cube的参数中包含生成列的所有可能组合时，每一列都有会产生n-1个空值，n为列表中的数目。在查询的例子中有两个例，因此对于每个唯一的age_range值都会在cust_income_level列上产生空值。对于age_range列来说也适用同样的规则。如果这两列中的数据在某些行上原本就有空值，这些空值就可能出问题。如何辨别数据中原有的空值和cube扩展所插入的值呢？在oracle 8i中引入了grouping()函数，可以用来识别这些超聚合行。被用来作为grouping()函数参数的表达式必须与出现在group by子句中的表达式相匹配。例如

decode(grouping(age_range),1,'ALL AGE',age_range) age_range

age_range检测age_range是否有一行由cube产生的空值，或者是否其在数据库中本身就是空值。如果当前行是由cube生成的超聚合行则返回值为１，对于其它所有情况返回值都为０。

当与case()表达式或decode()函数组合时，超聚合行中的空值可以用一个报告中有用的值替换。这种情况下，decode()看上去是更好的选择，因为它更简便并且grouping()函数仅有两种可能的返回值。

--grouping()函数

--无grouping

with tsales as

(select /* lst7-11 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || decode(cust_income_level,

null,

decode(age_range, null, 4, 3),

decode(age_range, null, 2, 1)) query_tag,

prod_category,

cust_income_level,

age_range,

sum(profit) profit

from tsales

group by prod_category, cube(cust_income_level, age_range)

order by prod_category, profit;

--有grouping

--case和decode都可以工作，我更喜欢用decode

with tsales as

(select /* lst7-12 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || decode(cust_income_level,

null,

decode(age_range, null, 4, 3),

decode(age_range, null, 2, 1)) query_tag,

prod_category,

case grouping(cust_income_level)

when 1 then

'ALL INCOME'

else

cust_income_level

end cust_income_level,

decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,

sum(profit) profit

from tsales

group by prod_category, cube(cust_income_level, age_range)

order by prod_category, profit;

用grouping()扩展报告

另一种使用grouping()的方法是放在having子句中，用来控制在输出中显示哪个层级的聚合。

使用grouping()函数可以被浓缩为对cube扩展中的各行或所有行进行滚动小计。

--在having子句中进行grouping()

with tsales as

(select /* lst7-13 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || decode(cust_income_level,

null,

decode(age_range, null, 4, 3),

decode(age_range, null, 2, 1)) query_tag,

prod_category,

case grouping(cust_income_level)

when 1 then

'ALL INCOME'

else

cust_income_level

end cust_income_level,

decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,

sum(profit) profit

from tsales

group by prod_category, cube(cust_income_level, age_range)

--having grouping(cust_income_level)=1

--having grouping(age_range)=1

having grouping(cust_income_level)=1 and grouping(age_range)=1

order by prod_category, profit;

上面sql的数据可以看到将grouping()应用到cust_income_level列对所有age_range值跨各个收入层次创建聚合。对age_range列进行这样的操作会得到类似的效果，对所有cust_income_level值进行聚合而不考虑age_range的值。将cube扩展中的所有列作为grouping()函数的参数将会导致聚合被浓缩为一行类似sum(profit)和group by prod_category所实现的功能。但是，使用cube扩展简单修改having子句就可以创建几份不同的报告。

用grouping_id()扩展报告

grouping_id()函数相对grouping()函数来说是相对较新的，在oracle 9i中引入，与grouping()函数在某种程度上是类似的。不同的是grouping()计算一个表达式并返回０或１，而grouping_id()计算一个表达式，确定其参数中的哪一行(如果有的话)用来生成超聚合行，然后创建一个位矢量，并将该值作为整形值返回。

--group_id()位矢量

with rowgen as (

select 1 bit_1,0 bit_0

from dual

), cubed as (

select

grouping_id(bit_1,bit_0) gid,

to_char(grouping(bit_1)) bv_1,

to_char(grouping(bit_0)) bv_0,

decode(grouping(bit_1),1,'GRP BIT 1') gb_1,

decode(grouping(bit_0),1,'GRP BIT 0') gb_0

from rowgen

group by cube(bit_1,bit_0)

)

select gid,bv_1 || bv_0 bit_vector,

gb_1,

gb_0

from cubed

order by gid;

846760d88629

group_id()位矢量运行结果

我们己经知道如何使用grouping()通过having子句来控制输出，但考虑数据库效率时，单独的grouping_id()调用可以用来取代所有不同的having grouping()子句。grouping()函数的功能仅仅用来辨别数据行，因为它仅能返回０或１。由于grouping_id()函数返回一个基于位矢量的数值，它可以轻易被用来进行各种不同的比较而不用修改sql语句。

为什么要关注不改变SQL语句就能改变比较呢？如上面基于销售历史的例子中，用户可能会被给出４个输出选项，任意一个或多个可能会被选中。用户的选择可以用来作为使用having grouping_id()函数的一个单独的sql语句，而不是基于having grouping()的不同组全的多个sql语句的输入，因此需要数据库解析sql语句的次数也比就较少。同时这也会使得需要执行的sql语句更少，使用更小的IO,以及更少的内存。

正如使用cube来避免通过union all将多个sql语句结合起来一样，grouping_id()能够避免在应用中使用多个sql语句。

--显示所有收入层次和年龄段的聚合

variable N_ALL_DATA number

--显示所有年龄段的聚合

variable N_AGE_RANGE number

--显示所有收入层次的聚合

variable N_INCOME_LEVEL number

--只给出汇总

variable N_SUMMAY number

begin

:N_ALL_DATA := 0; -- 1 生效

:N_AGE_RANGE := 2; -- 2 生效

:N_INCOME_LEVEL := 0; -- 3 生效

:N_SUMMAY := 4; -- 4 生效

end;

with tsales as

(select /* lst7-15 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,

prod_category,

decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,

decode(grouping(age_range),1,'ALL AGE',age_range) age_range,

sum(profit) profit

from tsales

group by prod_category,cube(cust_income_level,age_range)

having grouping_id(cust_income_level,age_range)+1 in(:N_ALL_DATA,:N_AGE_RANGE,:N_INCOME_LEVEL,:N_SUMMAY)

order by prod_category,profit;

使用grouping函数也可以实现同的结果，但需要在having子句中进行一些测试。示例销售历史数据查询在cube参数中只包含两列。在having子句中总共需要进行４次测试，因为grouping子句将会返回１或者０，每一列有两个可能的值。从而需要４次测试。如果３列，则需要８次，所需的测试次数将会是２的n次方，其中n为cube中参数列或表达式的个数。

用grouping()代替grouping_id()的having子句的例子

--显示所有收入层次和年龄段的聚合

variable N_ALL_DATA number

--显示所有年龄段的聚合

variable N_AGE_RANGE number

--显示所有收入层次的聚合

variable N_INCOME_LEVEL number

--只给出汇总

variable N_SUMMAY number

begin

:N_ALL_DATA := 0; -- 1 生效

:N_AGE_RANGE := 2; -- 2 生效

:N_INCOME_LEVEL := 0; -- 3 生效

:N_SUMMAY := 4; -- 4 生效

end;

with tsales as

(select /* lst7-16 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,

prod_category,

decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,

decode(grouping(age_range),1,'ALL AGE',age_range) age_range,

sum(profit) profit

from tsales

group by prod_category,cube(cust_income_level,age_range)

having

(bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_ALL_DATA)

or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_AGE_RANGE)

or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_INCOME_LEVEL)

or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_SUMMAY)

order by prod_category,profit;

１.　使用grouping可以判断该行是数据库中本来的行，还是有统计产生的行

grouping值为0时说明这个值是数据库中本来的值，为1说明是统计的结果(也可以说该列为空时是1，不为空时是0)

2.　GROUPING_ID()函数可以接受一列或多列，返回GROUPING位向量的十进制值。GROUPING位向量的计算方法是将按照顺序对每一列调用GROUPING函数的结果组合起来，所以说01和10的值不一样的

3.　group_id的使用当group by子句中重复使用一个列时,通过group_id来去除重复值

grouping sets与rollup()

group by的grouping sets()扩展在oracle 9i中初次登场，前面的例子中的整个group by...having子句可以用group by grouping sets()替换。

with tsales as

(select /* lst7-17 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,

prod_category,

decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,

decode(grouping(age_range),1,'ALL AGE',age_range) age_range,

sum(profit) profit

from tsales

group by prod_category,grouping sets(

rollup(prod_category), --产品分类小计

(cust_income_level),--产品分类和收入层次

(age_range),　--产品分类和年龄范围

(cust_income_level,age_range)　--产品分类，年龄范围和收入层次

)

--having group_id() < 1

order by prod_category,profit;

group by cube having grouping_id()与group by grouping sets一个主要的区别是，前者能将变量设定为正确的值来简便修改输出，而后者的输出不能修改，除非修改或动态生成sql语句。修改sql语句意味着需要维护更多的代码并且占用更多的数据库资源。最好尽量避免使用动态生成sql语句，因为它会消耗的数据库资源更多，并且在出现问题时难以检修。

某些时候grouping_sets()扩展会导致输出中出现重复。重复是由rollup(prod_category)产生的。可以通过去掉rollup()然后重新运行得到验证，重复的行将不复存在。但是，每种产品种类的总计也不存在了。解决的办法就是使用group_id()函数标记重复的行，并将其插入到having子句中。

在上面的sql中将--having group_id() < 1 改成　having group_id() < 1

这样，输出结果就如预期的那样不包含重复的行了。有趣的是如果将rollup(prod_category)改成null,去掉having子句，同时还能得到预期的输出。代码如下：

with tsales as

(select /* lst7-17-1 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,

prod_category,

decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,

decode(grouping(age_range),1,'ALL AGE',age_range) age_range,

sum(profit) profit

from tsales

group by prod_category,grouping sets(

null,

--rollup(prod_category), --产品分类小计

(cust_income_level),--产品分类和收入层次

(age_range),　--产品分类和年龄范围

(cust_income_level,age_range)　--产品分类，年龄范围和收入层次

)

--having group_id() < 1

order by prod_category,profit

group by的rollup()扩展也可以单独用来计算否则将会需要由union all结合起来的多个查询完成的小计。

例如：创建显示器所有名字以Sul开头的消费者各自的购买总额报告，并且要求对每个消费者分别按年，产品分类进行小计，还要有所有消费的总计。这种类型的任务可以使用rollup()完成。

--rollup()小计

with mysales as (

select c.cust_last_name || ',' || c.cust_first_name cust_name,

p.prod_category,

to_char(trunc(time_id,'YYYY'),'YYYY') sale_year,

p.prod_name,

s.amount_sold

from sh.sales s

join sh.products p on p.prod_id=s.prod_id

join sh.customers c on c.cust_id=s.cust_id

where c.cust_last_name like 'Sul%'

)

select

decode(grouping(m.cust_name),1,'GRAND TOTAL',m.cust_name) cust_name,

decode(grouping(m.sale_year),1,'TOTAL BY YEAR',m.sale_year) sale_year,

decode(grouping(m.prod_category),1,'TOTAL BY CATEGORY',m.prod_category) prod_category,

sum(m.amount_sold) amount_sold

from mysales m

group by rollup(m.cust_name,m.prod_category,m.sale_year)

order by grouping(m.cust_name), 1,2,3;

注意decode()和grouping()函数再一次被用来表示小计行。使用grouping(m.cust_name)将总计显示在报告的最后。由于这个值>0的唯一情况就是当计算所有消费者总计时，这个总计值只会出现在报告的最后。

group by的局限性

LOB列，嵌套表或数组不能用做group by表达式的一部分

SQL> with lobtest as (

2 select to_clob(d.dname) dname

3 from scott.emp e

4 join scott.dept d on d.deptno=e.deptno

5 )

6 select l.dname

7 from lobtest l

8 group by l.dname;

group by l.dname

第 8 行出现错误:

ORA-00932: 数据类型不一致: 应为 -, 但却获得 CLOB

不允许使用标量子查询表达式

SQL> select d.dname,count(empno) empcount

2 from scott.emp e

3 join scott.dept d on d.deptno=e.deptno

4 group by (select dname from scott.dept d2 where d2.dname = d.dname )

5 order by d.dname;

group by (select dname from scott.dept d2 where d2.dname = d.dname )

第 4 行出现错误:

ORA-22818: 这里不允许出现子查询表达式

如果group by子句引用任何对象类型的列则查询不能并行化

create type dept_location_type as object

(

street_address varchar2(40),

postal_code varchar2(10),

city varchar2(30),

state_province varchar2(10),

country_id char(2),

order member function match (e dept_location_type) return integer

);

create or replace type body dept_location_type

as order member function match (e dept_location_type) return integer

begin

if city

return -1;

elsif city > e.city then

return 1;

else

return 0;

end if;

end;

create table deptobj

select d.deptno,d.dname

from scott.dept d;

alter table deptobj add (dept_location dept_location_type);

select * from deptobj;

update deptobj set dept_location=dept_location_type('1234 fenmenao st','453076','ShenZhen','GuangDong','GD') where deptno=1;

update deptobj set dept_location=dept_location_type('345 Leshan st','123456','LeShan','SiCuan','SC') where deptno=2;

update deptobj set dept_location=dept_location_type('345 ChongQing st','123456','ChongQing','ChongQing','CQ') where deptno=3;

update deptobj set dept_location=dept_location_type('345 ChangChun st','123456','ChangChun','GuiYang','GY') where deptno=4;

--对象列的并行group by

select /*+ gather_plan_statictics parallel(e 2)*/

d.dept_location,count(e.ename) ecount

from scott.emp e,deptobj d

where e.deptno=d.deptno

group by d.dept_location

order by d.dept_location;

执行结果

846760d88629

对象列的并行group by

dept_location类型体中的成员函数匹配用来进行城市值的比较，然后使用group by将雇员按城市分组。最后一个列出的局限性在后期的版本是可以工作的。

总结

Ｏracle以group by子句扩展的形式为SQL开发者提供了一些极佳的工具，帮助我们不仅能够减少代码量，并且能提高数据库效率。大多数的特性也要与其它不同的功能进行组合。

846760d88629

group by总结

108杯美式

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
oracle cube 交叉分组,oracle学习笔记７: 高级分组

group by 用来在原始数据上创建聚合来将数据转化为有用的信息。基本的group by 列出个个部门的名称，员工总数select d.dname, count(empno) empcountfrom scott.dept dleft outer join scott.emp eon d.deptno = e.deptnogroup by d.dnameorder by d.dname;sel...
复制链接

扫一扫