oracle cube 交叉分组,oracle学习笔记7: 高级分组

group by 用来在原始数据上创建聚合来将数据转化为有用的信息。

基本的group by 列出个个部门的名称,员工总数

select d.dname, count(empno) empcount

from scott.dept d

left outer join scott.emp e

on d.deptno = e.deptno

group by d.dname

order by d.dname;

select列表中的每一列必须包含在group by子句中。如果没有则会导致错误。如:

SQL> select d.dname, d.loc, count(empno) empcount

2 from scott.emp e

3 join scott.dept d

4 on d.deptno = e.deptno

5 group by d.dname;

select d.dname, d.loc, count(empno) empcount

*

第 1 行出现错误:

ORA-00979: 不是 GROUP BY 表达式

尽管包含group by子句的select语句输出看上去是按顺序列出的,你不能期待group by每次都排好序地返回数据,如果输出结果必须按照一定的顺序排列,则必须使用order by子句。

--没有排序的group by

select deptno,count(*)

from emp

group by deptno;

--复杂的sql

set serveroutput off;

--复杂的sql

select /* lst7-4 */

distinct dname,

decode(d.deptno,

1,

(select count(*) from emp where deptno = 1),

2,

(select count(*) from emp where deptno = 2),

3,

(select count(*) from emp where deptno = 3),

(select count(*) from emp where deptno not in (1, 2, 3))) dept_count

from (select distinct deptno from emp) d

join dept d2

on d.deptno = d.deptno;

@E:\bjc2016\study\pln lst7-4

上面的写法,会使SQL语句更加复杂难以理解并且难以维护。group by子句极大的简化必须写的sql语句以外,还消除了数据库不必要的IO。

set serveroutput off;

--复杂的sql

select /* lst7-5 */

distinct dname,

count(empno) empcount

from emp e

join dept d

on d.deptno = d.deptno

group by d.dname

order by d.dname;

@E:\bjc2016\study\pln lst7-5

group by 优点:

使sql语句更具有可读性

书写起来比使用很多相关子查询更简单

减少了重复访问同一个数据块的次数,从而得到更好的性能。

在分组之后,还在数据集上应用了having子句。另一方面,在获取数据行之后,进行分组之前,应用了where子句。having 子句中可以使用运算,函数及子查询。

SQL> --having子句

SQL> select /* lst7-6 */

2 d.dname, trunc(e.hiredate, 'yyyy') hiredate, count(empno) empcount

3 from emp e

4 join dept d

5 on e.deptno = e.deptno

6 group by d.dname, trunc(e.hiredate, 'yyyy')

7 having count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select min(hiredate)

8 from scott.emp) and (select max(hiredate)

9 from scott.emp)

10 order by d.dname;

未选定行

SQL> @E:\bjc2016\study\pln lst7-6

原值 8: WHERE UPPER(SQL_TEXT) LIKE '%&1%'

新值 8: WHERE UPPER(SQL_TEXT) LIKE '%lst7-6%'

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

SQL_ID 0wcfknkztdxqt, child number 0

-------------------------------------

select /* lst7-6 */ d.dname, trunc(e.hiredate, 'yyyy') hiredate,

count(empno) empcount from emp e join dept d on e.deptno =

e.deptno group by d.dname, trunc(e.hiredate, 'yyyy') having

count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select

min(hiredate)

from scott.emp) and (select max(hiredate)

from

scott.emp) order by d.dname

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

Plan hash value: 239717969

---------------------------------------------------------------------------

| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |

---------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | | | | |

|* 1 | FILTER | | | | | |

| 2 | SORT GROUP BY | | 1 | 2048 | 2048 | 2048 (0)|

| 3 | MERGE JOIN CARTESIAN| | 40 | | | |

| 4 | TABLE ACCESS FULL | DEPT | 4 | | | |

| 5 | BUFFER SORT | | 10 | 2048 | 2048 | 2048 (0)|

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

| 6 | TABLE ACCESS FULL | EMP | 10 | | | |

| 7 | SORT AGGREGATE | | 1 | | | |

| 8 | TABLE ACCESS FULL | EMP | 10 | | | |

| 9 | SORT AGGREGATE | | 1 | | | |

| 10 | TABLE ACCESS FULL | EMP | 10 | | | |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter((COUNT(*)<=5 AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

fmyyyy')>= AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'fmyyyy')<=))

Note

-----

- Warning: basic plan statistics not available. These are only collected when

:

* hint 'gather_plan_statistics' is used for the statement or

* parameter 'statistics_level' is set to 'ALL', at session or system leve

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

l

已选择41行。

SQL>

group by的cube扩展

当与group by子句一起使用时,将会使得对每一行都要考虑包含在cube的参数中的所有可能的元素组合。这个运算将会生成比表中实际存在的行数更多的数据行。

-- hr.emplyees表的cube运算

select last_name, first_name

from hr.employees

group by first_name, last_name;

set autotrace off;

set autotrace on statistics;

with emps as

(select /* lst-7 */

last_name, first_name

from hr.employees

group by cube(last_name, first_name))

select rownum, last_name, first_name from emps;

对于每一对last_name,first_name,cube将会按顺序为每个元素替换为null值。cube生成的数据行在Oracle文档中称为超级聚合行,可以在运算列中加入null值来识别。

SQL> set autotrace off;

SQL> --预测cube返回行数

SQL> with counts as

2 (select count(distinct first_name) first_name_count,

3 count(distinct last_name) last_name_count,

4 count(distinct(first_name || last_name)) full_name_count

5 from hr.employees)

6 select first_name_count,

7 last_name_count,

8 full_name_count,

9 first_name_count + last_name_count + full_name_count + 1 total_count

10 from counts;

FIRST_NAME_COUNT LAST_NAME_COUNT FULL_NAME_COUNT TOTAL_COUNT

---------------- --------------- --------------- -----------

91 102 107 301

下面用SQL语句模拟cube,可以看出cube为我们节省了许多力气。

--用union all生成cube数据行

with emps as (

select last_name,first_name from hr.employees

),

mycube as (

select last_name,first_name from emps

union all

select last_name,null first_name from emps

union all

select null last_name,first_name from emps

union all

select null last_name,null first_name from emps

)

select /*+ gather_plan_statistics */ *

from mycube group by last_name,first_name;

cube实际应用

sales_history模式中包含1998~2001年的销售数据。

下面的SQL展示2001年的所有销售数据。并想要查看各个产品种类的销售情况汇总,包含基于10年消费者年龄段,收入水平的聚合;按照收入水平而不考虑年龄的汇总;以及按年龄而不考虑收入水平的聚合。

--销售数据的union all查询

with tsales as

(select /* lst7-10 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001)),

gb as

(select --Q1 - 所有分类通过收入和年龄范围

'Q1' query_tag,

prod_category,

cust_income_level,

age_range,

sum(profit) profit

from tsales

group by prod_category, cust_income_level, age_range

union all

select --Q2 - 所有分类通过年龄范围

'Q2' query_tag,

prod_category,

'ALL INCOME' cust_income_level,

age_range,

sum(profit) profit

from tsales

group by prod_category, 'ALL INCOME', age_range

union all

select --Q3 - 所有分类通过收入

'Q3' query_tag,

prod_category,

cust_income_level,

'ALL AGE' age_range,

sum(profit) profit

from tsales

group by prod_category, cust_income_level, 'ALL AGE'

union all

select --Q4 - 所有分类

'Q4' query_tag,

prod_category,

'ALL INCOME' cust_income_level,

'ALL AGE' age_range,

sum(profit) profit

from tsales

group by prod_category, 'ALL INCOME', 'ALL AGE'

)

select * from gb order by prod_category, profit;

【语法】NVL (expr1, expr2)

【功能】若expr1为NULL,返回expr2;expr1不为NULL,返回expr1。

注意两者的类型要一致

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level

若c.cust_income_level为null,则返回'A: below 30,000'

mod(x,y)

【功能】返回x除以y的余数

【参数】x,y,数字型表达式

【返回】数字

【示例】

select mod(23,8),mod(24,8) from dual;

返回:7,0

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' || to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range

上面是求年龄段,如果56,则求出的范围为50_60

--用cube代替union all

with tsales as

(select /* lst7-11 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || decode(cust_income_level,

null,

decode(age_range, null, 4, 3),

decode(age_range, null, 2, 1)) query_tag,

prod_category,

cust_income_level,

age_range,

sum(profit) profit

from tsales

group by prod_category, cube(cust_income_level, age_range)

order by prod_category, profit;

decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)

【功能】根据条件返回相应值

【参数】c1, c2, ...,cn,字符型/数值型/日期型,必须类型相同或null

注:值1……n 不能为条件表达式,这种情况只能用case when then end解决

·含义解释:

decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)

该函数的含义如下:

IF 条件=值1 THEN

RETURN(翻译值1)

ELSIF 条件=值2 THEN

RETURN(翻译值2)

......

ELSIF 条件=值n THEN

RETURN(翻译值n)

ELSE

RETURN(缺省值)

END IF

或:

when case 条件=值1 THEN

RETURN(翻译值1)

ElseCase 条件=值2 THEN

RETURN(翻译值2)

......

ElseCase 条件=值n THEN

RETURN(翻译值n)

ELSE

RETURN(缺省值)

END

'Q' || decode(cust_income_level, null,decode(age_range, null, 4, 3),decode(age_range, null, 2, 1)) query_tag

是返回查询分类标识cust_income_level为null返返回decode(age_range, null, 4, 3)否则返回decode(age_range, null, 2, 1)

cust_income_level==null and age_range==null,query_tag=4

cust_income_level==null and age_range!=null,query_tag=3

cust_income_level!=null and age_range==null,query_tag=2

cust_income_level!=null and age_range!=null,query_tag=1

用grouping()函数排除空值

上面的SQL有个问题,尽管总行数与之前使用union all运算符所得到的相一致,一些数据行中的cust_income_level和age_range具有空值,并且有一行的这两列都为空值。当cube的参数中包含生成列的所有可能组合时,每一列都有会产生n-1个空值,n为列表中的数目。在查询的例子中有两个例,因此对于每个唯一的age_range值都会在cust_income_level列上产生空值。对于age_range列来说也适用同样的规则。如果这两列中的数据在某些行上原本就有空值,这些空值就可能出问题。如何辨别数据中原有的空值和cube扩展所插入的值呢?在oracle 8i中引入了grouping()函数,可以用来识别这些超聚合行。被用来作为grouping()函数参数的表达式必须与出现在group by子句中的表达式相匹配。例如

decode(grouping(age_range),1,'ALL AGE',age_range) age_range

age_range检测age_range是否有一行由cube产生的空值,或者是否其在数据库中本身就是空值。如果当前行是由cube生成的超聚合行则返回值为1,对于其它所有情况返回值都为0。

当与case()表达式或decode()函数组合时,超聚合行中的空值可以用一个报告中有用的值替换。这种情况下,decode()看上去是更好的选择,因为它更简便并且grouping()函数仅有两种可能的返回值。

--grouping()函数

--无grouping

with tsales as

(select /* lst7-11 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || decode(cust_income_level,

null,

decode(age_range, null, 4, 3),

decode(age_range, null, 2, 1)) query_tag,

prod_category,

cust_income_level,

age_range,

sum(profit) profit

from tsales

group by prod_category, cube(cust_income_level, age_range)

order by prod_category, profit;

--有grouping

--case和decode都可以工作,我更喜欢用decode

with tsales as

(select /* lst7-12 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || decode(cust_income_level,

null,

decode(age_range, null, 4, 3),

decode(age_range, null, 2, 1)) query_tag,

prod_category,

case grouping(cust_income_level)

when 1 then

'ALL INCOME'

else

cust_income_level

end cust_income_level,

decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,

sum(profit) profit

from tsales

group by prod_category, cube(cust_income_level, age_range)

order by prod_category, profit;

用grouping()扩展报告

另一种使用grouping()的方法是放在having子句中,用来控制在输出中显示哪个层级的聚合。

使用grouping()函数可以被浓缩为对cube扩展中的各行或所有行进行滚动小计。

--在having子句中进行grouping()

with tsales as

(select /* lst7-13 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || decode(cust_income_level,

null,

decode(age_range, null, 4, 3),

decode(age_range, null, 2, 1)) query_tag,

prod_category,

case grouping(cust_income_level)

when 1 then

'ALL INCOME'

else

cust_income_level

end cust_income_level,

decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,

sum(profit) profit

from tsales

group by prod_category, cube(cust_income_level, age_range)

--having grouping(cust_income_level)=1

--having grouping(age_range)=1

having grouping(cust_income_level)=1 and grouping(age_range)=1

order by prod_category, profit;

上面sql的数据可以看到将grouping()应用到cust_income_level列对所有age_range值跨各个收入层次创建聚合。对age_range列进行这样的操作会得到类似的效果,对所有cust_income_level值进行聚合而不考虑age_range的值。将cube扩展中的所有列作为grouping()函数的参数将会导致聚合被浓缩为一行类似sum(profit)和group by prod_category所实现的功能。但是,使用cube扩展简单修改having子句就可以创建几份不同的报告。

用grouping_id()扩展报告

grouping_id()函数相对grouping()函数来说是相对较新的,在oracle 9i中引入,与grouping()函数在某种程度上是类似的。不同的是grouping()计算一个表达式并返回0或1,而grouping_id()计算一个表达式,确定其参数中的哪一行(如果有的话)用来生成超聚合行,然后创建一个位矢量,并将该值作为整形值返回。

--group_id()位矢量

with rowgen as (

select 1 bit_1,0 bit_0

from dual

), cubed as (

select

grouping_id(bit_1,bit_0) gid,

to_char(grouping(bit_1)) bv_1,

to_char(grouping(bit_0)) bv_0,

decode(grouping(bit_1),1,'GRP BIT 1') gb_1,

decode(grouping(bit_0),1,'GRP BIT 0') gb_0

from rowgen

group by cube(bit_1,bit_0)

)

select gid,bv_1 || bv_0 bit_vector,

gb_1,

gb_0

from cubed

order by gid;

846760d88629

group_id()位矢量运行结果

我们己经知道如何使用grouping()通过having子句来控制输出,但考虑数据库效率时,单独的grouping_id()调用可以用来取代所有不同的having grouping()子句。grouping()函数的功能仅仅用来辨别数据行,因为它仅能返回0或1。由于grouping_id()函数返回一个基于位矢量的数值,它可以轻易被用来进行各种不同的比较而不用修改sql语句。

为什么要关注不改变SQL语句就能改变比较呢?如上面基于销售历史的例子中,用户可能会被给出4个输出选项,任意一个或多个可能会被选中。用户的选择可以用来作为使用having grouping_id()函数的一个单独的sql语句,而不是基于having grouping()的不同组全的多个sql语句的输入,因此需要数据库解析sql语句的次数也比就较少。同时这也会使得需要执行的sql语句更少,使用更小的IO,以及更少的内存。

正如使用cube来避免通过union all将多个sql语句结合起来一样,grouping_id()能够避免在应用中使用多个sql语句。

--显示所有收入层次和年龄段的聚合

variable N_ALL_DATA number

--显示所有年龄段的聚合

variable N_AGE_RANGE number

--显示所有收入层次的聚合

variable N_INCOME_LEVEL number

--只给出汇总

variable N_SUMMAY number

begin

:N_ALL_DATA := 0; -- 1 生效

:N_AGE_RANGE := 2; -- 2 生效

:N_INCOME_LEVEL := 0; -- 3 生效

:N_SUMMAY := 4; -- 4 生效

end;

/

with tsales as

(select /* lst7-15 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,

prod_category,

decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,

decode(grouping(age_range),1,'ALL AGE',age_range) age_range,

sum(profit) profit

from tsales

group by prod_category,cube(cust_income_level,age_range)

having grouping_id(cust_income_level,age_range)+1 in(:N_ALL_DATA,:N_AGE_RANGE,:N_INCOME_LEVEL,:N_SUMMAY)

order by prod_category,profit;

使用grouping函数也可以实现同的结果,但需要在having子句中进行一些测试。示例销售历史数据查询在cube参数中只包含两列。在having子句中总共需要进行4次测试,因为grouping子句将会返回1或者0,每一列有两个可能的值。从而需要4次测试。如果3列,则需要8次,所需的测试次数将会是2的n次方,其中n为cube中参数列或表达式的个数。

用grouping()代替grouping_id()的having子句的例子

--显示所有收入层次和年龄段的聚合

variable N_ALL_DATA number

--显示所有年龄段的聚合

variable N_AGE_RANGE number

--显示所有收入层次的聚合

variable N_INCOME_LEVEL number

--只给出汇总

variable N_SUMMAY number

begin

:N_ALL_DATA := 0; -- 1 生效

:N_AGE_RANGE := 2; -- 2 生效

:N_INCOME_LEVEL := 0; -- 3 生效

:N_SUMMAY := 4; -- 4 生效

end;

/

with tsales as

(select /* lst7-16 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,

prod_category,

decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,

decode(grouping(age_range),1,'ALL AGE',age_range) age_range,

sum(profit) profit

from tsales

group by prod_category,cube(cust_income_level,age_range)

having

(bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_ALL_DATA)

or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_AGE_RANGE)

or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_INCOME_LEVEL)

or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_SUMMAY)

order by prod_category,profit;

1. 使用grouping可以判断该行是数据库中本来的行,还是有统计产生的行

grouping值为0时说明这个值是数据库中本来的值,为1说明是统计的结果(也可以说该列为空时是1,不为空时是0)

2. GROUPING_ID()函数可以接受一列或多列,返回GROUPING位向量的十进制值。GROUPING位向量的计算方法是将按照顺序对每一列调用GROUPING函数的结果组合起来,所以说01和10的值不一样的

3. group_id的使用 当group by子句中重复使用一个列时,通过group_id来去除重复值

grouping sets与rollup()

group by的grouping sets()扩展在oracle 9i中初次登场,前面的例子中的整个group by...having子句可以用group by grouping sets()替换。

with tsales as

(select /* lst7-17 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,

prod_category,

decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,

decode(grouping(age_range),1,'ALL AGE',age_range) age_range,

sum(profit) profit

from tsales

group by prod_category,grouping sets(

rollup(prod_category), --产品分类小计

(cust_income_level),--产品分类和收入层次

(age_range), --产品分类和年龄范围

(cust_income_level,age_range) --产品分类,年龄范围和收入层次

)

--having group_id() < 1

order by prod_category,profit;

group by cube having grouping_id()与group by grouping sets一个主要的区别是,前者能将变量设定为正确的值来简便修改输出,而后者的输出不能修改,除非修改或动态生成sql语句。修改sql语句意味着需要维护更多的代码并且占用更多的数据库资源。最好尽量避免使用动态生成sql语句,因为它会消耗的数据库资源更多,并且在出现问题时难以检修。

某些时候grouping_sets()扩展会导致输出中出现重复。重复是由rollup(prod_category)产生的。可以通过去掉rollup()然后重新运行得到验证,重复的行将不复存在。但是,每种产品种类的总计也不存在了。解决的办法就是使用group_id()函数标记重复的行,并将其插入到having子句中。

在上面的sql中将--having group_id() < 1 改成 having group_id() < 1

这样,输出结果就如预期的那样不包含重复的行了。有趣的是如果将rollup(prod_category)改成null,去掉having子句,同时还能得到预期的输出。代码如下:

with tsales as

(select /* lst7-17-1 */

s.quantity_sold,

s.amount_sold,

to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||

to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,

nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,

p.prod_name,

p.prod_desc,

p.prod_category,

(pf.unit_cost * s.quantity_sold) total_cost,

s.amount_sold - (pf.unit_cost * s.quantity_sold) profit

from sh.sales s

join sh.customers c

on c.cust_id = s.cust_id

join sh.products p

on p.prod_id = s.prod_id

join sh.times t

on t.time_id = s.time_id

join sh.costs pf

on pf.channel_id = s.channel_id

and pf.prod_id = s.prod_id

and pf.promo_id = s.promo_id

and pf.time_id = s.time_id

where (t.fiscal_year = 2001))

select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,

prod_category,

decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,

decode(grouping(age_range),1,'ALL AGE',age_range) age_range,

sum(profit) profit

from tsales

group by prod_category,grouping sets(

null,

--rollup(prod_category), --产品分类小计

(cust_income_level),--产品分类和收入层次

(age_range), --产品分类和年龄范围

(cust_income_level,age_range) --产品分类,年龄范围和收入层次

)

--having group_id() < 1

order by prod_category,profit

group by的rollup()扩展也可以单独用来计算否则将会需要由union all结合起来的多个查询完成的小计。

例如:创建显示器所有名字以Sul开头的消费者各自的购买总额报告,并且要求对每个消费者分别按年,产品分类进行小计,还要有所有消费的总计。这种类型的任务可以使用rollup()完成。

--rollup()小计

with mysales as (

select c.cust_last_name || ',' || c.cust_first_name cust_name,

p.prod_category,

to_char(trunc(time_id,'YYYY'),'YYYY') sale_year,

p.prod_name,

s.amount_sold

from sh.sales s

join sh.products p on p.prod_id=s.prod_id

join sh.customers c on c.cust_id=s.cust_id

where c.cust_last_name like 'Sul%'

)

select

decode(grouping(m.cust_name),1,'GRAND TOTAL',m.cust_name) cust_name,

decode(grouping(m.sale_year),1,'TOTAL BY YEAR',m.sale_year) sale_year,

decode(grouping(m.prod_category),1,'TOTAL BY CATEGORY',m.prod_category) prod_category,

sum(m.amount_sold) amount_sold

from mysales m

group by rollup(m.cust_name,m.prod_category,m.sale_year)

order by grouping(m.cust_name), 1,2,3;

注意decode()和grouping()函数再一次被用来表示小计行。使用grouping(m.cust_name)将总计显示在报告的最后。由于这个值>0的唯一情况就是当计算所有消费者总计时,这个总计值只会出现在报告的最后。

group by的局限性

LOB列,嵌套表或数组不能用做group by表达式的一部分

SQL> with lobtest as (

2 select to_clob(d.dname) dname

3 from scott.emp e

4 join scott.dept d on d.deptno=e.deptno

5 )

6 select l.dname

7 from lobtest l

8 group by l.dname;

group by l.dname

*

第 8 行出现错误:

ORA-00932: 数据类型不一致: 应为 -, 但却获得 CLOB

不允许使用标量子查询表达式

SQL> select d.dname,count(empno) empcount

2 from scott.emp e

3 join scott.dept d on d.deptno=e.deptno

4 group by (select dname from scott.dept d2 where d2.dname = d.dname )

5 order by d.dname;

group by (select dname from scott.dept d2 where d2.dname = d.dname )

*

第 4 行出现错误:

ORA-22818: 这里不允许出现子查询表达式

如果group by子句引用任何对象类型的列则查询不能并行化

create type dept_location_type as object

(

street_address varchar2(40),

postal_code varchar2(10),

city varchar2(30),

state_province varchar2(10),

country_id char(2),

order member function match (e dept_location_type) return integer

);

/

create or replace type body dept_location_type

as order member function match (e dept_location_type) return integer

is

begin

if city

return -1;

elsif city > e.city then

return 1;

else

return 0;

end if;

end;

end;

/

create table deptobj

as

select d.deptno,d.dname

from scott.dept d;

alter table deptobj add (dept_location dept_location_type);

select * from deptobj;

update deptobj set dept_location=dept_location_type('1234 fenmenao st','453076','ShenZhen','GuangDong','GD') where deptno=1;

update deptobj set dept_location=dept_location_type('345 Leshan st','123456','LeShan','SiCuan','SC') where deptno=2;

update deptobj set dept_location=dept_location_type('345 ChongQing st','123456','ChongQing','ChongQing','CQ') where deptno=3;

update deptobj set dept_location=dept_location_type('345 ChangChun st','123456','ChangChun','GuiYang','GY') where deptno=4;

--对象列的并行group by

select /*+ gather_plan_statictics parallel(e 2)*/

d.dept_location,count(e.ename) ecount

from scott.emp e,deptobj d

where e.deptno=d.deptno

group by d.dept_location

order by d.dept_location;

执行结果

846760d88629

对象列的并行group by

dept_location类型体中的成员函数匹配用来进行城市值的比较,然后使用group by将雇员按城市分组。最后一个列出的局限性在后期的版本是可以工作的。

总结

Oracle以group by子句扩展的形式为SQL开发者提供了一些极佳的工具,帮助我们不仅能够减少代码量,并且能提高数据库效率。大多数的特性也要与其它不同的功能进行组合。

846760d88629

group by总结

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了安卓应用、小程序、Python应用和Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值