Overview of Dimensions

最新推荐文章于 2021-04-16 00:24:02 发布

cyxlxp8411

最新推荐文章于 2021-04-16 00:24:02 发布

阅读量376

点赞数

分类专栏： Oracle 基础知识文章标签： hierarchy date table access join loops

Oracle 基础知识专栏收录该内容

110 篇文章 0 订阅

订阅专栏

Overview of Dimensions

A dimension defines hierarchical (parent/child) relationships between pairs of
columns or column sets. Each value at the child level is associated with one and only
one value at the parent level. A hierarchical relationship is a functional dependency
from one level of a hierarchy to the next level in the hierarchy. A dimension is a
container of logical relationships between columns, and it does not have any data
storage assigned to it.

The CREATE DIMENSION statement specifies:

■ Multiple LEVEL clauses, each of which identifies a column or column set in the
dimension

■ One or more HIERARCHY clauses that specify the parent/child relationships
between adjacent levels

■ Optional ATTRIBUTE clauses, each of which identifies an additional column or
column set associated with an individual level

The columns in a dimension can come either from the same table (denormalized) or
from multiple tables (fully or partially normalized). To define a dimension over
columns from multiple tables, connect the tables using the JOIN clause of the
HIERARCHY clause.

For example, a normalized time dimension can include a date table, a month table, and
a year table, with join conditions that connect each date row to a month row, and each
month row to a year row. In a fully denormalized time dimension, the date, month,
and year columns are all in the same table. Whether normalized or denormalized, the
hierarchical relationships among the columns need to be specified in the CREATE
DIMENSION statement.

在数据仓库环境中，我们通常利用物化视图强大的查询重写功能来提升统计查询的性能，但是物化视图的查询重写功能有时候无法智能地判断查询中一些相关联的条件，以至于影响性能。比如我们有一张销售表sales，用于存储订单的详细信息，包含交易日期、顾客编号和销售量。我们创建一张物化视图，按月存储累计销量信息，假如这时候我们要查询按季度或者按年度统计销量信息，Oracle是否能够智能地转换查询重写呢？我们知道交易日期中的日期意味着月，月意味着所处的季度，季度意味着年度，但是Oracle却是无法智能地判断这其中的关系，因此无法利用物化视图查询重写来返回我们季度或年度的销量信息，而是直接查询基表，导致性能产生问题。

这时候Dimension就派上用场了。Dimension用于说明列之间的父子对应关系，以使优化器能够自动转换不同列的关系，利用物化视图的查询功能来提升查询统计性能。下面我们首先创建一张销售交易表sales，包含交易日期、顾客编号和销售量这几个列，用于保存销售订单信息，整个表有42万多条记录；创建另一张表time_hierarchy用于存储交易日期中时间的关系，包含交易日期及其对应的月、季度及年度等信息，然后我们将体验Dimension的强大功能。

Roby@XUE> create table sales

2(trans_date date, cust_id int, sales_amount number );

Table created.

Roby@XUE> insert /*+ APPEND */ into sales

2select trunc(sysdate,'year')+mod(rownum,366) TRANS_DATE,

3mod(rownum,100) CUST_ID,

4abs(dbms_random.random)/100 SALES_AMOUNT

5from all_objects

5926 rows created.

Roby@XUE> commit;

Commit complete.

Roby@XUE> begin

2for i in 1 .. 6

3loop

4insert /*+ APPEND */ into sales

5select trans_date, cust_id, abs(dbms_random.random)/100 SALES_AMOUNT

6from sales;

7commit;

8end loop;

9end;

10/

PL/SQL procedure successfully completed.

Roby@XUE> select count(*) from sales;

COUNT(*)

----------

426672

创建索引组织表time_hierarchy，里面生成了交易日期中日期DAY、月MMYYYY、季度QTY_YYYY、年度YYYY的关系。

Roby@XUE> create table time_hierarchy

2(day primary key, mmyyyy, mon_yyyy, qtr_yyyy, yyyy)

3organization index

4as

5select distinct

6trans_date DAY,

7cast (to_char(trans_date,'mmyyyy') as number) MMYYYY,

8to_char(trans_date,'mon-yyyy') MON_YYYY,

9'Q' || ceil( to_char(trans_date,'mm')/3) || ' FY'

10|| to_char(trans_date,'yyyy') QTR_YYYY,

11cast( to_char( trans_date, 'yyyy' ) as number ) YYYY

12from sales

13/

Table created.

接下我们创建一张物化视图mv_sales，用于存储每个客户对应每个月的销量统计信息。

Roby@XUE> create materialized view mv_sales

2build immediate

3refresh on demand

4enable query rewrite

5as

6select sales.cust_id, sum(sales.sales_amount) sales_amount,

7time_hierarchy.mmyyyy

8from sales, time_hierarchy

9where sales.trans_date = time_hierarchy.day

10group by sales.cust_id, time_hierarchy.mmyyyy

11/

Materialized view created.

我们对基表进行分析，以使优化器能够物化视图的查询重写功能：

Roby@XUE> analyze table sales compute statistics;

Table analyzed.

Roby@XUE> analyze table time_hierarchy compute statistics;

Table analyzed.

设置会话的查询重写功能：

Roby@XUE> alter session set query_rewrite_enabled=true;

Session altered.

Roby@XUE> alter session set query_rewrite_integrity=trusted;

Session altered.

接下来我们按月统计总的销量：

Roby@XUE> select time_hierarchy.mmyyyy, sum(sales_amount)

2from sales, time_hierarchy

3where sales.trans_date = time_hierarchy.day

4group by time_hierarchy.mmyyyy

MMYYYY SUM(SALES_AMOUNT)

---------- -----------------

120064.0574E+11

120071.2297E+10

220063.6875E+11

320063.9507E+11

420063.7621E+11

520063.8549E+11

620063.6641E+11

720063.8110E+11

820063.8502E+11

920063.7278E+11

1020063.7983E+11

1120063.7210E+11

1220063.8364E+11

13 rows selected.

Execution Plan

----------------------------------------------------------

0SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=327 Bytes=8502)

1 0SORT (GROUP BY) (Cost=4 Card=327 Bytes=8502)

2 1TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=327 Bytes=8502)

Statistics

----------------------------------------------------------

17recursive calls

0db block gets

25consistent gets

4physical reads

我们可以看到查询使用了查询重写的功能，直接查询物化视图中的查询方案，而不是查询其表，逻辑IO只有25个，性能相当良好。

假如这时候我们要按季度来查询统计销量信息，结果又会是怎样呢？

Roby@XUE> select time_hierarchy.qtr_yyyy, sum(sales_amount)

2from sales, time_hierarchy

3where sales.trans_date = time_hierarchy.day

4group by time_hierarchy.qtr_yyyy

QTR_YYYYSUM(SALES_AMOUNT)

------------------------------------------------ -----------------

Q1 FY20061.1696E+12

Q1 FY20071.2297E+10

Q2 FY20061.1281E+12

Q3 FY20061.1389E+12

Q4 FY20061.1356E+12

Execution Plan

----------------------------------------------------------

0SELECT STATEMENT Optimizer=CHOOSE (Cost=1681 Card=5 Bytes=145)

10 SORT (GROUP BY) (Cost=1681 Card=5 Bytes=145)

21 NESTED LOOPS (Cost=35 Card=426672 Bytes=12373488)

32 TABLE ACCESS (FULL) OF 'SALES' (Cost=35 Card=426672

42 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)

Statistics

----------------------------------------------------------

14recursive calls

0db block gets

428048consistent gets

599physical reads

可以看到查询将直接查询基表产生了将近428048个逻辑IO，性能无法满足需求。

接下我们创建一个Dimension表time_hierarchy_dim，用于提醒优化器time_hierarchy表中的DAY列暗示着MMYYYY，MMYYYY又意味着QTY_YYYY，QTY_YYYY又意味着YYYY。然后我们将重新运行上面那个查询，看执行计划发生了怎样的变更。

Roby@XUE> create dimension time_hierarchy_dim

2level day is time_hierarchy.day

3level mmyyyy is time_hierarchy.mmyyyy

4level qtr_yyyy is time_hierarchy.qtr_yyyy

5level yyyy is time_hierarchy.yyyy

6hierarchy time_rollup

8day child of

9mmyyyy child of

10qtr_yyyy child of

11yyyy

12)

13attribute mmyyyy

14determines mon_yyyy;

Dimension created.

Roby@XUE> select time_hierarchy.qtr_yyyy, sum(sales_amount)

2from sales, time_hierarchy

3where sales.trans_date = time_hierarchy.day

4group by time_hierarchy.qtr_yyyy

QTR_YYYYSUM(SALES_AMOUNT)

------------------------------------------------ -----------------

Q1 FY20061.1696E+12

Q1 FY20071.2297E+10

Q2 FY20061.1281E+12

Q3 FY20061.1389E+12

Q4 FY20061.1356E+12

Execution Plan

----------------------------------------------------------

0SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=5 Bytes=195)

1 0SORT (GROUP BY) (Cost=14 Card=5 Bytes=195)

2 1HASH JOIN (Cost=7 Card=1157 Bytes=45123)

3 2VIEW (Cost=4 Card=46 Bytes=598)

4 3SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)

5 4INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)

6 2TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=327

Statistics

----------------------------------------------------------

193recursive calls

0db block gets

49consistent gets

2physical reads

可以看到创建Dimension后，Oracle已经能够智能地理解交易日期中月度和季度的转换关系，查询使用到物化视图，逻辑IO由原来的428048个减少到49个，性能有了大幅的提升。

同样我们再来统计一下年度的销量信息：

Roby@XUE> select time_hierarchy.yyyy, sum(sales_amount)

2from sales, time_hierarchy

3where sales.trans_date = time_hierarchy.day

4group by time_hierarchy.yyyy

YYYY SUM(SALES_AMOUNT)

---------- -----------------

20064.5721E+12

20071.2297E+10

Execution Plan

----------------------------------------------------------

0SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=2 Bytes=66)

10 SORT (GROUP BY) (Cost=10 Card=2 Bytes=66)

21 HASH JOIN (Cost=7 Card=478 Bytes=15774)

--End--

我们再创建一张customer_hierarchy表，用于存储客户代码、邮政编码和地区的关系，然后我们将按不同邮编或地区来查询各自的月度、季度或者年度销量信息。

Roby@XUE> create table customer_hierarchy

2( cust_id primary key, zip_code, region )

3organization index

4as

5select cust_id,

6mod( rownum, 6 ) || to_char(mod( rownum, 1000 ), 'fm0000') zip_code,

7mod( rownum, 6 ) region

8from ( select distinct cust_id from sales)

Table created.

Roby@XUE> analyze table customer_hierarchy compute statistics;

Table analyzed.

改写物化视图，查询方案中添加按不同邮编的月度统计销量。

Roby@XUE> drop materialized view mv_sales;

Materialized view dropped.

Roby@XUE> create materialized view mv_sales

2build immediate

3refresh on demand

4enable query rewrite

5as

6select customer_hierarchy.zip_code,

7time_hierarchy.mmyyyy,

8sum(sales.sales_amount) sales_amount

9from sales, time_hierarchy, customer_hierarchy

10where sales.trans_date = time_hierarchy.day

11and sales.cust_id = customer_hierarchy.cust_id

12group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy

13/

Materialized view created.

Roby@XUE> set autotrace traceonly

Roby@XUE> select customer_hierarchy.zip_code,

2time_hierarchy.mmyyyy,

3sum(sales.sales_amount) sales_amount

4from sales, time_hierarchy, customer_hierarchy

5where sales.trans_date = time_hierarchy.day

6and sales.cust_id = customer_hierarchy.cust_id

7group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy

1216 rows selected.

Execution Plan

----------------------------------------------------------

0SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=409 Bytes=20450)

1 0TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 Bytes=20450)

Statistics

----------------------------------------------------------

28recursive calls

0db block gets

116consistent gets

5physical reads

可以看到如果按不同邮编、不同月度来统计查询的话，优化器将会查询物化视图中的查询方案，性能也是比较可观的。假如我们查不同地区年度的统计销量信息，结果又会是怎样？

Roby@XUE> select customer_hierarchy.region,

2time_hierarchy.yyyy,

3sum(sales.sales_amount) sales_amount

4from sales, time_hierarchy, customer_hierarchy

5where sales.trans_date = time_hierarchy.day

6and sales.cust_id = customer_hierarchy.cust_id

7group by customer_hierarchy.region, time_hierarchy.yyyy

9 rows selected.

Execution Plan

----------------------------------------------------------

0SELECT STATEMENT Optimizer=CHOOSE (Cost=1681 Card=9 Bytes=261)

1 0SORT (GROUP BY) (Cost=1681 Card=9 Bytes=261)

2 1NESTED LOOPS (Cost=35 Card=426672 Bytes=12373488)

3 2NESTED LOOPS (Cost=35 Card=426672 Bytes=8106768)

4 3TABLE ACCESS (FULL) OF 'SALES' (Cost=35 Card=426672

5 3INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)

6 2INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)

Statistics

----------------------------------------------------------

0recursive calls

0db block gets

428047consistent gets

745physical reads

可以看到查询性能大有影响。接下我们同样创建dimension sales_dimension，用于说明客户代码和邮编、地区间的关系：

Roby@XUE> drop dimension time_hierarchy_dim

Dimension dropped.

Roby@XUE> create dimension sales_dimension

2level cust_id is customer_hierarchy.cust_id

3level zip_code is customer_hierarchy.zip_code

4level region is customer_hierarchy.region

5level day is time_hierarchy.day

6level mmyyyy is time_hierarchy.mmyyyy

7level qtr_yyyy is time_hierarchy.qtr_yyyy

8level yyyy is time_hierarchy.yyyy

9hierarchy cust_rollup

10(

11cust_id child of

12zip_code child of

13region

14)

15hierarchy time_rollup

16(

17day child of

18mmyyyy child of

19qtr_yyyy child of

20yyyy

21)

22attribute mmyyyy

23determines mon_yyyy;

Dimension created.

再回到原来的查询，我们可以看到查询性能有了大幅的提升：

Roby@XUE> set autotrace on

Roby@XUE> select customer_hierarchy.region,

2time_hierarchy.yyyy,

3sum(sales.sales_amount) sales_amount

4from sales, time_hierarchy, customer_hierarchy

5where sales.trans_date = time_hierarchy.day

6and sales.cust_id = customer_hierarchy.cust_id

7group by customer_hierarchy.region, time_hierarchy.yyyy

REGIONYYYY SALES_AMOUNT

---------- ---------- ------------

02006 7.3144E+11

02007 4484956329

12006 7.8448E+11

22006 7.7257E+11

22007 4684418980

32006 7.7088E+11

42006 7.8004E+11

42007 3127953246

52006 7.3273E+11

9 rows selected.

Execution Plan

----------------------------------------------------------

0SELECT STATEMENT Optimizer=CHOOSE (Cost=15 Card=9 Bytes=576)

1 0SORT (GROUP BY) (Cost=15 Card=9 Bytes=576)

2 1HASH JOIN (Cost=10 Card=598 Bytes=38272)

3 2VIEW (Cost=3 Card=100 Bytes=700)

4 3SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)

5 4INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)

6 2HASH JOIN (Cost=7 Card=598 Bytes=34086)

7 6VIEW (Cost=4 Card=19 Bytes=133)

8 7SORT (UNIQUE) (Cost=4 Card=19 Bytes=133)

9 8INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828'

10 6TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409

Statistics

----------------------------------------------------------

364recursive calls

0db block gets

88consistent gets

0physical reads

Roby@XUE> set autot trace

Roby@XUE> select customer_hierarchy.region,

2time_hierarchy.qtr_yyyy,

3sum(sales.sales_amount) sales_amount

4from sales, time_hierarchy, customer_hierarchy

5where sales.trans_date = time_hierarchy.day

6and sales.cust_id = customer_hierarchy.cust_id

7group by customer_hierarchy.region, time_hierarchy.qtr_yyyy;

27 rows selected.

Execution Plan

----------------------------------------------------------

0SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=22 Bytes=154

10 SORT (GROUP BY) (Cost=23 Card=22 Bytes=1540)

21 HASH JOIN (Cost=11 Card=1447 Bytes=101290)

32 VIEW (Cost=3 Card=100 Bytes=700)

43 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)

54 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE) (

62 HASH JOIN (Cost=7 Card=1447 Bytes=91161)

76 VIEW (Cost=4 Card=46 Bytes=598)

87 SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)

98 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UN

106 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 B

Statistics

----------------------------------------------------------

10recursive calls

0db block gets

19consistent gets

0physical reads

Roby@XUE> select customer_hierarchy.region,

2time_hierarchy.mon_yyyy,

3sum(sales.sales_amount) sales_amount

4from sales, time_hierarchy, customer_hierarchy

5where sales.trans_date = time_hierarchy.day

6and sales.cust_id = customer_hierarchy.cust_id

7group by customer_hierarchy.region, time_hierarchy.mon_yyyy;

75 rows selected.

Execution Plan

----------------------------------------------------------

0SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=56 Bytes=386

10 SORT (GROUP BY) (Cost=41 Card=56 Bytes=3864)

21 HASH JOIN (Cost=11 Card=3775 Bytes=260475)

32 VIEW (Cost=4 Card=120 Bytes=1440)

43 SORT (UNIQUE) (Cost=4 Card=120 Bytes=1440)

54 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UNIQ

62 HASH JOIN (Cost=6 Card=409 Bytes=23313)

76 VIEW (Cost=3 Card=100 Bytes=700)

87 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)

98 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)

106 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 B

Statistics

----------------------------------------------------------

0recursive calls

0db block gets

14consistent gets

0physical reads

参考：Tomates Kyte 《Expert One-on-One Oracle》

--End--

From:http://hi.baidu.com/edeed/blog/item/014a7becfa6a15d32f2e21b0.html

cyxlxp8411

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Overview of Dimensions

Overview of Dimensions A dimension defines hierarchical (parent/child) relationships between pairs ofcolumns or column sets. Each value at the child level is associated with one and onlyone v
复制链接

扫一扫