Overview of Dimensions
A dimension defines hierarchical (parent/child) relationships between pairs of
columns or column sets. Each value at the child level is associated with one and only
one value at the parent level. A hierarchical relationship is a functional dependency
from one level of a hierarchy to the next level in the hierarchy. A dimension is a
container of logical relationships between columns, and it does not have any data
storage assigned to it.
The CREATE DIMENSION statement specifies:
■ Multiple LEVEL clauses, each of which identifies a column or column set in the
dimension
■ One or more HIERARCHY clauses that specify the parent/child relationships
between adjacent levels
■ Optional ATTRIBUTE clauses, each of which identifies an additional column or
column set associated with an individual level
The columns in a dimension can come either from the same table (denormalized) or
from multiple tables (fully or partially normalized). To define a dimension over
columns from multiple tables, connect the tables using the JOIN clause of the
HIERARCHY clause.
For example, a normalized time dimension can include a date table, a month table, and
a year table, with join conditions that connect each date row to a month row, and each
month row to a year row. In a fully denormalized time dimension, the date, month,
and year columns are all in the same table. Whether normalized or denormalized, the
hierarchical relationships among the columns need to be specified in the CREATE
DIMENSION statement.
在数据仓库环境中,我们通常利用物化视图强大的查询重写功能来提升统计查询的性能,但是物化视图的查询重写功能有时候无法智能地判断查询中一些相关联的条件,以至于影响性能。比如我们有一张销售表sales,用于存储订单的详细信息,包含交易日期、顾客编号和销售量。我们创建一张物化视图,按月存储累计销量信息,假如这时候我们要查询按季度或者按年度统计销量信息,Oracle是否能够智能地转换查询重写呢?我们知道交易日期中的日期意味着月,月意味着所处的季度,季度意味着年度,但是Oracle却是无法智能地判断这其中的关系,因此无法利用物化视图查询重写来返回我们季度或年度的销量信息,而是直接查询基表,导致性能产生问题。
这时候Dimension就派上用场了。Dimension用于说明列之间的父子对应关系,以使优化器能够自动转换不同列的关系,利用物化视图的查询功能来提升查询统计性能。下面我们首先创建一张销售交易表sales,包含交易日期、顾客编号和销售量这几个列,用于保存销售订单信息,整个表有42万多条记录;创建另一张表time_hierarchy用于存储交易日期中时间的关系,包含交易日期及其对应的月、季度及年度等信息,然后我们将体验Dimension的强大功能。
Roby@XUE> create table sales
2(trans_date date, cust_id int, sales_amount number );
Table created.
Roby@XUE> insert /*+ APPEND */ into sales
2select trunc(sysdate,'year')+mod(rownum,366) TRANS_DATE,
3mod(rownum,100) CUST_ID,
4abs(dbms_random.random)/100 SALES_AMOUNT
5from all_objects
6/
5926 rows created.
Roby@XUE> commit;
Commit complete.
Roby@XUE> begin
2for i in 1 .. 6
3loop
4insert /*+ APPEND */ into sales
5select trans_date, cust_id, abs(dbms_random.random)/100 SALES_AMOUNT
6from sales;
7commit;
8end loop;
9end;
10/
PL/SQL procedure successfully completed.
Roby@XUE> select count(*) from sales;
COUNT(*)
----------
426672
创建索引组织表time_hierarchy,里面生成了交易日期中日期DAY、月MMYYYY、季度QTY_YYYY、年度YYYY的关系。
Roby@XUE> create table time_hierarchy
2(day primary key, mmyyyy, mon_yyyy, qtr_yyyy, yyyy)
3organization index
4as
5select distinct
6trans_date DAY,
7cast (to_char(trans_date,'mmyyyy') as number) MMYYYY,
8to_char(trans_date,'mon-yyyy') MON_YYYY,
9'Q' || ceil( to_char(trans_date,'mm')/3) || ' FY'
10|| to_char(trans_date,'yyyy') QTR_YYYY,
11cast( to_char( trans_date, 'yyyy' ) as number ) YYYY
12from sales
13/
Table created.
接下我们创建一张物化视图mv_sales,用于存储每个客户对应每个月的销量统计信息。
Roby@XUE> create materialized view mv_sales
2build immediate
3refresh on demand
4enable query rewrite
5as
6select sales.cust_id, sum(sales.sales_amount) sales_amount,
7time_hierarchy.mmyyyy
8from sales, time_hierarchy
9where sales.trans_date = time_hierarchy.day
10group by sales.cust_id, time_hierarchy.mmyyyy
11/
Materialized view created.
我们对基表进行分析,以使优化器能够物化视图的查询重写功能:
Roby@XUE> analyze table sales compute statistics;
Table analyzed.
Roby@XUE> analyze table time_hierarchy compute statistics;
Table analyzed.
设置会话的查询重写功能:
Roby@XUE> alter session set query_rewrite_enabled=true;
Session altered.
Roby@XUE> alter session set query_rewrite_integrity=trusted;
Session altered.
接下来我们按月统计总的销量:
Roby@XUE> select time_hierarchy.mmyyyy, sum(sales_amount)
2from sales, time_hierarchy
3where sales.trans_date = time_hierarchy.day
4group by time_hierarchy.mmyyyy
5/
MMYYYY SUM(SALES_AMOUNT)
---------- -----------------
120064.0574E+11
120071.2297E+10
220063.6875E+11
320063.9507E+11
420063.7621E+11
520063.8549E+11
620063.6641E+11
720063.8110E+11
820063.8502E+11
920063.7278E+11
1020063.7983E+11
1120063.7210E+11
1220063.8364E+11
13 rows selected.
Execution Plan
----------------------------------------------------------
0SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=327 Bytes=8502)
1 0SORT (GROUP BY) (Cost=4 Card=327 Bytes=8502)
2 1TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=327 Bytes=8502)
Statistics
----------------------------------------------------------
17recursive calls
0db block gets
25consistent gets
4physical reads
我们可以看到查询使用了查询重写的功能,直接查询物化视图中的查询方案,而不是查询其表,逻辑IO只有25个,性能相当良好。
假如这时候我们要按季度来查询统计销量信息,结果又会是怎样呢?
Roby@XUE> select time_hierarchy.qtr_yyyy, sum(sales_amount)
2from sales, time_hierarchy
3where sales.trans_date = time_hierarchy.day
4group by time_hierarchy.qtr_yyyy
5/
QTR_YYYYSUM(SALES_AMOUNT)
------------------------------------------------ -----------------
Q1 FY20061.1696E+12
Q1 FY20071.2297E+10
Q2 FY20061.1281E+12
Q3 FY20061.1389E+12
Q4 FY20061.1356E+12
Execution Plan
----------------------------------------------------------
0SELECT STATEMENT Optimizer=CHOOSE (Cost=1681 Card=5 Bytes=145)
10 SORT (GROUP BY) (Cost=1681 Card=5 Bytes=145)
21 NESTED LOOPS (Cost=35 Card=426672 Bytes=12373488)
32 TABLE ACCESS (FULL) OF 'SALES' (Cost=35 Card=426672
42 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)
Statistics
----------------------------------------------------------
14recursive calls
0db block gets
428048consistent gets
599physical reads
可以看到查询将直接查询基表产生了将近428048个逻辑IO,性能无法满足需求。
接下我们创建一个Dimension表time_hierarchy_dim,用于提醒优化器time_hierarchy表中的DAY列暗示着MMYYYY,MMYYYY又意味着QTY_YYYY,QTY_YYYY又意味着YYYY。然后我们将重新运行上面那个查询,看执行计划发生了怎样的变更。
Roby@XUE> create dimension time_hierarchy_dim
2level day is time_hierarchy.day
3level mmyyyy is time_hierarchy.mmyyyy
4level qtr_yyyy is time_hierarchy.qtr_yyyy
5level yyyy is time_hierarchy.yyyy
6hierarchy time_rollup
7(
8day child of
9mmyyyy child of
10qtr_yyyy child of
11yyyy
12)
13attribute mmyyyy
14determines mon_yyyy;
Dimension created.
Roby@XUE> select time_hierarchy.qtr_yyyy, sum(sales_amount)
2from sales, time_hierarchy
3where sales.trans_date = time_hierarchy.day
4group by time_hierarchy.qtr_yyyy
5/
QTR_YYYYSUM(SALES_AMOUNT)
------------------------------------------------ -----------------
Q1 FY20061.1696E+12
Q1 FY20071.2297E+10
Q2 FY20061.1281E+12
Q3 FY20061.1389E+12
Q4 FY20061.1356E+12
Execution Plan
----------------------------------------------------------
0SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=5 Bytes=195)
1 0SORT (GROUP BY) (Cost=14 Card=5 Bytes=195)
2 1HASH JOIN (Cost=7 Card=1157 Bytes=45123)
3 2VIEW (Cost=4 Card=46 Bytes=598)
4 3SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)
5 4INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)
6 2TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=327
Statistics
----------------------------------------------------------
193recursive calls
0db block gets
49consistent gets
2physical reads
可以看到创建Dimension后,Oracle已经能够智能地理解交易日期中月度和季度的转换关系,查询使用到物化视图,逻辑IO由原来的428048个减少到49个,性能有了大幅的提升。
同样我们再来统计一下年度的销量信息:
Roby@XUE> select time_hierarchy.yyyy, sum(sales_amount)
2from sales, time_hierarchy
3where sales.trans_date = time_hierarchy.day
4group by time_hierarchy.yyyy
5/
YYYY SUM(SALES_AMOUNT)
---------- -----------------
20064.5721E+12
20071.2297E+10
Execution Plan
----------------------------------------------------------
0SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=2 Bytes=66)
10 SORT (GROUP BY) (Cost=10 Card=2 Bytes=66)
21 HASH JOIN (Cost=7 Card=478 Bytes=15774)
--End--
我们再创建一张customer_hierarchy表,用于存储客户代码、邮政编码和地区的关系,然后我们将按不同邮编或地区来查询各自的月度、季度或者年度销量信息。
Roby@XUE> create table customer_hierarchy
2( cust_id primary key, zip_code, region )
3organization index
4as
5select cust_id,
6mod( rownum, 6 ) || to_char(mod( rownum, 1000 ), 'fm0000') zip_code,
7mod( rownum, 6 ) region
8from ( select distinct cust_id from sales)
9/
Table created.
Roby@XUE> analyze table customer_hierarchy compute statistics;
Table analyzed.
改写物化视图,查询方案中添加按不同邮编的月度统计销量。
Roby@XUE> drop materialized view mv_sales;
Materialized view dropped.
Roby@XUE> create materialized view mv_sales
2build immediate
3refresh on demand
4enable query rewrite
5as
6select customer_hierarchy.zip_code,
7time_hierarchy.mmyyyy,
8sum(sales.sales_amount) sales_amount
9from sales, time_hierarchy, customer_hierarchy
10where sales.trans_date = time_hierarchy.day
11and sales.cust_id = customer_hierarchy.cust_id
12group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy
13/
Materialized view created.
Roby@XUE> set autotrace traceonly
Roby@XUE> select customer_hierarchy.zip_code,
2time_hierarchy.mmyyyy,
3sum(sales.sales_amount) sales_amount
4from sales, time_hierarchy, customer_hierarchy
5where sales.trans_date = time_hierarchy.day
6and sales.cust_id = customer_hierarchy.cust_id
7group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy
8/
1216 rows selected.
Execution Plan
----------------------------------------------------------
0SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=409 Bytes=20450)
1 0TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 Bytes=20450)
Statistics
----------------------------------------------------------
28recursive calls
0db block gets
116consistent gets
5physical reads
可以看到如果按不同邮编、不同月度来统计查询的话,优化器将会查询物化视图中的查询方案,性能也是比较可观的。假如我们查不同地区年度的统计销量信息,结果又会是怎样?
Roby@XUE> select customer_hierarchy.region,
2time_hierarchy.yyyy,
3sum(sales.sales_amount) sales_amount
4from sales, time_hierarchy, customer_hierarchy
5where sales.trans_date = time_hierarchy.day
6and sales.cust_id = customer_hierarchy.cust_id
7group by customer_hierarchy.region, time_hierarchy.yyyy
8/
9 rows selected.
Execution Plan
----------------------------------------------------------
0SELECT STATEMENT Optimizer=CHOOSE (Cost=1681 Card=9 Bytes=261)
1 0SORT (GROUP BY) (Cost=1681 Card=9 Bytes=261)
2 1NESTED LOOPS (Cost=35 Card=426672 Bytes=12373488)
3 2NESTED LOOPS (Cost=35 Card=426672 Bytes=8106768)
4 3TABLE ACCESS (FULL) OF 'SALES' (Cost=35 Card=426672
5 3INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)
6 2INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)
Statistics
----------------------------------------------------------
0recursive calls
0db block gets
428047consistent gets
745physical reads
可以看到查询性能大有影响。接下我们同样创建dimension sales_dimension,用于说明客户代码和邮编、地区间的关系:
Roby@XUE> drop dimension time_hierarchy_dim
2/
Dimension dropped.
Roby@XUE> create dimension sales_dimension
2level cust_id is customer_hierarchy.cust_id
3level zip_code is customer_hierarchy.zip_code
4level region is customer_hierarchy.region
5level day is time_hierarchy.day
6level mmyyyy is time_hierarchy.mmyyyy
7level qtr_yyyy is time_hierarchy.qtr_yyyy
8level yyyy is time_hierarchy.yyyy
9hierarchy cust_rollup
10(
11cust_id child of
12zip_code child of
13region
14)
15hierarchy time_rollup
16(
17day child of
18mmyyyy child of
19qtr_yyyy child of
20yyyy
21)
22attribute mmyyyy
23determines mon_yyyy;
Dimension created.
再回到原来的查询,我们可以看到查询性能有了大幅的提升:
Roby@XUE> set autotrace on
Roby@XUE> select customer_hierarchy.region,
2time_hierarchy.yyyy,
3sum(sales.sales_amount) sales_amount
4from sales, time_hierarchy, customer_hierarchy
5where sales.trans_date = time_hierarchy.day
6and sales.cust_id = customer_hierarchy.cust_id
7group by customer_hierarchy.region, time_hierarchy.yyyy
8/
REGIONYYYY SALES_AMOUNT
---------- ---------- ------------
02006 7.3144E+11
02007 4484956329
12006 7.8448E+11
22006 7.7257E+11
22007 4684418980
32006 7.7088E+11
42006 7.8004E+11
42007 3127953246
52006 7.3273E+11
9 rows selected.
Execution Plan
----------------------------------------------------------
0SELECT STATEMENT Optimizer=CHOOSE (Cost=15 Card=9 Bytes=576)
1 0SORT (GROUP BY) (Cost=15 Card=9 Bytes=576)
2 1HASH JOIN (Cost=10 Card=598 Bytes=38272)
3 2VIEW (Cost=3 Card=100 Bytes=700)
4 3SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)
5 4INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)
6 2HASH JOIN (Cost=7 Card=598 Bytes=34086)
7 6VIEW (Cost=4 Card=19 Bytes=133)
8 7SORT (UNIQUE) (Cost=4 Card=19 Bytes=133)
9 8INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828'
10 6TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409
Statistics
----------------------------------------------------------
364recursive calls
0db block gets
88consistent gets
0physical reads
Roby@XUE> set autot trace
Roby@XUE> select customer_hierarchy.region,
2time_hierarchy.qtr_yyyy,
3sum(sales.sales_amount) sales_amount
4from sales, time_hierarchy, customer_hierarchy
5where sales.trans_date = time_hierarchy.day
6and sales.cust_id = customer_hierarchy.cust_id
7group by customer_hierarchy.region, time_hierarchy.qtr_yyyy;
27 rows selected.
Execution Plan
----------------------------------------------------------
0SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=22 Bytes=154
10 SORT (GROUP BY) (Cost=23 Card=22 Bytes=1540)
21 HASH JOIN (Cost=11 Card=1447 Bytes=101290)
32 VIEW (Cost=3 Card=100 Bytes=700)
43 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)
54 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE) (
62 HASH JOIN (Cost=7 Card=1447 Bytes=91161)
76 VIEW (Cost=4 Card=46 Bytes=598)
87 SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)
98 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UN
106 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 B
Statistics
----------------------------------------------------------
10recursive calls
0db block gets
19consistent gets
0physical reads
Roby@XUE> select customer_hierarchy.region,
2time_hierarchy.mon_yyyy,
3sum(sales.sales_amount) sales_amount
4from sales, time_hierarchy, customer_hierarchy
5where sales.trans_date = time_hierarchy.day
6and sales.cust_id = customer_hierarchy.cust_id
7group by customer_hierarchy.region, time_hierarchy.mon_yyyy;
75 rows selected.
Execution Plan
----------------------------------------------------------
0SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=56 Bytes=386
10 SORT (GROUP BY) (Cost=41 Card=56 Bytes=3864)
21 HASH JOIN (Cost=11 Card=3775 Bytes=260475)
32 VIEW (Cost=4 Card=120 Bytes=1440)
43 SORT (UNIQUE) (Cost=4 Card=120 Bytes=1440)
54 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UNIQ
62 HASH JOIN (Cost=6 Card=409 Bytes=23313)
76 VIEW (Cost=3 Card=100 Bytes=700)
87 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)
98 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)
106 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 B
Statistics
----------------------------------------------------------
0recursive calls
0db block gets
14consistent gets
0physical reads
参考:Tomates Kyte 《Expert One-on-One Oracle》
--End--
From:http://hi.baidu.com/edeed/blog/item/014a7becfa6a15d32f2e21b0.html