一次AUTO_SAMPLE_SIZE异常


      今天在为客户分析几个SQL的时候需要删除statistics后再重新收集,过程如下:

SQL> begin
2  dbms_stats.gather_schema_stats(
3        wnname => '&user',
4        cascade => true,
5      estimate_percent => dbms_stats.auto_sample_size);
6 end;
7 .
8 define user=s500
/


old   3:                wnname => '&user',
new   3:                wnname => 's500',

PL/SQL procedure successfully completed.


然后查到如下令人不解的sampl_size信息:

SQL> select table_name, sample_size,NUM_ROWS from dba_tables where wner='S500';

TABLE_NAME SAMPLE_SIZE NUM_ROWS
---------- ----------- ----------
DICTSTATTA 5354 5354
DICTSTATTA 5354 5354
DISTRICT 5000 5000
CUSTOMER 15000000 15000000
HISTORY 15000000 15000000
NEW_ORDERS 4500000 4500000
ORDERS 15000000 15000000
ORDER_LINE
ITEM 100000 100000
STOCK
WAREHOUSE

goole了一番,终于发现了一篇老外写的文章,说的是11g对于dbms_stats的加强,原来11g的auto_sample_size为100%是正常的。

Oracle 11g: Enhancements to DBMS_STATS

September 17, 2007
By Greg Rahn

Many of you are aware of the Oracle 11g Database New Features and while some may be generally interested in new features, one area that I focus on is new features that yield gains in performance. Some of these features can be found in the General Server Performance section of the Oracle 11g Database New Features documentation. There is one area (for now…) that didn’t make this list but I feel is worth mentioning – performance enhancements made to DBMS_STATS.

The Necessity of Representative Statistics
Representative statistics are the foundation that the Optimizer relies on to make the best decisions when choosing execution plans. One recent blog post from Don Seiler, with the help of Wolfgang Breitling, is a prefect real-world case. This blog post dealt with out-of-range values, but one other case that often causes headaches is data skew. In the Real-World Performance Roundtable, Part II session at OracleWorld 2006, I discussed a basic stats gathering strategy that dealt with the exception case of data skew. When using the DBMS_STATS default of DBMS_STATS.AUTO_SAMPLE_SIZE in 10g and 9i, the NDV (Number of Distinct Values) may be statistically inaccurate when there is significant data skew. In order to deal with this exception, a fixed percentage of data that yields statistically representative NDV counts should be chosen.

11g DBMS_STATS
In 11g there have been some enhancements made to the DBMS_STATS package. Overall the GATHER_* processes run faster but what stands out to me is the speed and accuracy that DBMS_STATS.AUTO_SAMPLE_SIZE now gives. As a performance person, I often times make reference to letting the numbers tell the story, so lets dive into a comparison between 10.2.0.3 and 11.1.0.5.

I’ve chosen the same data set that I used in the “Refining the Stats” section of Real-World Performance Roundtable, Part II session. Stats were serially gathered with ESTIMATE_PERCENT of 10%, 100%, and DBMS_STATS.AUTO_SAMPLE_SIZE.

10.2.0.3
run#
AUTO_SAMPLE_SIZE
10%
100%
1
00:07:53.97
00:04:18.87
00:09:22.15
2
00:09:06.09
00:04:18.95
00:09:13.28
3
00:07:46.23
00:03:52.50
00:09:18.11
4
00:07:55.43
00:04:02.94
00:09:20.54
5
00:09:43.30
00:03:49.96
00:09:16.38
11.1.0.5
run#
AUTO_SAMPLE_SIZE
10%
100%
1
00:02:39.31
00:02:38.55
00:07:37.83
2
00:02:21.86
00:02:31.56
00:08:24.10
3
00:02:38.11
00:02:49.49
00:07:38.25
4
00:02:26.60
00:02:27.75
00:07:42.25
5
00:02:29.95
00:02:29.45
00:07:42.49


11g DBMS_STATS Observations
As you can see by the numbers, 11g pulls a win in each of the three GATHER_TABLE_STATS calls. Take note of the AUTO_SAMPLE_SIZE timings. The 11g AUTO_SAMPLE_SIZE gather takes the same time as the 11g 10% sample. Not bad!

NDV Accuracy
We’ve seen that the 11g gather stats is overall faster and that the 11g AUTO_SAMPLE_SIZE shows a significant improvement in speed compared to 10.2.0.3 AUTO_SAMPLE_SIZE for this table, but how do the NDV calculations compare? Again, let’s look at the numbers. I’ve queried USER_TAB_COL_STATISTICS to get the NDV and SAMPLE_SIZE for our skewed data set.

10.2.0.3

ESTIMATE_PERCENT => 10
COLUMN_NAME NUM_DISTINCT NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1 31464 0 2148910
C2 608544 0 2148910
C3 359424 0 2148910

ESTIMATE_PERCENT => 100%
COLUMN_NAME NUM_DISTINCT NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1 60351 0 21456269
C2 1289760 0 21456269
C3 777942 0 21456269

ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE
COLUMN_NAME NUM_DISTINCT NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1 1787 0 5823
C2 367075 0 576909
C3 52464 0 57431

11.1.0.5

ESTIMATE_PERCENT => 10
COLUMN_NAME NUM_DISTINCT NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1 31320 0 2147593
C2 608814 0 2147593
C3 359365 0 2147593

ESTIMATE_PERCENT => 100
COLUMN_NAME NUM_DISTINCT NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1 60351 0 21456269
C2 1289760 0 21456269
C3 777942 0 21456269

ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE
COLUMN_NAME NUM_DISTINCT NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1 59852 0 21456269
C2 1270912 0 21456269
C3 768384 0 21456269

As expected, the 100% samples are identical and the 10% samples are statistically equivalent. One interesting data point is that the SAMPLE_SIZE for the 11g AUTO_SAMPLE_SIZE run shows the exact SAMPLE_SIZE as the 100% gather – the total number of rows in the table. Also note that the NDV counts for the 11g AUTO_SAMPLE_SIZE gather are statistically equivalent to the 100% sample. What does this mean? It means that the 11g AUTO_SAMPLE_SIZE had been enhanced to provide nearly 100% sample accuracy, even on skewed data sets.

Summary
Overall the 11g DBMS_STATS has been enhanced to gather stats in less time, but in my opinion the significant enhancement is to AUTO_SAMPLE_SIZE which yields near 100% sample accuracy in 10% sample time.

As the documentation says:

…Oracle recommends setting the ESTIMATE_PERCENT parameter of the DBMS_STATS gathering procedures to DBMS_STATS.AUTO_SAMPLE_SIZE to maximize performance gains while achieving necessary statistical accuracy.

I couldn’t agree with the documentation more.

If you wish to know more about how the new DBMS_STATS.AUTO_SAMPLE_SIZE works, see section 3 of Efficient and scalable statistics gathering for large databases in Oracle 11g.



来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/7734298/viewspace-680629/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/7734298/viewspace-680629/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值