一次AUTO_SAMPLE_SIZE异常

最新推荐文章于 2021-04-13 16:22:52 发布

cijieguang1797

最新推荐文章于 2021-04-13 16:22:52 发布

阅读量224

点赞数

文章标签：数据库

      今天在为客户分析几个SQL的时候需要删除statistics后再重新收集，过程如下：

SQL> begin
2 dbms_stats.gather_schema_stats(
3        wnname => '&user',
4        cascade => true,
5      estimate_percent => dbms_stats.auto_sample_size);
6 end;
7 .
8 define user=s500
/

old   3:                wnname => '&user',
new   3:                wnname => 's500',

PL/SQL procedure successfully completed.

然后查到如下令人不解的sampl_size信息：

SQL> select table_name, sample_size,NUM_ROWS from dba_tables where wner='S500';

TABLE_NAME SAMPLE_SIZE NUM_ROWS
---------- ----------- ----------
DICTSTATTA 5354 5354
DICTSTATTA 5354 5354
DISTRICT 5000 5000
CUSTOMER 15000000 15000000
HISTORY 15000000 15000000
NEW_ORDERS 4500000 4500000
ORDERS 15000000 15000000
ORDER_LINE
ITEM 100000 100000
STOCK
WAREHOUSE

goole了一番，终于发现了一篇老外写的文章，说的是11g对于dbms_stats的加强，原来11g的auto_sample_size为100%是正常的。

Oracle 11g: Enhancements to DBMS_STATS

September 17, 2007

By Greg Rahn

Many of you are aware of the Oracle 11g Database New Features and while some may be generally interested in new features, one area that I focus on is new features that yield gains in performance. Some of these features can be found in the General Server Performance section of the Oracle 11g Database New Features documentation. There is one area (for now…) that didn’t make this list but I feel is worth mentioning – performance enhancements made to DBMS_STATS.

The Necessity of Representative Statistics
Representative statistics are the foundation that the Optimizer relies on to make the best decisions when choosing execution plans. One recent blog post from Don Seiler, with the help of Wolfgang Breitling, is a prefect real-world case. This blog post dealt with out-of-range values, but one other case that often causes headaches is data skew. In the Real-World Performance Roundtable, Part II session at OracleWorld 2006, I discussed a basic stats gathering strategy that dealt with the exception case of data skew. When using the DBMS_STATS default of DBMS_STATS.AUTO_SAMPLE_SIZE in 10g and 9i, the NDV (Number of Distinct Values) may be statistically inaccurate when there is significant data skew. In order to deal with this exception, a fixed percentage of data that yields statistically representative NDV counts should be chosen.

11g DBMS_STATS
In 11g there have been some enhancements made to the DBMS_STATS package. Overall the GATHER_* processes run faster but what stands out to me is the speed and accuracy that DBMS_STATS.AUTO_SAMPLE_SIZE now gives. As a performance person, I often times make reference to letting the numbers tell the story, so lets dive into a comparison between 10.2.0.3 and 11.1.0.5.

I’ve chosen the same data set that I used in the “Refining the Stats” section of Real-World Performance Roundtable, Part II session. Stats were serially gathered with ESTIMATE_PERCENT of 10%, 100%, and DBMS_STATS.AUTO_SAMPLE_SIZE.

10.2.0.3
run#	AUTO_SAMPLE_SIZE	10%	100%
1	00:07:53.97	00:04:18.87	00:09:22.15
2	00:09:06.09	00:04:18.95	00:09:13.28
3	00:07:46.23	00:03:52.50	00:09:18.11
4	00:07:55.43	00:04:02.94	00:09:20.54
5	00:09:43.30	00:03:49.96	00:09:16.38

11.1.0.5
run#	AUTO_SAMPLE_SIZE	10%	100%
1	00:02:39.31	00:02:38.55	00:07:37.83
2	00:02:21.86	00:02:31.56	00:08:24.10
3	00:02:38.11	00:02:49.49	00:07:38.25
4	00:02:26.60	00:02:27.75	00:07:42.25
5	00:02:29.95	00:02:29.45	00:07:42.49

11g DBMS_STATS Observations
As you can see by the numbers, 11g pulls a win in each of the three GATHER_TABLE_STATS calls. Take note of the AUTO_SAMPLE_SIZE timings. The 11g AUTO_SAMPLE_SIZE gather takes the same time as the 11g 10% sample. Not bad!

NDV Accuracy
We’ve seen that the 11g gather stats is overall faster and that the 11g AUTO_SAMPLE_SIZE shows a significant improvement in speed compared to 10.2.0.3 AUTO_SAMPLE_SIZE for this table, but how do the NDV calculations compare? Again, let’s look at the numbers. I’ve queried USER_TAB_COL_STATISTICS to get the NDV and SAMPLE_SIZE for our skewed data set.

10.2.0.3

ESTIMATE_PERCENT => 10
COLUMN_NAME     NUM_DISTINCT  NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1                     31464          0     2148910
C2                    608544          0     2148910
C3                    359424          0     2148910

ESTIMATE_PERCENT => 100%
COLUMN_NAME     NUM_DISTINCT  NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1                     60351          0    21456269
C2                   1289760          0    21456269
C3                    777942          0    21456269

ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE
COLUMN_NAME     NUM_DISTINCT  NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1                      1787          0        5823
C2                    367075          0      576909
C3                     52464          0       57431

11.1.0.5

ESTIMATE_PERCENT => 10
COLUMN_NAME     NUM_DISTINCT  NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1                     31320          0     2147593
C2                    608814          0     2147593
C3                    359365          0     2147593

ESTIMATE_PERCENT => 100
COLUMN_NAME     NUM_DISTINCT  NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1                     60351          0    21456269
C2                   1289760          0    21456269
C3                    777942          0    21456269

ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE
COLUMN_NAME     NUM_DISTINCT  NUM_NULLS SAMPLE_SIZE
--------------- ------------ ---------- -----------
C1                     59852          0    21456269
C2                   1270912          0    21456269
C3                    768384          0    21456269

As expected, the 100% samples are identical and the 10% samples are statistically equivalent. One interesting data point is that the SAMPLE_SIZE for the 11g AUTO_SAMPLE_SIZE run shows the exact SAMPLE_SIZE as the 100% gather – the total number of rows in the table. Also note that the NDV counts for the 11g AUTO_SAMPLE_SIZE gather are statistically equivalent to the 100% sample. What does this mean? It means that the 11g AUTO_SAMPLE_SIZE had been enhanced to provide nearly 100% sample accuracy, even on skewed data sets.

Summary
Overall the 11g DBMS_STATS has been enhanced to gather stats in less time, but in my opinion the significant enhancement is to AUTO_SAMPLE_SIZE which yields near 100% sample accuracy in 10% sample time.

As the documentation says:

…Oracle recommends setting the ESTIMATE_PERCENT parameter of the DBMS_STATS gathering procedures to DBMS_STATS.AUTO_SAMPLE_SIZE to maximize performance gains while achieving necessary statistical accuracy.

I couldn’t agree with the documentation more.

If you wish to know more about how the new DBMS_STATS.AUTO_SAMPLE_SIZE works, see section 3 of Efficient and scalable statistics gathering for large databases in Oracle 11g.

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/7734298/viewspace-680629/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/7734298/viewspace-680629/