理解CBO

最新推荐文章于 2023-03-20 18:37:36 发布

cuanxiu7017

最新推荐文章于 2023-03-20 18:37:36 发布

阅读量162

点赞数

1.2.2 Understanding the Cost-Based Optimizer

The cost-based optimizer is a more sophisticated facility than the rule-based optimizer. To determine the best execution path for a statement, it uses database information such as table size, number of rows, key spread, and so forth, rather than rigid rules.

CBO 是比RBO 更智能的工具.

使用最优的执行路径由以下database 信息决定,如 table Size,行数量,键值的分布,因此比严格的RBO 要灵活.

The information required by the cost-based optimizer is available once a table has been analyzed via the ANALYZE command, or via the DBMS_STATS facility. If a table has not been analyzed, the cost-based optimizer can use only rule-based logic to select the best access path. It is possible to run a schema with a combination of cost-based and rule-based behavior by having some tables analyzed and others not analyzed.

CBO 需要table 至少一次被 Analyze,或DBMS_STATS 命令分析过.

如果没有分析过,则CBO 只能依照RBO 的逻辑选择最优的访问路径.

如果Schema 中一些table 被分析而另一些table没有分析,

执行是以CBO 和RBO 的合并结果来选择访问路径.

[ The ANALYZE command and the DBMS_STATS functions collect statistics about tables, clusters, and indexes, and store those statistics in the data dictionary.]

Analyze 命令和DBMS_STATS 函数收集 table,Clusters,indexes 的统计信息,并且将这些统计信息保存的数据字典中.

A SQL statement will default to the cost-based optimizer if any one of the tables involved in the statement has been analyzed. The cost-based optimizer then makes an educated guess as to the best access path for the other tables based on information in the data dictionary.

-- Optimizer_mode<>Rule

如果SQL 语句中包含的任一table 有被分析过则此SQL 会默认的选择CBO.CBO 将根据数据字典中信息猜测对其他table访问的访问路径.

The RDBMS kernel defaults to using the cost-based optimizer under a number of situations, including the following:

满足下列条件 RDBMS 将默认使用CBO.

• OPTIMIZER_MODE = CHOOSE has been specified in the INIT.ORA file, and statistics exist for at least one table involved in the statement

• An ALTER SESSION SET OPTIMIZER_MODE = CHOOSE command has been executed, and statistics exist for at least one table involved in the statement

• An ALTER SESSION SET OPTIMIZER_MODE = FIRST_ROWS (or ALL_ROWS) command has been executed, and statistics exist for at least one table involved in the statement

• A statement uses the FIRST_ROWS or ALL_ROWS hint (e.g., SELECT /*+ FIRST_ROWS */. . .)

在Init.ora 参数文件中设定 optimizer_mode=choose,并且SQL 语句中至少有一个table统计信息存在
执行 alter session set optimizer_mode=choose 命令,并且SQL 语句中至少有一个table统计信息存在
执行 alter session set optimizer_mode=first_rows(或者 all_rows) 命令,并且SQL 语句中至少有一个table统计信息存在
SQL 语句使用 First_rows 或者 all_rows 提示(如: select /*+ first_rows*+/)

--optimizer_mode=rule

SELECT /*+ FIRST_ROWS +*/ emp_no,DEPT_NO,COST_CENTER
FROM emp
WHERE emp_no = 2
AND emp_name = 'PJWANG'
AND dept_no = '12'

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=2 Card=1 B
ytes=22)

1 0 INDEX (RANGE SCAN) OF 'IDX_3' (NON-UNIQUE) (Cost=2 Card=1
Bytes=22)

When you analyze, you can have Oracle look at all rows in a table (ANALYZE COMPUTE) or at a sampling of rows (ANALYZE ESTIMATE).

Typically, I use ANALYZE ESTIMATE for very large tables (1,000,000 rows or more), and ANALYZE COMPUTE for small to medium tables.

当你对Table 进行分析对Table 中所有行进行分析(Analyze compute) 或者取样分析(Analyze estimate)

比较有代表性的是,用Analyze estimate 对大table(大于1百万行),用analyze compute 对小,或中等table.

I strongly recommend that you analyze FOR ALL INDEXED COLUMNS for any table that can have severe data skewness. For example, if a large percentage of rows in a table has the same value in a given column, that represents skewness. The FOR ALL INDEXED COLUMNS option makes the cost-based optimizer aware of the skewness of a column's data in addition to the cardinality (number-distinct values) of that data.

我强烈推荐对比较歪斜的数据(很多column 值为 A,很多为B部分散) 用 for all indexed columns 对这些table 做analyze.For all indexed columns 选项能对歪斜的Columns 数据做协调.

When a table is analyzed using ANALYZE, all associated indexes are analyzed as well. If an index is subsequently dropped and recreated, it must be re-analyzed. Be aware that the procedures DBMS_STATS.GATHER_SCHEMA_STATS and GATHER_TABLE_STATS analyze only tables by default, not their indexes. When using those procedures, you must specify the CASCADE=>TRUE option for indexes to be analyzed as well.

当某一table 用analyze 进行分析后, 所有和此table有关系的Index 也会被分析.

如果某一index 在后来被Drop 和重建,则此index 必须重新分析.

我们知道 procedure DBMS_STATS.GATHER_SCHEMA_STATS 和Gather_TABLE_stats analyze

默认对table 进行分析,不包括indexes.

当使用前两个procedure,必须设定 cascade=>true 时对index 也会分析.

Following are some sample ANALYZE statements:

ANALYZE TABLE EMP ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;

ANALYZE INDEX EMP_NDX1 ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;

ANALYZE TABLE EMP COMPUTE STATISTICS FOR ALL INDEXED COLUMNS;

If you analyze a table by mistake, you can delete the statistics.

For example: ANALYZE TABLE EMP DELETE STATISTICS;

Analyzing can take an excessive amount of time if you use the COMPUTE option on large objects.

We find that on almost every occasion, ANALYZE ESTIMATE 5 PERCENT on a large table forces the optimizer make the same decision as ANALYZE COMPUTE.

下面是一些 Analyze 的语句:

ANALYZE TABLE EMP ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;

ANALYZE INDEX EMP_NDX1 ESTIMATE STATISTICS SAMPLE 5 PERCENT FOR ALL INDEXED COLUMNS;

ANALYZE TABLE EMP COMPUTE STATISTICS FOR ALL INDEXED COLUMNS;

如果分析table 出现错误,可以将统计信息删除.

1.2.2.3 Inner workings of the cost-based optimizer

Unlike the rule-based optimizer, the cost-based optimizer does not have hard and fast path evaluation rules. The cost-based optimizer is flexible and can adapt to its environment. This adaptation is possible only once the necessary underlying object statistics have been refreshed (re-analyzed). What is constant is the method by which the cost-based optimizer calculates each possible execution plan and evaluates its cost (efficiency).

CBO 的内部工作

不想RBO ,CBO 不必评估最快或是最慢的执行路径.CBO 比较灵活并且能适应环境.这种适应可能是将Object 的统计信息重新计算一边.

1.2.2.3 Inner workings of the cost-based optimizer

Unlike the rule-based optimizer, the cost-based optimizer does not have hard and fast path evaluation rules. The cost-based optimizer is flexible and can adapt to its environment.

This adaptation is possible only once the necessary underlying object statistics have been refreshed (re-analyzed). What is constant is the method by which the cost-based optimizer calculates each possible execution plan and evaluates its cost (efficiency).

CBO 内部工作

不像 RBO,CBO 不需要评估执行的最快和最慢路径.CBO 是很灵活并且能适应环境.

这种适应可能是将此object 的统计信息重新刷新(分析).

不变的是 CBO 可能的执行计划和评估他的cost(效果)

The cost-based optimizer's functionality can be (loosely) broken into the following steps:

1. Parse the SQL (check syntax, object privileges, etc.).

2. Generate a list of all potential execution plans.

3. Calculate (estimate) the cost of each execution plan using all available object statistics.

4. Select the execution plan with the lowest cost.

CBO功能如果遇到下列步骤会被中断(释放) :

1: 分析SQL (检查语法,object privileges ,等等)

2: 产生所有可能的执行计划列表

3: 计算每一个执行计划所用到的对象的统计花费

4: 选择花费最少的执行计划.

The cost-based optimizer will be used only if at least one table within a SQL statement has statistics (table statistics for unanalyzed tables are estimated).

If no statistics are available for any table involved in the SQL, the RDBMS will resort to the rule-based optimizer, unless the cost-based optimizer is forced via statement-level HINTS or by an optimizer goal of ALL_ROWS or FIRST_ROWS.

如果SQL中至少有一个TABLE被统计则可以使用CBO .

如果没有一个被统计过,RDBMS 将重新选择RBO,除非SQL 语句级别强制使用 HITS 或优化器目标是all_rows或first_rows.

To understand how the cost-based optimizer works and, ultimately, how to exploit it, we need to understand how it thinks.

为了了解CBO 如何工作, 最终让我们如何使用他.我们需要了解CBO是如何思考的.

Primary key and/or UNIQUE index equality

A UNIQUE index's selectivity is recognized as 100%. No other indexed access method is more precise. For this reason, a unique index is always used when available.

主键和/或 Unique index 的等同性

Unique index的选择性被认定是100%.其他的index 都没有Unique index 精确.

基于这个原因,当unique index 是合法时,他能一直被用到.

Non-UNIQUE index equality

For non-UNIQUE indexes, index selectivity is calculated. The cost-based optimizer makes the assumption that the table (and subsequent indexes) have uniform data spread unless you use the FOR ALL INDEXED COLUMNS option of the ANALYZE. That option will make the cost-based optimizer aware of how the data in the indexed columns is skewed.

非-Unique index 等同性

非-unique index ,index 的选择是计算出来的.

Range evaluation

For index range execution plans, selectivity is evaluated. This evaluation is based on a column's most recent high-value and low-value statistics. Again, the cost-based optimizer makes the assumption that the table (and subsequent indexes) have uniform data spread unless you use the FOR ALL INDEXED COLUMNS option when analyzing the table.

范围评估

对于Index 范围的执行计划,选择性是可评估的. 此评估基于栏位最近的最高和最低值的统计信息.

再次重申,当分析table 时,CBO假定table (和并发的Index) 都有相同的数据分布,除非你使用 for all indexed columns 选项.

Range evaluation over bind variables

For index range execution plans, selectivity is guessed. Prior to Oracle9i, because bind variable values are not available at parse time (values are passed to the cursor after the execution plan has been decided), the optimizer cannot make decisions based on bind variable values. The optimizer assumes a rule of thumb of 25% selectivity for unbounded bind variable ranges (e.g., WHERE dept_no = :b1) and 50% selectivity for bounded ranges (WHERE dept_no > :b1 AND dept_no < :b2). Beginning with Oracle9i, the cost-based optimizer obtains bind variable values prior to determining an execution plan.

范围评估优于绑定变量

对于Index 范围执行计划,选择是猜测的.在Oracle9i 以前版本,因为绑定变量的值不能在分析时间提供(

在值传入游标之前执行计划已经决定),所以优化器不能以绑定变量的值来决定执行计划. 优化器的经验法则是对于无范围限制的选择几率是 25%(如where dept_no=:b1) ,对于有范围限制的选择几率是50%

(where dept_no>:b1 and dept_no<:b2>

Histograms

Prior to the introduction of histograms in Oracle 7.3, The cost-based optimizer could not distinguish grossly uneven key data spreads.

柱状图

在Oracle7.3 以前的柱状图.CBO 不能辨别粗略的辨别主要数据资料的分布.

System resource usage

By default, the cost-based optimizer assumes that you are the only person accessing the database. Oracle9i gives you the ability to store information about system resource usage, and can make much better informed decisions based on workload (read up on the DBMS_STATS.GATHER_SYSTEM_STATS package).

系统资源使用率:

在默认情况下.CBO假定只有一人访问数据库. Oracle9i 提供保留System resource Usage 的资源信息,并能对资源进行基于负载的最优分配.(参考 DBMS_STATS.GATHER_SYSTEM_STATS 包)

Current statistics are important

The cost-based optimizer can make poor execution plan choices when a table has been analyzed but its indexes have not been, or when indexes have been analyzed but not the tables.

You should not force the database to use the cost-based optimizer via inline hints when no statistics are available for any table involved in the SQL.

Using old (obsolete) statistics can be more dangerous than estimating the statistics at runtime, but keep in mind that changing statistics frequently can also blow up in your face, particularly on a mission-critical system with lots of online users. Always back up your statistics before you re-analyze by using DBMS_STATS.EXPORT_SCHEMA_STATS.

Analyzing large tables and their associated indexes with the COMPUTE option will take a long, long time, requiring lots of CPU, I/O, and temporary tablespace resources. It is often overkill. Analyzing with a consistent value, for example, estimate 3%, will usually allow the cost-based optimizer to make optimal decisions

当前统计信息很重要:

CBO 当在Table 进行分析,但所涉及的Index没有被分析或是Index被分析但table 没有分析的情况下所能选择的执行计划很少.

你不能强制对于那些没有统计信息的Table 所组成的SQL语句采用inhint 方法使用CBO.

使用Old(废弃的) 统计信息比在运行是估计统计信息更危险.但要紧记频繁改动的统计信息,也能使你很生气.

显然一个大点的系统有很多在线用户,.在你重新分析时要经常用DBMS_STATS.EXPORT_SCHEMA-STATS来备份你的统计信息.

用Compute 选项分析大Table 和相关联的index 将会花费很长时间,需要大量的CPU,I/O 和临时表空间等资源.

经常太花资源. 分析一致性,例如使用estimate 3% 选项来做CBO将是一个明智选择.

1.2.2.4 EXPLAIN PLAN for the cost-based optimizer

Oracle provides information on the cost of query execution via the EXPLAIN PLAN facility. EXPLAIN PLAN can be used to display the calculated execution cost(s) via some extensions to the utility. In particular, the plan table's COST column returns a value that increases or decreases to show the relative cost of a query. For example:

oracle 是通过Explain Plan 工具提供基于Cost上的查询信息.

Explain Plan 能通过扩展功能显示计算执行的Cost.详细的说,Plan table的Cost 栏位返回此查询的花费是增加还是减少. 例如:

EXPLAIN PLAN FOR

SELECT count(*) FROM winners, horses

WHERE winners.owner=horses.owner AND winners.horse_name LIKE 'Mr %'
COLUMN "SQL" FORMAT a56
SELECT lpad(' ',2*level)||operation||'' ||options ||' '||object_name|| decode(OBJECT_TYPE, '', '', '('||object_type||')') "SQL", cost "Cost", cardinality "Num Rows"
FROM plan_table
CONNECT BY prior id = parent_id START WITH id = 0;
SQL                  Cost     Num Rows
-----------------------------------------------
SELECT STATEMENT       44   1
   SORT AGGREGATE

      HASH JOIN 44 100469
       INDEX RANGE SCAN MG1(NON-UNIQUE)
                       2     1471
          INDEX FAST FULL SCAN OWNER_PK(UNIQUE)

4 683

By manipulating the cost-based optimizer (i.e., via inline hints, by creating/removing indexes, or by adjusting the way that indexes or tables are analyzed), we can see the differences in the execution cost as calculated by the optimizer. Use EXPLAIN PLAN to look at different variations on a query, and choose the variation with the lowest relative cost. For absolute optimal performance, many sites have the majority of the tables and indexes analyzed but a small number of tables that are used in isolation are not analyzed. This is usually to force rule-based behavior on the tables that are not analyzed. However, it is important that tables that have not been analyzed are not joined with tables that have been analyzed.

[@more@]

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/34596/viewspace-799584/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/34596/viewspace-799584/

cuanxiu7017

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
理解CBO

1.2.2 Understanding the Cost-Based Optimizer The cost-based optimizer is a more sophisticated facility than the r...
复制链接

扫一扫