查询优化器介绍（转）

最新推荐文章于 2024-10-09 22:24:16 发布

zhangzijian36

最新推荐文章于 2024-10-09 22:24:16 发布

阅读量150

点赞数

文章标签：优化 join oracle table sql nested

1.1优化器操作
简单的说, 优化器的input, 是SQL语句, 优化器的output, 是一个它自以为最有效率的执行计划, 为了实现这一功能, 优化器会对我们的SQL语句进行如下操作:
   . Evaluation of expressions
   . Statement transformation
   . Choice of optimizer goals
   . Choice of access paths
   . Choice of join orders

1.2 选择优化目标
优化器的目标是得到性能最好的执行计划, 但关键是, 什么是好的性能? 是最少的CPU? 还是高速路不堵车, 还是1个月只花1元钱? 我们总得有个标准。

对Oracle的优化器而言, 好性能有两个不同的标准, 请看Oracle的官方描述:By default, the goal of the query optimizer is the best throughput. This means that it chooses the least amount of resources necessary to process all rows accessed by the statement.Oracle can also optimize a statement with the goal of best response time. This means that it uses the least amount of resources necessary to process the first row accessed by a SQL statement.

这里的两个标准分别是:
Best throughput, 即让从SQL开始执行到SQL完全执行结束之间的时间间隔最短. 适用与只关心SQL的总的执行时间的用户, 比如batch, report…

Best Response time, 即让SQL开始执行到SQL开始返回第一条记录的时间间隔最短. 适用于比较急躁的用户, 如交互式界面, 或分页查询.

通过设置一个Oracle参数, 我们可以告诉优化器我们关心的到底是Best throughput还是Best Response time, 这个参数就是Optimizer_mode.

以下是这个参数的取值和解释：

1. All_Rows即表示我们希望SQL以Best Throughput的方式执行, 这是一种Cost-Based的优化方法.

2. First_Rows_N即表示我们希望SQL以Best Response time的方式执行, 这也是Cost-Based的优化方法, 其中N的取值可以是1, 10, 100, 1000, 分别表示”我们希望SQL结果的第1[10, 100, 1000]行能尽快回来”。

3. First_Rows是仅仅为了向后兼容和Plan Stability而存在的选项. 从目的上讲, First_Rows的目的应该是和First_Rows_1的目的是一样, 只是实现的机制不同, First_rows是Cost-Based和heuristics（rule）方式的结合, 而正如Oracle官方文档中的描述, 使用heuristics有时候会导致Cost比较大的执行计划, 因此Oracle建议不使用First_Rows, 转而使用纯粹基于Cost的First_Rows_N.

1.3 启用和控制查询优化器特性

1.3.1 启用查询优化器特性
设置初始参数即可，如下：
SQL> alter session set OPTIMIZER_FEATURES_ENABLE = '10.2.0.1';
会话已更改。
SQL> show parameter OPTIMIZER_FEATURES_ENABLE
NAME                       TYPE        VALUE
-----------------------    -------     -----------------------
optimizer_features_enable string      10.2.0.1

1.3.2 控制查询优化器特性
通过设置参数，可以有效的控制查询优化器的行为。
1. CURSOR_SHARING
这个参数可以把sql语句中的显示变量转变为绑定变量，更有效的实现游标共享，从而影响sql的执行计划。

2. DB_FILE_MULTIBLOCK_READ_COUNT

3. OPTIMIZER_INDEX_CACHING
这个参数影响nest loop 连接中索引查找的开销，值是0-100。假如值设为100，则oracle认为所有的索引都在buffer cache中，就用调整开销。慎用，会影响sql语句的执行计划。

4. OPTIMIZER_INDEX_COST_ADJ
这个参数用于调整索引查找的开销，值是1-10000。

5. OPTIMIZER_MODE

6. PGA_AGGREGATE_TARGET
控制用于进行排序和hash join的分配内存，如果分配的内存大，开销就会小。

1.4 学习查询分析器

1.4.1 查询分析器组成

大概步骤如下：
（一） Query Transformer:优化器会尝试把复杂的SQL语句转化较为简单的SQL语句, 通常倾向于转化为表的连接方式。Oracle的优化器对SQL转化的手段包括5种, 我们接下来会对这个方面分别进行说明:

• View Merging（视图合并）
对于SQL里的视图, 优化器可以选择先对视图代表的查询单独优化并生成view本身的执行计划, 在把这个计划和SQL语句其他的部分合并在一起生成最终的执行计划, 当然这种技术往往会产生一个不是最好的计划, 因为视图的优化是单独考虑了, 而没有放到整个SQL环境中通盘考虑。而Oracle的查询优化器则可以帮助我们把视图代表的查询merge到SQL的其他部分当中, 从而可以更全面的考虑SQL的执行计划。

• Predicate Pushing（谓词入栈）
对于不能合并的视图，优化器会把他们放到特定的查询数据块中，作为优化的辅助方案，将来可以用来访问索引或者作为过滤条件。

• Subquery Unnesting （子查询分解）

• Query Rewrite with Materialized Views （物化视图）

• OR-expansion

（二） Estimator:优化器会对依据统计信息对SQL语句进行估量, 这些估量主要包括3个方面: Selectivity, Cardinality和Cost, 这三个方面是相互相关的。

• Selectivit：是指一个SQL操作的得出结果集占原来结果集的百分比。

• Cardinality：是指一个SQL操作的得出结果集的行数，CBO是通过selectivity来计算cardinality的，也就是说cardinality=selectivity*原结果集行数。

• Cost：The cost represents units of work or resource used. The query optimizer uses disk I/O, CPU usage, and memory usage as units of work. So, the cost used by the query optimizer represents an estimate of the number of disk I/Os and the amount of CPU and memory used in performing an operation. The operation can be scanning a table, accessing rows from a table by using an index, joining two tables together, or sorting a row set. The cost of a query plan is the number of work units that are expected to be incurred when the query is executed and its result produced.

（三）Plan Generator:优化器会尝试各种执行计划并给出一个代价最低的计划。

1.4.2 学习查询分析器执行计划

通过例子说明：
EXPLAIN PLAN FOR
SELECT e.employee_id, j.job_title, e.salary, d.department_name
FROM employees e, jobs j, departments d
WHERE e.employee_id < 103
AND e.job_id = j.job_id
AND e.department_id = d.department_id;

1.5 学习查询分析器的路径访问

1.5.1 Full Table Scans
(1) 原理:为实现全表扫描，Oracle读取表中所有的行，并检查每一行是否满足语句的 WHERE 限制条件。Oracle 顺序地读取分配给表的每个数据块，直到读到表的最高水线处(high water mark, HWM，标识表的最后一个数据块)。一个多块读操作可以使一次I/O 能读取多块数据块(db_block_multiblock_read_count 参数设定)，而不是只读取一个数据块，这极大的减少了I/O 总次数，提高了系统的吞吐量，所以利用多块读的方法可以十分高效地实现全表扫描，而且只有在全表扫描的情况下才能使用多块读操作。在这种访问模式下，每个数据块只被读一次。
(2）什么时候使用全表扫描
• Lack of Index
• Large Amount of Data
• Small Table
• Full Table Scan Hints
(3）Why a Full Table Scan Is Faster for Accessing Large Amounts of Data?

1.5.2 Rowid Scans
(1)原理：ROWID指出了该行所在的数据文件、数据块以及行在该块中的位置，所以通过ROWID 来存取数据可以快速定位到目标数据上，是 Oracle 存取单行数据的最快方法。为了通过 ROWID存取表，Oracle首先要获取被选择行的 ROWID，或者从语句的WHERE 子句中得到，或者通过表的一个或多个索引的索引扫描得到。Oracle 然后以得到ROWID为依据定位每个被选择的行。这种存取方法不会用到多块读操作，一次 I/O 只能读取一个数据块。
（2）什么时候使用rowid：一般是索引扫描的第二步操作，如果索引包含了查询的全部列，则不会发生这步操作。

1.5.3 Index Scans
■ Assessing I/O for Blocks, not Rows ：评估I/O是根据访问的Blocks,而不是根据查询的行数。
■ Index Unique Scans：通过唯一索引查找一个数值经常返回单个ROWID。如果该唯一索引有多个列组成 (即组合索引)，则至少要有组合索引的引导列参与到该查询中。如果存在 UNIQUE 或PRIMARY KEY 约束（它保证了语句只存取单行）的话，Oracle 经常实现唯一性扫描。

■ Index Range Scans ：使用一个索引存取多行数据。在唯一索引上使用索引范围扫描的典型情况下是在谓词(where限制条件)中使用了范围操作符(如>、<、<>、>=、<=、between)，

■ Index Range Scans Descending（数据返回是降序）：Indexes, by default, are stored in ascending order. An index range scan descending is identical to an index range scan, except that the data is returned in descending order。

■ Index Skip Scans：适用于组合索引。特点：第一列的值比较少，根据第一列的值生成几个子索引，然后进行扫描。比如第一列的值是男女，根据男女生成2个子索引，加快扫描速度。

■ Full Scans ：至少包含一个谓词是index中的列，但是不需要时前导列。

■ Fast Full Index Scans索引中包含全部的数据，不用访问表。

■ Index Joins ：An index join is a hash join of several indexes that together contain all the table columns that are referenced in the query. If an index join is used, then no table access is needed, because all the relevant column values can be retrieved from the indexes. An index join cannot be used to eliminate a sort operation.

■ Bitmap Indexes：A bitmap join uses a bitmap for key values and a mapping function that converts each bit position to a rowid. Bitmaps can efficiently merge indexes that correspond to several conditions in a WHERE clause, using Boolean operations to resolve AND and OR conditions。

1.5.4 Cluster Access

A cluster scan is used to retrieve, from a table stored in an indexed cluster, all rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data block. To perform. a cluster scan, Oracle first obtains the rowid of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this rowid.

1.5.5 Hash Access
A hash scan is used to locate rows in a hash cluster, based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data block. To perform. a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle then scans the data blocks containing rows with that hash value.

1.5.6 How the Query Optimizer Chooses an Access Paths

The query optimizer chooses an access path based on the following factors:
■ The available access paths for the statement
■ The estimated cost of executing the statement, using each access path or combination of paths To choose an access path, the optimizer first determines which access paths are available by examining the conditions in the statement's WHERE clause and its FROM clause. The optimizer then generates a set of possible execution plans using available access paths and estimates the cost of each plan, using the statistics for the index, columns, and tables accessible to the statement. Finally, the optimizer chooses the execution plan with the lowest estimated cost.

1.6 学习表连接

1.6.1 How the Query Optimizer Executes Join Statements
■ Access Paths
As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement.
■ Join Method
To join each pair of row sources, Oracle must perform. a join operation. Join methods include nested loop, sort merge, cartesian, and hash joins.
■ Join Order
To execute a statement that joins more than two tables, Oracle joins two of the tables and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result.

1.6.2 Nested Loop Joins
一、官方描述：
1. The optimizer determines the driving table and designates it as the outer table.
2. The other table is designated as the inner table.
3. For every row in the outer table, Oracle accesses all the rows in the inner table. The outer loop is for every row in outer table and the inner loop is for every row in theinner table. The outer loop appears before the inner loop in the execution plan, as
NESTED LOOPS
      outer_loop
      inner_loop
二、关于outertable 和innertble的介绍
关于outer table和inner table这种叫法，一般应该是只针对于nested loops的，对于做hash join的两张表，一般叫做build table和probe table比较形象，而对于merge join，叫first table和second table就可以了。
在nested loops的inner table里面，一般都会在被join的字段上有一个索引，nested loops可以以两种方式来扫描这个索引：一种是使用unique scan（当然是只有在索引是主键索引或唯一索引的情况下），一种是使用range scan（不管索引是否是主键/唯一索引，都有可能使用）。

1.6.3 Hash Joins

自从oracke 7.3以来，oracle提供了一种新的join技术，就是hash join。Hash Join只能用于相等连接，且只能在CBO优化器模式下。相对于nested loop join，hash join更适合处理大型结果集。Hash join不需要在驱动表上存在索引。
Hash join算法的一个基本思想就是根据小的row sources(称作build input，我们记较小的表为S，较大的表为B) 建立一个可以存在于hash area内存中的hash table，然后用大的row sources(称作probe input) 来探测前面所建的hash table。

1.6.4 Sort Merge Joins

内部连接过程：
1)首先生成row source1需要的数据，然后对这些数据按照连接操作关联列进行排序。
2 随后生成row source2需要的数据，然后对这些数据按照与sort source1 对应的连
   接操作关联列进行排序。
3)最后两边已排序的行被放在一起执行合并操作，即将2 个 row source 按照连接条
   件连接起来
下面是连接步骤的图形表示：
                 MERGE
                 /       /
            SORT        SORT
               |            |
     Row Source 1         Row Source 2
如果 row source 已经在连接关联列上被排序，则该连接操作就不需要再进行sort      操作，这样可以大大提高这种连接操作的连接速度，因为排序是个极其费资源的操作。
When the Optimizer Uses Sort Merge Joins
The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:
■ The join condition between two tables is not an equi-join.
■ Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.

1.6.5 Cartesian Joins

当两个row source做连接，但是它们之间没有关联条件时，就会在两个row source
中做笛卡儿乘积，这通常由编写代码疏漏造成(即程序员忘了写关联条件)。笛卡尔乘积是一个表的每一行依次与另一个表中的所有行匹配。

1.6.6 Outer Joins

• Nested Loop Outer Joins
• Hash Join Outer Joins
• Sort Merge Outer Joins
• Full Outer Joins