No SQL,No cost. SQL语句是造成数据库开销最大的部分。而不良SQL写法直接导致数据库系统性能下降的情形比比皆是。那么如何才能称得
上高效的SQL语句呢?一是查询优化器为当前的SQL语句生成最佳的执行计划,保证数据读写使用最佳路径;二是设置合理的物理存储结构,如表
的类型,字段的顺序,字段的数据类型等。本文主要描述如何编写高效的SQL语句并给出示例。下面的描述主要分为三个部分,一是编写高效SQL
语句,二是使用索引提高查询性能的部分,三是总结部分。
一、编写高效SQL语句
二、合理使用索引以提高性能
索引依赖于表而存在,是真实表的一个缩影,类似于一本书的目录,通过目录以更快获得所需的结果。Oracle使用了一个复杂的自平衡
B数据结构。即任意记录的DML操作将打破索引的平衡,而定期重构索引使得索引重新获得平衡。通常,通过索引查找数据比全表扫描更高效。
任意的DQL或DML操作,SQL优化引擎优先使用索引来计算当前操作的成本以生成最佳的执行计划。一旦使用索引操出参数optimizer_index_cost_adj
设定的值才使用全表扫描。同样对于多表连接使用索引也可以提高效率。同时索引也提供主键(primary key)的唯一性验证。
除了那些LONG或LONG RAW数据类型,你可以索引几乎所有的列.通常,在大型表中使用索引特别有效.当然,你也会发现,在扫描小表时,使用索
引同样能提高效率。
虽然使用索引能得到查询效率的提高,但是索引需要空间来存储,需要定期维护.尤其是在有大量DML操作的表上,任意的DML操作都将引起索
引的变更这意味着每条记录的INSERT , DELETE , UPDATE将为此多付出4 , 5 次的磁盘I/O . 因为索引需要额外的存储空间和处理,
那些不必要的索引反而会使查询反应时间变慢。
DML操作使用索引上存在碎片而失去高度均衡,因此定期的重构索引是有必要的.
1) 避免基于索引列的计算 where 子句中的谓词上存在索引,而此时基于该列的计算将使得索引失效 -->低效: SELECT employee_id, first_name FROM employees WHERE employee_id + 10 > 150; -->索引列上使用了计算,因此索引失效,走全表扫描方式 -->高效: SELECT employee_id, first_name FROM employees WHERE employee_id > 160; -->走索引范围扫描方式 例外情形 上述规则不适用于SQL中的MIN和MAX函数 hr@CNMMBO> SELECT MAX( employee_id ) max_id 2 FROM employees 3 WHERE employee_id 4 + 10 > 150; 1 row selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1481384439 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 4 | 1 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | | 2 | FIRST ROW | | 5 | 20 | 1 (0)| 00:00:01 | |* 3 | INDEX FULL SCAN (MIN/MAX)| EMP_EMP_ID_PK | 5 | 20 | 1 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- 2) 避免在索引列上使用NOT运算或不等于运算(<>,!=) 通常,我们要避免在索引列上使用NOT或<>,两者会产生在和在索引列上使用函数相同的影响。 当ORACLE遇到NOT或不等运算时,他就会停止 使用索引转而执行全表扫描。 -->低效: SELECT * FROM emp WHERE NOT ( deptno = 20 ); -->实际上NOT ( deptno = 20 )等同于deptno <> 20,即deptno <>同样会限制索引 -->高效: SELECT * FROM emp WHERE deptno > 20 OR deptno < 20; -->尽管此方式可以替换且实现上述结果,但依然走全表扫描,如果是单纯的 > 或 < 运算,则此时为索引范围扫描 需要注意的是,在某些时候, ORACLE优化器会自动将NOT转化成相对应的关系操作符 其次如果是下列运算符进行NOT运算,依然有可能选择走索引, 仅仅除了NOT = 之外,因为 NOT = 等价于 <> “NOT >” to <= “NOT >=” to < “NOT <” to >= “NOT <=” to > 来看一个实际的例子 hr@CNMMBO> SELECT * 2 FROM employees 3 where not employee_id<100; -->索引列上使用了not,但是该查询返回了所有的记录,即107条,因此此时选择走全表扫描 107 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1445457117 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 107 | 7276 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| EMPLOYEES | 107 | 7276 | 3 (0)| 00:00:01 | -->执行计划中使用了走全表扫描方式 ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("EMPLOYEE_ID">=100) -->查看这里的谓词信息被自动转换为 >= 运算符 hr@CNMMBO> SELECT * 2 FROM employees 3 where not employee_id<140; -->此例与上面的语句相同,仅仅是查询范围不同返回67条记录,而此时选择了索引范围扫描 67 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 603312277 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 68 | 4624 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 68 | 4624 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | EMP_EMP_ID_PK | 68 | | 1 (0)| 00:00:01 | -->索引范围扫描方式 --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("EMPLOYEE_ID">=140) 3) 用UNION 替换OR(适用于索引列) 通常情况下,使用UNION 替换WHERE子句中的OR将会起到较好的效果.基于索引列使用OR使得优化器倾向于使用全表扫描,而不是扫描索引. 注意,以上规则仅适用于多个索引列有效。 如果有column没有被索引, 查询效率可能会因为你没有选择OR而降低。 -->低效: SELECT deptno, dname FROM dept WHERE loc = 'DALLAS' OR deptno = 20; -->高效: SELECT deptno, dname FROM dept WHERE loc = 'DALLAS' UNION SELECT deptno, dname FROM dept WHERE deptno = 30 -->经测试,由于数据量较少,此时where子句中的谓词上都存在索引列时,两者性能相当. -->假定where子句中存在两列 scott@CNMMBO> create table t6 as select object_id,owner,object_name from dba_objects where owner='SYS' and rownum<1001; scott@CNMMBO> insert into t6 select object_id,owner,object_name from dba_objects where owner='SCOTT' and rownum<6; scott@CNMMBO> create index i_t6_object_id on t6(object_id); scott@CNMMBO> create index i_t6_owner on t6(owner); scott@CNMMBO> insert into t6 select object_id,owner,object_name from dba_objects where owner='SYSTEM' and rownum<=300; scott@CNMMBO> commit; scott@CNMMBO> exec dbms_stats.gather_table_stats('SCOTT','T6',cascade=>true); scott@CNMMBO> select owner,count(*) from t6 group by owner; OWNER COUNT(*) -------------------- ---------- SCOTT 5 SYSTEM 300 SYS 1000 scott@CNMMBO> select * from t6 where owner='SCOTT' and rownum<2; OBJECT_ID OWNER OBJECT_NAME ---------- -------------------- -------------------- 69450 SCOTT T_TEST scott@CNMMBO> select * from t6 where object_id=69450 or owner='SYSTEM'; 301 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 238853296 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 300 | 7200 | 5 (0)| 00:00:01 | | 1 | CONCATENATION | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID| T6 | 1 | 24 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | I_T6_OBJECT_ID | 1 | | 1 (0)| 00:00:01 | |* 4 | TABLE ACCESS BY INDEX ROWID| T6 | 299 | 7176 | 3 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | I_T6_OWNER | 300 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("OBJECT_ID"=69450) 4 - filter(LNNVL("OBJECT_ID"=69450)) 5 - access("OWNER"='SYSTEM') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 46 consistent gets 0 physical reads 0 redo size 11383 bytes sent via SQL*Net to client 712 bytes received via SQL*Net from client 22 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 301 rows processed scott@CNMMBO> select * from t6 where owner='SYSTEM' or object_id=69450; 301 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 238853296 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 300 | 7200 | 5 (0)| 00:00:01 | | 1 | CONCATENATION | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID| T6 | 1 | 24 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | I_T6_OBJECT_ID | 1 | | 1 (0)| 00:00:01 | |* 4 | TABLE ACCESS BY INDEX ROWID| T6 | 299 | 7176 | 3 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | I_T6_OWNER | 300 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("OBJECT_ID"=69450) 4 - filter(LNNVL("OBJECT_ID"=69450)) 5 - access("OWNER"='SYSTEM') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 46 consistent gets 0 physical reads 0 redo size 11383 bytes sent via SQL*Net to client 712 bytes received via SQL*Net from client 22 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 301 rows processed scott@CNMMBO> select * from t6 2 where object_id=69450 3 union 4 select * from t6 5 where owner='SYSTEM'; 301 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 370530636 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 301 | 7224 | 7 (72)| 00:00:01 | | 1 | SORT UNIQUE | | 301 | 7224 | 7 (72)| 00:00:01 | | 2 | UNION-ALL | | | | | | | 3 | TABLE ACCESS BY INDEX ROWID| T6 | 1 | 24 | 2 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | I_T6_OBJECT_ID | 1 | | 1 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T6 | 300 | 7200 | 3 (0)| 00:00:01 | |* 6 | INDEX RANGE SCAN | I_T6_OWNER | 300 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("OBJECT_ID"=69450) 6 - access("OWNER"='SYSTEM') Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 7 consistent gets 0 physical reads 0 redo size 11383 bytes sent via SQL*Net to client 712 bytes received via SQL*Net from client 22 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 301 rows processed -->从上面的统计信息可知,consistent gets由46下降为7,故当where子句中谓词上存在索引时,使用union替换or更高效 -->即使当列object_id与owner上不存在索引时,使用union仍然比or更高效(在Oracle 10g R2与Oracle 11g R2测试) 4) 避免索引列上使用函数 -->下面是一个来自实际生产环境的例子 -->表acc_pos_int_tbl上business_date列存在索引,由于使用了SUBSTR函数,此时索引失效,使用全表扫描 SELECT acc_num , curr_cd , DECODE( '20110728' , ( SELECT TO_CHAR( LAST_DAY( TO_DATE( '20110728', 'YYYYMMDD' ) ), 'YYYYMMDD' ) FROM dual ), 0 , adj_credit_int_lv1_amt + adj_credit_int_lv2_amt - adj_debit_int_lv1_amt - adj_debit_int_lv2_amt ) AS interest FROM acc_pos_int_tbl WHERE SUBSTR( business_date, 1, 6 ) = SUBSTR( '20110728', 1, 6 ) AND business_date <= '20110728'; -->改进的办法 SELECT acc_num , curr_cd , DECODE( '20110728' , ( SELECT TO_CHAR( LAST_DAY( TO_DATE( '20110728', 'YYYYMMDD' ) ), 'YYYYMMDD' ) FROM dual ), 0 , adj_credit_int_lv1_amt + adj_credit_int_lv2_amt - adj_debit_int_lv1_amt - adj_debit_int_lv2_amt ) AS interest FROM acc_pos_int_tbl acc_pos_int_tbl WHERE business_date >= TO_CHAR( LAST_DAY( ADD_MONTHS( TO_DATE( '20110728', 'yyyymmdd' ), -1 ) ) + 1, 'yyyymmdd' ) AND business_date <= '20110728'; -->下面的例子虽然没有使用函数,但字符串连接同样导致索引失效 -->低效: SELECT account_name, amount FROM transaction WHERE account_name || account_type = 'AMEXA'; -->高效: SELECT account_name, amount FROM transaction WHERE account_name = 'AMEX' AND account_type = 'A'; 5) 比较不匹配的数据类型 -->下面的查询中business_date列上存在索引,且为字符型,这种 -->低效: SELECT * FROM acc_pos_int_tbl WHERE business_date = 20090201; Execution Plan ---------------------------------------------------------- Plan hash value: 2335235465 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 37516 | 2857K| 106K (1)| 00:21:17 | |* 1 | TABLE ACCESS FULL| ACC_POS_INT_TBL | 37516 | 2857K| 106K (1)| 00:21:17 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(TO_NUMBER("BUSINESS_DATE")=20090201) -->这里可以看到产生了类型转换 -->高效: SELECT * FROM acc_pos_int_tbl WHERE business_date = '20090201' 6) 索引列上使用 NULL 值 IS NULL和IS NOT NULL会限制索引的使用,因为数据中没有值等于NULL值,即便是NULL值也不等于NULL值.且NULL值不存储在于索引之中 因此应尽可能避免在索引类上使用NULL值 SELECT acc_num , pl_cd , order_qty , trade_date FROM trade_client_tbl WHERE input_date IS NOT NULL; Execution Plan ---------------------------------------------------------- Plan hash value: 901462645 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 44 | 15 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TRADE_CLIENT_TBL | 1 | 44 | 15 (0)| 00:00:01 | -------------------------------------------------------------------------------------- alter table trade_client_tbl modify (input_date not null); 不推荐使用的查询方式 SELECT * FROM table_name WHERE col IS NOT NULL SELECT * FROM table_name WHERE col IS NULL 推荐使用的方式 SELECT * FROM table_name WHERE col >= 0 --尽可能的使用 =, >=, <=, like 等运算符 -->Author: Robinson Cheng -->Blog: http://blog.csdn.net/robinson_0612三、总结
1、尽可能最小化基表数据以及中间结果集(通过过滤条件避免后续产生不必要的计算与聚合)
2、为where子句中的谓词信息提供最佳的访问路径(rowid访问,索引访问)
3、使用合理的SQL写法来避免过多的Oracle内部开销以提高性能
4、合理的使用提示以提高表之间的连接来提高连接效率(如避免迪卡尔集,将不合理的嵌套连接改为hash连接等)
四、更多参考
dbms_xplan之display_cursor函数的使用