the filter pushed in subquery issue in oracle 11g
今天开发的同事来问我一个SQL问题, 在一个字符字段to_number时提示有无效数值, 肉眼看全部为数字格式, 数据库版本 11.2.0.3 当然开始怀疑有不可显示字符(如char(10)), 可以创建个function来验证一下, 脚本如下:
CREATE OR REPLACE function IS_NUMBER(str in varchar2) return number IS dummy number; begin dummy := TO_NUMBER(str); return 1; Exception WHEN OTHERS then return 0; end; /
下面是当时的问题SQL
SELECT * FROM (SELECT s.score AS count_num FROM em_examinee e, em_exam_course_score s WHERE e.exam_id = 444 AND e.id = s.examinee_id AND s.score <> '缺考' AND s.score <> '不需要考' AND s.score <> '其他' AND s.score <> '违纪') a WHERE TO_NUMBER (a.count_num) < 60;
em_exam_course_score.score字段是varchar2,值有数字和汉字(先不讨论为什么不用负数来代替中文),这个简单的SQL一看也可以猜出意图是过滤掉无法转换数值的记录,再从子查询的结果中找出小于60分的记录。 首先我们把子查询单独拿出来执行并加上刚才创建的function 来验证is_number=0,无返回结果。 也许你已经想到了”谓词推进”. 下一步让我们来看一下执行计划。
SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------ Plan hash value: 2955730142 ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 54 | 81 (2)| 00:00:01 | |* 1 | HASH JOIN | | 3 | 54 | 81 (2)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| EM_EXAMINEE | 3 | 27 | 3 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | EM_EXAMINEE_UK1 | 3 | | 2 (0)| 00:00:01 | |* 4 | TABLE ACCESS FULL | EM_EXAM_COURSE_SCORE | 968 | 8712 | 78 (2)| 00:00:01 | ----------------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("E"."ID"="S"."EXAMINEE_ID") 3 - access("EXAM_ID"=444) 4 - filter(TO_NUMBER("S"."SCORE")<60 AND "S"."SCORE"<>'缺考' AND "S"."SCORE"<>'不需要考' AND "S"."SCORE"<>'其他' AND "S"."SCORE"<>'违纪')
Notice:
注意上面的4# 计划 的fileter 部分, 这就是问题的根源。了解SQL的执行步骤应该清楚在sql parse阶段CBO会帮我们做sql查询转换生成高效的执行计划, 在本案例我们想把TO_NUMBER(“S”.”SCORE”)<60 的条件放到子查询返回的结果集后再过滤, 结果CBO错误把此条件提前和该字段的其它条件一并执行,找到了问题,我开始尝试阻止score <60 条件推进到子查询中去(to prevent the filter is pushed into the inline view),使用Hint来引导CBO.
解决方法1,materialize hint.
with c as (
SELECT /*+materialize*/s.score AS count_num
FROM em_examinee e, em_exam_course_score s
WHERE exam_id = 444
–and s.em_exam_course_id=482 –or s.em_exam_course_id=4841
AND e.id = s.examinee_id
AND s.score <> ‘缺考’
AND s.score <> ‘不需要考’
AND s.score <> ‘其他’
AND s.score <> ‘违纪’
)
select * from c where to_number(count_num)<60;
执行计划
————————————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————————————-
| 0 | SELECT STATEMENT | | 12 | 1224 | 84 (2)| 00:00:02 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D95AD_72C40D1 | | | | |
|* 3 | HASH JOIN | | 12 | 216 | 82 (2)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| EM_EXAMINEE | 6 | 54 | 4 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | EM_EXAMINEE_UK1 | 6 | | 2 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | EM_EXAM_COURSE_SCORE | 22305 | 196K| 78 (2)| 00:00:01 |
|* 7 | VIEW | | 12 | 1224 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D95AD_72C40D1 | 12 | 48 | 2 (0)| 00:00:01 |
————————————————————————————————————-
Predicate Information (identified by operation id):
—————————————————
3 – access(“E”.”ID”=”S”.”EXAMINEE_ID”)
5 – access(“EXAM_ID”=444)
6 – filter(“S”.”SCORE”<>’缺考’ AND “S”.”SCORE”<>’不需要考’ AND “S”.”SCORE”<>’其他’ AND “S”.”SCORE”<>’违纪’)
7 – filter(TO_NUMBER(“COUNT_NUM”)<60)
解决方法2 “_optimizer_filter_pushdown” parameter
SQL> explain plan for SELECT /*+ NO_MERGE(a) no_push_pred(a) */*
2 FROM (SELECT s.score AS count_num
3 FROM em_examinee e, em_exam_course_score s
4 WHERE exam_id = 444
5 AND e.id = s.examinee_id
6 AND s.score <> ‘缺考’ AND s.score <> ‘不需要考’
7 AND s.score <> ‘其他’ AND s.score <> ‘违纪’) a
8 WHERE TO_NUMBER (a.count_num) < 60;
SQL> select * from table(dbms_xplan.display);
——————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————————
| 0 | SELECT STATEMENT | | 3 | 306 | 81 (2)| 00:00:01 |
| 1 | VIEW | | 3 | 306 | 81 (2)| 00:00:01 |
|* 2 | HASH JOIN | | 3 | 54 | 81 (2)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| EM_EXAMINEE | 3 | 27 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | EM_EXAMINEE_UK1 | 3 | | 2 (0)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | EM_EXAM_COURSE_SCORE | 968 | 8712 | 78 (2)| 00:00:01 |
Predicate Information (identified by operation id):
—————————————————
2 – access(“E”.”ID”=”S”.”EXAMINEE_ID”)
4 – access(“EXAM_ID”=444)
5 – filter(TO_NUMBER(“S”.”SCORE”)<60 AND “S”.”SCORE”<>’缺考’ AND
“S”.”SCORE”<>’不需要考’ AND “S”.”SCORE”<>’其他’ AND “S”.”SCORE”<>’违纪’)
SQL> explain plan for SELECT /*+ NO_MERGE(a) no_push_subq(@subq1) */*
2 FROM (SELECT /*+ qb_name(subq1) */ s.score AS count_num
3 FROM em_examinee e, em_exam_course_score s
4 WHERE exam_id = 444
5 AND e.id = s.examinee_id
6 AND s.score <> ‘缺考’ AND s.score <> ‘不需要考’
7 AND s.score <> ‘其他’ AND s.score <> ‘违纪’) a
8 WHERE TO_NUMBER (a.count_num) < 60;
SQL> select * from table(dbms_xplan.display);
——————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————————
| 0 | SELECT STATEMENT | | 3 | 306 | 81 (2)| 00:00:01 |
| 1 | VIEW | | 3 | 306 | 81 (2)| 00:00:01 |
|* 2 | HASH JOIN | | 3 | 54 | 81 (2)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| EM_EXAMINEE | 3 | 27 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | EM_EXAMINEE_UK1 | 3 | | 2 (0)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | EM_EXAM_COURSE_SCORE | 968 | 8712 | 77 (0)| 00:00:01 |
Predicate Information (identified by operation id):
—————————————————
2 – access(“E”.”ID”=”S”.”EXAMINEE_ID”)
4 – access(“EXAM_ID”=444)
5 – filter(TO_NUMBER(“S”.”SCORE”)<60 AND “S”.”SCORE”<>’缺考’ AND “S”.”SCORE”<>’不需要考’ AND
“S”.”SCORE”<>’其他’ AND “S”.”SCORE”<>’违纪’)
#注意上面的Hint 居然都未启作用, 下在在会话级turn off “_optimizer_filter_pushdown”
SQL> alter session set “_optimizer_filter_pushdown”=false;
SQL> explain plan for SELECT /*+ NO_MERGE(t) NO_PUSH_PRED(t)*/*
2 FROM (SELECT s.score
3 FROM em_examinee e, em_exam_course_score s
4 WHERE exam_id = 444
5 AND e.id = s.examinee_id
6 AND s.score <> ‘缺考’ AND s.score <> ‘不需要考’
7 AND s.score <> ‘其他’ AND s.score <> ‘违纪’) t
8 WHERE to_number(score) < 60;
SQL> select * from table(dbms_xplan.display);
——————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————————
| 0 | SELECT STATEMENT | | 5 | 510 | 81 (2)| 00:00:01 |
|* 1 | VIEW | | 5 | 510 | 81 (2)| 00:00:01 |
|* 2 | HASH JOIN | | 5 | 90 | 81 (2)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| EM_EXAMINEE | 3 | 27 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | EM_EXAMINEE_UK1 | 3 | | 2 (0)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | EM_EXAM_COURSE_SCORE | 19369 | 170K| 77 (0)| 00:00:01 |
Predicate Information (identified by operation id):
—————————————————
1 – filter(TO_NUMBER(“SCORE”)<60)
2 – access(“E”.”ID”=”S”.”EXAMINEE_ID”)
4 – access(“EXAM_ID”=444)
5 – filter(“S”.”SCORE”<>’缺考’ AND “S”.”SCORE”<>’不需要考’ AND “S”.”SCORE”<>’其他’ AND
“S”.”SCORE”<>’违纪’)
或者在SQL级
SELECT /*+ no_merge(t) OPT_PARAM(‘_OPTIMIZER_FILTER_PUSHDOWN’ ‘FALSE’) */*
FROM (SELECT s.score
FROM em_examinee e, em_exam_course_score s
WHERE exam_id = 444
AND e.id = s.examinee_id
AND s.score <> ‘缺考’ AND s.score <> ‘不需要考’
AND s.score <> ‘其他’ AND s.score <> ‘违纪’) t
WHERE to_number(score) < 60;
解决方法3 put rownum in subquery
SQL> explain plan for SELECT * 2 FROM (SELECT s.score ,rownum rn 3 FROM em_examinee e, em_exam_course_score s 4 WHERE exam_id = 444 5 AND e.id = s.examinee_id 6 AND s.score <> '缺考' AND s.score <> '不需要考' 7 AND s.score <> '其他' AND s.score <> '违纪') t 8 WHERE to_number(score) < 60; 已解释。 SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------------- Plan hash value: 129358830 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 575 | 81 (2)| 00:00:01 | |* 1 | VIEW | | 5 | 575 | 81 (2)| 00:00:01 | | 2 | COUNT | | | | | | |* 3 | HASH JOIN | | 5 | 90 | 81 (2)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID| EM_EXAMINEE | 3 | 27 | 3 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | EM_EXAMINEE_UK1 | 3 | | 2 (0)| 00:00:01 | |* 6 | TABLE ACCESS FULL | EM_EXAM_COURSE_SCORE | 19369 | 170K| 77 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(TO_NUMBER("SCORE")<60) 3 - access("E"."ID"="S"."EXAMINEE_ID") 5 - access("EXAM_ID"=444) 6 - filter("S"."SCORE"<>'缺考' AND "S"."SCORE"<>'不需要考' AND "S"."SCORE"<>'其他' AND "S"."SCORE"<>'违纪')
ORACLE在引入一些新特性的同时, 往往会加入一些隐藏参数来关掉这些特性, 我们可以通过下面的V$ 或才X$的视图查询相关的优化器参数
v$ses_optimizer_env;
v$sys_optimizer_env;
或对应的X$(比v$更加全面)
x$qksceses, x$qkscesys
set linesize 180 set trimspool on set pagesize 1000 column name format a40 column feature format a24 column value format a15 column def_value format a15 break on feature skip 1 select -- FID_QKSCESYROW feature, -- vc64 PNAME_QKSCESYROW name, -- vc40 PVALUE_QKSCESYROW value, -- vc25 DEFPVALUE_QKSCESYROW def_value, -- vc25 KSPNUM_QKSCESYROW parameter, -- number PNUM_QKSCESYROW opt_param -- number -- ,FLAGS_QKSCESYROW flags -- number from X$QKSCESYS order by -- feature, name;
Summary :
阻止filter推进subquery这里提到了三种方法
with + materialize hint;
rownum column in subquery;
Setting the parameter _optimizer_filter_pushdown to FALSE.
关于”_optimizer_filter_pushdown” 参数,我建议在11G 版本中设为false, 因为相关的bug 实在太多。
Bug 17397506 SORT MERGE JOIN incorrectly chosen over HASH JOIN IN 11.2.0.3
Bug 17645326 Wrong Results with outer joins of subquery factoring having subqueries in 11.2.0.3
Bug 10100244 Wrong results from view with RESULT_CACHE, UNION and filter predicates version>11.1
Bug 17033499 ORA-7445 [kkqfppRelFilter] from SQL with PLSQL function
Bug 16220085 ORA-22905 using UNION in 11.2 with TABLE() function
Bug 17893931 ORA-7445 [kkobok] during filter pushdown in CONNECT BY .. PRIOR SQL
Bug 13245379 Hang/spin during query parse / optimization processing transitive predicates
Bug 12695062 Wrong results doing join factorization on OUTER joins with constants inside views – superceded
Wrong Results from Queries Selecting from USER|ALL|DBA_OBJECTS Views with NVL(with bind) On the Left Side of the Join Condition (文档 ID 1624690.1)
Insert as Select Fails with ORA-22905 in PL/SQL Block (文档 ID 1528251.1)
---------------当然,处理字符串中的数字最好用oracle正则。
下面是修改后的语句:
SELECT s.score AS count_num
FROM em_examinee e, em_exam_course_score s
WHERE exam_id = 444
--and s.em_exam_course_id=482 --or s.em_exam_course_id=4841
AND e.id = s.examinee_id
and regexp_like(s.score,'^([0-5][0-9])$')
多么简洁。
oracle的正则表达式(regular expression)简单介绍
目前,正则表达式已经在很多软件中得到广泛的应用,包括*nix(Linux, Unix等),HP等操作系统,PHP,C#,Java等开发环境。
Oracle 10g正则表达式提高了SQL灵活性。有效的解决了数据有效性,重复词的辨认, 无关的空白检测,或者分解多个正则组成的字符串等问题。
Oracle 10g支持正则表达式的四个新函数分别是:REGEXP_LIKE、REGEXP_INSTR、REGEXP_SUBSTR、和REGEXP_REPLACE。
它们使用POSIX 正则表达式代替了老的百分号(%)和通配符(_)字符。
特殊字符:
'^' 匹配输入字符串的开始位置,在方括号表达式中使用,此时它表示不接受该字符集合。
'$' 匹配输入字符串的结尾位置。如果设置了 RegExp 对象的 Multiline 属性,则 $ 也匹配 '\n' 或'\r'。
'.' 匹配除换行符 \n之外的任何单字符。
'?' 匹配前面的子表达式零次或一次。
'*' 匹配前面的子表达式零次或多次。
'+' 匹配前面的子表达式一次或多次。
'( )' 标记一个子表达式的开始和结束位置。
'[]' 标记一个中括号表达式。
'{m,n}' 一个精确地出现次数范围,m=<出现次数<=n,'{m}'表示出现m次,'{m,}'表示至少出现m次。
'|' 指明两项之间的一个选择。例子'^([a-z]+|[0-9]+)$'表示所有小写字母或数字组合成的字符串。
\num 匹配 num,其中 num 是一个正整数。对所获取的匹配的引用。
正则表达式的一个很有用的特点是可以保存子表达式以后使用,被称为Backreferencing. 允许复杂的替换能力
如调整一个模式到新的位置或者指示被代替的字符或者单词的位置. 被匹配的子表达式存储在临时缓冲区
中,缓冲区从左到右编号, 通过\数字符号访问。 下面的例子列出了把名字 aa bb cc 变成cc, bb, aa.
Select REGEXP_REPLACE('aa bb cc','(.*) (.*) (.*)', '\3, \2, \1') FROM dual;
REGEXP_REPLACE('ELLENHILDISMIT
cc, bb, aa
'\' 转义符。
字符簇:
[[:alpha:]] 任何字母。
[[:digit:]] 任何数字。
[[:alnum:]] 任何字母和数字。
[[:space:]] 任何白字符。
[[:upper:]] 任何大写字母。
[[:lower:]] 任何小写字母。
[[unct:]] 任何标点符号。
[[:xdigit:]] 任何16进制的数字,相当于[0-9a-fA-F]。
各种操作符的运算优先级
\ 转义符
(), (?, (?=), [] 圆括号和方括号
*, +, ?, {n}, {n,}, {n,m} 限定符
^, $, \anymetacharacter 位置和顺序
| “或”操作
--测试数据
create table test(mc varchar2(60));
insert into test values('112233445566778899');
insert into test values('22113344 5566778899');
insert into test values('33112244 5566778899');
insert into test values('44112233 5566 778899');
insert into test values('5511 2233 4466778899');
insert into test values('661122334455778899');
insert into test values('771122334455668899');
insert into test values('881122334455667799');
insert into test values('991122334455667788');
insert into test values('aabbccddee');
insert into test values('bbaaaccddee');
insert into test values('ccabbddee');
insert into test values('ddaabbccee');
insert into test values('eeaabbccdd');
insert into test values('ab123');
insert into test values('123xy');
insert into test values('007ab');
insert into test values('abcxy');
insert into test values('The final test is is is how to find duplicate words.');
commit;
一、REGEXP_LIKE
select * from test where regexp_like(mc,'^a{1,3}');
select * from test where regexp_like(mc,'a{1,3}');
select * from test where regexp_like(mc,'^a.*e$');
select * from test where regexp_like(mc,'^[[:lower:]]|[[:digit:]]');
select * from test where regexp_like(mc,'^[[:lower:]]');
Select mc FROM test Where REGEXP_LIKE(mc,'[^[:digit:]]');
Select mc FROM test Where REGEXP_LIKE(mc,'^[^[:digit:]]');
二、REGEXP_INSTR
Select REGEXP_INSTR(mc,'[[:digit:]]$') from test;
Select REGEXP_INSTR(mc,'[[:digit:]]+$') from test;
Select REGEXP_INSTR('The price is $400.','\$[[:digit:]]+') FROM DUAL;
Select REGEXP_INSTR('onetwothree','[^[[:lower:]]]') FROM DUAL;
Select REGEXP_INSTR(',,,,,','[^,]*') FROM DUAL;
Select REGEXP_INSTR(',,,,,','[^,]') FROM DUAL;
三、REGEXP_SUBSTR
SELECT REGEXP_SUBSTR(mc,'[a-z]+') FROM test;
SELECT REGEXP_SUBSTR(mc,'[0-9]+') FROM test;
SELECT REGEXP_SUBSTR('aababcde','^a.*b') FROM DUAL;
四、REGEXP_REPLACE
Select REGEXP_REPLACE('Joe Smith','( ){2,}', ',') AS RX_REPLACE FROM dual;
Select REGEXP_REPLACE('aa bb cc','(.*) (.*) (.*)', '\3, \2, \1') FROM dual
四个函数是:regexp_like.regexp_instr.regexp_substr.与 regexp_replace.它们在用法上与oracle sql 函数like.instr.substr 与replace 用法,但是它们使用posix 正则表达式代替了老的百分号(%)与通配符(_)字符. 【推荐阅读:My SQL数据库引擎快速指导】
regexp_like 与like 操作符相似.如果第一个参数匹配正则表达式它就解析为true.例如where regexp_like(ename,^j[ao],i) 将在ename 以ja 或jo 开始的情况下返回一行数据.i 参数指定正则表达式是大小写敏感的.另外还可以在check 约束与函数索引中指定regexp_like.例如: 【扩展信息:MYSQL基本数据库术语】
^ 表示字符串的开始 $ 表示字符串的结束 . 表示任何字符 字符的范围,比如说[a-z],表示任何ascii 小写字母,与字符类"[[:lower:]]"" 等价 ? 允许一个后继字符匹配零次或一次 + 允许一个后继字符匹配一次或多次 * 表示零次或多次
可以使用"{m,n}" 指定一个精确地出现范围,其意思是“出现从m 次到n 次”;"{m}" 表示“正好m次”;而"{m,}" 表示“至少m次”.还可以使用圆括号组合字符的集合,使用"|"(竖线)表示可替换.例如,字符串^([a-z]+|[0-9]+)$将匹配所有由小写字母或数字组合成的字符串.