plsql优化笛卡尔积

最新推荐文章于 2021-04-02 19:19:10 发布

编程小强

最新推荐文章于 2021-04-02 19:19:10 发布

阅读量2.7k

点赞数 3

分类专栏： Oracle开发

Oracle开发专栏收录该内容

38 篇文章 0 订阅

订阅专栏

转载自：http://www.itpub.net/thread-1840767-1-1.html

写在最前，是对我自己而言收获最大的想法。
oracle 的优化，了解CBO很重要，分析执行计划很重要；
但是，优化绝不止于CBO，相比之下优化必谈CBO，我觉得多少有点误区。
我们究竟是否明白一个sql或者一段plsql执行过程中哪步最费时？为什么？
--******************************************************
正题
论坛上有很多网友问过这样的问题：
我的表中有一个字段text，有一个分隔符 ','，需要按照分隔符把数据分为n行，每行按顺序取夹在分隔符中的部分
比如数据是
id text
1 a, b, c, d
2 e, f, g
我需要的结果是
1 a
1 b
1 c
1 d
2 e
2 f
2 g

这时候，很多网友会用itpub上最经典的sql回复

select id, regexp_substr(text, '[^'||chr(10)||']+', 1, level) item_txt
from t1
connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;

复制代码

当然，也可能有“正则黑”，会用substr套instr的版本来实现。
当然，也会有笛卡尔积的版本。
总之，这已经成了一个套路。

但是，这样做真的好么？
我以前从来没思考过这个问题，直到我的项目中真正接触到了这个问题，我才发现，sql的几个写法，几乎只有理论上的意义。
我们来测测。

create table t1 (id int, text varchar2(4000));
insert into t1
select 1, listagg(lpad('a', 60), chr(10)) within group (order by 1)||chr(10)
from dual
connect by rownum<=64;
insert into t1
select n+1, text
from t1, (select rownum n from dual connect by rownum<=199);

复制代码

--sql connect by 写法
bill@ORCL> select id, regexp_substr(text, '[^'||chr(10)||']+', 1, level) item_txt
2 from t1
3 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
已选择 12800 行。
已用时间: 00: 00: 16.24
执行计划
----------------------------------------------------------
Plan hash value: 3874795171
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
|* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
| 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=PRIOR "ID")
filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
"DBMS_RANDOM"."VALUE"()>0)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
4 recursive calls
0 db block gets
313 consistent gets
0 physical reads
0 redo size
228399 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
12800 rows processed

复制代码

16秒这个时间有点出乎我的意料。我没想到会这么慢，试试substr套instr的版本

bill@ORCL> select id, substr(text,
2 decode(level, 1, 1, instr(text, chr(10), 1, level-1)+1)),
3 instr(text, chr(10), 1, level)-decode(level, 1, 1, instr(text, chr(10), 1, level-1)+1)
4 from t1
5 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
已选择 12800 行。
已用时间: 00: 00: 04.89
执行计划
----------------------------------------------------------
Plan hash value: 3874795171
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
|* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
| 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=PRIOR "ID")
filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
"DBMS_RANDOM"."VALUE"()>0)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
248 consistent gets
0 physical reads
0 redo size
25757310 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
12800 rows processed

复制代码

强多了，看来oracle的正则最好是能不用尽量不用。
但是仍然很不满意，这才几条数据啊。
试试笛卡尔积的版本

bill@ORCL> select substr(text,
2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
4 from t1,
5 (select rownum lv from dual connect by rownum<=64) b;
已选择 12800 行。
已用时间: 00: 00: 03.30
执行计划
----------------------------------------------------------
Plan hash value: 894562235
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | BUFFER SORT | | 186 | 363K| 70 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | T1 | 186 | 363K| 68 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(ROWNUM<=64)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
48 recursive calls
0 db block gets
435 consistent gets
2 physical reads
0 redo size
359032 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
12800 rows processed

复制代码

又快一点，不过仅仅一点而已，数量级是不会变了。
试试物理表

create table t2 (id int, lv int, text varchar2(4000));
insert into t2
select (n-1)*200+id, n, text
from t1, (select rownum n from dual connect by rownum<=64);
bill@ORCL> select substr(text,
2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
4 from t2;
已选择 12800 行。
已用时间: 00: 00: 03.53
执行计划
----------------------------------------------------------
Plan hash value: 1513984157
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15663 | 30M| 3567 (1)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T2 | 15663 | 30M| 3567 (1)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
13005 consistent gets
12997 physical reads
0 redo size
3526140 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed

复制代码

从实验现象来看，似乎oracle不管你select的substr需要多少空间，都把全表复制64份，再取substr。如果是这样，那这里的内存操作实在太低效了。
多做两个实验看看。

bill@ORCL> select substr(text,
2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
4 from t1,
5 (
6 select id, lv
7 from (select id from t1),
8 (select rownum lv from dual connect by rownum<=64)
9 where rownum>0
10 ) b
11 where t1.id=b.id;
已选择 12800 行。
已用时间: 00: 00: 03.56
执行计划
----------------------------------------------------------
Plan hash value: 553474792
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 187 | 372K| 71 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 187 | 372K| 71 (0)| 00:00:01 |
| 2 | VIEW | | 186 | 4836 | 3 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
|* 4 | FILTER | | | | | |
| 5 | MERGE JOIN CARTESIAN | | 186 | 4836 | 3 (0)| 00:00:01 |
| 6 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 7 | COUNT | | | | | |
|* 8 | CONNECT BY WITHOUT FILTERING| | | | | |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | BUFFER SORT | | 186 | 2418 | 3 (0)| 00:00:01 |
| 11 | INDEX FULL SCAN | SYS_C0010642 | 186 | 2418 | 1 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="B"."ID")
4 - filter(ROWNUM>0)
8 - filter(ROWNUM<=64)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
15 recursive calls
0 db block gets
515 consistent gets
9 physical reads
0 redo size
25712495 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
12800 rows processed

复制代码

可以看到，改写的速度也是一样，似乎效率还要更差一点。
再看看我们明确的告诉oracle select的字段内容，它的表现怎么样，

bill@ORCL> select substr(t1.id, instr(text, chr(10), 1, 58), instr(text, chr(10), 1, 59)-instr(text, chr(10), 1, 58)-1) item_txt
2 from t1,
3 (select rownum lv from dual connect by rownum<=64) b;
已选择 12800 行。
已用时间: 00: 00: 01.26
执行计划
----------------------------------------------------------
Plan hash value: 894562235
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
| 2 | VIEW | | 1 | | 2 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | BUFFER SORT | | 186 | 366K| 70 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(ROWNUM<=64)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
7 recursive calls
0 db block gets
380 consistent gets
2 physical reads
0 redo size
227513 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
12800 rows processed

复制代码

又快了一些。
那经过这些测试，似乎可以得出这样的结论，cpu运算和内存操作现在是需要考虑的重要因素(本来也是，只不过我们似乎总是做不了什么来优化这些)。
很显然的结论就是，长的字符串，我们对它进行的substr和instr操作，自然会更慢，变量的赋值，效率也会更低。
思考到这里，似乎需要转向plsql了。
因为我的需求中，打散之后的结果是要和其他表继续关联，所以我理所当然想要用管道表函数来试试。
第一个版本的代码并不难写，除了plsql中这些复杂的语法，当然复杂是相对sql而言，其他高级编程语言请呵呵。

create or replace package refcur_pkg_v1
authid current_user
as
type inrec is record (
id number(38),
text varchar2(4000));
type refcur_t is ref cursor return inrec;
type outrec_typ is record (
id number(38),
item_txt varchar2(4000));
type outrecset is table of outrec_typ;
function f_cartesian (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any);
end;
/
create or replace PACKAGE BODY refcur_pkg_v1 IS
FUNCTION f_cartesian (p refcur_t) RETURN outrecset PIPELINED
parallel_enable (partition p by any)
IS
in_rec p%ROWTYPE;
out_rec outrec_typ;
C_NL_TERM varchar2(2) := chr(10); --unix style
v_nl_pos int:=0;
v_tmp_pos int:=0;
BEGIN
LOOP
FETCH p INTO in_rec; -- input row
EXIT WHEN p%NOTFOUND;
v_nl_pos :=0;
v_tmp_pos :=0;
out_rec.id :=in_rec.id;
FOR i IN 1..100000 LOOP
v_tmp_pos:=instr(in_rec.text, C_NL_TERM, v_nl_pos+1);
exit when v_tmp_pos=0;
out_rec.item_txt :=substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1);
v_nl_pos :=v_tmp_pos;
PIPE ROW(out_rec);
END LOOP;
END LOOP;
CLOSE p;
RETURN;
END f_cartesian;
END refcur_pkg_v1;
/

复制代码

bill@ORCL> select id, item_txt
2 from table(refcur_pkg_v1.f_cartesian(cursor(
3 select id, text
4 from t1
5 )));
已选择 12800 行。
已用时间: 00: 00: 00.70
执行计划
----------------------------------------------------------
Plan hash value: 4049074522
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN | 8168 | | 29 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
217 recursive calls
0 db block gets
264 consistent gets
0 physical reads
0 redo size
228399 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed

复制代码

0.7秒!看来方向是正确的。
但是我的需求中数据量很庞大，最好还能继续优化一下。
现在的版本是带着整个text的内容去循环，那很自然的会想用二分法去试一试。

create or replace package refcur_pkg
authid current_user
as
type inrec is record (
id number(38),
lines number(38),
text varchar2(4000));
type refcur_t is ref cursor return inrec;
type outrec_typ is record (
id number(38),
item_txt varchar2(4000));
type outrecset is table of outrec_typ;
function f_cartesian (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any);
function f_cartesian2 (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any);
function f_cartesian3 (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any);
end;
/
create or replace package body refcur_pkg IS
function f_cartesian (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any)
is
in_rec p%ROWTYPE;
out_rec outrec_typ;
C_NL_TERM varchar2(2) := chr(10); --unix style
C_NL_LENG int := 1; --unix new line length
C_SIZE int:=64; --split clob to varchar by every C_SIZE-th newline character
C_MAX_LEN INT:=1024; --max length for item_txt; if change, also change item_txt varchar2(1024)
v_div_pos int; --end postion of each sub clob
v1_div_pos int; --end postion of each sub clob
v2_div_pos int; --end postion of each sub clob
v3_div_pos int; --end postion of each sub clob
v4_div_pos int; --end postion of each sub clob
v5_div_pos int; --end postion of each sub clob
v1_substr varchar2(4000);
v2_substr varchar2(4000);
v3_substr varchar2(4000);
v4_substr varchar2(4000);
v5_substr varchar2(4000);
begin
loop
fetch p into in_rec; -- input row
exit when p%NOTFOUND;
out_rec.id :=in_rec.id;
--if lines=C_SIZE, then dichotomy
if in_rec.lines=C_SIZE then
v_div_pos :=instr(in_rec.text, C_NL_TERM, 1, C_SIZE/2);
for a in 1..2 loop
v1_substr :=substr(in_rec.text, 1+v_div_pos*(a-1), v_div_pos*(2-a)+1e6*(a-1));
v1_div_pos :=instr(v1_substr, C_NL_TERM, 1, C_SIZE/4);
for b in 1..2 loop
v2_substr :=substr(v1_substr, 1+v1_div_pos*(b-1), v1_div_pos*(2-b)+1e6*(b-1));
v2_div_pos :=instr(v2_substr, C_NL_TERM, 1, C_SIZE/8);
for c in 1..2 loop
v3_substr :=substr(v2_substr, 1+v2_div_pos*(c-1), v2_div_pos*(2-c)+1e6*(c-1));
v3_div_pos :=instr(v3_substr, C_NL_TERM, 1, C_SIZE/16);
for d in 1..2 loop
v4_substr :=substr(v3_substr, 1+v3_div_pos*(d-1), v3_div_pos*(2-d)+1e6*(d-1));
v4_div_pos :=instr(v4_substr, C_NL_TERM, 1, C_SIZE/32);
for e in 1..2 loop
v5_substr :=substr(v4_substr, 1+v4_div_pos*(e-1), v4_div_pos*(2-e)+1e6*(e-1));
v5_div_pos :=instr(v5_substr, C_NL_TERM, 1, 1);
for f in 1..2 loop
out_rec.item_txt :=substr(v5_substr, 1+v5_div_pos*(f-1), (v5_div_pos-1)*(2-f)+(v4_div_pos-v5_div_pos-1)*(f-1));
exit when out_rec.item_txt is null;
pipe row(out_rec);
end loop;
end loop;
end loop;
end loop;
end loop;
end loop;
--if lines=C_SIZE, then ordinary loop method,
else
v_div_pos:=0;
v1_div_pos:=0;
for i in 1..1000000000 loop
v1_div_pos:=instr(in_rec.text, C_NL_TERM, v_div_pos+1);
exit when v1_div_pos=0;
out_rec.item_txt :=substr(in_rec.text, v_div_pos+1, v1_div_pos-v_div_pos-1);
v_div_pos :=v1_div_pos;
pipe row(out_rec);
end loop;
end if;
end loop;
close p;
return;
end f_cartesian;
function f_cartesian2 (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any)
is
in_rec p%ROWTYPE;
out_rec outrec_typ;
begin
loop
fetch p into in_rec; -- input row
exit when p%NOTFOUND;
out_rec.id :=in_rec.id;
out_rec.item_txt :=in_rec.text;
for i in 1..in_rec.lines loop
pipe row(out_rec);
end loop;
end loop;
close p;
return;
end f_cartesian2;
function f_cartesian3 (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any)
is
in_rec p%ROWTYPE;
out_rec outrec_typ;
begin
loop
fetch p into in_rec; -- input row
exit when p%NOTFOUND;
out_rec.id :=in_rec.id;
out_rec.item_txt :=substr(in_rec.text, 1, 64);
for i in 1..in_rec.lines loop
pipe row(out_rec);
end loop;
end loop;
close p;
return;
end f_cartesian3;
end refcur_pkg;
/

复制代码

bill@ORCL> select id, item_txt
2 from table(refcur_pkg.f_cartesian(cursor(
3 select id, length(text)-length(replace(text, chr(10))), text
4 from t1
5 )));
已选择 12800 行。
已用时间: 00: 00: 00.40
执行计划
----------------------------------------------------------
Plan hash value: 4049074522
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN | 8168 | | 29 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
258 recursive calls
0 db block gets
469 consistent gets
1 physical reads
0 redo size
228399 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed

复制代码

我不太想详细解释二分法的代码，因为它太丑了，而代码本身的逻辑其实很简单。
因为在我的处理中，是要从clob进行打散，所以我先把之前的处理结果分成64行为一个大的varchar，到这里再用硬性的6层二分法，
如果你的需求中行数不定，可以略微更改一下if条件，以及二分法的层数，这样可以不必在二分法的层数上太过伤神。
总之，我们看到了性能的提升，是显著的，而且，在我的测试当中，我发现先把instr的变量存起来，
也就是类似substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1)这种写法，要更高效，上面的代码其实还可以再改。

纯粹为了测试，我还做了f_cartesian2和f_cartesian3，
前者是简单的实现数据的复制，后者是每行取指定的substr，在这两种情况下，其实plsql就没有什么优势了。

--sql connect by
bill@ORCL> select id, text
2 from t1
3 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
已选择 12800 行。
已用时间: 00: 00: 06.81
执行计划
----------------------------------------------------------
Plan hash value: 3874795171
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
|* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
| 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=PRIOR "ID")
filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
"DBMS_RANDOM"."VALUE"()>0)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
4 recursive calls
0 db block gets
313 consistent gets
0 physical reads
0 redo size
236099 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
12800 rows processed
--sql笛卡尔
bill@ORCL> select id, text
2 from t1,
3 (select rownum n from dual connect by rownum<=64) b;
已选择 12800 行。
已用时间: 00: 00: 05.00
执行计划
----------------------------------------------------------
Plan hash value: 894562235
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
| 2 | VIEW | | 1 | | 2 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | BUFFER SORT | | 186 | 366K| 70 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(ROWNUM<=64)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
7 recursive calls
0 db block gets
378 consistent gets
0 physical reads
0 redo size
280133 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
12800 rows processed
--管道表函数
bill@ORCL> select id, item_txt
2 from table(refcur_pkg.f_cartesian2(cursor(
3 select id, length(text)-length(replace(text, chr(10))), text
4 from t1
5 )));
已选择 12800 行。
已用时间: 00: 00: 05.16
执行计划
----------------------------------------------------------
Plan hash value: 2991304848
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN2 | 8168 | | 29 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
266 recursive calls
0 db block gets
529 consistent gets
0 physical reads
0 redo size
236103 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed
bill@ORCL> select id, item_txt
2 from table(refcur_pkg.f_cartesian3(cursor(
3 select id, length(text)-length(replace(text, chr(10))), text
4 from t1
5 )));
已选择 12800 行。
已用时间: 00: 00: 00.29
执行计划
----------------------------------------------------------
Plan hash value: 2831580648
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN3 | 8168 | | 29 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
统计信息
----------------------------------------------------------
278 recursive calls
0 db block gets
555 consistent gets
0 physical reads
124 redo size
228407 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed

复制代码

整个这些测试，比较，对我震撼是比较大的。用sql来实现笛卡尔然后取不同的数据，似乎是个天经地义适合sql来处理的事情，可惜到最后我都不知道这究竟是不适合sql，还是oracle没把这个实现好？大家不妨用别的数据库来测测？
不过，从此以后，oracle里面，再有这种需求，大家不妨参考这里plsql的思路来做。

编程小强

关注

3
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
plsql优化笛卡尔积

转载自：http://www.itpub.net/thread-1840767-1-1.html写在最前，是对我自己而言收获最大的想法。oracle的优化，了解CBO很重要，分析执行计划很重要；但是，优化绝不止于CBO，相比之下优化必谈CBO，我觉得多少有点误区。我们究竟是否明白一个sql或者一段plsql执行过程中哪步最费时？为什么？--*****************
复制链接

扫一扫