转载自:http://www.itpub.net/thread-1840767-1-1.html
写在最前,是对我自己而言收获最大的想法。
oracle 的优化,了解CBO很重要,分析执行计划很重要;
但是,优化绝不止于CBO,相比之下优化必谈CBO,我觉得多少有点误区。
我们究竟是否明白一个sql或者一段plsql执行过程中哪步最费时?为什么?
--******************************************************
正题
论坛上有很多网友问过这样的问题:
我的表中有一个字段text,有一个分隔符 ',',需要按照分隔符把数据分为n行,每行按顺序取夹在分隔符中的部分
比如数据是
id text
1 a, b, c, d
2 e, f, g
我需要的结果是
1 a
1 b
1 c
1 d
2 e
2 f
2 g
这时候,很多网友会用itpub上最经典的sql回复
复制代码
当然,也可能有“正则黑”,会用substr套instr的版本来实现。
当然,也会有笛卡尔积的版本。
总之,这已经成了一个套路。
但是,这样做真的好么?
我以前从来没思考过这个问题,直到我的项目中真正接触到了这个问题,我才发现,sql的几个写法,几乎只有理论上的意义。
我们来测测。
复制代码
复制代码
16秒这个时间有点出乎我的意料。我没想到会这么慢,试试substr套instr的版本
复制代码
强多了,看来oracle的正则最好是能不用尽量不用。
但是仍然很不满意,这才几条数据啊。
试试笛卡尔积的版本
复制代码
又快一点,不过仅仅一点而已,数量级是不会变了。
试试物理表
复制代码
从实验现象来看,似乎oracle不管你select的substr需要多少空间,都把全表复制64份,再取substr。如果是这样,那这里的内存操作实在太低效了。
多做两个实验看看。
复制代码
可以看到,改写的速度也是一样,似乎效率还要更差一点。
再看看我们明确的告诉oracle select的字段内容,它的表现怎么样,
复制代码
又快了一些。
那经过这些测试,似乎可以得出这样的结论,cpu运算和内存操作现在是需要考虑的重要因素(本来也是,只不过我们似乎总是做不了什么来优化这些)。
很显然的结论就是,长的字符串,我们对它进行的substr和instr操作,自然会更慢,变量的赋值,效率也会更低。
思考到这里,似乎需要转向plsql了。
因为我的需求中,打散之后的结果是要和其他表继续关联,所以我理所当然想要用管道表函数来试试。
第一个版本的代码并不难写,除了plsql中这些复杂的语法,当然复杂是相对sql而言,其他高级编程语言请呵呵。
复制代码
复制代码
0.7秒!看来方向是正确的。
但是我的需求中数据量很庞大,最好还能继续优化一下。
现在的版本是带着整个text的内容去循环,那很自然的会想用二分法去试一试。
复制代码
复制代码
我不太想详细解释二分法的代码,因为它太丑了,而代码本身的逻辑其实很简单。
因为在我的处理中,是要从clob进行打散,所以我先把之前的处理结果分成64行为一个大的varchar,到这里再用硬性的6层二分法,
如果你的需求中行数不定,可以略微更改一下if条件,以及二分法的层数,这样可以不必在二分法的层数上太过伤神。
总之,我们看到了性能的提升,是显著的,而且,在我的测试当中,我发现先把instr的变量存起来,
也就是类似substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1)这种写法,要更高效,上面的代码其实还可以再改。
纯粹为了测试,我还做了f_cartesian2和f_cartesian3,
前者是简单的实现数据的复制,后者是每行取指定的substr,在这两种情况下,其实plsql就没有什么优势了。
复制代码
整个这些测试,比较,对我震撼是比较大的。用sql来实现笛卡尔然后取不同的数据,似乎是个天经地义适合sql来处理的事情,可惜到最后我都不知道这究竟是不适合sql,还是oracle没把这个实现好?大家不妨用别的数据库来测测?
不过,从此以后,oracle里面,再有这种需求,大家不妨参考这里plsql的思路来做。
oracle 的优化,了解CBO很重要,分析执行计划很重要;
但是,优化绝不止于CBO,相比之下优化必谈CBO,我觉得多少有点误区。
我们究竟是否明白一个sql或者一段plsql执行过程中哪步最费时?为什么?
--******************************************************
正题
论坛上有很多网友问过这样的问题:
我的表中有一个字段text,有一个分隔符 ',',需要按照分隔符把数据分为n行,每行按顺序取夹在分隔符中的部分
比如数据是
id text
1 a, b, c, d
2 e, f, g
我需要的结果是
1 a
1 b
1 c
1 d
2 e
2 f
2 g
这时候,很多网友会用itpub上最经典的sql回复
- select id, regexp_substr(text, '[^'||chr(10)||']+', 1, level) item_txt
- from t1
- connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
当然,也会有笛卡尔积的版本。
总之,这已经成了一个套路。
但是,这样做真的好么?
我以前从来没思考过这个问题,直到我的项目中真正接触到了这个问题,我才发现,sql的几个写法,几乎只有理论上的意义。
我们来测测。
- create table t1 (id int, text varchar2(4000));
- insert into t1
- select 1, listagg(lpad('a', 60), chr(10)) within group (order by 1)||chr(10)
- from dual
- connect by rownum<=64;
-
- insert into t1
- select n+1, text
- from t1, (select rownum n from dual connect by rownum<=199);
- --sql connect by 写法
- bill@ORCL> select id, regexp_substr(text, '[^'||chr(10)||']+', 1, level) item_txt
- 2 from t1
- 3 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 16.24
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 3874795171
-
- -------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
- |* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
- | 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- -------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 1 - access("ID"=PRIOR "ID")
- filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
- "DBMS_RANDOM"."VALUE"()>0)
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 4 recursive calls
- 0 db block gets
- 313 consistent gets
- 0 physical reads
- 0 redo size
- 228399 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 1 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
- bill@ORCL> select id, substr(text,
- 2 decode(level, 1, 1, instr(text, chr(10), 1, level-1)+1)),
- 3 instr(text, chr(10), 1, level)-decode(level, 1, 1, instr(text, chr(10), 1, level-1)+1)
- 4 from t1
- 5 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 04.89
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 3874795171
-
- -------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
- |* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
- | 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- -------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 1 - access("ID"=PRIOR "ID")
- filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
- "DBMS_RANDOM"."VALUE"()>0)
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 248 consistent gets
- 0 physical reads
- 0 redo size
- 25757310 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 1 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
但是仍然很不满意,这才几条数据啊。
试试笛卡尔积的版本
- bill@ORCL> select substr(text,
- 2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
- 3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
- 4 from t1,
- 5 (select rownum lv from dual connect by rownum<=64) b;
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 03.30
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 894562235
-
- ----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
- | 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
- | 2 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
- | 3 | COUNT | | | | | |
- |* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
- | 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
- | 6 | BUFFER SORT | | 186 | 363K| 70 (0)| 00:00:01 |
- | 7 | TABLE ACCESS FULL | T1 | 186 | 363K| 68 (0)| 00:00:01 |
- ----------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 4 - filter(ROWNUM<=64)
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 48 recursive calls
- 0 db block gets
- 435 consistent gets
- 2 physical reads
- 0 redo size
- 359032 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 8 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
试试物理表
- create table t2 (id int, lv int, text varchar2(4000));
- insert into t2
- select (n-1)*200+id, n, text
- from t1, (select rownum n from dual connect by rownum<=64);
-
- bill@ORCL> select substr(text,
- 2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
- 3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
- 4 from t2;
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 03.53
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 1513984157
-
- --------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 15663 | 30M| 3567 (1)| 00:00:01 |
- | 1 | TABLE ACCESS FULL| T2 | 15663 | 30M| 3567 (1)| 00:00:01 |
- --------------------------------------------------------------------------
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 13005 consistent gets
- 12997 physical reads
- 0 redo size
- 3526140 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
多做两个实验看看。
- bill@ORCL> select substr(text,
- 2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
- 3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
- 4 from t1,
- 5 (
- 6 select id, lv
- 7 from (select id from t1),
- 8 (select rownum lv from dual connect by rownum<=64)
- 9 where rownum>0
- 10 ) b
- 11 where t1.id=b.id;
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 03.56
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 553474792
-
- ----------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ----------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 187 | 372K| 71 (0)| 00:00:01 |
- |* 1 | HASH JOIN | | 187 | 372K| 71 (0)| 00:00:01 |
- | 2 | VIEW | | 186 | 4836 | 3 (0)| 00:00:01 |
- | 3 | COUNT | | | | | |
- |* 4 | FILTER | | | | | |
- | 5 | MERGE JOIN CARTESIAN | | 186 | 4836 | 3 (0)| 00:00:01 |
- | 6 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
- | 7 | COUNT | | | | | |
- |* 8 | CONNECT BY WITHOUT FILTERING| | | | | |
- | 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
- | 10 | BUFFER SORT | | 186 | 2418 | 3 (0)| 00:00:01 |
- | 11 | INDEX FULL SCAN | SYS_C0010642 | 186 | 2418 | 1 (0)| 00:00:01 |
- | 12 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- ----------------------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 1 - access("T1"."ID"="B"."ID")
- 4 - filter(ROWNUM>0)
- 8 - filter(ROWNUM<=64)
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 15 recursive calls
- 0 db block gets
- 515 consistent gets
- 9 physical reads
- 0 redo size
- 25712495 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 2 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
再看看我们明确的告诉oracle select的字段内容,它的表现怎么样,
- bill@ORCL> select substr(t1.id, instr(text, chr(10), 1, 58), instr(text, chr(10), 1, 59)-instr(text, chr(10), 1, 58)-1) item_txt
- 2 from t1,
- 3 (select rownum lv from dual connect by rownum<=64) b;
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 01.26
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 894562235
-
- ----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
- | 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
- | 2 | VIEW | | 1 | | 2 (0)| 00:00:01 |
- | 3 | COUNT | | | | | |
- |* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
- | 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
- | 6 | BUFFER SORT | | 186 | 366K| 70 (0)| 00:00:01 |
- | 7 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- ----------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 4 - filter(ROWNUM<=64)
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 7 recursive calls
- 0 db block gets
- 380 consistent gets
- 2 physical reads
- 0 redo size
- 227513 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 2 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
那经过这些测试,似乎可以得出这样的结论,cpu运算和内存操作现在是需要考虑的重要因素(本来也是,只不过我们似乎总是做不了什么来优化这些)。
很显然的结论就是,长的字符串,我们对它进行的substr和instr操作,自然会更慢,变量的赋值,效率也会更低。
思考到这里,似乎需要转向plsql了。
因为我的需求中,打散之后的结果是要和其他表继续关联,所以我理所当然想要用管道表函数来试试。
第一个版本的代码并不难写,除了plsql中这些复杂的语法,当然复杂是相对sql而言,其他高级编程语言请呵呵。
- create or replace package refcur_pkg_v1
- authid current_user
- as
- type inrec is record (
- id number(38),
- text varchar2(4000));
- type refcur_t is ref cursor return inrec;
- type outrec_typ is record (
- id number(38),
- item_txt varchar2(4000));
- type outrecset is table of outrec_typ;
- function f_cartesian (p refcur_t) return outrecset pipelined
- parallel_enable (partition p by any);
- end;
- /
-
- create or replace PACKAGE BODY refcur_pkg_v1 IS
- FUNCTION f_cartesian (p refcur_t) RETURN outrecset PIPELINED
- parallel_enable (partition p by any)
- IS
- in_rec p%ROWTYPE;
- out_rec outrec_typ;
- C_NL_TERM varchar2(2) := chr(10); --unix style
- v_nl_pos int:=0;
- v_tmp_pos int:=0;
- BEGIN
- LOOP
- FETCH p INTO in_rec; -- input row
- EXIT WHEN p%NOTFOUND;
-
- v_nl_pos :=0;
- v_tmp_pos :=0;
- out_rec.id :=in_rec.id;
- FOR i IN 1..100000 LOOP
- v_tmp_pos:=instr(in_rec.text, C_NL_TERM, v_nl_pos+1);
- exit when v_tmp_pos=0;
- out_rec.item_txt :=substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1);
- v_nl_pos :=v_tmp_pos;
- PIPE ROW(out_rec);
- END LOOP;
-
- END LOOP;
- CLOSE p;
- RETURN;
- END f_cartesian;
- END refcur_pkg_v1;
- /
- bill@ORCL> select id, item_txt
- 2 from table(refcur_pkg_v1.f_cartesian(cursor(
- 3 select id, text
- 4 from t1
- 5 )));
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 00.70
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 4049074522
-
- --------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
- | 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
- | 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN | 8168 | | 29 (0)| 00:00:01 |
- | 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- --------------------------------------------------------------------------------------------------
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 217 recursive calls
- 0 db block gets
- 264 consistent gets
- 0 physical reads
- 0 redo size
- 228399 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
但是我的需求中数据量很庞大,最好还能继续优化一下。
现在的版本是带着整个text的内容去循环,那很自然的会想用二分法去试一试。
- create or replace package refcur_pkg
- authid current_user
- as
- type inrec is record (
- id number(38),
- lines number(38),
- text varchar2(4000));
- type refcur_t is ref cursor return inrec;
- type outrec_typ is record (
- id number(38),
- item_txt varchar2(4000));
- type outrecset is table of outrec_typ;
- function f_cartesian (p refcur_t) return outrecset pipelined
- parallel_enable (partition p by any);
- function f_cartesian2 (p refcur_t) return outrecset pipelined
- parallel_enable (partition p by any);
- function f_cartesian3 (p refcur_t) return outrecset pipelined
- parallel_enable (partition p by any);
- end;
- /
-
- create or replace package body refcur_pkg IS
- function f_cartesian (p refcur_t) return outrecset pipelined
- parallel_enable (partition p by any)
- is
- in_rec p%ROWTYPE;
- out_rec outrec_typ;
- C_NL_TERM varchar2(2) := chr(10); --unix style
- C_NL_LENG int := 1; --unix new line length
- C_SIZE int:=64; --split clob to varchar by every C_SIZE-th newline character
- C_MAX_LEN INT:=1024; --max length for item_txt; if change, also change item_txt varchar2(1024)
- v_div_pos int; --end postion of each sub clob
- v1_div_pos int; --end postion of each sub clob
- v2_div_pos int; --end postion of each sub clob
- v3_div_pos int; --end postion of each sub clob
- v4_div_pos int; --end postion of each sub clob
- v5_div_pos int; --end postion of each sub clob
- v1_substr varchar2(4000);
- v2_substr varchar2(4000);
- v3_substr varchar2(4000);
- v4_substr varchar2(4000);
- v5_substr varchar2(4000);
- begin
- loop
- fetch p into in_rec; -- input row
- exit when p%NOTFOUND;
- out_rec.id :=in_rec.id;
-
- --if lines=C_SIZE, then dichotomy
- if in_rec.lines=C_SIZE then
- v_div_pos :=instr(in_rec.text, C_NL_TERM, 1, C_SIZE/2);
- for a in 1..2 loop
- v1_substr :=substr(in_rec.text, 1+v_div_pos*(a-1), v_div_pos*(2-a)+1e6*(a-1));
- v1_div_pos :=instr(v1_substr, C_NL_TERM, 1, C_SIZE/4);
- for b in 1..2 loop
- v2_substr :=substr(v1_substr, 1+v1_div_pos*(b-1), v1_div_pos*(2-b)+1e6*(b-1));
- v2_div_pos :=instr(v2_substr, C_NL_TERM, 1, C_SIZE/8);
- for c in 1..2 loop
- v3_substr :=substr(v2_substr, 1+v2_div_pos*(c-1), v2_div_pos*(2-c)+1e6*(c-1));
- v3_div_pos :=instr(v3_substr, C_NL_TERM, 1, C_SIZE/16);
- for d in 1..2 loop
- v4_substr :=substr(v3_substr, 1+v3_div_pos*(d-1), v3_div_pos*(2-d)+1e6*(d-1));
- v4_div_pos :=instr(v4_substr, C_NL_TERM, 1, C_SIZE/32);
- for e in 1..2 loop
- v5_substr :=substr(v4_substr, 1+v4_div_pos*(e-1), v4_div_pos*(2-e)+1e6*(e-1));
- v5_div_pos :=instr(v5_substr, C_NL_TERM, 1, 1);
- for f in 1..2 loop
- out_rec.item_txt :=substr(v5_substr, 1+v5_div_pos*(f-1), (v5_div_pos-1)*(2-f)+(v4_div_pos-v5_div_pos-1)*(f-1));
- exit when out_rec.item_txt is null;
- pipe row(out_rec);
- end loop;
- end loop;
- end loop;
- end loop;
- end loop;
- end loop;
- --if lines=C_SIZE, then ordinary loop method,
- else
- v_div_pos:=0;
- v1_div_pos:=0;
- for i in 1..1000000000 loop
- v1_div_pos:=instr(in_rec.text, C_NL_TERM, v_div_pos+1);
- exit when v1_div_pos=0;
- out_rec.item_txt :=substr(in_rec.text, v_div_pos+1, v1_div_pos-v_div_pos-1);
- v_div_pos :=v1_div_pos;
- pipe row(out_rec);
- end loop;
- end if;
- end loop;
- close p;
- return;
- end f_cartesian;
-
- function f_cartesian2 (p refcur_t) return outrecset pipelined
- parallel_enable (partition p by any)
- is
- in_rec p%ROWTYPE;
- out_rec outrec_typ;
- begin
- loop
- fetch p into in_rec; -- input row
- exit when p%NOTFOUND;
- out_rec.id :=in_rec.id;
- out_rec.item_txt :=in_rec.text;
- for i in 1..in_rec.lines loop
- pipe row(out_rec);
- end loop;
- end loop;
- close p;
- return;
- end f_cartesian2;
-
- function f_cartesian3 (p refcur_t) return outrecset pipelined
- parallel_enable (partition p by any)
- is
- in_rec p%ROWTYPE;
- out_rec outrec_typ;
- begin
- loop
- fetch p into in_rec; -- input row
- exit when p%NOTFOUND;
- out_rec.id :=in_rec.id;
- out_rec.item_txt :=substr(in_rec.text, 1, 64);
- for i in 1..in_rec.lines loop
- pipe row(out_rec);
- end loop;
- end loop;
- close p;
- return;
- end f_cartesian3;
-
- end refcur_pkg;
- /
- bill@ORCL> select id, item_txt
- 2 from table(refcur_pkg.f_cartesian(cursor(
- 3 select id, length(text)-length(replace(text, chr(10))), text
- 4 from t1
- 5 )));
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 00.40
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 4049074522
-
- --------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- --------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
- | 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
- | 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN | 8168 | | 29 (0)| 00:00:01 |
- | 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- --------------------------------------------------------------------------------------------------
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 258 recursive calls
- 0 db block gets
- 469 consistent gets
- 1 physical reads
- 0 redo size
- 228399 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
因为在我的处理中,是要从clob进行打散,所以我先把之前的处理结果分成64行为一个大的varchar,到这里再用硬性的6层二分法,
如果你的需求中行数不定,可以略微更改一下if条件,以及二分法的层数,这样可以不必在二分法的层数上太过伤神。
总之,我们看到了性能的提升,是显著的,而且,在我的测试当中,我发现先把instr的变量存起来,
也就是类似substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1)这种写法,要更高效,上面的代码其实还可以再改。
纯粹为了测试,我还做了f_cartesian2和f_cartesian3,
前者是简单的实现数据的复制,后者是每行取指定的substr,在这两种情况下,其实plsql就没有什么优势了。
- --sql connect by
- bill@ORCL> select id, text
- 2 from t1
- 3 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 06.81
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 3874795171
-
- -------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
- |* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
- | 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- -------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 1 - access("ID"=PRIOR "ID")
- filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
- "DBMS_RANDOM"."VALUE"()>0)
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 4 recursive calls
- 0 db block gets
- 313 consistent gets
- 0 physical reads
- 0 redo size
- 236099 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 1 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
-
- --sql笛卡尔
- bill@ORCL> select id, text
- 2 from t1,
- 3 (select rownum n from dual connect by rownum<=64) b;
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 05.00
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 894562235
-
- ----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
- | 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
- | 2 | VIEW | | 1 | | 2 (0)| 00:00:01 |
- | 3 | COUNT | | | | | |
- |* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
- | 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
- | 6 | BUFFER SORT | | 186 | 366K| 70 (0)| 00:00:01 |
- | 7 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- ----------------------------------------------------------------------------------------
-
- Predicate Information (identified by operation id):
- ---------------------------------------------------
-
- 4 - filter(ROWNUM<=64)
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 7 recursive calls
- 0 db block gets
- 378 consistent gets
- 0 physical reads
- 0 redo size
- 280133 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 2 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
-
- --管道表函数
- bill@ORCL> select id, item_txt
- 2 from table(refcur_pkg.f_cartesian2(cursor(
- 3 select id, length(text)-length(replace(text, chr(10))), text
- 4 from t1
- 5 )));
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 05.16
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 2991304848
-
- ---------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
- | 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
- | 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN2 | 8168 | | 29 (0)| 00:00:01 |
- | 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- ---------------------------------------------------------------------------------------------------
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 266 recursive calls
- 0 db block gets
- 529 consistent gets
- 0 physical reads
- 0 redo size
- 236103 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
-
-
- bill@ORCL> select id, item_txt
- 2 from table(refcur_pkg.f_cartesian3(cursor(
- 3 select id, length(text)-length(replace(text, chr(10))), text
- 4 from t1
- 5 )));
-
- 已选择 12800 行。
-
- 已用时间: 00: 00: 00.29
-
- 执行计划
- ----------------------------------------------------------
- Plan hash value: 2831580648
-
- ---------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
- | 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
- | 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN3 | 8168 | | 29 (0)| 00:00:01 |
- | 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
- ---------------------------------------------------------------------------------------------------
-
- Note
- -----
- - dynamic statistics used: dynamic sampling (level=2)
-
-
- 统计信息
- ----------------------------------------------------------
- 278 recursive calls
- 0 db block gets
- 555 consistent gets
- 0 physical reads
- 124 redo size
- 228407 bytes sent via SQL*Net to client
- 9927 bytes received via SQL*Net from client
- 855 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 12800 rows processed
不过,从此以后,oracle里面,再有这种需求,大家不妨参考这里plsql的思路来做。