语法格式:
REGEXP_SUBSTR(String, pattern, position, occurrence, modifier)
样例1:
select regexp_substr('ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD',
'[^,]+', 1, 1, 'c')
from dual;
-- String 处理的字符串是: 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD'
-- pattern 匹配的正则是:'[^,]+'
-- 意思是"匹配非逗号的多个字符",相当于指定逗号为分隔符,
-- 将被处理的字符串分隔成多个组
-- position 起始字符是第一个,即从被处理字符串的第一个字符匹配正则,进行拆分,默认1
-- occurrence 取出拆分后的第1组,默认1
-- modifier 分隔符是否忽略大小写,'c'代表不忽略,'i'代表忽略,默认'c'
样例2:
select regexp_substr('ALLEN X JAMES x TURNER X BLAKE x MARTIN X WARD',
'[^x]+', 3, 1, 'i')
from dual;
-- 以小写x做为分隔符,忽略大小写,即分隔符实际上是大写X和小写x,
-- 从被处理串的第三个字符开始匹配,取出分隔后的第一组,
-- 取出的数据为"LEN ",尾部是有一个空格的,因为原串有空格
样例3:
取出所有的组,需要使用到”connect by level”和”regexp_replace”,具体推演如下:
原字符串:
处理后的样式:
推演一:
-- 因为'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD'以逗号分隔有6组,
-- 因此需要做6次union all 操纵才能分别将所有组取出:
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD' as strs, 1 as lvl
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 2
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 3
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 4
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 5
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 6
from dual;
with tmp as
(select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD' as strs, 1 as lvl
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 2
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 3
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 4
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 5
from dual union all
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', 6
from dual)
select regexp_substr(strs, '[^,]+', 1, lvl, 'c') as str from tmp;
-- 根据lvl来确定是第几行,取出来第几组,比如第一行取第一组,第二行取第二组...
推演二:
以上6次的union all 操作可以使用 dual connect by level技巧来改写:
select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD' as strs, lvl
from dual, (select level lvl from dual connect by level <= 6);
with tmp as
(select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD' as strs, lvl
from dual, (select level lvl from dual connect by level <= 6))
select regexp_substr(strs, '[^,]+', 1, lvl, 'c') as str from tmp;
推演三:
因为被处理串分隔后有6组,因此需要做6次union all 操作,那么做多少次的union all 操作,也可以使用函数自动判断出,需要使用regexp_replace函数和length函数来确定,具体是将分隔符之外的字符抹掉,然后求出分隔符的数量,加1后就是union all 的次数:
select length('1,2,3,4'),length(',,,')+1 from dual;
select regexp_replace('ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD', '[^,]', '')
from dual;
select length(regexp_replace('ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD',
'[^,]','')) + 1
from dual;
这个地方有一个要注意的地方:
要注意被处理的字符串最后一个字符是不是分隔符,然后要做相应的调整(最后要不要加1)
最终SQL为:
with tmp as
(select 'ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD' as strs, lvl
from dual,
(select level lvl
from dual
connect by level <= (select length(regexp_replace('ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD',
'[^,]')) + 1
from dual)))
select regexp_substr(strs, '[^,]+', 1, lvl, 'c') as str from tmp;
[TOC]