Oracle 10g数据库内建了符合IEEE POSIX (Portable Operating System for Unix)标准的正则表达式。熟练使用正则表达式,可以写出简洁,强大的SQL语句。
正则表达式有几个优点优于常见的LIKE操作符和INSTR、SUBSTR及REPLACE 函数的。这些传统的SQL 函数不便于进行模式匹配。只有LIKE 操作符通过使用%和_字符匹配,但LIKE不支持表达式的重复、复杂的更替、字符范围、字符列表和POSIX 字符类等等。
元字符(Meta Character):
- ^ 使表达式定位至一行的开头
- $ 使表达式定位至一行的末尾
- * 匹配 0 次或更多次
- ? 匹配 0 次或 1 次
- + 匹配 1 次或更多次
- {m} 正好匹配 m 次
- {m,} 至少匹配 m 次
- {m, n} 至少匹配 m 次但不超过 n 次
- [:alpha:] 字母字符
- [:lower:] 小写字母字符
- [:upper:] 大写字母字符
- [:digit:] 数字
- [:alnum:] 字母数字字符
- [:space:] 空白字符(禁止打印),如回车符、换行符、竖直制表符和换页符[:punct:] 标点字符
- [:cntrl:] 控制字符(禁止打印)
- [:print:] 可打印字符 | 分隔替换选项,通常与分组操作符 () 一起使用
- ( ) 将子表达式分组为一个替换单元、量词单元或后向引用单元
- [char] 字符列表
Oracle 10g提供了四个regexp function: REGEXP_LIKE, REGEXP_REPLACE,REGEXP_INSTR,REGEXP_SUBSTR。
- REGEXP_LIKE:比较一个字符串是否与正则表达式匹配
- (srcstr, pattern [, match_option])
- REGEXP_INSTR:在字符串中查找正则表达式,并且返回匹配的位置
- (srcstr, pattern [, position [, occurrence [, return_option [, match_option]]]])
- REGEXP_SUBSTR:返回与正则表达式匹配的子字符串
- (srcstr, pattern [, position [, occurrence [, match_option]]])
- REGEXP_REPLACE:搜索并且替换匹配的正则表达式
- (srcstr, pattern [, replacestr [, position [, occurrence [, match_option]]]])
其中各参数的含义为:
- srcstr: 被查找的字符数据。
- pattern: 正则表达式。
- occurrence: 出现的次数。默认为1。
- position: 开始位置
- return_option: 默认值为0,返回该模式的起始位置;值为1则返回符合匹配条件的下一个字符的起始位置。
- replacestr: 用来替换匹配模式的字符串。
- match_option: 匹配方式选项。缺省为c。
- c:case sensitive
- I:case insensitive
- n:(.)匹配任何字符(包括newline)
- m:字符串存在换行的时候被作为多行处理
下面通过一些具体的例子来说明如何使用这四个函数。首先创建一个测试数据表,
- SQL> create table person (
- 2 first_name varchar(20),
- 3 last_name varchar(20),
- 4 email varchar(100),
- 5 zip varchar(6));
- Table created.
- SQL> insert into person values ('Steven', 'Chen', 'steven@hp.com', '123456');
- 1 row created.
- SQL> insert into person values ('James', 'Li', 'jamesli@sun.com' || chr(10) || 'lijames@oracle.com', '1b3d5f');
- 1 row created.
- SQL> commit;
- Commit complete.
- SQL> select * from person;
- FIRST_NAME LAST_NAME EMAIL ZIP
- ---------- ---------- -------------------- ------
- Steven Chen steven@hp.com 123456
- James Li jamesli@sun.com 1b3d5f
- lijames@oracle.com
1。REGEXP_LIKE
- SQL> select zip as invalid_zip from person where regexp_like(zip, '[^[:digit:]]');
- INVALID_ZIP
- --------------------
- 1b3d5f
- SQL> select first_name from person where regexp_like(first_name, '^S.*n$');
- FIRST_NAME
- ----------
- Steven
- SQL> select first_name from person where regexp_like(first_name, '^s.*n$');
- no rows selected
- SQL> select first_name from person where regexp_like(first_name, '^s.*n$', 'c');
- no rows selected
- SQL> select first_name from person where regexp_like(first_name, '^s.*n$', 'i');
- FIRST_NAME
- ----------
- Steven
- SQL> select email from person where regexp_like(email, '^james.*com$');
- no rows selected
- SQL> select email from person where regexp_like(email, '^james.*com$', 'n');
- --------------------
- jamesli@sun.com
- lijames@oracle.com
- SQL> select email from person where regexp_like(email, '^li.*com$');
- no rows selected
- SQL> select email from person where regexp_like(email, '^li.*com$', 'm');
- --------------------
- jamesli@sun.com
- lijames@oracle.com
注意上面分别测试了不同的match_option对结果的影响。
2。REGEXP_INSTR
- 查找zip中第一个非数字字符的位置
- SQL> select regexp_instr(zip, '[^[:digit:]]') as position from person;
- POSITION
- ----------
- 0
- 2
- 从第三个字符开始,查找zip中第二个非数字字符的位置
- SQL> select regexp_instr(zip, '[^[:digit:]]', 3, 2) as position from person;
- POSITION
- ----------
- 0
- 6
- 从第三个字符开始,查找zip中第二个非数字字符的下一个字符位置
- SQL> select regexp_instr(zip, '[^[:digit:]]', 3, 2, 1) as position from person;
- POSITION
- ----------
- 0
- 7
3。REGEXP_SUBSTR
- SQL> select regexp_substr(zip, '[^[:digit:]]') as zip from person;
- ZIP
- ------------------
- b
- SQL> select regexp_substr(zip, '[^[:digit:]]', 3, 2) as zip from person;
- ZIP
- ------------
- f
4。REGEXP_REPLACE
- 把zip中所有非数字字符替换为0
- SQL> update person set zip=regexp_replace(zip, '[^[:digit:]]', '0')
- 2 where regexp_like(zip, '[^[:digit:]]');
- 1 row updated.
- SQL> select zip from person;
- ZIP
- ------
- 123456
- 103050
后向引用(backreference):
后向引用是一个很有用的特性。它能够把子表达式的匹配部分保存在临时缓冲区中,供以后重用。缓冲区从左至右进行编号,并利用 \digit 符号进行访问。子表达式用一组圆括号来显示。利用后向引用可以实现较复杂的替换功能。
- SQL> select regexp_replace('Steven Chen', '(.*) (.*)', '\2, \1') as reversed_name from dual;
- REVERSED_NAME
- --------------------
- Chen, Steven
在DDL中也可以正则表达式,比如Constraint, index, view
- SQL> alter table person add constraint constraint_zip check (regexp_like(zip, '^[[:digit:]]+$'));
- SQL> create index person_idx on person(regexp_substr(last_name, '^[[:upper:]]'));
ORACLE终于在10G中提供了对正则表达式的支持,以前那些需要通过LIKE来进行的复杂的匹配就可以通过使用正则表达式更简单的实现。
Oracle10g正则表达式提高了SQL灵活性。有效的解决了数据有效性,重复词的辨认,无关的空白检测,或者分解多个正则组成的字符串等问题。
Oracle10g支持正则表达式的四个新函数分别,它们使用POSIX正则表达式代替了老的百分号(%)和通配符(_)字符。
1,REGEXP_LIKE:与LIKE的功能相似
2,REGEXP_INSTR:与INSTR的功能相似:返回字符的个数
3,REGEXP_SUBSTR:与SUBSTR的功能相似:截取字符串
4,REGEXP_REPLACE:与REPLACE的功能相似
在新的函数中使用正则表达式来代替通配符‘%’和‘_’。
正则表达式由标准的元字符(metacharacters)所构成:
'^'匹配输入字符串的开始位置,在方括号表达式中使用,此时它表示不接受该字符集合。
'$'匹配输入字符串的结尾位置。如果设置了RegExp对象的Multiline属性,则$也匹配'n'或'r'。
'.'匹配除换行符n之外的任何单字符。
'?'匹配前面的子表达式零次或一次。
'+'匹配前面的子表达式一次或多次。
'*'匹配前面的子表达式零次或多次。
'|'指明两项之间的一个选择。例子'^([a-z]+|[0-9]+)$'表示所有小写字母或数字组合成的字符串。
'()'标记一个子表达式的开始和结束位置。
'[]'标记一个中括号表达式。
'{m,n}'一个精确地出现次数范围,m=<出现次数<=n,'{m}'表示出现m次,'{m,}'表示至少出现m次。
num匹配num,其中num是一个正整数。对所获取的匹配的引用。
字符簇:
[[:alpha:]]任何字母。
[[:digit:]]任何数字。
[[:alnum:]]任何字母和数字。
[[:space:]]任何白字符。
[[:upper:]]任何大写字母。
[[:lower:]]任何小写字母。
[[:punct:]]任何标点符号。
[[:xdigit:]]任何16进制的数字,相当于[0-9a-fA-F]。
各种操作符的运算优先级
1.转义符
2.(),(?:),(?=),[]圆括号和方括号
3.*,+,?,{n},{n,},{n,m}限定符
4.^,$,anymetacharacter位置和顺序
4.|“或”操作
示例:
一、REGEXP_LIKE
/*Formattedon2009/11/1119:03(FormatterPlusv4.8.8)*/
SELECT*
FROMscott.emp
WHEREREGEXP_LIKE(ename,'^a{1,3}');
SELECT*
FROMscott.emp
WHEREREGEXP_LIKE(ename,'a{1,2}');
SELECT*
FROMscott.emp
WHEREREGEXP_LIKE(ename,'^a.*e$');
SELECT*
FROMscott.emp
WHEREREGEXP_LIKE(ename,'^[[:lower:]]|[[:digit:]]');
SELECT*
FROMscott.emp
WHEREREGEXP_LIKE(ename,'^[[:lower:]]');
SELECTsal
FROMscott.emp
WHEREREGEXP_LIKE(sal,'[^[:digit:]]');
SELECTsal
FROMscott.emp
WHEREREGEXP_LIKE(sal,'^[^[:digit:]]');
二、REGEXP_INSTR
/*Formattedon2009/11/1119:03(FormatterPlusv4.8.8)*/
SELECTREGEXP_INSTR(ename,'[[:digit:]]$')
FROMscott.emp;
SELECTREGEXP_INSTR(mc,'[[:digit:]]+$')
FROMscott.emp;
SELECTREGEXP_INSTR('Thepriceis$400.','$[[:digit:]]+')
FROMDUAL;
SELECTREGEXP_INSTR('onetwothree','[^[[:lower:]]]')
FROMDUAL;
SELECTREGEXP_INSTR(',,,,,','[^,]*')
FROMDUAL;
SELECTREGEXP_INSTR(',,,,,','[^,]')
FROMDUAL;
三、REGEXP_SUBSTR
/*Formattedon2009/11/1119:04(FormatterPlusv4.8.8)*/
SELECTREGEXP_SUBSTR(ename,'[a-z]+')
FROMscott.emp;
SELECTREGEXP_SUBSTR(sal,'[0-9]+')
FROMscott.emp;
SELECTREGEXP_SUBSTR('aababcde','^a.*b')
FROMDUAL;
四、REGEXP_REPLACE
/*Formattedon2009/11/1119:06(FormatterPlusv4.8.8)*/
SELECTREGEXP_REPLACE('JoeSmith','(){2,}',',')ASrx_replace
FROMDUAL;
SELECTREGEXP_REPLACE('aabbcc','(.*)(.*)(.*)','3,2,1')
FROM DUAL ;
Oracle SQL 语句中关于正则表达式的函数有
REGEXP_LIKE(匹配)
REGEXP_INSTR (包含)
REGEXP_REPLACE(替换)
REGEXP_SUBSTR(提取)
如 手机号码的表达式: ^[1]{1}[35]{1}[[:digit:]]{9}$
查询客户信息表(tKHXX)中有手机号码(SJHM)的可以这样查询
- SELECT * FORM tKHXX where REGEXP_LIKE(SJHM, '^[1]{1}[35]{1}[[:digit:]]{9}$' )
- SELECT * FORM tKHXX where REGEXP_LIKE(SJHM, '^[1]{1}[35]{1}[[:digit:]]{9}$' )
针对这个表达式解释一下
^ 表示开始
$ 表示结束
[]内部为匹配范围
{}里的内容表时个数
手机号码的特点是以 1开头接着是3或5再加9位的数字 所以这么理解
1开头 表达式为 ^[1]{1} 意为 开始1位里包含1
3或5 表达式为 [35]{1}
9位数字结束 为: [[:digit:]]{9}$ 这里[:digit:]为特殊写法,代表为数字 再加个结束符$
用则表达式很简单,更高效
下面列一些参考,来自网络 :)
Anchoring Characters
^ Anchoring Characters
$ Anchor the expression to the end of a line
Equivalence Classes
= =
Oracle supports the equivalence classes through the POSIX '[==]' syntax. A base letter and all of its accented versions constitute an equivalence class. For example, the equivalence class '[=a=]' matches ?and ? The equivalence classes are valid only inside the bracketed expression
Match Options
c Case sensitive matching
i Case insensitive matching
m Treat source string as multi-line activating Anchor chars
n Allow the period (.) to match any newline character
Posix Characters
[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:blank:] Blank Space Characters
[:cntrl:] Control characters (nonprinting)
[:digit:] Numeric digits
[:graph:] Any [:punct:], [:upper:], [:lower:], and [:digit:] chars
[:lower:] Lowercase alphabetic characters
[:print:] Printable characters
[:punct:] Punctuation characters
[:space:] Space characters (nonprinting), such as carriage return, newline, vertical tab, and form feed
[:upper:] Uppercase alphabetic characters
[:xdigit:] Hexidecimal characters
Quantifier Characters
* Match 0 or more times
? Match 0 or 1 time
+ Match 1 or more times
{m} Match exactly m times
{m,} Match at least m times
{m, n} Match at least m times but no more than n times
\n Cause the previous expression to be repeated n times
Alternative Matching And Grouping Characters
| Separates alternates, often used with grouping operator ()
( ) Groups subexpression into a unit for alternations, for quantifiers, or for backreferencing (see "Backreferences" section)
[char] Indicates a character list; most metacharacters inside a character list are understood as literals, with the exception of character classes, and the ^ and - metacharacters
下面是个测试例子及环境
测试表
- CREATE TABLE test (
- testcol VARCHAR2(50));
- INSERT INTO test VALUES ('abcde');
- INSERT INTO test VALUES ('12345');
- INSERT INTO test VALUES ('1a4A5');
- INSERT INTO test VALUES ('12a45');
- INSERT INTO test VALUES ('12aBC');
- INSERT INTO test VALUES ('12abc');
- INSERT INTO test VALUES ('12ab5');
- INSERT INTO test VALUES ('12aa5');
- INSERT INTO test VALUES ('12AB5');
- INSERT INTO test VALUES ('ABCDE');
- INSERT INTO test VALUES ('123-5');
- INSERT INTO test VALUES ('12.45');
- INSERT INTO test VALUES ('1a4b5');
- INSERT INTO test VALUES ('1 3 5');
- INSERT INTO test VALUES ('1 45');
- INSERT INTO test VALUES ('1 5');
- INSERT INTO test VALUES ('a b c d');
- INSERT INTO test VALUES ('a b c d e');
- INSERT INTO test VALUES ('a e');
- INSERT INTO test VALUES ('Steven');
- INSERT INTO test VALUES ('Stephen');
- INSERT INTO test VALUES ('111.222.3333');
- INSERT INTO test VALUES ('222.333.4444');
- INSERT INTO test VALUES ('333.444.5555');
- COMMIT;
- CREATE TABLE test (
- testcol VARCHAR2(50));
- INSERT INTO test VALUES ('abcde');
- INSERT INTO test VALUES ('12345');
- INSERT INTO test VALUES ('1a4A5');
- INSERT INTO test VALUES ('12a45');
- INSERT INTO test VALUES ('12aBC');
- INSERT INTO test VALUES ('12abc');
- INSERT INTO test VALUES ('12ab5');
- INSERT INTO test VALUES ('12aa5');
- INSERT INTO test VALUES ('12AB5');
- INSERT INTO test VALUES ('ABCDE');
- INSERT INTO test VALUES ('123-5');
- INSERT INTO test VALUES ('12.45');
- INSERT INTO test VALUES ('1a4b5');
- INSERT INTO test VALUES ('1 3 5');
- INSERT INTO test VALUES ('1 45');
- INSERT INTO test VALUES ('1 5');
- INSERT INTO test VALUES ('a b c d');
- INSERT INTO test VALUES ('a b c d e');
- INSERT INTO test VALUES ('a e');
- INSERT INTO test VALUES ('Steven');
- INSERT INTO test VALUES ('Stephen');
- INSERT INTO test VALUES ('111.222.3333');
- INSERT INTO test VALUES ('222.333.4444');
- INSERT INTO test VALUES ('333.444.5555');
- COMMIT;
REGEXP_INSTR
REGEXP_INSTR(<source_string>, <pattern>, <start_position>, <occurrence>, <return_option>, <match_parameter>)
Find words beginning with 's' or 'r' or 'p' followed by any 4 alphabetic characters: case insensitive
- SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 0, 'i') RESULT
- FROM dual;
- SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 1, 'i') RESULT
- FROM dual;
- SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 0, 'i') RESULT
- FROM dual;
- SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 1, 'i') RESULT
- FROM dual;
- SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 0, 'i') RESULT
- FROM dual;
- SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 1, 'i') RESULT
- FROM dual;
- SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 0, 'i') RESULT
- FROM dual;
- SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 1, 'i') RESULT
- FROM dual;
Find the postiion of try, trying, tried or tries
- SELECT REGEXP_INSTR('We are trying to make the subject easier.', 'tr(y(ing)?|(ied)|(ies))') RESULTNUM
- FROM dual;
- SELECT REGEXP_INSTR('We are trying to make the subject easier.', 'tr(y(ing)?|(ied)|(ies))') RESULTNUM
- FROM dual;
REGEXP_LIKE
REGEXP_LIKE(<source_string>, <pattern>, <match_parameter>)
AlphaNumeric Characters
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alnum:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{5}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alnum:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{5}');
Alphabetic Characters:
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alpha:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{5}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alpha:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{5}');
Control Characters
- INSERT INTO test VALUES ('zyx' || CHR(13) || 'wvu');
- COMMIT;
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:cntrl:]]{1}');
- INSERT INTO test VALUES ('zyx' || CHR(13) || 'wvu');
- COMMIT;
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:cntrl:]]{1}');
Digits
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:digit:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:digit:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:digit:]]{5}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:digit:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:digit:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:digit:]]{5}');
Lower Case
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:lower:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:lower:]]{2}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:lower:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:lower:]]{5}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:lower:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:lower:]]{2}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:lower:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:lower:]]{5}');
Printable Characters
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:print:]]{5}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:print:]]{6}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:print:]]{7}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:print:]]{5}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:print:]]{6}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:print:]]{7}');
Punctuation
- TRUNCATE TABLE test;
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:punct:]]');
- TRUNCATE TABLE test;
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:punct:]]');
Spaces
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:space:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:space:]]{2}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:space:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:space:]]{5}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:space:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:space:]]{2}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:space:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:space:]]{5}');
Upper Case
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:upper:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:upper:]]{2}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:upper:]]{3}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:upper:]]');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:upper:]]{2}');
- SELECT *
- FROM test
- WHERE REGEXP_LIKE(testcol, '[[:upper:]]{3}');
Values Starting with 'a%b'
- SELECT testcol
- FROM test
- WHERE REGEXP_LIKE(testcol, '^ab*');
- SELECT testcol
- FROM test
- WHERE REGEXP_LIKE(testcol, '^ab*');
'a' is the third value
- SELECT testcol
- ROM test WHERE REGEXP_LIKE(testcol, '^..a.');
- SELECT testcol
- ROM test WHERE REGEXP_LIKE(testcol, '^..a.');
Contains two consecutive occurances of the letter 'a' or 'z'
- SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '([az])\1', 'i');
- SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '([az])\1', 'i');
Begins with 'Ste' ends with 'en' and contains either 'v' or 'ph' in the center
- SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '^Ste(v|ph)en$');
- SELECT testcol FROM test WHERE REGEXP_LIKE(testcol, '^Ste(v|ph)en$');
Use a regular expression in a check constraint
- CREATE TABLE mytest (c1 VARCHAR2(20),
- CHECK (REGEXP_LIKE(c1, '^[[:alpha:]]+$')));
- Identify SSN
- Thanks: Byron Bush HIOUG
- CREATE TABLE ssn_test (
- ssn_col VARCHAR2(20));
- INSERT INTO ssn_test VALUES ('111-22-3333');
- INSERT INTO ssn_test VALUES ('111=22-3333');
- INSERT INTO ssn_test VALUES ('111-A2-3333');
- INSERT INTO ssn_test VALUES ('111-22-33339');
- INSERT INTO ssn_test VALUES ('111-2-23333');
- INSERT INTO ssn_test VALUES ('987-65-4321');
- COMMIT;
- SELECT ssn_col
- from ssn_test
- WHERE regexp_like(ssn_col,'^[0-9]{3}-[0-9]{2}-[0-9]{4}$');
- CREATE TABLE mytest (c1 VARCHAR2(20),
- CHECK (REGEXP_LIKE(c1, '^[[:alpha:]]+$')));
- Identify SSN
- Thanks: Byron Bush HIOUG
- CREATE TABLE ssn_test (
- ssn_col VARCHAR2(20));
- INSERT INTO ssn_test VALUES ('111-22-3333');
- INSERT INTO ssn_test VALUES ('111=22-3333');
- INSERT INTO ssn_test VALUES ('111-A2-3333');
- INSERT INTO ssn_test VALUES ('111-22-33339');
- INSERT INTO ssn_test VALUES ('111-2-23333');
- INSERT INTO ssn_test VALUES ('987-65-4321');
- COMMIT;
- SELECT ssn_col
- from ssn_test
- WHERE regexp_like(ssn_col,'^[0-9]{3}-[0-9]{2}-[0-9]{4}$');
REGEXP_REPLACE
Syntax REGEXP_REPLACE(<source_string>, <pattern>,<replace_string>, <position>, <occurrence>, <match_parameter>)
Looks for the pattern xxx.xxx.xxxx and reformats pattern to (xxx) xxx-xxxx col testcol format a15
col result format a15
- SELECT testcol, REGEXP_REPLACE(testcol,'([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})',
- '(\1) \2-\3') RESULT
- FROM test
- WHERE LENGTH(testcol) = 12;
- SELECT testcol, REGEXP_REPLACE(testcol,'([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})',
- '(\1) \2-\3') RESULT
- FROM test
- WHERE LENGTH(testcol) = 12;
Put a space after every character
- SELECT testcol, REGEXP_REPLACE(testcol, '(.)', '\1 ') RESULT
- FROM test WHERE testcol like 'S%';
- SELECT testcol, REGEXP_REPLACE(testcol, '(.)', '\1 ') RESULT
- FROM test WHERE testcol like 'S%';
Replace multiple spaces with a single space
- SELECT REGEXP_REPLACE('500 Oracle Parkway, Redwood Shores, CA', '( ){2,}', ' ') RESULT
- FROM dual;
- SELECT REGEXP_REPLACE('500 Oracle Parkway, Redwood Shores, CA', '( ){2,}', ' ') RESULT
- FROM dual;
Insert a space between a lower case character followed by an upper case character
- SELECT REGEXP_REPLACE('George McGovern', '([[:lower:]])([[:upper:]])', '\1 \2') CITY
- FROM dual;
- SELECT REGEXP_REPLACE('George McGovern', '([[:lower:]])([[:upper:]])', '\1 \2') CITY
- FROM dual;
Replace the period with a string (note use of '\')
- SELECT REGEXP_REPLACE('We are trying to make the subject easier.','\.',' for you.') REGEXT_SAMPLE
- FROM dual;
- SELECT REGEXP_REPLACE('We are trying to make the subject easier.','\.',' for you.') REGEXT_SAMPLE
- FROM dual;
REGEXP_SUBSTR
Syntax REGEXP_SUBSTR(source_string, pattern[, position [, occurrence[, match_parameter]]])
Searches for a comma followed by one or more occurrences of non-comma characters followed by a comma
- SELECT REGEXP_SUBSTR('500 Oracle Parkway, Redwood Shores, CA', ',[^,]+,') RESULT
- FROM dual;
- SELECT REGEXP_SUBSTR('500 Oracle Parkway, Redwood Shores, CA', ',[^,]+,') RESULT
- FROM dual;
Look for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (.) col result format a50
- SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database',
- 'http://([[:alnum:]]+\.?){3,4}/?') RESULT
- FROM dual;
- SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database',
- 'http://([[:alnum:]]+\.?){3,4}/?') RESULT
- FROM dual;
Extracts try, trying, tried or tries
SELECT REGEXP_SUBSTR('We are trying to make the subject easier.','tr(y(ing)?|(ied)|(ies))')
FROM dual;
Extract the 3rd field treating ':' as a delimiter SELECT REGEXP_SUBSTR('system/pwd@orabase:1521:sidval',
'[^:]+', 1, 3) RESULT
FROM dual;
Extract from string with vertical bar delimiter
- CREATE TABLE regexp (
- testcol VARCHAR2(50));
- INSERT INTO regexp
- (testcol)
- VALUES
- ('One|Two|Three|Four|Five');
- SELECT * FROM regexp;
- SELECT REGEXP_SUBSTR(testcol,'[^|]+', 1, 3)
- FROM regexp;
- CREATE TABLE regexp (
- testcol VARCHAR2(50));
- INSERT INTO regexp
- (testcol)
- VALUES
- ('One|Two|Three|Four|Five');
- SELECT * FROM regexp;
- SELECT REGEXP_SUBSTR(testcol,'[^|]+', 1, 3)
- FROM regexp;
环境: oracle 10g r2
更多内容可以参考:
<<Oracle? Database Application Developer's Guide - Fundamentals 10g Release 2 (10.2)>>
必须明白的是,oracle的正则表达式和其它的语言的并不是完全一样。
oracle支持以下标准的正则表达式:
- IEEE Portable Operating System Interface (POSIX) standard draft 1003.2/D11.2
- Unicode Regular Expression Guidelines of the Unicode Consortium
- Extends the matching capabilities for multilingual data beyond what is specified in the POSIX standard(即posix的多语言支持).
- Adds support for the common Perl regular expression extensions that are not
expression operators, for example, character class shortcuts, the non-greedy
modifier, and so on(即部分perl标准,不包含在posix标准,但是又不与之冲突的)
---2008-09-08 做了个实验,关键的要点是三个,重复次数{},+,*,?
---以及匹配模式[...] ,[^...],(...),(...)
--以及转义符号 \d,\D,\\等。
------------------------------------------------------------------------------------------------------------------
ID VALUE
--------------------------------------- ----------------------------------------
9 lzf is a chinese
10 1231231231lzf
11 perfect
1 13625061300
2 059183383335
3 13959029697
4 13950323157
5 who is your daddy?
6 who are you?
6 what is your name?
6 I am 20 years old!
7 股市下挫到2100点左右
8 lzf ' is very happy 136ok
8 do you know lzf,ok?
8 ok,lzf is my brother
------------------------------------------------------------------------------------------------------------------
select * from test_reg where regexp_like(value,'^1(.+)7$') --1开头,7结尾的内容
select * from test_reg where regexp_like(value,'\d+') --一个或者多个数字
select * from test_reg where regexp_like(value,'^[a-zA-Z]{1,}[a-zA-Z]+$') --全部字符的.[:digit:]
select * from test_reg where regexp_like(value,'^[0-9]{1,}[0-9]+$') --全部数字的.
select * from test_reg where regexp_like(value,'[y][o][u]') --包含you
select * from test_reg where regexp_like(value,'(you)') --包含you
select * from test_reg where regexp_like(value,'[you]') --包含y o u的,不分顺序
select * from test_reg where regexp_like(value,'(lzf){1,}(.){0,}(ok)') --包含lzf 和ok的. 但是也仅仅限制与lzf 在ok前
以上试验是在看了下文以及参考了OReilly.Oracle.PL.SQL.Programming.4th.Edition.Aug.2005.chs后作出的.
--------------
2010/08/04 增加以下例子
---
--以'POSIX('开头的串,以下几个表达式的结果都是一样的。
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1})(,)\d{1,}?\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1})(,)\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1}),\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),([RYN]{1}),\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),([RYN]+),\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
select 1 from dual where regexp_like('POSIX(A,N,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
----------
--位于字符串中的串
select 1 from dual where regexp_like('POSIX(A,N,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
select 1 from dual where regexp_like('P_OSI_X(JYL_GP_R,N,13)','^[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)')
--测试regexp_substr,基本的还是regexp_like
SQL> select regexp_substr('P_OSI_X(JYL_GP_R,N,13)+AVG(JYL_JJ_R,Y,3)','^[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)') FROM DUAL 2 / REGEXP_SUBSTR('P_OSI_X(JYL_GP_ ------------------------------ P_OSI_X(JYL_GP_R,N,13) SQL> select regexp_substr('P_OSI_X(JYL_GP_R,N,13)+AVG(JYL_JJ_R,Y,3)','[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)',1,2) FROM DUAL 2 / REGEXP_SUBSTR('P_OSI_X(JYL_GP_ ------------------------------ AVG(JYL_JJ_R,Y,3) |
2010/08/26 新增例子
-- delete the first () select regexp_replace('dad(disc_id=50003118)add','(^([A-Z]{1,}\()|(\)[A-Z]{1,})$)','',1,0,'i') FROM DUAL -- match pattern using classes select regexp_substr(upper('dad(ssdsd89\9 dd== (sddf(d)),sdsd)add'),'^([A-Z]{1,}\()[[:space:][:alnum:]\,\_\(\)\=]{1,}\)[A-Z]{1,}$',1,1,'i') FROM DUAL |
OReilly. 的参考内容
Section A.1. Metacharacters
A.1. MetacharactersThe R2 column in Tables A-1 through A-3 indicates which metacharacters were introduced in Oracle Database 10g Release 2. All others were introduced in Release 1.
|
对于类的使用,要特别注意点,因为基本上要包含在一对中括号中。
例一:检验非汉字的函数
例如可以定义一个检验非汉字的函数,不考虑其它UNICODE
CREATE OR REPLACE Function func_existNormalChar(chars In Varchar2) Return Pls_Integer
Is
vResult Pls_Integer;
Begin
Select 1
Into vresult From dual
Where regexp_like(chars,'[!-~ ,。、?’“;:『』【】+=-——~!◎#¥%……※×()]+')
Or regexp_like(chars,'[[:space:]+]')
Or regexp_like(chars,'[[:blank:]+]')
Or regexp_like(chars,'[[:cntrl:]+]');
Return vresult;
Exception
When Others Then
Return 0;
End;
例二:出现次数的检验
序号 | 例子 | 结果 | 说明 |
1 | regexp_replace('a12c','[[:digit:]]+','*') | a*c | 测试"+",即出现一次或者更多次的.由于采取的是连续的匹配,所以结果上是把多个数字合并为一个替代字符串,为了理解连续匹配的意思,看序号9更能够明白.这是总体偏少的模式 |
2 | regexp_replace('a12c','[[:digit:]]?','*') | *a***c* | 测试"?",即0或者一次.这个符号,会在任意一个其它字符边上插入一个假设出现的匹配类型符号,如例子,a的左边本来是没有,但是表达式认为数字字符在a的左边出现了0次(这个是成立的).然后还要逐个替换过去,这是总体偏多的模式,过于匹配和逐个匹配结合 |
3 | regexp_replace('a12c','[[:digit:]]*','*') | *a**c* | 测试"*",即0或者更多次数,这个模式类似与?类似,采取连续匹配+多余匹配 |
4 | regexp_replace('a12c','[[:digit:]]+?','*') | a**c | 测试"+?",匹配在前面一次或者多次的串,采取的是逐个匹配的策略(如果有连续的情况出现),这个是通常最希望使用的模式. |
5 | regexp_replace('a12c','[[:digit:]]*?','*') | *a*1*2*c* | 测试"*?",匹配0或者更多次,难于理解! 但可以简单地理解为为在各个字符中以及两边插入替代串. |
6 | regexp_replace('a12c','[[:digit:]]??','*') | *a*1*2*c* | 测试"??",匹配0或者1次,难于理解啊!但可以简单地理解为为在各个字符中以及两边插入替代串. |
7 | regexp_replace('a12c','[[:digit:]]{1,}','*') | a*c | 测试{1,},同+,结果已经说明 |
8 | |||
9 | regexp_replace('a1x2c','[[:digit:]]+','*') | a*x*c | 测试"+",这个例子说明,如果被检测的字符串中特定的匹配串如果不是连续的(多次),那么替代的还是属于正常的范围. |
其它的次数模式暂时没有测试,对于 *,?,+ ,*?,+?,??,基本有其对应的模式 {n,m}? 或者是{n,}? ,除了*,*?之外,因为此二者表示的都是0或者多次(大于1),无法用{n,m}的方式表示.
例三:多个类的并用
很多时候使用类还是很方便的,例如[:digit:]表示数字.[:space:]表示空格,换行,回车等等.
如果希望把特定串中的数字和回车空格等都替换为'-',则可以使用 "|"(替代操作符,其实就是类似一般高级语言中的或操作符号).
select regexp_replace('adfa jkjk jk
jk','([[:digit:]]+?)|([[:space:]]+?)','-') from dual;
结果如下:
adfa-jkjk-jk-jk
已经把空格和回车符号都替换为'-'了.
又例如:
Select regexp_replace('a1b c1d jk
jk','([[:digit:]]+?)|([[:space:]]+?)','-') From dual
结果如下:
a-b-c-d-jk-jk
例四:迭代替换
例如希望把以下串替换掉
'<xxxx <often> <common> <fat> >'
关于范围操作符号 -
发音:横杆或者减号
在列表匹配操作情况下,如果没有特别的设置,就是表示匹配一个范围:
例子:把字符串中非数字+-()的符号替换为空
这是一个不成功的匹配
SQL> select regexp_replace('1+[2c- 3([ad]km,4','[^0-9\+\-\(\)]','',1) from dual;
REGEXP_REPLACE('1+[2C-3([AD]KM
------------------------------
1+23(4
这也是一个错误的匹配,因为减号这个时候表示一个范围
SQL> select regexp_replace('1+[2c- 3([ad]km,4','[^0-9\+-\(\)]','',1) from dual;
REGEXP_REPLACE('1+[2C-3([AD]KM
------------------------------
1+[2-3([,4
二者的区别在于单独的一个减号,如果前面没有跟上斜杠\,那么就是表示范围,否则就是减号本身。
但即使如此,依然没有达到目的,第一个把减号还是给去除了,虽然排除了其它的需要排除的符号,
第二个,则是当作范围符号,大部分的没有排除掉。
但是不太明白的是,按照oracle文档的约定,在范围匹配中形如[\-]表示匹配一个减号,为何依然匹配补上了,尤其是当夹着其它需要匹配的模式的时候。
现在只能尝试着把这个模式放在最后:
SQL> select regexp_replace('1+[2c- 3([ad4','[^0-9\(+\)\-]','',1) from dual
2 /
REGEXP_REPLACE('1+[2C-3([AD4',
------------------------------
1+2-3(4
结果是可以,因为模式"\-"被放到了范围中的尾巴。
那是文档不全,还是理解不透彻了,
看来以后遇到这样的问题还是先把减号放到尾巴去吧。
转自:http://www.iteye.com/topic/398717
http://qjoycn.iteye.com/blog/1280595
http://mayuqi827.iteye.com/blog/1008395
http://lzfhope.blog.163.com/blog/static/6363992200812112025877/