正则表达式是用来匹配文本特殊的串(字符集合).
基本字符匹配
语法:
select 字段1,字段2... from 表名 where 检索列 regexp 文本;
#检索列包含文本的所有行
select id,name from product where id regexp 'c005';#检索id字段包含文本'c005'的所有行
regexp后所跟的东西作为正则表达式,如c005
mysql> select * from product;
+----------+--------+-------+
| id | name | price |
+----------+--------+-------+
| hyj c001 | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| hyj c001 | 卸妆油 | NULL |
| hyj c001 | NULL | NULL |
+----------+--------+-------+
9 rows in set (0.00 sec)
mysql> select id,name from product where id regexp '.005';
# .代表匹配任意一个字符,检索id字段包含文本'.005'(四个字符)的所有行
+---------+------+
| id | name |
+---------+------+
| hyjc005 | 口红 |
+---------+------+
1 row in set (0.00 sec)
mysql> select id,name from product where id regexp '.05';
+---------+------+
| id | name |
+---------+------+
| hyjc005 | 口红 |
+---------+------+
1 row in set (0.00 sec)
mysql> select id,name from product where id regexp 'c005';
+---------+------+
| id | name |
+---------+------+
| hyjc005 | 口红 |
+---------+------+
1 row in set (0.00 sec)
mysql> select id,name from product where id like 'c005';
Empty set (0.00 sec)
上面使用了正则表达式.005
.
是正则表达式语言中一个特殊字符,它代表匹配任意一个字符
上面的例子也可以用like和通配符(% _)来完成.
但是like和regexp之间有一个重要的区别:like匹配整个列,如果匹配的文本在列值中出现,like将不会找到它,相应的行也不会被返回(除非使用通配符).而regexp在列值内进行匹配,如果被匹配的文本在列值中出现,regexp将会找到它,相应的行将会被返回.
MySQL中的正则表达式匹配不区分大小写,为区分大小写,可使用binary,如where name regexp binary
‘HYJ’;
进行or匹配
为搜索两个串之一(或者为这个串,或者为另一个串),使用 |
mysql> select * from product;
+----------+--------+-------+
| id | name | price |
+----------+--------+-------+
| hyj c001 | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| hyj c001 | 卸妆油 | NULL |
| hyj c001 | NULL | NULL |
+----------+--------+-------+
9 rows in set (0.01 sec)
mysql> select * from product where id regexp 'hyj|07';
+----------+--------+-------+
| id | name | price |
+----------+--------+-------+
| hyj c001 | 防晒霜 | 68 |
| hyjc005 | 口红 | 59 |
| c007 | 粉底液 | 68 |
| hyj c001 | 卸妆油 | NULL |
| hyj c001 | NULL | NULL |
+----------+--------+-------+
5 rows in set (0.00 sec)
语句中使用了正则表达式hyj|07
,|
为正则表达式的OR操作符,它代表匹配其中之一.|
从功能上类似于在select语句中使用or语句(或者
的意思)
匹配几个字符
[ ]
是另一种形式的or语句
mysql> select * from product;
+----------+--------+-------+
| id | name | price |
+----------+--------+-------+
| hyj c001 | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| hyj c001 | 卸妆油 | NULL |
| hyj c001 | NULL | NULL |
+----------+--------+-------+
9 rows in set (0.00 sec)
mysql> select * from product where id regexp 'c00[1,4,6]';
+----------+--------+-------+
| id | name | price |
+----------+--------+-------+
| hyj c001 | 防晒霜 | 68 |
| c004 | 粉饼 | 58 |
| c006 | 水乳 | 68 |
| hyj c001 | 卸妆油 | NULL |
| hyj c001 | NULL | NULL |
+----------+--------+-------+
5 rows in set (0.00 sec)
上面使用了正则表达式c00[1,4,6]
,也可以这样子写c00[146]
,效果一样.[146]
定义一组字符,它的意思是匹配1或4或6.(即最后文本是’c001’或’c004’或’c006)
正则表达式c00[146]
为c00[1|4|6]
的缩写,也可以使用后者
字符集合也可以被否定,即,它们将匹配除指定字符外的任何东西,为否定一个字符集,在集合的开始处放一个^
即可,因此,尽管[146]匹配字符1或4或6,但是[^146]
却匹配除这些字符外的任何东西
mysql> select * from product;
+----------+--------+-------+
| id | name | price |
+----------+--------+-------+
| hyj c001 | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| hyj c001 | 卸妆油 | NULL |
| hyj c001 | NULL | NULL |
+----------+--------+-------+
9 rows in set (0.00 sec)
mysql> select * from product where id regexp 'c00[^146]';
+---------+--------+-------+
| id | name | price |
+---------+--------+-------+
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| hyjc005 | 口红 | 59 |
| c007 | 粉底液 | 68 |
+---------+--------+-------+
4 rows in set (0.00 sec)
匹配范围
集合可用来定义要匹配的一个或多个字符.
如[123456789]这个集合将匹配数字0到9
[123456789]
简化版为[1-9]
[a-z]
将匹配任意字母字符
mysql> select * from product where id regexp 'c00[1-5]';
+----------+--------+-------+
| id | name | price |
+----------+--------+-------+
| hyj c001 | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| hyj c001 | 卸妆油 | NULL |
| hyj c001 | NULL | NULL |
+----------+--------+-------+
7 rows in set (0.00 sec)
上面使用了正则表达式c00[1-5]
,[1-5]
定义了一个范围,这个表达式的意思是匹配1到5
匹配特殊字符
正则表达式语言有具有特定含义的特殊字符构成.通过上面的知识,我们已经了解到.
,[]
,|
,-
等,还有其他一些字符,如果我们要找出包含这些字符的值,我们该怎样搜索呢?
为了匹配特殊字符,必须使用\\为前导,\\-
表示查找-
,\\.
表示查找.
,\\\
表示查找\
\\
(转义)也用来引用元字符(具有特殊含义的字符)
元字符 | 说明 |
---|---|
\\f | 换页 |
\\n | 换行 |
\\r | 回车 |
\\t | 制表 |
\\v | 纵向制表 |
mysql> select * from product;
+---------+--------+-------+
| id | name | price |
+---------+--------+-------+
| h.c001- | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| h.c001- | 卸妆油 | NULL |
| h.c001- | NULL | NULL |
+---------+--------+-------+
9 rows in set (0.00 sec)
mysql> select * from product where id regexp '\\.|\\-';
+---------+--------+-------+
| id | name | price |
+---------+--------+-------+
| h.c001- | 防晒霜 | 68 |
| h.c001- | 卸妆油 | NULL |
| h.c001- | NULL | NULL |
+---------+--------+-------+
3 rows in set (0.00 sec)
匹配字符类
类 | 说明 |
---|---|
[:alnum:] | 任意字母或数字(同[a-zA-Z0-9] ) |
[:alpha:] | 任意字符(同[a-zA-Z] ) |
[:blank:] | 空格和制表(同[\\t ]) |
[:cntrl:] | ASCII控制字符(ASCII0到31和127) |
[:digit:] | 任意数字(同[0-9] ) |
[:print:] | 任意可打印字符 |
[:graph:] | 与[:print:]相同,但不包括空格 |
[:lower:] | 任意小写字母(同[a-z] ) |
[:upper:] | 任意大写字母(同[A-Z] ) |
[:punct:] | 既不在[:alnum:]又不在[:cntrl:]中的任意字符 |
[:space:] | 包括空格在内的任意空白字符(同[\\f\\n\\r\\t\\v ]) |
[:xdigit:] | 任意十六进制数字(同[a-fA-F0-9] ) |
mysql> select * from product where id regexp '[:alnum:]';
+---------+--------+-------+
| id | name | price |
+---------+--------+-------+
| h.c001- | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| h.c001- | 卸妆油 | NULL |
| h.c001- | NULL | NULL |
+---------+--------+-------+
9 rows in set (0.01 sec)
mysql> select * from product where id regexp '[:blank:]';
Empty set (0.00 sec)
匹配多个实例
重复元字符
元字符 | 说明 | 通俗点地说 |
---|---|---|
* | 0个或多个匹配 | *前面的一个字符可出现0次或多次 |
+ | 1个或多个匹配(等于{1,}) | +前面的一个字符可出现1次或多次 |
? | 0个或1个匹配(等于{0,1} | ?前面的一个字符可出现0次或1次 |
{n} | 指定数目的匹配 | |
{n,} | 不少于指定数目的匹配 | |
{n,m} | 匹配指定数目的范围(m不超过255) |
mysql> select * from product;
+-----------+--------+-------+
| id | name | price |
+-----------+--------+-------+
| hy (c001) | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| hy (c001) | 卸妆油 | NULL |
| hy (c001) | NULL | NULL |
+-----------+--------+-------+
9 rows in set (0.00 sec)
mysql> select * from product where id regexp '(c[:digit:]{3})';
+-----------+--------+-------+
| id | name | price |
+-----------+--------+-------+
| hy (c001) | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| hy (c001) | 卸妆油 | NULL |
| hy (c001) | NULL | NULL |
+-----------+--------+-------+
9 rows in set (0.01 sec)
mysql> select * from product where id regexp '\\(c[:digit:]{3}\\)';
+-----------+--------+-------+
| id | name | price |
+-----------+--------+-------+
| hy (c001) | 防晒霜 | 68 |
| hy (c001) | 卸妆油 | NULL |
| hy (c001) | NULL | NULL |
+-----------+--------+-------+
3 rows in set (0.01 sec)
正则表达式\\(c[:digit:]{3}\\)
,\\(
匹配(
,\\)
匹配)
,[:digit:]
匹配任意数字(同[0-9]
),{3}
要求它前面的字符(任意数字)出现3次,所以[:digit:]{3}
匹配连接在一起的任意3位数字,[:digit:]{3}
也可以替换成[0-9][0-9][0-9]
mysql> select * from product;
+-----------+--------+-------+
| id | name | price |
+-----------+--------+-------+
| hy (c001) | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| hyhhc004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| abc003 | 卸妆油 | NULL |
| abc003 | NULL | NULL |
+-----------+--------+-------+
9 rows in set (0.00 sec)
mysql> select * from product where id regexp 'hyh?';
#?前面的h可出现0次或1次,因为?匹配它前面的字符0次或1次出现(?使h可选)
+-----------+--------+-------+
| id | name | price |
+-----------+--------+-------+
| hy (c001) | 防晒霜 | 68 | #?前面的h出现0次(也就是没有出现)
| hyhhc004 | 粉饼 | 58 | #?前面的h出现1次
| hyjc005 | 口红 | 59 | #?前面的h出现0次(也就是没有出现)
+-----------+--------+-------+
3 rows in set (0.00 sec)
mysql> select * from product where id regexp 'hyu*';
+-----------+--------+-------+
| id | name | price |
+-----------+--------+-------+
| hy (c001) | 防晒霜 | 68 | # *匹配它前面的字符u 0次出现
| hyhhc004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
+-----------+--------+-------+
3 rows in set (0.00 sec)
mysql> select * from product where id regexp 'hyu+';
Empty set (0.00 sec) # +使它前面的字符u 1次或多次出现,所以为空Empty
mysql> select * from product where id regexp 'hyu?';
+-----------+--------+-------+
| id | name | price |
+-----------+--------+-------+
| hy (c001) | 防晒霜 | 68 |
| hyhhc004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
+-----------+--------+-------+
3 rows in set (0.00 sec)
定位符
定位元字符
元字符 | 说明 |
---|---|
^ | 文本的开始 |
$ | 文本的结尾 |
[[:<:]] | 词的开始 |
[[:>:]] | 词的结尾 |
mysql> select * from product;
+-----------+--------+-------+
| id | name | price |
+-----------+--------+-------+
| hy (c001) | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| hyhhc004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| abc003 | 卸妆油 | NULL |
| abc003 | NULL | NULL |
+-----------+--------+-------+
9 rows in set (0.00 sec)
mysql> select * from product where id regexp 'c[:digit:]{3}';
#检索id字段中包含文本'c[:digit:]{3}'的所有行信息
+-----------+--------+-------+
| id | name | price |
+-----------+--------+-------+
| hy (c001) | 防晒霜 | 68 |
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| hyhhc004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| abc003 | 卸妆油 | NULL |
| abc003 | NULL | NULL |
+-----------+--------+-------+
9 rows in set (0.00 sec)
mysql> select * from product where id regexp '^c[:digit:]{3}';
# ^匹配串的开始,因此,^c[:digit:]{3}只在c和3个数字为串的开始时才匹配它们
+------+--------+-------+
| id | name | price |
+------+--------+-------+
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
+------+--------+-------+
4 rows in set (0.00 sec)
mysql> select * from product where id regexp 'c[:digit:]{3}$';
# $匹配串的结尾,因此,c[:digit:]{3}$只在c和3个数字为串的结尾时才匹配它们
+----------+--------+-------+
| id | name | price |
+----------+--------+-------+
| c002 | 护手霜 | 23 |
| c003 | 面膜 | 79 |
| hyhhc004 | 粉饼 | 58 |
| hyjc005 | 口红 | 59 |
| c006 | 水乳 | 68 |
| c007 | 粉底液 | 68 |
| abc003 | 卸妆油 | NULL |
| abc003 | NULL | NULL |
+----------+--------+-------+
8 rows in set (0.00 sec)
^
的双重用途:^
有两种用法,在集合中(用[和]定义),用它来否定该集合;否则,用来值串的开始处.
简单的正则表达式测试:
#regexp检查总是返回0(没有匹配)或1(匹配)
mysql> select 'hello' regexp '[0-9]';#因为文本'hello'中没有数字,所以返回0
+------------------------+
| 'hello' regexp '[0-9]' |
+------------------------+
| 0 |
+------------------------+
1 row in set (0.00 sec)
mysql> select 'hello' regexp '[:lower:]';#因为文本'hello'中有小写字母,所以返回1
+----------------------------+
| 'hello' regexp '[:lower:]' |
+----------------------------+
| 1 |
+----------------------------+
1 row in set (0.00 sec)