python的正则表达式用法_53012119760427732x-CSDN博客

本文链接：https://blog.csdn.net/qq_39249347/article/details/104264793

match函数

import re
# 匹配某个字符串
text = 'hello'
# match接受两个参数（正则表达式,要匹配的字符串),从字符串的头开始匹配,如果第一个字符不匹配就认为找不到
res = re.match('he', text)
# group()把匹配的字符串打印出来
print(res.group())
输出：he

.:可以匹配任意字符，但是不能匹配换行符\n,如果需要匹配换行符需要添加re.DOTALL参数

import re
text = 'hello'
res = re.match('.', text)
print(res.group())
输出：h

/d:匹配任意数字0-9

import re
text = '1'
res = re.match('\d', text)
print(res.group())
输出：1

\D:匹配任意的非数字

import re
text = '+'
res = re.match('\D', text)
print(res.group())
输出：+

\s:匹配空白字符包括(\n,\t,\r,空格)

import re
text = ' '
res = re.match('\s', text)
print(res.group())
输出：

\w:匹配的是a-z和A-Z以及数字下划线

import re
text = '_'
res = re.match('\w', text)
print(res.group())
输出：_

\W:匹配正好与小写w相反,小写w匹配不到的大写W都能匹配到

import re
text = '+'
res = re.match('\W', text)
print(res.group())
输出：+

[]:组合的方式，只要满足中括号中的字符就可以匹配

import re
text = '0376-888888888adads'
res = re.match('[\d\-]+', text)
print(res.group())
输出：0376-888888888

之前提到的几种匹配规则可以用中括号代替：

\d : [0-9]
\D : [ ^0-9 ]
\w : [0-9a-zA-Z_]
\W: [ ^0-9a-zA-Z_ ]

匹配多个字符

*:可以匹配0或任意多个字符

import re
text = '0376'
res = re.match('\d*', text)
print(res.group())
输出：0376

+:匹配1个或者多个字符

import re
text = 'ab+cd'
res = re.match('\w+', text)
print(res.group())
输出：ab
Ps：当匹配不到会报错

?:匹配一个或0个

import re
text = 'ab+cd'
res = re.match('\w?', text)
print(res.group())
输出：a

{m}:匹配m个字符

import re
text = 'abcd'
res = re.match('\w{2}', text)
print(res.group())
输出：ab

{m,n}:匹配m-n个字符

import re
text = 'abcd'
res = re.match('\w{1,3}', text)
print(res.group())
输出：abc
以匹配最多的方式匹配，即贪婪模式

实例

验证手机号码，第一位必须以1开头，第二位必须是34578，后面9位可以是任意数字

import re

text = '15517672121'
res = re.match('1[34578]\d{9}', text)
print(res.group())
输出：15517672121

匹配邮箱:邮箱规则是邮箱名称是数字，字母，下划线组成的，然后是@符号，后面就是域名了

import re

text = '155176@qq.com'
res = re.match('\w+@[a-zA-Z0-9]+\.[a-z]+', text)
print(res.group())
输出：155176@qq.com

验证url：前面是http或https或ftp然后再加上一个冒号，再加上两个斜杠，再后面就是可以出现任意非空白字符了。

import re

url = 'https://www.baidu.com/'
res = re.match('(http|https|ftp)://[^s]+', url)
print(res.group())
输出：https://www.baidu.com/

验证身份证：前17位是数字，第18位可以是数字，x，X。

import re

url = '53012119760427732X'
res = re.match('\d{17}[\dxX]', url)
print(res.group())
输出：53012119760427732X

几个常用的符号

^(脱字号)：表示以…开始

import re

text = 'hello'
res = re.match('^h', text)
print(res.group())
输出：h

如果在中括号中，代表的是取反操作。

$：表示以…结尾

import re
text = 'hello@qq.com'
res = re.match('\w+@qq.com$', text)
print(res.group())
输出：hello@qq.com

|:匹配多个表达式或者字符串

import re
text = 'ftp'
res = re.match('http|https|ftp', text)
print(res.group())
输出：ftp

贪婪模式与非贪婪模式

贪婪模式尽量匹配多的字符
import re
text = '2121212'
res = re.match('\d+', text)
print(res.group())
输出：2121212
使用?开启非贪婪模式，直匹配符合条件最小的匹配结果
import re
text = '2121212'
res = re.match('\d+?', text)
print(res.group())
输出：2

转义字符和原生字符串

在正则表达式中，有些字符串是有特殊意义的字符。因此如果想要匹配这些字符，那么就必须使用反斜杠进行转义。

import re

text = 'macbookpro price is $3000'
res = re.search('\$\d+', text)
print(res.group())
输出：$3000

在python和正则表达式中’'都是用来做转义的,使用原生字符串后python不在进行转义

import re

text = '\c'
res = re.search(r'\\c', text)
print(res.group())
输出：\c

正则表达式常用函数

group()

在正则表达式中，可以对过滤到的字符串进行分组，分组使用圆括号的方式。
group：和group（0）是等价的，返回的是整个满足条件的字符串
groups：返回的是里面的子组，索引从1开始
group（1）：返回的是第一个子组，可以传入多个

import re
text = "apple's price $100,orange's price is $30"
res = re.search(r'.*(\$\d+).*(\$\d+)', text)
print(res.group(0))
print(res.group(1))
print(res.group(2))
print(res.group(1,2))
print(res.groups())
输出：
apple's price $100,orange's price is $30
$100
$30
('$100', '$30')
('$100', '$30')

findall

找到所有满足条件的，返回的是一个列表

import re

text = "apple's price $100,orange's price is $30"
res = re.findall(r'\$\d+', text)
print(res)
输出：['$100', '$30']

sub用来替换字符串，将匹配到的字符串替换成另一个字符串

import re
text = "apple's price $100,orange's price is $30"
#第一个参数表示模式，第二个表示要替换成的字符串，第三个表示源字符串
res = re.sub(r'\$\d+', '$200', text)
print(res)
输出：apple's price $200,orange's price is $200

split用来分割字符串，返回的是一个列表

import re

text = "hello python"
res = re.split(' ', text)
print(res)
输出：['hello', 'python']

compile

对于一个经常要用到的正则表达式，可以使用compile进行编译，后期再使用的时候可以直接拿过来用，执行效率会更快。

import re

text = "the number is 20.50"
r = re.compile('\d+\.?\d*')
res = re.search(r, text)
print(res.group())
输出：20.50