import re python_Python re模块

weixin_39888018

于 2020-12-11 05:23:43 发布

阅读量817

点赞数

文章标签： import re python

re 模块

正则表达式

需要使用re模块，re模块用于对python的正则表达式的操作

语法

import re #导入模块名

# 生成要匹配的正则对象， ^代表从开头匹配，[0-9]代表匹配0至9的任意一个数字，所以这里的意思是对传进来的字符串进行匹配，如果这个字符串的开头第一个字符是数字，就代表匹配上了

p = re.compile("^[0-9]")

# 按上面生成的正则对象去匹配字符串，如果能匹配成功，这个m就会有值，否则m为None
if m: #不为空代表匹配上了

m = p.match('14534Abc')

# m.group()返回匹配上的结果，此处为1，因为匹配上的是1这个字符
else:
　　print("doesn't match.")

print(m.group())

上面的第2 和第3行也可以合并成一行来写：

m = p.match("^[0-9]",'14534Abc')

效果是一样的，区别在于：

第一种方式是提前对要匹配的格式进行了编译(对匹配公式进行解析)，这样再去匹配的时候就不用在编译匹配的格式

第二种简写是每次匹配的时候都要进行一次匹配公式的编译

所以，如果你需要从一个5w行的文件中匹配出所有以数字开头的行，建议先把正则公式进行编译再匹配，这样速度会快点

正则表达式元字符：

字符匹配：

. ：除换行符以外的任意单个字符

[] ：指定范围内字符

[^] ：指定范围外字符

次数匹配：

* ：任意次，0，1，多次

.* ：任意字符任意次

? ：至多1次或0次

+ ：至少出现1次或多次

{m} ：其前面字符出现m次

{m,n} ：其前面字符出现至少m次，至多n次

{m,} ：其前面字符出现至少m次

{,n} ：其前面字符出现至多n次

位置锚定：

^ ：匹配字符串的开头

$ ：匹配字符串的末尾

分组及引用：

() ：分组，括号内模式会被记录于正则表达式引擎

后向引用：\1 \2 \3.....

或：

a|b ：a或者b

C|cat ：C或cat

(C|c)at ：Cat或cat

转义字符：

\w ：匹配字母数字

\W ：匹配非字母数字

\s ：匹配任意空白字符，等价于 [\t\n\r\f].

\S ：匹配任意非空字符

\d ；匹配任意数字，等价于 [0-9].

\D ：匹配任意非数字

\A ：匹配字符串开始

\Z ：匹配字符串结束，如果是存在换行，只匹配到换行前的结束字符串

\z ：匹配字符串结束

\G ：匹配最后匹配完成的位置。

\b ：匹配一个单词边界，也就是指单词和空格间的位置。例如， 'er\b' 可以匹配"never" 中的 'er'，但不能匹配 "verb" 中的 'er'。

\B ：匹配非单词边界。'er\B' 能匹配 "verb" 中的 'er'，但不能匹配 "never" 中的 'er'。

\n ：匹配一个换行符

\t ：匹配一个制表符

\1...\9 ：匹配第n个分组的子表达式

正则表达式常用5种操作：

1、re.match(pattern, string, flags=0)

从起始位置开始根据模型去字符串中匹配指定内容，匹配单个

正则表达式

要匹配的字符串

标志位，用于控制正则表达式的匹配方式

import re

obj = re.match('\d+', '957evescn')

if obj:

print(obj.group())

# 输出结果

957

标志位

# flags

I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case

L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale

U = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale

M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline

S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline

X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments

2、re.search(pattern, string, flags=0)

匹配整个字符串，返回第一个符合条件的匹配

import re

obj = re.search('\d+', 'gmkk957evescn')

if obj:

print(obj.group())

# 输出结果

957

3、group和groups

import re

a = "123abc456"

print(re.search("([0-9]*)([a-z]*)([0-9]*)", a).group())

print(re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(0))

print(re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1))

print(re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(2))

print(re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(3))

print(re.search("([0-9]*)([a-z]*)([0-9]*)", a).groups())

# 输出结果

123abc456

123

abc

456

('123', 'abc', '456')

4、re.findall(pattern, string, flags=0)

找到所有要匹配的字符并返回列表格式

import re

obj = re.findall('\D+', 'evescn666gmkk')

print(obj)

# 输出结果

['evescn', 'gmkk']

5、re.sub(pattern, repl, string, count=0, flags=0)

替换匹配到的字符

import re

content = "123abc456"

new_content = re.sub('\d+', 'sb', content)

# new_content = re.sub('\d+', 'sb', content, 1)

print(new_content)

# 输出结果

sbabcsb

相比于str.replace功能更加强大

6、re.split(pattern, string, maxsplit=0, flags=0)

将匹配到的格式当做分割点对字符串分割成列表

import re

content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"

new_content = re.split('\*', content)

# new_content = re.split('\*', content, 1)

print(new_content)

###### 输出结果

["'1 - 2 ", ' ((60-30+1', '(9-2', '5/3+7/3', '99/4', '2998+10', '568/14))-(-4', '3)/(16-3', "2) )'"]

["'1 - 2 ", " ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"]

######

content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"

new_content = re.split('[\+\-\*\/]+', content)

# new_content = re.split('[\+\-\*\/]+', content, 1)

print(new_content)

###### 输出结果

["'1 ", ' 2 ', ' ((60', '30', '1', '(9', '2', '5', '3', '7', '3', '99', '4', '2998', '10', '568', '14))', '(', '4', '3)', '(16', '3', "2) )'"]

["'1 ", " 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"]

######

inpp = '1-2*((60-30 +(-40-5)*(9-2*5/3 + 7 /3*99/4*2998 +10 * 568/14 )) - (-4*3)/ (16-3*2))'

inpp = re.sub('\s*', '', inpp)

print(inpp)

new_content = re.split('$([\+\-\*\/]?\d+[\+\-\*\/]?\d+){1}$', inpp, 1)

print(new_content)

###### 输出结果

1-2*((60-30+(-40-5)*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2))

['1-2*((60-30+', '-40-5', '*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2))']

######

几个常见正则例子：

匹配手机号

import re

phone_str = "my name is evescn, and my phone number is 18111555666"

m = re.search("(1)([358]\d{9})",phone_str)

if m:

print(m.group())

# 输出结果

18111555666

匹配IPv4

ip_addr = "inet 172.19.133.212 brd 172.19.143.255"

m = re.search("(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}", ip_addr)

print(m.group())

# 输出结果

172.19.133.212

分组匹配地址

contactInfo = 'Evescn, ChengDu: 028-8888888'

# 分组

match = re.search(r'(\w+), (\w+): (\S+)', contactInfo)

"""

>>> match.group(1)

'Evescn'

>>> match.group(2)

'ChengDu'

>>> match.group(3)

'028-8888888'

"""

# 分组

match = re.search(r'(?P\w+), (?P\w+): (?P\S+)', contactInfo)

"""

>>> print(match.group('name'))

'Evescn'

>>> print(match.group('addr'))

'ChengDu'

>>> print(match.group('phone'))

'028-8888888'

"""

匹配email

email = "evescn.gmkk@163.com http://blog.evescn.com"

m = re.search(r"[0-9.a-z]{0,26}@[0-9.a-z]{0,20}.[0-9a-z]{0,8}", email)

print(m.group())

# 输出结果

evescn.gmkk@163.com

转载自

http://www.cnblogs.com/alex3714/articles/5143440.html

weixin_39888018

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
import re python_Python re模块

re 模块正则表达式需要使用re模块，re模块用于对python的正则表达式的操作语法import re #导入模块名# 生成要匹配的正则对象， ^代表从开头匹配，[0-9]代表匹配0至9的任意一个数字，所以这里的意思是对传进来的字符串进行匹配，如果这个字符串的开头第一个字符是数字，就代表匹配上了p = re.compile("^[0-9]")# 按上面生成的正则对象去匹配字符串，如果能...
复制链接

扫一扫