python3 正则表达式

最新推荐文章于 2022-08-09 14:23:50 发布

@天道酬勤@

最新推荐文章于 2022-08-09 14:23:50 发布

阅读量469

点赞数

分类专栏： python 文章标签： python 正则表达式 regex

本文链接：https://blog.csdn.net/su_mingyang/article/details/118461365

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1、正则表达式基本概念

正则表达式:提取字符串的规则，使用特定语法表达，匹配满足规则的字符串

代码:

import re
# 定义一个匹配规则
pattern = re.compile('apple')
res1 = pattern.match('apple')
res2 = pattern.match('word')
print(res1)
print(res2)
print(type(res1))
print(type(res2))
# 第一个参数是规则，第二个参数是匹配的字符穿
res3 = re.match('a','abcderaaa')
print(type(res3))
print(res3)

结果:

<re.Match object; span=(0, 5), match='apple'>
None
<class 're.Match'>
<class 'NoneType'>
<class 're.Match'>
<re.Match object; span=(0, 1), match='a'>

2、python正则re模块

###2.1 re特定字符

特殊字符	说明
.	匹配任意字符（除了\n）
[]	匹配其中任意一个字符
\d	匹配一个数字
\D	匹配非数字
\s	匹配空格和tab键
\S	匹配非空白
\	转义符
\w	匹配非特殊字符
\W	匹配特殊字符
I	忽略大小写
S	匹配换行

2.1.1 ”.“的基本使用

代码示例:

import re
# . 的用法，group:匹配出来的字符串
result = re.match('...',"a23")
if result:
    print(result.group())
# . 不支持\n
result2 = re.match('...','\n')
print(result2)

结果:

a23
None

2.1.2 反斜杠\对”."进行转义

代码示例:

import re
# 使用\对"."进行转义
result1 = re.match('1\.2','1#2')
result2 = re.match('1.2','1#2')
print(result1)
print(result2)

结果:

None
<re.Match object; span=(0, 3), match='1#2'>

2.1.3 []的使用

代码示例：

import re

result1 = re.match('[1234]','12341235512344')
# "."在中括号内代表字符串
result2 = re.match('[.]','12341235512344')
# 设置范围a-z,0-9等
result3 = re.match('[0-9]','12341235512344')
result4 = re.match('[a-z]','afawvawegfawegasedgasdf')
result5 = re.match('[a-z0-9]','a1231fawvawegfawegasedgasdf')
result6 = re.match('[a-z0-9-]','-a1231fawvawegfawegasedgasdf')
result7 = re.match('[\[]','[][]')
# ^:在中括号内只表示取反
result8 = re.match('[^0-9]','-a1231fawvawegfawegasedgasdf')
print(result1)
print(result2)
print(result3)
print(result4)
print(result5)
print(result6)
print(result7)
print(result8)

结果:

<re.Match object; span=(0, 1), match='1'>
None
<re.Match object; span=(0, 1), match='1'>
<re.Match object; span=(0, 1), match='a'>
<re.Match object; span=(0, 1), match='a'>
<re.Match object; span=(0, 1), match='-'>
<re.Match object; span=(0, 1), match='['>
<re.Match object; span=(0, 1), match='-'>

2.1.4 匹配数字

示例代码:

import re

# \d:只匹配一个
result1 = re.match('\d','12341235512344')
result2 = re.match('\D','123asdf41235512344')
result3 = re.match('\D','d123asdf41235512344')
print(result1)
print(result2)
print(result3)

结果:

<re.Match object; span=(0, 1), match='1'>
None
<re.Match object; span=(0, 1), match='d'>

3、原始字符串r

示例代码:

# r:原始字符串，自动将特殊字符转化成字符串,"."除外
str1 = 'hello \n word!!!'
str2 = r'hello \n word'
str = 'abcd邮箱123_@163#com'
result1 = re.match(r'\w{4,20}@163\.com', str)
result2 = re.match(r'\w{4,20}@163.com', str)
print(result1)
print(result2)
print(str1)
print(str2)

结果:

None
<re.Match object; span=(0, 18), match='abcd邮箱123_@163#com'>
hello 
 word!!!
hello \n word

4、规则匹配次数

规则	含义
*	代表零次或者无限次
+	一次或者无限次
?	零次或者一次
{n}	n次
{n,}	至少n次
{n,m}	n到m次

示例代码:

import re

result1 = re.match('.*','abc')
result2 = re.match('\d*','abc') # *:代表0或这无限次
result3 = re.match('\d+','abc') # +：代表1次或者无限次
result4 = re.match('\d?','123abc') # ?：代表1次或者0次
result5 = re.match('\d{2}','123abc') # {}:只匹配两个,少于两个为none
result6 = re.match('\d{9}','123abc')
result7 = re.match('\d{2,}','123abc') # {}:最少匹配两个，否则为none
result8 = re.match('\d{2,5}','1231231414124abc') # {}:最少匹配两个，最多匹配五个,否则为none

print(result1)
print(result2)
print(result3)
print(result4)
print(result5)
print(result6)
print(result7)
print(result8)

结果:

<re.Match object; span=(0, 3), match='abc'>
<re.Match object; span=(0, 0), match=''>
None
<re.Match object; span=(0, 1), match='1'>
<re.Match object; span=(0, 2), match='12'>
None
<re.Match object; span=(0, 3), match='123'>
<re.Match object; span=(0, 5), match='12312'>

5、规则匹配字符边界

特殊字符	含义
^	以什么开头
$	以什么结尾
\b	匹配单词边界
\B	匹配非单词边界
\|	或者，匹配其中任意一个正则表达式
（ab）	将括号作为一个分组
\num	引用分组索引
（?P<name>）	分组别名
（?P=name）	引用分组别名

代码实例一:

import re

str = 'abcd邮箱123_@163.com'
str1 = 'abcd邮箱126_@126.com'
str2 = 'abcd邮箱qq_@qq.com'
str3 = 'abcd邮箱qq_@qq.com1234'
result1 = re.match(r'\w{4,20}@163.com', str)
result2 = re.match(r'\w{4,20}@163\.com', str)
result3 = re.match(r'\w{4,20}@(?:163|126|qq)\.com', str1)
result4 = re.match(r'\w{4,20}@(?:163|126|qq)\.com', str2)
result5 = re.match(r'\w{4,20}@(?:163|126|qq)\.com$', str3)
result6 = re.match(r'\w{4,20}@(?:163|126|qq)\.com', str3)
print(result1)
print(result2)
print(result3)
print(result4)
print(result5)
print(result6)

结果:

<re.Match object; span=(0, 18), match='abcd邮箱123_@163.com'>
<re.Match object; span=(0, 18), match='abcd邮箱123_@163.com'>
<re.Match object; span=(0, 18), match='abcd邮箱126_@126.com'>
<re.Match object; span=(0, 16), match='abcd邮箱qq_@qq.com'>
None
<re.Match object; span=(0, 16), match='abcd邮箱qq_@qq.com'>

代码实例二:

import re

str = 'word'
result =  re.match(r'\w+or\b', str)
result1 =  re.match(r'\w+or\b', "wor d")
result2 =  re.match(r'\w+or\B', str)
print(result)
print(result1)
print(result2)

结果:

None
<re.Match object; span=(0, 3), match='wor'>
<re.Match object; span=(0, 3), match='wor'>

代码实例三：

import re

str = '<h1>haha</h1>'
str1 = '<h1>haha</h2>'
result = re.match(r'<\w+>.+</\w+>',str)
result1 = re.match(r'<\w+>.+</\w+>',str1)
# 分组
result3 = re.match(r'<(\w+)>.+</(\1)>',str) # 引用分组索引
result5 = re.match(r'<(\w+)>.+</(?:\1)>',str) # ?: 不取值,取group2会报错
result4 = re.match(r'<(\w+)>.+</(\1)>',str1)
print(result)
print(result1)
print("="*10+"开始使用分组"+"="*10)
print(result3)
print(result3.group()) # 默认是0,取所有
print(result3.group(1))
print(result3.group(2))
print(result4)
print(result5.group())

结果：

<re.Match object; span=(0, 13), match='<h1>haha</h1>'>
<re.Match object; span=(0, 13), match='<h1>haha</h2>'>
==========开始使用分组==========
<re.Match object; span=(0, 13), match='<h1>haha</h1>'>
<h1>haha</h1>
h1
h1
None
<h1>haha</h1>
13

代码实例四:

import re

str = '<h1><li>haha</li></h1>'
result = re.match(r'<(?P<g1>\w+)><(?P<g2>\w+)>.+</(?P=g2)></(?P=g1)>', str)
print(result)

结果:

<re.Match object; span=(0, 22), match='<h1><li>haha</li></h1>'>

6、贪婪模式与非贪婪模式

实例代码：

import re

str = 'weaaaaaaa'
result = re.match(r'wea+', str)  # 贪婪模式
result1 = re.match(r'wea+?', str)  # 非贪婪模式
print(result)
print(result1)
str2 = '<div>1</div><div>2</div>'
res1 = re.match(r'<div>.*</div', str2)
res2 = re.match(r'<div>.*?</div', str2)
print(res1)
print(res2)

结果:

<re.Match object; span=(0, 9), match='weaaaaaaa'>
<re.Match object; span=(0, 3), match='wea'>
<re.Match object; span=(0, 23), match='<div>1</div><div>2</div'>
<re.Match object; span=(0, 11), match='<div>1</div'>

7、正则其他方法

代码实例一:

import re

str = '中国有960万平方公里,拥有15亿人口'
res1 = re.match(r'\w+?(\d+).*?(\d+).*',str)
print(res1)
print(res1.group(1))
print(res1.group(2))
res2 = re.search(r'\d+',str) # 只返回第一个
print(res2)
res3 = re.search(r'(\d+).*?(\d+)',str)
print(res3.group(1))
print(res3.group(2))
res4 = re.findall(r'\d+', str) # 返回所有符合条件的列表类型的值
print(res4)

结果:

<re.Match object; span=(0, 19), match='中国有960万平方公里,拥有15亿人口'>
960
15
<re.Match object; span=(3, 6), match='960'>
960
15
['960', '15']

实例代码二：

import re

str = '我今年18岁,我的身高188'
res = re.split('\d+', str)
print(res)
res3 = re.sub("\d+",'28',str)
res4 = re.sub("\d+",'28',str,1)
print(res3)
print(res4)

结果:

['我今年', '岁,我的身高', '']
我今年28岁,我的身高28
我今年28岁,我的身高188

@天道酬勤@

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python3 正则表达式

python3 re的使用Python 正则表达式的使用python 如何使用正则表达式python 正则表达式案例pythone re 案例
复制链接

扫一扫

专栏目录

python3 正则表达式

1、正则表达式基本概念

2、python正则re模块

2.1.1 ”.“的基本使用

2.1.2 反斜杠\对”."进行转义

2.1.3 []的使用

2.1.4 匹配数字

3、原始字符串r

4、规则匹配次数

5、规则匹配字符边界

6、贪婪模式与非贪婪模式

7、正则其他方法

“相关推荐”对你有帮助么？