python正则表达式的学习

最新推荐文章于 2022-11-28 15:43:49 发布

一枚努力的程序猿

最新推荐文章于 2022-11-28 15:43:49 发布

阅读量266

点赞数

分类专栏： python

python 专栏收录该内容

54 篇文章 1 订阅

订阅专栏

学习一下python的正则表达式的用法

python中需要通过正则表达式对字符串进行匹配，使用re模块
需要3个步骤

导入模块
使用match方法进行匹配操作 result=re.match(正则表达式，要匹配的字符串)
result.group() 如果上一步匹配到数据，使用group方法来提取数据。

re.match是用来进行正则匹配检查的方法，若字符串匹配正则表达式，则match方法返回匹配对象（Match Object）,否则返回None(不是空字符串“”“”)

匹配对象Match Object具有group方法，用来返回字符串的匹配部分。

注意^的用法

[^a]表示“匹配除了a的任意字符”。[^a-zA-Z0-9]表示“找到一个非字母也非数字的字符”。[\^abc]表示“找到一个插入符或者a或者b或者c”。[^\^]表示“找到除了插入符外的任意字符”。（呕！）

只要是”^”这个字符是在中括号”[]”中被使用的话就是表示字符类的否定，如果不是的话就是表示限定开头。我这里说的是直接在”[]”中使用，不包括嵌套使用。
其实也就是说”[]”代表的是一个字符集，”^”只有在字符集中才是反向字符集的意思

小括号就是括号内看成一个整体，中括号就是匹配括号内的其中一个，大括号就是匹配几次

re.match("xxx","xxxxxx")能够匹配出以"xxx”开头的字符串

例子：

python的原生字符串,r,使得反斜杠使用更清晰

正则表达式使用"\"作为转义字符，若需要匹配文本中的字符"\"，使用编程语言表示的正则表达式将需要4个反斜杠"\\"：前两个和后两个分别用于在编程语言里转义成反斜杠，转换成两个反斜杠后再在正则表达式里转义成一个反斜杠。

ret=re.match(r"c:\\a","c:\\a")------ret.group()结果：c:\\a; print(ret.gropu())---结果：c:\a

使用方法：

1、通过$来确定末尾

>>> ret=re.match("[\w]3$","h3")
>>> ret.group()
'h3'
>>>
>>> ret=re.match("[\w]{3,8}3$","h3llo3")
>>> ret.group()
'h3llo3'

2、^匹配字符串开头

>>> ret=re.match("^hello","hello")
>>> ret.group()
'hello'

>>> ret=re.match("^hello","hello122")
>>> ret.group()
'hello'

3、匹配字符串开头和结尾: ^ $

>>> ret=re.match("^hello","hello")
>>> ret.group()
'hello'
>>>
>>> ret=re.match("^hello111$","hello111")
>>> ret.group()
'hello111'
>>>
>>> ret=re.match("^hello1$","hello1")
>>> ret.group()
'hello1'

4、\b匹配一个单词的边界

>>> re.match(r".*\bver\b","how ver you").group()
'how ver'
其中.表示匹配任意字符 *匹配前一个字符出现0次或者无限次，即匹配一个字符出现与否
>>> re.match(r".*\bver\b"," ver you").group()
' ver'
>>> re.match(r".*\b hello world \b","I print : hello world , when i learn python").group()
Traceback (most recent call last):
File "<pyshell#70>", line 1, in <module>
re.match(r".*\b hello world \b","I print : hello world , when i learn python").group()
AttributeError: 'NoneType' object has no attribute 'group'
>>> re.match(r".*\b hello world \b","I print : hello world ").group()
Traceback (most recent call last):
File "<pyshell#71>", line 1, in <module>
re.match(r".*\b hello world \b","I print : hello world ").group()
AttributeError: 'NoneType' object has no attribute 'group'

出现错误时因为.匹配任意字符，不匹配：
>>> re.match(r".*\bhello world\b","I print hello world").group()
'I print hello world'
>>> re.match(r".*\bhello world\b","I print hello world ").group()
'I print hello world'
>>> re.match(r".*\bhello world\b","I print hello world when").group()
'I print hello world'

5、匹配非单词的边界----不是作为一个单词的边界，不是作为单词

>>> re.match(r".*\Bver\B","iamsovery").group()
'iamsover'
>>> re.match(r".*\Bver\B","iamso very").group()
Traceback (most recent call last):
File "<pyshell#84>", line 1, in <module>
re.match(r".*\Bver\B","iamso very").group()
AttributeError: 'NoneType' object has no attribute 'group'
>>> re.match(r".*\Bver\B","iamsover y").group()
Traceback (most recent call last):
File "<pyshell#85>", line 1, in <module>
re.match(r".*\Bver\B","iamsover y").group()
AttributeError: 'NoneType' object has no attribute 'group'

例子：

1、 | 匹配左右任意一个表达式

|表达式左边或右边有匹配的就可以

>>> re.match("[1-9]?\d$|100","07").group()
Traceback (most recent call last):
File "<pyshell#103>", line 1, in <module>
re.match("[1-9]?\d$|100","07").group()
AttributeError: 'NoneType' object has no attribute 'group'
>>> re.match("[1-9]?\d$|100","100").group()
'100'
>>>
>>>
>>> re.match("[1-9]?","07").group()
''
>>> re.match("[1-9]?\d","07").group()
'0'
>>> re.match("[1-9]?\d$","07").group()
Traceback (most recent call last):
File "<pyshell#109>", line 1, in <module>
re.match("[1-9]?\d$","07").group()
AttributeError: 'NoneType' object has no attribute 'group'
>>>
>>> re.match("[1-9]?\d$","7").group()
'7'
>>> re.match("[1-9]?\d$","0").group()
'0'
>>>
>>> re.match("[1-9]?\d$","70").group()
'70'

^可以做否定符，在[ ]中使用时

>>> re.match("([^-]*)-(\d+)","010-9387573").group()
'010-9387573'
>>> re.match("([^-]*)-(\d+)","abdc-9387573").group()
'abdc-9387573'

匹配<html>hh</html>

>>> re.match("<[a-zA-Z]*>\w*</[a-zA-Z]*>","<html>hh</html>").group()
'<html>hh</html>'
>>> re.match("<[a-zA-Z]*>\w*</[a-zA-Z]*>","<html>hh</htmld>").group()
'<html>hh</htmld>'

由上面的可以看出<>中后面的不匹配，结果不对，上述方法不对，采用分组的方法。

>>> re.match(r"<([a-zA-Z]*)>\w*</\1>","<html>hh</html>").group()
'<html>hh</html>'

>>> re.match(r"<([a-zA-Z]*)>\w*</\1>","<html>hh</htmlgg>").group()
Traceback (most recent call last):
File "<pyshell#152>", line 1, in <module>
re.match(r"<([a-zA-Z]*)>\w*</\1>","<html>hh</htmlgg>").group()
AttributeError: 'NoneType' object has no attribute 'group'

re.match(r"<([a-zA-Z]*)>\w*</\1>\d<\1>\d</\1>","<html>hh</html>9<html>8</html>").group()
'<html>hh</html>9<html>8</html>'

\num引用分组num匹配的字符串

>>> re.match(r"<(\w*)><(\w*)>.*<(/\1)><(/\2)>","<html><h1>www.itcast<></h1></html>").group()
Traceback (most recent call last):
File "<pyshell#166>", line 1, in <module>
re.match(r"<(\w*)><(\w*)>.*<(/\1)><(/\2)>","<html><h1>www.itcast<></h1></html>").group()
AttributeError: 'NoneType' object has no attribute 'group'
>>>
>>> re.match(r"<(\w*)><(\w*)>.*<(/\2)><(/\1)>","<html><h1>www.itcast<></h1></html>").group()
'<html><h1>www.itcast<></h1></html>'

（?P<name>）分组起别名

re.match(r"<(?P<name1>\w*)><(?P<name2>\w*)>.*</(?P=name2)></(?P=name1)>","<html><h1>www.itcast<></h1></html>").group()
'<html><h1>www.itcast<></h1></html>'

re模块的其他用法

search检索数字

>>> re.search(r"\d+","阅读次数为99").group()
'99'

findall检索多个

>>> ret=re.findall(r"\d+","python=993,c++=8673,c#=934")
>>> print(ret)
['993', '8673', '934']

sub将匹配的数据进行替换

>>> ret=re.sub(r"\d+",'8888',"python阅读量为8567")
>>> print(ret)
python阅读量为8888

采用另一种方法：

#coding=utf-8
import re
def add(temp):
strNum=temp.group()
num=int(strNum)+100
return str(num)
ret=re.sub(r"\d+",add,"python的阅读量为10000")
rett=re.sub(r"\d+",add,"C的阅读量为100000")
print(ret)
print(rett)

----------------------------------------

>>>
python的阅读量为10100
C的阅读量为100100

split根据匹配进行切割字符串，并返回一个列表

>>> ret=re.split(r":| ","hello:world I know some things")
>>> print(ret)
['hello', 'world', 'I', 'know', 'some', 'things']

python的贪婪和非贪婪

贪婪：尝试匹配尽可能多的字符

非贪婪：尝试匹配尽可能少的字符

.+ 一个或更多的任意字符

>>> s="the num is 345-143-1523-53"
>>> r=re.match(".+(\d+-\d+-\d+-\d+)",s)
>>> r.group()
'the num is 345-143-1523-53'
>>> r.group(1)
'5-143-1523-53'
>>> r.group(0)
'the num is 345-143-1523-53'

.+匹配尽可能多的字符，匹配了从字符串起始到第一位数字5之前的所有字符，\d只匹配一个或者无限多就可以。

？出现0次或者1次，加上?，要求正则匹配的越少越好。

>>> rr=re.match(".+?(\d+-\d+-\d+-\d+)",s)
>>> rr.group(1)
'345-143-1523-53'

...

一枚努力的程序猿

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式的学习

学习一下python的正则表达式的用法python中需要通过正则表达式对字符串进行匹配，使用re模块需要3个步骤导入模块使用match方法进行匹配操作 result=re.match(正则表达式，要匹配的字符串) result.group() 如果上一步匹配到数据，使用group方法来提取数据。re.match是用来进行正则匹配检查的方法，若字符串匹配正则表达
复制链接

扫一扫