【PYTHON】Regex 正则表达式

Dxcxcdcdx

已于 2024-05-28 22:22:49 修改

阅读量367

点赞数 4

分类专栏： Python 文章标签： python 正则表达式

于 2024-05-28 22:16:49 首次发布

本文链接：https://blog.csdn.net/d516715596/article/details/139279176

版权

Python 专栏收录该内容

30 篇文章 0 订阅

订阅专栏

+	每个字符多个加号，指多个匹配，而不是匹配一个字符。
\d	0-9的任何数字
\D	除了0-9的数字，以外的任何字符
\w	任何字母、数字或下划线字符（可以认为是匹配“单词”字符）
\W	除字母、数字和下划线以外的任何字符
\s	空格、制表符或换行符（可以认为是匹配“空白”字符）
\S	除空格、制表符和换行符以外的任何字符

[] 方括号建立自己的字符分类。（方括号内不需要转义）

re.compile(r'[a-zA-Z0-9]')    # 将匹配所有的小写、大写和数字。
re.compile(r'[[aeiouAEIOU]')    # 匹配指定的字符。
re.compile(r'[^aeiouAEIOU]')    # 插入字符（^），可以匹配除括号内所有字符，即非字符类。

^ $ 插入字符和美元字符。

re.compile(r'^Hello')    # 匹配以 Hello 开始的字符串。
re.compile(r'\d$')        # 匹配以数字 0-9 结束的字符串。
re.compile(r'^\d+$')        # 匹配从开始到结束都是数字的字符串。

. 句点字符称为通配字符；匹配除了换行之外的所有字符。

>>> atRegex = re.compile(r'.at')        # 句点字符只匹配一个字符。
>>> atRegex.findall('The cat in the hat sat on the flat mat.')
['cat', 'hat', 'sat', 'lat', 'mat']

点 - 星匹配所有字符

>>> nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
>>> mo = nameRegex.search('First Name: Al Last Name: Sweigart')
>>> mo.group(1)
'Al'
>>> mo.group(2)
'Sweigart'

句点字符匹配换行；re.DOTALL参数。

re.compile('.*') 
re.compile('.*', re.DOTALL) # re.DOTALL参数，可以让句点字符匹配所有字符。

() 利用括号分组；使用group()匹配对象方法，从一个分组匹配文本。

>>> phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
>>> mo = phoneNumRegex.search('My number is 415-555-4242.')
>>> mo.group(1)
'415'

>>> mo.group(2)
'555-4242'
>>> mo.group(0)
'415-555-4242'
>>> mo.group()
'415-555-4242'

如果想要一次就获取所有的分组，请使用groups()方法，注意函数名的复数形式。

>>> mo.groups()
('415', '555-4242')
>>> areaCode, mainNumber = mo.groups()
>>> print(areaCode)
415

| 用管道匹配多个分组。

iRegex = re.compile(r'Batman|Tina') # 匹配Batman 或Tina，都出现只会匹配第一次出现的文本。 
mo = iRegex.search('Batman and Tina Fey.')

>>> batRegex = re.compile(r'Bat(man|mobile|copter|bat)') # 匹配Bat开头的英文 
>>> mo = batRegex.search('Batmobile lost a wheel') 
>>> mo.group() # 返回完全匹配的文本 
'Batmobile' 
>>> mo.group(1) # 只返回第一个括号分组内匹配的文本 
'mobile'

* 用星号匹配零次或多次

>>> batRegex = re.compile(r'Bat(wo)*man') # wo 字符串没出现，或出现wowowo多次都会匹配。 
>>> mo1 = batRegex.search('The Adventures of Batman') 
>>> mo1.group() 
'Batman'

+ 用加号匹配一次或多次

>>> batRegex = re.compile(r'Bat(wo)+man')

{} 用花括号匹配特定次数

>>> haRegex = re.compile(r'(Ha){3}') # 单次 >>> haRegex = re.compile(r'(Ha){3,5}') # 指定范围

贪心和非贪心匹配

>>> greedyHaRegex = re.compile(r'(Ha){3,5}') # PYTHON的正则表达式默认是贪心，即匹配最长的字符串。 
>>> greedyHaRegex = re.compile(r'(Ha){3,5}？') # 花括号后跟一个问号，表示非贪心，即匹配最短的字符串。

re.compile() 传入一个字符串值，表示正则表达式，返回Regex对象。

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d\d-\d\d\d\d')

search() 方法查找出入字符串，无返回None，有返回一个Match对象。

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My name is dxc,my number is 15386366629,153-8636-6645')
print(mo.group())

findall() 方法返回一组字符串列表。

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') 
phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')    # 返回的是字符串列表。
['415-555-9999', '212-555-0000']

phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-（\d\d\d\d）')       # 有分组的Regex上，返回一个字符串的元组的列表。

不区分大小写匹配；re.IGNORECASE 或 re.I 参数

re.compile(r'robocop', re.I)

sub() 方法替换字符串

>>> namesRegex = re.compile(r'Agent \w+')
# sub()方法，第一个参数用于取代匹配的字符串，第二个参数是Regex。返回替换后的字符串。
>>> namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')
'CENSORED gave the secret documents to CENSORED.'

>>> agentNamesRegex = re.compile(r'Agent (\w)\w*')
>>> agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent
Eve knew Agent Bob was a double agent.')
A**** told C**** that E**** knew B**** was a double agent.

re.VERBOSE 忽略Regex字符串中的空白符和注释

phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))?             # area code
    (\s|-|\.)?                     # separator
    (\d{3})                         # first 3 digits
    (\s|-|\.)                         # separator
    (\d{4})                         # last 4 digits
    (\s*(ext|x|ext.)\s*(\d{2,5}))? # extension
)''', re.VERBOSE)