Python正则表达式

最新推荐文章于 2021-01-29 03:54:14 发布

pirlo-san

最新推荐文章于 2021-01-29 03:54:14 发布

阅读量411

点赞数

分类专栏： python 文章标签：正则表达式 python 字符串

本文链接：https://blog.csdn.net/m0_37554486/article/details/104048678

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1 正则表达式的使用步骤

Import the regex module with import re.
Create a Regex object with the re.compile() function. (Remember to use a raw string.)
Pass the string you want to search into the Regex object’s search() method. This returns a Match object.
Call the Match object’s group() method to return a string of the actual matched text.

import re
phoneNumberRegex = re.compile(r'\d{3}-\d{3}-\d{4}')
mo = phoneNumberRegex.search('My phone number is 415-555-4242.')
print(mo.group())

2 正则符号列表

见正则表达式全部符号解释

3 对匹配的子串分组

>>> regex = re.compile(r'(\d{3})-(\d{3}-\d{4})')
>>> regex.search('123-456-7890')
<re.Match object; span=(0, 12), match='123-456-7890'>
>>> mo = regex.search('123-456-7890')
>>> mo.group()
'123-456-7890'
>>> mo.group(0)
'123-456-7890'
>>> mo.group(1)
'123'
>>> mo.group(2)
'456-7890'
>>> mo.groups()
('123', '456-7890')

The first set of parentheses in a regex string will be group 1. The second set will be group 2. By passing the integer 1 or 2 to the group() match object method, you can grab different parts of the matched text. Passing 0 or nothing to the group() method will return the entire matched text. If you would like to retrieve all the groups at once, use the groups() method—note the plural form for the name.

4 匹配0次或1次：?

>>> regex = re.compile(r'Bat(wo)?man')
>>> mo = regex.search('Batman')
>>> mo.group()
'Batman'
>>> mo = regex.search('Batwoman')
>>> mo.group()
'Batwoman'

5 匹配0次或多次：*

>>> regex = re.compile(r'Bat(wo)*man')
>>> mo1 = regex.search('Batman')
>>> mo2 = regex.search('Batwoman')
>>> mo3 = regex.search('Batwowowowoman')
>>> mo1.group()
'Batman'
>>> mo2.group()
'Batwoman'
>>> mo3.group()
'Batwowowowoman'

6 匹配1次或多次：+

>>> regex = re.compile(r'Bat(wo)+man')
>>> mo1 = regex.search('Batman')
>>> mo2 = regex.search('Batwoman')
>>> mo3 = regex.search('Batwowowowoman')
>>> mo1 == None
True
>>> mo2.group()
'Batwoman'
>>> mo3.group()
'Batwowowowoman'

7 匹配固定次数：{m,n}

其中m和n分别为最少和最多匹配次数，并且可以省略其中之一

>>> re.compile(r'(ha){3}').search('hahaha')
<re.Match object; span=(0, 6), match='hahaha'>
>>> re.compile(r'(ha){3,5}').search('hahahahaha')
<re.Match object; span=(0, 10), match='hahahahaha'>
>>> re.compile(r'(ha){3,}').search('hahahahahahahahahaha')
<re.Match object; span=(0, 20), match='hahahahahahahahahaha'>
>>> re.compile(r'(ha){,3}').search('')
<re.Match object; span=(0, 0), match=''>
>>>

8 贪婪/非贪婪匹配

Python’s regular expressions are greedy by default, which means that in ambiguous situations they will match the longest string possible. The non-greedy (also called lazy) version of the braces, which matches the shortest string possible, has the closing brace followed by a question mark.

>>> re.compile(r'(ha){3,5}?').search('hahahahaha')
<re.Match object; span=(0, 6), match='hahaha'>
>>> re.compile(r'(ha){3,5}').search('hahahahaha')
<re.Match object; span=(0, 10), match='hahahahaha'>

9 获取所有匹配结果： findall

When called on a regex with no groups, such as \d\d\d-\d\d\d-\d\d\d\d, the method findall() returns a list of string matches, such as [‘415-555-9999’, ‘212-555-0000’].

>>> phoneNumbers = regex.findall('cell: 111-222-3333, work: 444-555-6666')
>>> phoneNumbers[0]
'111-222-3333'
>>> phoneNumbers[1]
'444-555-6666'
>>> phoneNumbers
['111-222-3333', '444-555-6666']

When called on a regex that has groups, such as (\d\d\d)-(\d\d\d)-(\d\d\d\d), the method findall() returns a list of tuples of strings (one string for each group), such as [(‘415’, ‘555’, ‘9999’), (‘212’, ‘555’, ‘0000’)].

>>> regex = re.compile(r'(\d{3})-(\d{3})-(\d{4})')
>>> phoneNumbers = regex.findall('cell: 111-222-3333, work: 444-555-6666')
>>> phoneNumbers
[('111', '222', '3333'), ('444', '555', '6666')]
>>> phoneNumbers[0]
('111', '222', '3333')
>>> phoneNumbers[1]
('444', '555', '6666')
>>> phoneNumbers[1][1]
'555'

10 反向匹配：[^xxx]

匹配非元音字母：

>>> consonantRegex = re.compile(r'[^aeiouAEIOU]')
>>> consonantRegex.findall('abcdefghijklmnopqrstUVWXYZ')
['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'V', 'W', 'X', 'Y', 'Z']

11 匹配开头和结尾：^$

匹配以Hello开头的字符串：

>>> helloRegex = re.compile('^Hello')
>>> helloRegex.findall('Hello, wolrd and Hello milan')
['Hello']

匹配字符串结尾的数字：

>>> endWithNumericRegex = re.compile(r'\d+$')
>>> endWithNumericRegex.findall('1234 and 5678')
['5678']

匹配全是小写字母的字符串：

>>> alphaRegex = re.compile(r'^[a-z]+$')
>>> alphaRegex.findall('ckjohdciqhdcui')
['ckjohdciqhdcui']
>>> alphaRegex.findall('aaa111333bbb')
[]
>>> alphaRegex.findall('aaaBBBccc')
[]
>>>

12 匹配任意字符：.*

dot（.）可以匹配任意一个字符，但是\n除外：

regex = re.compile(r'.*')
regexAll = re.compile(r'.*', re.DOTALL)
text = '''aaaa
bbbb
cccc
dddd'''

print(regex.search(text).group())       # aaaa
print(regexAll.search(text).group())    # aaaa\nbbbb\ncccc\ndddd
print(regex.findall(text))              # ['aaaa', '', 'bbbb', '', 'cccc', '', 'dddd', '']
print(regexAll.findall(text))           # ['aaaa\nbbbb\ncccc\ndddd', '']

可以通过re.DOTALL参数匹配包括\n在内的任意字符

13 忽略大小写：re.IGNORECASE或re.I

>>> regex = re.compile(r'abcd', re.IGNORECASE)
>>> regex.findall('abcdABCDAbCd')
['abcd', 'ABCD', 'AbCd']

14 字符串替换：sub

将密码替换为星号：

passwordRegex = re.compile(r'(password:)\s*([a-zA-Z0-9_]+)')

text = '''
username: pirlo
password: pirlo1234
username: kaka
password:1234kaka
username: maldini
password:   abcd_89023
'''

print(passwordRegex.sub(r"\1 ****", text))

\1\2等等分别对应匹配的group

15 给正则表达式添加注释：re.VERBOSE

#! python3


import pyperclip
import re
import sys


phoneNumberRegex = re.compile(r'''
    (\d{3}|\(\d{3}\))?      # area code, optional
    (-|\.|\s)               # separator
    (\d{3})                 # first 3 digits
    (-|\.|\s)               # separator
    (\d{4})                 # last 4 digits
    (\s*(ext|x|ext\.)\s*(\d{2,5}))?
    ''', re.VERBOSE)


emailAddressRegex = re.compile(r'''(
    [a-zA-Z0-9_.-]+     # username
    @
    [a-zA-Z0-9.-]+
    \.[A-Za-z]{2,4}
    )''', re.VERBOSE)


text = str(pyperclip.paste())
matches = []
for group in phoneNumberRegex.findall(text):
    print(group)
    areaCode, firstDigits, lastDigits, ext = group[0], group[2], group[4], group[7]
    phoneNumber = ""
    if areaCode != "":
        phoneNumber = areaCode + "-"
    phoneNumber += firstDigits + "-" + lastDigits
    if ext != "":
        phoneNumber += " ext " + ext
    matches.append(phoneNumber)

for group in emailAddressRegex.findall(text):
    matches.append(group)

if len(matches) == 0:
    print("no matched phone number or email address found")
    sys.exit()
pyperclip.copy('\n'.join(matches))
print("copied to clipboard:")
print('\n'.join(matches))

pirlo-san

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python正则表达式

1 正则表达式的使用步骤Import the regex module with import re.Create a Regex object with the re.compile() function. (Remember to use a raw string.)Pass the string you want to search into the Regex object’s s...
复制链接

扫一扫

专栏目录