Python正则表达式

1 正则表达式的使用步骤

  • Import the regex module with import re.
  • Create a Regex object with the re.compile() function. (Remember to use a raw string.)
  • Pass the string you want to search into the Regex object’s search() method. This returns a Match object.
  • Call the Match object’s group() method to return a string of the actual matched text.
import re
phoneNumberRegex = re.compile(r'\d{3}-\d{3}-\d{4}')
mo = phoneNumberRegex.search('My phone number is 415-555-4242.')
print(mo.group())

2 正则符号列表

正则表达式全部符号解释

3 对匹配的子串分组

>>> regex = re.compile(r'(\d{3})-(\d{3}-\d{4})')
>>> regex.search('123-456-7890')
<re.Match object; span=(0, 12), match='123-456-7890'>
>>> mo = regex.search('123-456-7890')
>>> mo.group()
'123-456-7890'
>>> mo.group(0)
'123-456-7890'
>>> mo.group(1)
'123'
>>> mo.group(2)
'456-7890'
>>> mo.groups()
('123', '456-7890')

The first set of parentheses in a regex string will be group 1. The second set will be group 2. By passing the integer 1 or 2 to the group() match object method, you can grab different parts of the matched text. Passing 0 or nothing to the group() method will return the entire matched text. If you would like to retrieve all the groups at once, use the groups() method—note the plural form for the name.

4 匹配0次或1次:?

>>> regex = re.compile(r'Bat(wo)?man')
>>> mo = regex.search('Batman')
>>> mo.group()
'Batman'
>>> mo = regex.search('Batwoman')
>>> mo.group()
'Batwoman'

5 匹配0次或多次:*

>>> regex = re.compile(r'Bat(wo)*man')
>>> mo1 = regex.search('Batman')
>>> mo2 = regex.search('Batwoman')
>>> mo3 = regex.search('Batwowowowoman')
>>> mo1.group()
'Batman'
>>> mo2.group()
'Batwoman'
>>> mo3.group()
'Batwowowowoman'

6 匹配1次或多次:+

>>> regex = re.compile(r'Bat(wo)+man')
>>> mo1 = regex.search('Batman')
>>> mo2 = regex.search('Batwoman')
>>> mo3 = regex.search('Batwowowowoman')
>>> mo1 == None
True
>>> mo2.group()
'Batwoman'
>>> mo3.group()
'Batwowowowoman'

7 匹配固定次数:{m,n}

其中m和n分别为最少和最多匹配次数,并且可以省略其中之一

>>> re.compile(r'(ha){3}').search('hahaha')
<re.Match object; span=(0, 6), match='hahaha'>
>>> re.compile(r'(ha){3,5}').search('hahahahaha')
<re.Match object; span=(0, 10), match='hahahahaha'>
>>> re.compile(r'(ha){3,}').search('hahahahahahahahahaha')
<re.Match object; span=(0, 20), match='hahahahahahahahahaha'>
>>> re.compile(r'(ha){,3}').search('')
<re.Match object; span=(0, 0), match=''>
>>> 

8 贪婪/非贪婪匹配

Python’s regular expressions are greedy by default, which means that in ambiguous situations they will match the longest string possible. The non-greedy (also called lazy) version of the braces, which matches the shortest string possible, has the closing brace followed by a question mark.

>>> re.compile(r'(ha){3,5}?').search('hahahahaha')
<re.Match object; span=(0, 6), match='hahaha'>
>>> re.compile(r'(ha){3,5}').search('hahahahaha')
<re.Match object; span=(0, 10), match='hahahahaha'>

9 获取所有匹配结果: findall

When called on a regex with no groups, such as \d\d\d-\d\d\d-\d\d\d\d, the method findall() returns a list of string matches, such as [‘415-555-9999’, ‘212-555-0000’].

>>> phoneNumbers = regex.findall('cell: 111-222-3333, work: 444-555-6666')
>>> phoneNumbers[0]
'111-222-3333'
>>> phoneNumbers[1]
'444-555-6666'
>>> phoneNumbers
['111-222-3333', '444-555-6666']

When called on a regex that has groups, such as (\d\d\d)-(\d\d\d)-(\d\d\d\d), the method findall() returns a list of tuples of strings (one string for each group), such as [(‘415’, ‘555’, ‘9999’), (‘212’, ‘555’, ‘0000’)].

>>> regex = re.compile(r'(\d{3})-(\d{3})-(\d{4})')
>>> phoneNumbers = regex.findall('cell: 111-222-3333, work: 444-555-6666')
>>> phoneNumbers
[('111', '222', '3333'), ('444', '555', '6666')]
>>> phoneNumbers[0]
('111', '222', '3333')
>>> phoneNumbers[1]
('444', '555', '6666')
>>> phoneNumbers[1][1]
'555'

10 反向匹配:[^xxx]

匹配非元音字母:

>>> consonantRegex = re.compile(r'[^aeiouAEIOU]')
>>> consonantRegex.findall('abcdefghijklmnopqrstUVWXYZ')
['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'V', 'W', 'X', 'Y', 'Z']

11 匹配开头和结尾:^$

匹配以Hello开头的字符串:

>>> helloRegex = re.compile('^Hello')
>>> helloRegex.findall('Hello, wolrd and Hello milan')
['Hello']

匹配字符串结尾的数字:

>>> endWithNumericRegex = re.compile(r'\d+$')
>>> endWithNumericRegex.findall('1234 and 5678')
['5678']

匹配全是小写字母的字符串:

>>> alphaRegex = re.compile(r'^[a-z]+$')
>>> alphaRegex.findall('ckjohdciqhdcui')
['ckjohdciqhdcui']
>>> alphaRegex.findall('aaa111333bbb')
[]
>>> alphaRegex.findall('aaaBBBccc')
[]
>>> 

12 匹配任意字符:.*

dot(.)可以匹配任意一个字符,但是\n除外:

regex = re.compile(r'.*')
regexAll = re.compile(r'.*', re.DOTALL)
text = '''aaaa
bbbb
cccc
dddd'''

print(regex.search(text).group())       # aaaa
print(regexAll.search(text).group())    # aaaa\nbbbb\ncccc\ndddd
print(regex.findall(text))              # ['aaaa', '', 'bbbb', '', 'cccc', '', 'dddd', '']
print(regexAll.findall(text))           # ['aaaa\nbbbb\ncccc\ndddd', '']

可以通过re.DOTALL参数匹配包括\n在内的任意字符

13 忽略大小写:re.IGNORECASE或re.I

>>> regex = re.compile(r'abcd', re.IGNORECASE)
>>> regex.findall('abcdABCDAbCd')
['abcd', 'ABCD', 'AbCd']

14 字符串替换:sub

将密码替换为星号:

passwordRegex = re.compile(r'(password:)\s*([a-zA-Z0-9_]+)')

text = '''
username: pirlo
password: pirlo1234
username: kaka
password:1234kaka
username: maldini
password:   abcd_89023
'''

print(passwordRegex.sub(r"\1 ****", text))

\1\2等等分别对应匹配的group

15 给正则表达式添加注释:re.VERBOSE

#! python3


import pyperclip
import re
import sys


phoneNumberRegex = re.compile(r'''
    (\d{3}|\(\d{3}\))?      # area code, optional
    (-|\.|\s)               # separator
    (\d{3})                 # first 3 digits
    (-|\.|\s)               # separator
    (\d{4})                 # last 4 digits
    (\s*(ext|x|ext\.)\s*(\d{2,5}))?
    ''', re.VERBOSE)


emailAddressRegex = re.compile(r'''(
    [a-zA-Z0-9_.-]+     # username
    @
    [a-zA-Z0-9.-]+
    \.[A-Za-z]{2,4}
    )''', re.VERBOSE)


text = str(pyperclip.paste())
matches = []
for group in phoneNumberRegex.findall(text):
    print(group)
    areaCode, firstDigits, lastDigits, ext = group[0], group[2], group[4], group[7]
    phoneNumber = ""
    if areaCode != "":
        phoneNumber = areaCode + "-"
    phoneNumber += firstDigits + "-" + lastDigits
    if ext != "":
        phoneNumber += " ext " + ext
    matches.append(phoneNumber)

for group in emailAddressRegex.findall(text):
    matches.append(group)

if len(matches) == 0:
    print("no matched phone number or email address found")
    sys.exit()
pyperclip.copy('\n'.join(matches))
print("copied to clipboard:")
print('\n'.join(matches))

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值