Python 正则表达式（Google Python Course）

最新推荐文章于 2024-04-28 18:22:27 发布

翻译最新推荐文章于 2024-04-28 18:22:27 发布 · 617 阅读

文章标签：

#python #正则表达式

Python 专栏收录该内容

6 篇文章

订阅专栏

本文介绍了Python中正则表达式的使用方法，包括基本函数search和findall的应用、匹配模式符号解析、分组记录和替换操作等内容。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

用了这么久正则表达式，一直是复制粘贴或求助论坛～期间也看着教程学过，可是一直领会不了啊～领会不了！看了Google For Education的Python Course终于知道点皮毛了

Python函数

主要是两个函数：

import re
match = re.search(pattern, string, flags=0)
match = re.findall(pattern, string, flags=0)

partten匹配模式一般以r开头，表示raw string
flags默认是False，表示不区分大小写

对于输出，search函数的输出可以用match.group或match.group(1)表示第一个分组；findall函数的输出是一个多元组。

匹配模式符号

符号	功能
a, X, 9	原始字符串仅表示字符串本身
.	(点号)表示除\n外的任意单个字符
\w	表示单个字符和下划线，即[a-zA-Z0-9_]，可以理解成可以出现在“word”中的符号，不过不是指真正的一个单词
\W	大写表示和小写相反的含义，即任意“非单词”字符
\b	单词和非单词的边界，但是不表示任何字符，仅表示位置
\s	(space)表示一个空白符：空格、\n、\r、\t、\f
\S	大写表示非空白符
\d	(decimal)表示一个数字
^, $	起止符
\	转义符
/	？

重复

符号	功能
+	左边匹配模式出现了 $\geq 1$ 次
*	左边匹配模式出现了 $\geq 0$ 次
?	左边匹配模式出现了 $=0,1$ 次

重复是贪心的，它先找到第一个匹配的位置，然后找尽可能远的位置，如：

str = "<b>foo</b> and <i>so on</i>"
match = re.search(r'<.*>', str)
if match:
    print(match.group())

这并不会输出<b>而是输出所有字符串，因为*号会匹配到最远的位置，也就是中间的“b>foo….<.*?>

中括号

中括号表示字符集合，如：

符号	功能
[abc]	表示a或b或c
[\w.-]	表示单词或-号
[^ab]	表示除a、b外的所有字符串

分组记录

圆括号表示匹配的分组，以便后续输出

str = 'purple alice-b@google.com monkey dishwasher'
match = re.search('([\w.-]+)@([\w.-]+)', str)
if match:
    print match.group()   ## 'alice-b@google.com' (the whole match)
    print match.group(1)  ## 'alice-b' (the username, group 1)
    print match.group(2)  ## 'google.com' (the host, group 2)

str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['alice@google.com', 'bob@abc.com']
for email in emails:
    # do something with each found email string
    print email

str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print tuples  ## [('alice', 'google.com'), ('bob', 'abc.com')]
for tuple in tuples:
    print tuple[0]  ## username
    print tuple[1]  ## host

替换

用\1，\2来表示替换位置

str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
## re.sub(pat, replacement, str) -- returns new string with all replacements,
## \1 is group(1), \2 group(2) in the replacement
print re.sub(r'([\w\.-]+)@([\w\.-]+)', r'\1@yo-yo-dyne.com', str)
## purple alice@yo-yo-dyne.com, blah monkey bob@yo-yo-dyne.com blah dishwasher