Python 正则表达式里的单行s和多行m模式

最新推荐文章于 2025-03-11 14:57:18 发布

susu_xi

最新推荐文章于 2025-03-11 14:57:18 发布

阅读量2.9k

点赞数 1

分类专栏：文章收藏

原文链接：https://www.lfhacks.com/tech/python-re-single-multiline

版权

文章收藏专栏收录该内容

2 篇文章

订阅专栏

本文深入探讨Python的re模块，解析单行模式与多行模式如何改变正则表达式的匹配行为，尤其是在处理包含换行符的多行文本时，提供代码示例与在线测试工具帮助理解。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Python 的 re 模块内置函数几乎都有一个 flags参数，规定了正则匹配时的各种策略模式，其中有两个模式：单行（re.DOTALL, 或者re.S）和多行（re.MULTILINE,或者re.M）模式。

太长不看版：

单行模式和多行模式，都增强了对多行文本（即中间包含\n的字符串）的解析能力。
单行模式突破换行符 \n 的阻碍，将匹配视野扩大到整个字符串
多行模式实现换行符 \n 的分隔，将匹配视野缩小到一行之内，并且按行分别匹配。

两种模式都改变了对换行符 \n 的处理策略。

在线测试工具

首先我们使用一款在线工具以得到感性认识。

（在原文链接查看：https://www.lfhacks.com/tech/python-re-single-multiline）。

空山新雨后，
天气晚来秋。
明月松间照，
清泉石上流。

那么在普通匹配模式 ".+" 下，可见匹配到第一行末尾即停止，结果如下：

空山新雨后，

单行匹配模式 ".+", re.DOTALL 下，可见匹配出了包括换行符在内的所有字符，结果如下：

空山新雨后，\n天气晚来秋。\n明月松间照，\n清泉石上流。

在多行匹配模式 "^.+$" re.MULTILINE 下，可见匹配结果本身也是多行，结果如下：

空山新雨后，
天气晚来秋。
明月松间照，
清泉石上流。

普通模式

正则表达式（ Python2 ， Python3 ）里，点号（.）能匹配除换行符以外的所有字符。

In the default mode, this matches any character except a newline.

也就是说，用 .*这样的模式匹配到换行符的前面时，匹配即停止。例如下面这样的字符串：

This is the first line.
This is the second line.
This is the third line.

直接使用点号匹配，遇到换行符（\n）即停止。例如下面图所示，点击查看匹配过程：

执行代码的例子，如下面的执行过程。从匹配结果可以看出来，仅有第一行出现在结果里，而且不包含换行符。

> a = 'This is the first line.\nThis is the second line.\nThis is the third line.'
> print a
This is the first line.
This is the second line.
This is the third line.
> import re
> p = re.match(r'This.*line\.' ,a) 
> p.group(0)
'This is the first line.'
>

单行模式 re.DOTALL

在上面的例子里，即使是默认的贪婪（greedy）模式，仍然在第一行的结尾初停止了匹配。如果想完整匹配出字符串，就需要进入 单行模式 。 Python 文档中这么描述：

If the DOTALL flag has been specified, this matches any character including a newline.

在单行模式下，匹配的行为模式如下图，点击查看匹配过程：

从上面的动图里可以看出，当使用 re.DOTALL 时，点号将同时匹配换行符，实现了跨行匹配。代码的执行过程如下，从下面的记过可以看出，匹配结果里包含了换行符 \n 和全部的三行。

> q = re.match(r'This.*line\.', a, flags=re.DOTALL)
> q.group(0)
'This is the first line.\nThis is the second line.\nThis is the third line.'

结论：

默认模式下，点号.的匹配动作到换行符即停止
单行模式下，点号.也能匹配换行符，字符串被整体匹配。单行模式改变了点号（.）的匹配策略。

多行模式 re.MULTILINE

有时候我们想找出一篇文章里符合特定条件一共有几行。比如在下面的例子里，我们希望找出以 This 开头，line 结尾的行。

> a = 'This is the first line.\nThis is the second line.\nThis is the third line.'
> print a
This is the first line.
This is the second line.
This is the third line.
> import re
> re.findall(r'^This.*line\.$', a)
[]
>

匹配结果为空，从上一节我们知道，点号默认不匹配换行符，我们需要进入单行模式，设置re.DOTALL。

> re.findall(r'^This.*line\.$', a, flags=re.DOTALL)
['This is the first line.\nThis is the second line.\nThis is the third line.']
>

匹配出了整个字符串，但这并不是我们想要的，因为原字符串的三行都满足匹配条件，应该有三条结果。用问号 ? 切换成非贪婪模式试试：

> re.findall(r'^This.*?line\.$', a, flags=re.DOTALL)
['This is the first line.\nThis is the second line.\nThis is the third line.']
>

仍然是整个字符串，这是因为正常情况下，行首符^和行尾符$ 仅仅匹配整个字符串的起始和结尾。Python 文档中这么描述：

By default, ‘^’ matches only at the beginning of the string, and ‘$’ only at the end of the string and immediately before the newline (if any) at the end of the string.

这就是引入多行模式的原因。在多行模式下，会把每一行看做单个字符串，用 ^ 和 $ 匹配。点击下图，观看多行模式的匹配过程：

或者说，^除了匹配整个字符串的起始位置，还匹配换行符后面的位置；$ 除了匹配整个字符串的结束位置，还匹配换行符前面的位置。Python 文档中这么描述：

When specified, the pattern character ‘^’ matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character ‘$’ matches at the end of the string and at the end of each line (immediately preceding each newline).

再回到前面的例子中，使用多行模式得到的结果如下：

> re.findall(r'^This.*line\.$', a, flags=re.MULTILINE)
['This is the first line.', 'This is the second line.', 'This is the third line.']
>

结论：

多行模式改变了^和 $ 符号的匹配策略，当字符串中间有换行符 \n 时，将字符串的每行分别匹配

当需要在一个文本文件里跨行匹配时，单行和多行模式尤其有用。

二者不冲突

单行模式和多行模式，从名字上看是互斥的，但是实际上，两者可以共存。

本文转自 https://www.lfhacks.com/tech/python-re-single-multiline