1. re.match和 re.search, re.compile(见eg2)
import re
re.match(pattern, string, flags=0)
re.search(pattern, string, flags=0)
pattern: 模式串
string: 被匹配串
flags: 标志位, re.I 忽略大小写, re.M 多行匹配(影响^和$), re.S使.匹配换行在内的所有字符;
区别:
match仅匹配开始, 开始不符合,返回None;
search 匹配整个字符串,直到匹配到。
返回: 寻找成功,返回一个匹配对象Obj,否则为None;
eg1:
In [1]: import re
In [2]: pattern = 'hello'
In [3]: aim = 'hello world hello world'
In [4]: obj1 = re.match(pattern, aim)
In [5]: obj1
Out[5]: <_sre.SRE_Match object; span=(0, 5), match='hello'>
In [6]: obj1.span()
Out[6]: (0, 5)
In [10]: aim[0:5]
Out[10]: 'hello'
In [11]: obj1.groups()
Out[11]: ()
In [12]: obj1.group()
Out[12]: 'hello'
In [13]: obj1.group(0)
Out[13]: 'hello'
In [10]: aim[0:5]
Out[10]: 'hello'
In [11]: obj1.groups()
Out[11]: ()
In [12]: obj1.group()
Out[12]: 'hello'
In [13]: obj1.group(0)
Out[13]: 'hello'
eg2.
In [22]: str0 = 'aa is bb and cc'
In [23]: pattern0 = re.compile(r'(.*) is (.*) and (.*)')
In [24]: res0 = pattern0.match(str0)
In [25]: res0
Out[25]: <_sre.SRE_Match object; span=(0, 15), match='aa is bb and cc'>
In [26]: res0.group()
Out[26]: 'aa is bb and cc'
In [27]: res0.groups()
Out[27]: ('aa', 'bb', 'cc')
In [28]: res0.group(0,1)
Out[28]: ('aa is bb and cc', 'aa')
In [29]: res0.group(0,1,2)
Out[29]: ('aa is bb and cc', 'aa', 'bb')
In [30]: res1 = pattern0.search(str0)
In [31]: res1
Out[31]: <_sre.SRE_Match object; span=(0, 15), match='aa is bb and cc'>
In [32]: res1.groups()
Out[32]: ('aa', 'bb', 'cc')
2. re.sub(pattern, repl, string, count)
pattern: 模式串
repl: 替换成的字符串, 可以是函数
string: 被匹配串
count: 替换最大次数, 默认0表示替换所有匹配
eg1:
In [34]: str1 = 'a0b1c2d3e4f5'
In [35]: def triple(matched):
...: v = int(matched.group('v'))
...: return str(v*3)
...:
In [36]: res1 = re.sub(r'(?P<v>\d+)', triple, str1)
In [37]: res1
Out[37]: 'a0b3c6d9e12f15'
3. re.compile
pattern = re.compile('pattern')
pattern.match(string, start, end)
eg0:
In [39]: aim
Out[39]: 'hello world hello world'
In [40]: re.match('ello', aim, 0, 20)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-85a2ae02b17c> in <module>()
----> 1 re.match('ello', aim, 0, 20)
TypeError: match() takes from 2 to 3 positional arguments but 4 were given
In [41]: p1 = re.compile('ello')
In [42]: p1.match(aim, 0, 20)
In [43]: r1 = p1.match(aim, 0, 20)
In [44]: r1
In [45]: r2 = p1.match(aim, 2, 20)
In [46]: r2
In [47]: r3 = p1.match(aim, 1, 20)
In [48]: r3
Out[48]: <_sre.SRE_Match object; span=(1, 5), match='ello'>
In [49]: r4 = p1.search(aim, 0, 20)
In [50]: r4
Out[50]: <_sre.SRE_Match object; span=(1, 5), match='ello'>
4. re.findall(pattern, string, start, end)
In [51]: r5 = re.findall('ello', aim)
In [52]: r5
Out[52]: ['ello', 'ello']
In [53]: r6 = p1.findall(aim, 0, 20)
In [54]: r6
Out[54]: ['ello', 'ello']
In [55]: aim
Out[55]: 'hello world hello world'
5. re.finditer(pattern, string, flags=0)
In [57]: r7 = p1.finditer(aim)
In [58]: for i in r7:
...: print(i)
...:
<_sre.SRE_Match object; span=(1, 5), match='ello'>
<_sre.SRE_Match object; span=(13, 17), match='ello'>
6. re.split(pattern, string, maxsplit=0, flags=0)
In [59]: aim
Out[59]: 'hello world hello world'
In [60]: re.split(r' ', aim)
Out[60]: ['hello', 'world', 'hello', 'world']
7. 模式串
模式 | 含义 |
---|---|
^ | 开始 |
$ | 结束 |
. | 任意字符,不含换行 |
[...] | [xxx]中的任一字符 |
[^...] | 不在[xxx]中的字符 |
* | 0个或多个 |
+ | 1个或多个 |
? | 0个或1个 |
{n} | 精确匹配n个 |
{n,} | 多于或等于n个 |
{n,m} | n到m个 |
a|b | a或b |
(?P<name>pattern) | pattern的参数名name |
\w | 字母数字下划线 |
\W | 非字母数字下划线 |
\s | 任意空白, [\n\t\r\f] |
\S | 任意非空白 |
\d | 任意数字,[0-9] |
\D | 任意非数字 |
\A | 字符串开始 |
\Z | 字符串结束,换行前 |
\z | 字符串结束,含换行 |
\b | 单词边界 |
\B | 非单词边界 |
\n | 换行 |
\t | 制表 |
\1..\9 | 第n个分组 |
[0-9], [a-z], [A-Z], [a-zA-Z0-9], [^0-9] | 数字、小写字母、大写字母、字母及数字、非数字 |