python——正则表达式详解(一)

最新推荐文章于 2022-08-02 05:34:59 发布

活动的笑脸

最新推荐文章于 2022-08-02 05:34:59 发布

阅读量626

点赞数

分类专栏：正则表达式文章标签：正则表达式 python

本文链接：https://blog.csdn.net/weixin_43215588/article/details/122074141

版权

正则表达式专栏收录该内容

3 篇文章 0 订阅

订阅专栏

python的正则表达式用途很广泛，常用于数据处理，下面将一一进行讲解。

元字符：

字符	作用
.	可以匹配除了换行符(\n)以外的所有单个字符
*	匹配单个字符0次或多次
+	匹配单个字符1次或多次
?	匹配单个字符0次或1次
{}	{n}表示匹配n个字符，{n,m}表示匹配n～m个字符
[]	[]表示集合，如[0-9a-zA-Z]表示匹配数字小写字母和大写字母
^	表示匹配以某元素开头，该字符在[]中如[^0-2]，表示不包含0-2
$	表示匹配以某元素结尾
\	转义特殊字符，或者表示一个特殊序列
()	匹配小括号的内容，(xyz)作为一个整体去匹配
竖线	表示或，匹配的是x或y

以下代码进行示例：

import re

"""
.的作用：可以匹配除了换行符(\n)以外的所有单个字符
"""
# egg_002 匹配字符接
test_str = "hello python hello jython"
pattern = re.compile(r".ython")
matcher = pattern.findall(test_str)
print(matcher)

"""
* 的作用：匹配单个字符0次或多次
"""
test_str = "ct cat caat caaat caaaat"
pattern = re.compile(r"ca*t")
matcher = pattern.findall(test_str)
print(matcher)

"""
+ 的作用：匹配单个字符1次或多次
"""
test_str = "ct cat caat caaat caaaat"
pattern = re.compile(r"ca+t")
matcher = pattern.findall(test_str)
print(matcher)

"""
? 的作用：匹配单个字符0次或1次
"""
test_str = "ct cat caat caaat caaaat"
pattern = re.compile(r"ca?t")
matcher = pattern.findall(test_str)
print(matcher)

"""
{} 的作用：{n}表示匹配n个字符，{n,m}表示匹配n～m个字符
"""
test_str = "ct cat caat caaat caaaat"
pattern = re.compile(r"ca{2,4}t")
matcher = pattern.findall(test_str)
print(matcher)

"""
[] 的作用：[]表示集合，如[0-9a-zA-Z]表示匹配数字小写字母和大写字母
"""
test_str = "hello python hello jython"
pattern = re.compile(r"[pj]ython")
matcher = pattern.findall(test_str)
print(matcher)

"""
^ 的作用：表示匹配以某元素开头，该字符在[]中如[^0-2]，表示不包含0-2
"""
test_str = "look"
pattern = re.compile(r"[^b].+")
matcher = pattern.findall(test_str)
print(matcher)

"""
$ 的作用：表示匹配以某元素结尾
"""
test_str = "python"
pattern = re.compile(r".+n$")
matcher = pattern.findall(test_str)
print(matcher)

"""
\ 的作用：转义特殊字符，或者表示一个特殊序列
"""
test_str = "how are you ?"
pattern = re.compile(r"\?")
matcher = pattern.findall(test_str)
print(matcher)

"""
() 的作用：匹配小括号的内容，(xyz)作为一个整体去匹配
"""
test_str = "123ABC123ABC"
pattern = re.compile(r"(123)")
matcher = pattern.findall(test_str)
print(matcher)

"""
| 的作用：表示或，x|y匹配的是x或y
"""
test_str = "AAABBBCCCDDD"
pattern = re.compile(r"A+|C+")
matcher = pattern.findall(test_str)
print(matcher)

转义字符

字符	作用
\d	匹配数字，效果同[0-9]
\D	匹配非数字，效果同[^0-9]
\w	匹配数字，字母和下划线，效果同[0-9a-zA-Z]
\W	匹配非数字，字母和下划线，效果同[^0-9a-zA-Z]
\s	匹配任意的空白[ \f\n\r\t]
\S	匹配任意的非空白[^ \f\n\r\t]

以下代码进行示例：

import re

"""
\d 的作用：匹配数字，效果同[0-9]
"""
test_str = "123ABC123ABC"
pattern = re.compile(r"\d+")
matcher = pattern.findall(test_str)
print(matcher)

"""
\D 的作用：匹配非数字，效果同[^0-9]
"""
test_str = "123ABC123ABC"
pattern = re.compile(r"\D+")
matcher = pattern.findall(test_str)
print(matcher)

"""
\w 的作用：匹配数字，字母和下划线，效果同[0-9a-zA-Z]
"""
test_str = "123ABC——123ABC"
pattern = re.compile(r"\w+")
matcher = pattern.findall(test_str)
print(matcher)

"""
\W 的作用：匹配非数字，字母和下划线，效果同[^0-9a-zA-Z]
"""
test_str = "123ABC——123ABC"
pattern = re.compile(r"\W+")
matcher = pattern.findall(test_str)
print(matcher)

"""
\s 的作用：匹配任意的空白[ \f\n\r\t]
"""
test_str = "123ABC\n123ABC"
pattern = re.compile(r"\s+")
matcher = pattern.findall(test_str)
print(matcher)

"""
\S 的作用：匹配任意的非空白[^ \f\n\r\t]
"""
test_str = "123ABC\n123ABC"
pattern = re.compile(r"\S+")
matcher = pattern.findall(test_str)
print(matcher)

前瞻：expl(?=ex2) 查找exp2前面的expl
后顾：(?<=exp2)expl 查找exp2后面的expl

import re

"""
前瞻：expl(?=ex2)     查找exp2前面的expl
后顾：(?<=exp2)expl   查找exp2后面的expl
"""
test_str = "<body><h1><p>hello python</p></h1></body>"
# pattern = re.compile("<p>[\w ]+</p>") # 结果为：<p>hello python</p>
# pattern = re.compile("<p>[\w ]+(?=</p>)") # 结果为：<p>hello python
# pattern = re.compile("(?<=<p>)[\w ]+</p>")  # 结果为：hello python</p>
pattern = re.compile("(?<=<p>).+(?=</p>)")  # 结果为：hello python
matcher = pattern.findall(test_str)
print(matcher)

贪婪模式：尽可能多的匹配，比较模糊
懒惰模式：尽可能少的匹配，比较精确

import re

"""
贪婪模式：尽可能多的匹配，比较模糊
懒惰模式：尽可能少的匹配，比较精确
"""
# 默认为贪婪模式
test_str = "<body><h1><p>hello python</p></h1></body></p></h1></body></p></h1></body>"
pattern = re.compile("(?<=<p>).+(?=</p>)")
matcher = pattern.findall(test_str)
print(matcher) # 结果为：['hello python</p></h1></body></p></h1></body>']

# 懒惰模式
test_str = "<body><h1><p>hello python</p></h1></body></p></h1></body></p></h1></body>"
pattern = re.compile("(?<=<p>).+?(?=</p>)")
matcher = pattern.findall(test_str)
print(matcher)  # 结果为：['hello python']

捕获组：()捕获到返回
非捕获组：(?:)捕获到不返回

import re

"""
捕获组：()捕获到返回
非捕获组：(?:)捕获到不返回
"""
test_str = "123@qq.com123@163.com123@126.com"
pattern = re.compile(r"\w+@(qq|163|126).com") # 捕获组，结果为：['qq', '163', '126']
# pattern = re.compile(r"\w+@(?:qq|163|126).com") # 非捕获组，结果为：['123@qq.com', '123@163.com', '123@126.com']
matcher = pattern.findall(test_str)
print(matcher)