Python-正则表达式

Kiraxqc

已于 2022-07-27 10:47:01 修改

阅读量248

点赞数

文章标签：正则表达式 python

于 2022-07-27 10:13:27 首次发布

本文链接：https://blog.csdn.net/Kiraxqc/article/details/125989415

版权

本文详细介绍了Python中正则表达式的概念和使用，包括re模块的导入、匹配单个字符、匹配多个字符、匹配开头和结尾以及匹配分组等。通过实例代码解析了各种正则表达式的用法，并提供了匹配邮箱、QQ号、HTML标签等具体场景的应用。

摘要由CSDN通过智能技术生成

一、正则表达式概念

记录文本规则的代码

\d:0-9数字

{2}：2个数字

二、re模块

1. 导入re模块

import re

2. match匹配数据

match(正则表达式要匹配的字符串)

result = re.match(正则表达式，要匹配字符串)

3. group提取数据

result.group()

三、匹配单个字符

. ：匹配任意字符（除了\n）

[ ] ：匹配[ ] 中列举的字符

\d ：匹配数字，即0-9

\D ：匹配不是数字

\s ：空白、空格，tab键

\S ：匹配非空白

\w ：匹配非特殊字符

\W ：匹配特殊字符（非字母、数字、汉字）

\t ：tab键

举例

import re

# itc[ca123],即后面的参数只要为itcc,itca,itc1,itc2,itc3都可匹配
result = re.match("itc[ca123]", "itca") # (正则表达式，要匹配字符串)

info = result.group()
print(info)

代码

1） result = re.match("itc[0-9]", "itc3") # 0-9之间的数字都可以

2）result = re.match("itc\d", "itc1") # 数字都可以

\d：匹配一个字符，如果输入两位数就要输入 \d\d

3）result = re.match("itc\D", "itc*") # 不是数字都可以

4）result = re.match("itc\s", "itc\t") # (\t:tab键)

5）result = re.match("itc\w", "itcy") # 匹配非特殊字符(数字、汉字、字母)

5）result = re.match("itc\W", "itc!") # 匹配特殊字符

四、匹配多个字符

* ：匹配前一个字符出现0次或无限次

+ ：匹配1次或无限次

？：匹配1次或0次，要么有要么没有

{m}：匹配前一个字符出现m次

{m,n}：匹配前一个字符出现m到n次

举例

import re
result = re.match("itc*", "itccc")   # 可匹配多个c
info = result.group()
#print(info)

if result:
    info = result.group()
    print(info)
else:
    print("没有匹配到")

代码

1） result = re.match("it1*", "it1111") # 可匹配多个1,只能为1

2） result = re.match("itc\d*", "itc1111") # 可匹配多个数字

3） result = re.match("itc\d+", "itc1111") # 至少出现1次数字，itca不可

4） result = re.match("itc\d？", "itc1") # 出现1次数字或不出现，itc12不可

5） result = re.match("itc\d{2}", "itc16") #必须出现2个字符 itc或itc3不可

6） result = re.match("itc\d{2,5}", "itc16") # 最少出现2次，最多出现5次 itc1或itc123112不可

7）result = re.match("itc\d{2,}", "itc16") # 大于2次

五、匹配开头、结尾

^：字符串开头

[^指定字符]：除了...以外的字符都匹配 [^4]除了4以外的字符

$：匹配字符串结尾

举例

import re
result = re.match("^\dit1", "1it133") # 开头为数字 只能匹配1it1
info = result.group()
#print(info)

if result:
    info = result.group()
    print(info)
else:
    print("没有匹配到")

代码

1）result = re.match("^\d.*", "1it133") .*：以数字为开头任意字符任意次数都行（匹配任意字符）

2）result = re.match("^\d.*\d$", "1it133") \d$：以数字为结尾

3）result = re.match("^\d.*[^4]$", "1it133") [^4]$：不能以4结尾

六、匹配分组

| : 匹配任意一个表达式

(ab)：将括号中的字符作为一个分组

\num：引用分组num匹配到的字符串

(?P<name>)：分组起别名

(?P=name) ：引用别名为name分组匹配到的字符串

举例

import re

fruit = ["apple", "banana", "orange", "pear"]
for value in fruit:
    result = re.match("apple|pear", value)
# 判断是否成功
    if result:
        info = result.group()
        print("水果", value)
    else:
        print("我不想吃")

代码

1）匹配出163,126,qq等邮箱

result = re.match("[a-zA-Z0-9_]{4,20}@(163|126|qq)\.com", "he@qq.com")

\.：\代表转义符号，此时.不匹配所有字符，而只是.

[a-zA-Z0-9] ：字母数字下划线

{4,20} ：4-20个字符

(163|126|qq) ：匹配163或126或qq，看做一个整体

2）匹配qq:10223

result = re.match("qq:\d[1-9]{4,11}", "qq:234")

\d：数字

{4,11}：字符要求

[1-9]：数字要求

3）匹配qq:10223，提取出qq文字和qq号码

result = re.match("(qq):(\d[1-9]{4,11})", "qq:2213")

if result:

#group(0)代表所有数据，group(1)代表第一个分组的数据， group(2)代表第二个分组的数据

info = result.group(2) #2213

print(info)

4）匹配出<html>hh<html>

result = re.match("<[a-zA-Z1-6]{4}>.*</[a-zA-Z1-6]{4}>", "<html>hh<html>")

.* ：任意字符

<[a-zA-Z1-6]{4}：字母a-z/A-Z，4个字符

result = re.match("<([a-zA-Z1-6]{4})>.*</\\1>", "<html>hh<html>")

\1 : 转换1为ASCII码，转义(第一个括号内)

\\1：转换成字符\1

5）匹配出<html><h1>hh<h1><html>

result = re.match("<([a-zA-Z1-6]{4})><([a-zA-Z1-6]{2})>.*</\\2></\\1>", "<html>hh<html>")

([a-zA-Z1-6]{2}) ：匹配h1

</\\2>：第二个括号内的匹配<h1>

</\\1>：第一个括号内的匹配<html>

6）匹配出<html><h1>www.123.com<h1><html>

result = re.match("<(?P<name1>[a-zA-Z1-6]{4})><(?P<name2>[a-zA-Z1-6]{2})>.*

</(?P=name2)></(?P=name1)>", "<html><h1>www.123.com<h1><html>")

?P<name2> ：命名name1

(?P=name2) ：匹配name2