30分钟带你入门python正则表达式

最新推荐文章于 2021-01-30 13:26:00 发布

缘缘媛

最新推荐文章于 2021-01-30 13:26:00 发布

阅读量363

点赞数 2

分类专栏： Python 文章标签：正则表达式 python

本文链接：https://blog.csdn.net/lb_fly0505/article/details/104946335

版权

Python 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

场景：

	*  字符的匹配
	*  爬虫中的使用
	*  手机号码，邮箱的匹配

python中使用正则表达式

re模块的使用

re模块是python中正则表达式的包

import re
# result = re.match(<正则表达式>, <要匹配的字符串>)
result = re.match('python', 'ipython is best')
if not result:
    print("no")
else:
    print(result.group(), result.groupdict(), result.start())

匹配单个字符

正则表达式匹配单个字符0-9 ,a-z, A-Z, 特殊字符

正则匹配单个字符

# 匹配任意1个字符（除了\n）
print(re.match('.', "C").group())
print(re.match('t.o', 'two').group())
# 匹配[ ]中列举的字符
print(re.match('[hH]', 'Hello python').group())
print(re.match('[0123456789]Hello', '9Hello').group(), re.match('[0-9]Hello', '9Hello').group(), end=' ')
print('\n', re.match('[0-9a-zA-Z]', '0Aasddsa').group())
# 匹配数字，即0-9
print(re.match('\dHello', '9Hello').group())

匹配多个字符

正则表达式匹配多个字符对象 *，+，？，{m}，{m,n}

正则匹配多个字符串

# * 匹配前一个字符出现0次或者无限次，即可有可无
print(re.match("[a-z]*", "hehe").group())
print(re.match('[A-Z][a-z]*', 'MnnnM').group())
# + 匹配前一个字符出现1次或者无限次，即至少有1次
print(re.match('[a-zA-Z_]+', 'hh').group())
# ? 匹配前一个字符出现1次或者0次，即要么有1次，要么没有
print(re.match("[1-9]?[0-9]", "7").group(), re.match('[0-9]?\d', '999').group())
# {m} 匹配前一个字符出现m次
print(re.match("[a-zA-Z0-9_]{6}", "12a3g45678").group())
print(re.match("[a-zA-Z0-9_]{2,9}", "12a3g45678").group())
# 匹配出163的邮箱地址，且@符号之前有4到20位，例如hello@163.com
print(re.match('[0-9A-Za-z]{4,20}@163\.com', 'hello@163.com').group())  # 要匹配 '.'的时候要加转义字符\.
print(re.match('[0-9A-Za-z]{4,20}@163\.com', 'hedasdsadasdllo@163.com').group())
print(re.match('[\w]{4,20}@163\.com', 'hedasdsadasdllo@163.com').group())

匹配开头结尾

^：匹配字符串开头
$：匹配字符串结尾

print(re.match('[\w]{4,20}@163\.com$', 'hedasdsadasdllo@163.com').group())
print(re.match('^[\w]{4,20}@163\.com$', 'hedasdsadasdllo@163.com').group())

匹配分组

# | 匹配左右任意一个表达式，类似于or
print(re.match("[1-9]?\d$|100", "100").group())
print(re.match("[1-9]?\d$|100", "99").group())

# (ab) 将括号中字符作为一个分组
print(re.match("[\w]{4,20}@(163|126|qq)\.com$", 'test@qq.com').group())
print(re.match("[\w]{4,20}@(163|126|qq)\.com$", 'test@163.com').group())
print(re.match("[\w]{4,20}@(163|126|qq)\.com$", 'test@126.com').group())
print(re.match("([\w]{4,20})@(163|126|qq)\.com$", 'test@126.com').group(1))
print(re.match("([\w]{4,20})@(163|126|qq)\.com$", 'test@126.com').group(2))

# \num 引用分组num匹配到的字符串
html1 = '<h1>sdasdas</h1>'
html2 = '<h1>sdasdas</h2>'
print(re.match('<(\w+)>.*</(\w+)>', html1).group())
print(re.match(r'<(\w*)>.*</\1>', html1).group())  # 原样输出
print(re.match('<(\w*)>.*</\\1>', html1).group())
html3 = '<body><h1>sdasdas</h1></body>'
print(re.match(r"<(\w*)><(\w*)>.*</\2></\1>", html3).group())
html4 = '<body><h1>sdasdas</h1></body>'
# (?P<name>)  分组起别名
# (?P=name)   引用别名为name分组匹配到的字符串
print(re.match(r"<(?P<p1>\w*)><(?P<p2>\w*)>.*</(?P=p2)></(?P=p1)>", html4).group())

re模块的高级用法

# compile
print(re.compile(r"\d+").match("31231").group())

# search
print(re.search(r"\d+", "阅读次数111").group())
print(re.search(r"\d{4}", 'Microsoft Windows Server 2012 R2 Datacenter').group())

# findall 统计个数，并将匹配到的对象以列表的形式返回
print(re.findall(r"\d+", "python=999, c=7899, Go=999"))
print([i.group() for i in re.finditer(r"\d+", "阅读次数111")])

# sub 将匹配到的数据进行替换
print(re.sub(r"\d+", "999", "python=996"))

# split 根据匹配进行切割字符串，并返回一个列表
print(re.split(r"\d", "a9a"))

贪婪和非贪婪

Python里数量词默认是贪婪的（在少数语言里也可能是默认非贪婪），总是尝试匹配尽可能多的字符；
非贪婪则相反，总是尝试匹配尽可能少的字符。
在"*","?","+","{m,n}"后面加上？，使贪婪变成非贪婪。

缘缘媛

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
30分钟带你入门python正则表达式

场景： * 字符的匹配 * 爬虫中的使用 * 手机号码，邮箱的匹配python中使用正则表达式re模块的使用re模块是python中正则表达式的包import re# result = re.match(<正则表达式>, <要匹配的字符串>)result = re.match('python', 'ipython is best')if no...
复制链接

扫一扫