python中的正则表达式re模块_python中正则表达式re模块详解

最新推荐文章于 2023-07-04 18:39:17 发布

weixin_39860732

最新推荐文章于 2023-07-04 18:39:17 发布

阅读量153

点赞数

文章标签： python中的正则表达式re模块

本文链接：https://blog.csdn.net/weixin_39860732/article/details/113652367

版权

正则表达式是处理字符串的强大工具，它有自己特定的语法结构，有了它，实现字符串的检索，替换，匹配验证都不在话下。

当然，对于爬虫来说，有了它，从HTML里提取想要的信息就非常方便了。

先看一下常用的匹配规则：

\w:匹配字母、数字及下划线

\W:匹配不是字母、数字及下划线

\s:匹配任意空白字符，等价于[\t\n\r\f]

\S:匹配任意非空字符

\d:匹配任意数字，等价于[0-9]

\D:匹配任意飞数字的字符

\A:匹配字符串开头

\Z:匹配字符串结尾，如果存在换行，只匹配到换行前得结束字字符串

\z:匹配字符串结尾，如果存在换行，同时还会匹配换行符

\G:匹配最后匹配完成的位置

\n:匹配一个换行符

\t:匹配一个制表符

^ 匹配一行字符串的开头

$ 匹配一行字符串的结尾

. 匹配任意字符，除了换行符

[...]:用来表示一组字符，单独列出，比如[amk]匹配a,m或k

[^...]:不在[]的字符，比如[^abc]匹配除了a,b,c的字符

*：匹配0个或多个表达式

+：匹配1个或多个表达式

？：匹配0个或一个前面的正则表达式定义的片段，非贪婪方式

{n}:精确匹配n个前面的表达式

{n:m}：匹配n到m次由前面正则表达式定义的片段，贪婪方式

a|b：匹配a或b

(): 匹配括号内的表达式，也表示一个组

python中的re模块主要有五种方法re.match(),re.search(),re.finall(),re.sub(),re.compile()

re.match():从字符串的起始位置匹配正则表达式，如果匹配，就返回匹配成功的结果

re.search():匹配时扫描整个字符串，然后返回第一个成功匹配的字符

re.findall():获取匹配正则表达式的所有内容

re.sub():修改字符串的文本

re.compile():可以将正则字符串编译成正则表达式对象

下面我们来具体看一些例子

re.match()的详细用法：

importre

con='Hello 123 4567 World_This is a Regex Demo'

print(len(con))

result=re.match('^Hello\s\d\d\d\s\d{4}\s\w{10}',con)print(result)print(result.group())print(result.span())

result=re.match('^Hello\s(\d+)\s(\d+)\sWorld',con)print(result)print(result.group())print(result.group(1),result.group(2))print(result.span())

result=re.match('^Hello.*Demo',con)print(result)print(result.group())print(result.span())

result=re.match('^He.*?(\d+).*Demo',con)print(result)print(result.group(1))

con1='http://weibo.com/comment/ltf'result1=re.match('^http.*?comment/(.*?)',con1)

result2=re.match('^http.*?comment/(.*)',con1)print(result1.group(1))print(result2.group(1))

con2='''Hello 123 4567 World_This

is a Regex Demo''''result=re.match('Hell.*?(\d+).*?Demo',con2,re.S)print(result)print(result.group(1))

con3='(百度)www.baidu.com'result=re.match('$百度$www\..*?\..*',con3)print(result)print(result.group())

运行结果如下：

re的search，findall，sub，compile用法：

代码如下：

importre

con='EXO hero Hello 123 4567 World_This is a Regex Demo'result=re.search('Hell.*?Demo',con)print(result)print(result.group())

html=''''

24.

25.

26.

27.

28.

29.

'''result=re.search('li.*?songNum ">(.*?).*?>(.*?)',html,re.S)#print(result)#print(result.group())

print(result.group(1))print(result.group(2))

results=re.findall('li.*?songNum ">(.*?).*?>(.*?)',html,re.S)print(results)print(results[0])

conte='ahfgi123ahfuo358bjhif134'conten=re.sub('\d+','afanti',conte)print(conten)

content1='2015-9-12 12:00'content2='2016-12-22 13:55'content3='2017-10-1 11:40'pattern=re.compile('\d{2}:\d{2}')print(pattern)

result1=re.sub(pattern,'',content1)

result2=re.sub(pattern,'',content2)

result3=re.sub(pattern,'',content3)print(result1,result2,result3)

运行结果：

以上就是python中的正则表达式的详细用法了。

weixin_39860732

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python中的正则表达式re模块_python中正则表达式re模块详解

正则表达式是处理字符串的强大工具，它有自己特定的语法结构，有了它，实现字符串的检索，替换，匹配验证都不在话下。当然，对于爬虫来说，有了它，从HTML里提取想要的信息就非常方便了。先看一下常用的匹配规则：\w:匹配字母、数字及下划线\W:匹配不是字母、数字及下划线\s:匹配任意空白字符，等价于[\t\n\r\f]\S:匹配任意非空字符\d:匹配任意数字，等价于[0-9]\D:匹配任意飞数字的字符\A...
复制链接

扫一扫

python中的正则表达式re模块_python中正则表达式re模块详解

“相关推荐”对你有帮助么？