爬虫定位 2 正则表达式 <1>

最新推荐文章于 2023-12-17 21:58:44 发布

chenkan0214

最新推荐文章于 2023-12-17 21:58:44 发布

阅读量176

点赞数

文章标签：爬虫 python

原文链接：https://my.oschina.net/u/3771014/blog/1629983

版权

# -*- coding:utf-8 -*-
import re
# re 模块是python中内置的用来支持正则表达式的模块

# 正则表达式
"""

"""
string = "hello word"
#准备正则
pattern = re.compile("hello")
# 2 使用正则，从大字符串搜索符合正则的字符串
# match()1 正则表达式  2 要查询的大字符串
    # match() 如果找到了结果，返回对象,没有找到返回None
    # 要查找的字符串必须位于大字符串的开头位置才能匹配成功，如果不在
    #匹配失败，返回None
res = re.match(pattern, string)
# print(res)
if res:
    # group()获取分组信息，分组信息在compile()正则表达式中设置
    print(res.group())
else:
    print("没有匹配到数据")
    # search()1 正则表达式  2 要查询的大字符串
    # search() 如果找到了结果，返回对象,没有找到返回None
    # 要查找的字符串位于大字符串的任意位置，如果不在
    # 匹配失败，返回None

res = re.search(pattern, string)
print(res)
if res:
    print(res.group())
string2 = "bacccccsbbafwerewdgfddef"
# .匹配任意字符 *匹配前一个字符0次或无限次
# 默认.*是贪婪模式（尽可能多的匹配数据）
pattern = re.compile("a.*b")
res = re.search(pattern, string2)
print("3", res.group())
# 一般使用的是非贪婪模式(尽可能少的做数据匹配)
# .*？非贪婪模式
pattern = re.compile("a.*?b")
res = re.search(pattern, string2)
print("4", res.group())
# if res:
#     print(res.group())
# .+? +表示一个字符1次或无限次  .+?非贪婪模式
pattern = re.compile("a.+?b")
res = re.search(pattern, string2)
print("5", res.group())
# |表示或者，两边正则符合一个即可,都满足左面为准
pattern = re.compile("a.*b|c.*?b")
res = re.search(pattern, string2)
print(res.group())
"""
hello
<_sre.SRE_Match object; span=(0, 5), match='hello'>
hello
3 acccccsbb
4 acccccsb
5 acccccsb
acccccsbb
"""

概念：

正则表达式是对字符串操作的一种逻辑公式，就是用事先定义好的一些特定字符、及这些特定字符的组合，组成一个“规则字符串”，这个“规则字符串”用来表达对字符串的一种过滤逻辑

转载于:https://my.oschina.net/u/3771014/blog/1629983

chenkan0214

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫定位 2 正则表达式 <1>

# -*- coding:utf-8 -*-import re# re 模块是python中内置的用来支持正则表达式的模块# 正则表达式""""""string = "hello word"#准备正则pattern = re.compile("hello")# 2 使用正...
复制链接

扫一扫