python几个简单的正则使用

最新推荐文章于 2022-10-15 11:45:49 发布

sin_404

最新推荐文章于 2022-10-15 11:45:49 发布

阅读量207

点赞数

分类专栏：爬虫 python

本文链接：https://blog.csdn.net/sin_404/article/details/106098787

版权

python 同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

爬虫

5 篇文章 0 订阅

订阅专栏

获取时间

reg = '\d{4}年\d{1,2}月\d{1,2}日'
# # reg = '\d{4}年\d{2}月\d{2}日'
string = '2019年10月17日 - 论坛引起强烈反响,中国人民大学中国普惠金融研究院(CAFI)理事会联席主席兼院长贝多广...www.licai18.com/article/ArticleDetail.jsp?d...-快照-理财18'
x = re.search(reg, string)
print(x.group(0))

一次性替换多个字符，可以替代多次使用replace的情况，replace不再识别正则，需使用re.sub

x = 'http://www.baidu.com/li\tnk?url=NDoHHS0eqT\n\rb5aRbCL8g4LG1KiliQUoEfHseKCjd6fvjOviLO9wloiUX2zfCg2BJjJttGw5Fvvx1qXkUALc2tGmDdai_cxWVLuKfIsOh_2p_&ck=4212.2.0.0.0.360.563.0&shh=www.baidu.com'
s = re.sub('\\t|\\n|\\r', '', x)
print(s)

re.findall和re.search的区别，可以发现,()在findall里面是和search里作用不同，search的括号类似区分但包含

x = 'http://www.baidu.com/link?url=NDoHHS0eqTb5aRbCL8g4LG1KiliQUoEfHseKCjd6fvjO&ck=4212.2.0.0.0.360.563.0&shh=www.baidu.com'
s = re.findall('url=(.*?)&', x)  # ?非贪婪匹配 ? 0-1个
print(s)  # ['NDoHHS0eqTb5aRbCL8g4LG1KiliQUoEfHseKCjd6fvjO']
s = re.findall('url=(.*)&', x)  # 贪婪匹配
print(s)  # ['NDoHHS0eqTb5aRbCL8g4LG1KiliQUoEfHseKCjd6fvjO&ck=4212.2.0.0.0.360.563.0']
s = re.search('url=(.*?)&', x)
print(s.group(0)) #url=NDoHHS0eqTb5aRbCL8g4LG1KiliQUoEfHseKCjd6fvjO&
s = re.findall('url=.*?&', x)
print(s) #['url=NDoHHS0eqTb5aRbCL8g4LG1KiliQUoEfHseKCjd6fvjO&']