beautifulsoup实例

最新推荐文章于 2024-04-15 21:58:50 发布

dwanwan16

最新推荐文章于 2024-04-15 21:58:50 发布

阅读量289

点赞数

本文链接：https://blog.csdn.net/wanerding/article/details/104610169

版权

爬虫之旅游网

案例

import requests
from bs4 import BeautifulSoup
url='http://www.cntour.cn/'
strhtml=requests.get(url)
#print(strhtml.text)
soup=BeautifulSoup(strhtml.text,'lxml')
data=soup.select("#main > div > div.mtop.firstMod.clearfix > div.centerBox > ul.newsList > li.top > a")
data

运行结果如下：
在这里插入图片描述

#清洗和组织数据
#正则中\d表示一个数字，\d+表示多个数字
import re
for item in data:
    result={
        'title':item.get_text(),
        'link':item.get('href'),
        'ID':re.findall(r'\d+',item.get('href'))
    }
    print(result)

运行结果如下：
在这里插入图片描述

关于match,search,findall的匹配区别

#match当且仅当匹配的字符串开头，才能匹配到
import re
print(re.match("c","abcde"))
print(re.match("a","abcde"))
pattern=re.compile('c')
print(pattern.match('abcdef',2))

运行结果如下：
在这里插入图片描述

print(re.search("c","abcde"))

运行结果如下：
在这里插入图片描述
findall结果如下：

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

dwanwan16

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
beautifulsoup实例

爬虫之旅游网案例import requestsfrom bs4 import BeautifulSoupurl='http://www.cntour.cn/'strhtml=requests.get(url)#print(strhtml.text)soup=BeautifulSoup(strhtml.text,'lxml')data=soup.select("#main > ...
复制链接

扫一扫