beautifulsoup抓取＜ record ＞

最新推荐文章于 2024-07-12 16:42:46 发布

crowsfeather1

最新推荐文章于 2024-07-12 16:42:46 发布

阅读量631

点赞数

分类专栏： python 文章标签：爬虫

本文链接：https://blog.csdn.net/crowsfeather1/article/details/124229946

版权

python 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

写了个爬爬去标题和内容，用bs4获取a标签的href和text时，获取不到，网页源代码如下去

解决办法：利用re

main_page = BeautifulSoup(resp.text,"html.parser")
    #audit_div = main_page.find("div",attrs={"class":"ny-list"})
audit_div = main_page.find("div", attrs={"id": "4009681"})
obj = re.compile(r"<a  href='(?P<href>.*?)' class='bt_link' title='(?P<title>.*?)'>",re.S)
result = obj.finditer(str(audit_div))
    child_href_list = []
    for it in result:
        child_href = domain + it.group('href')
        title = it.group('title')
        child_href_list.append(child_href)
        print(child_href,title)

效果如下

关注博主即可阅读全文

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

crowsfeather1

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
beautifulsoup抓取＜ record ＞

写了个爬爬去标题和内容，用bs4获取a标签的href和text时，获取不到，网页源代码如下去解决办法：利用remain_page = BeautifulSoup(resp.text,"html.parser") #audit_div = main_page.find("div",attrs={"class":"ny-list"})audit_div = main_page.find("div", attrs={"id": "4009681"})obj = re.compile.
复制链接

扫一扫