Python爬虫实例-爬取某网站的h2标题的a标签的href属性和文字

最新推荐文章于 2024-07-30 17:23:09 发布

八饱粥

最新推荐文章于 2024-07-30 17:23:09 发布

阅读量2.6k

点赞数 1

分类专栏： Python学习笔记文章标签： python 爬虫开发语言

本文链接：https://blog.csdn.net/qq_64016761/article/details/128025966

版权

Python学习笔记专栏收录该内容

12 篇文章 1 订阅

订阅专栏

爬取网站http://www.crazyant.net/的h2标题的a标签的href属性和文字

指定url-下载url的内容-解析里面的内容-提取数据

import requests
url="http://www.crazyant.net/"
r=requests.get(url)
if r.status_code!=200:
    raise Exception()
    
html_doc=r.text

from bs4 import BeautifulSoup

soup=BeautifulSoup(html_doc,"html.parser")
h2_nodes=soup.find_all("h2",class_="entry-title")
for h2_node in h2_nodes:
    link=h2_node.find("a")
    print(link["href"],link.get_text())