第一个简单的爬虫：提取需要的数据

FrankHuang888

于 2024-04-23 20:19:52 发布

阅读量238

点赞数 1

文章标签：爬虫

本文链接：https://blog.csdn.net/qq_33323974/article/details/138137392

版权

本文展示了如何使用Python的requests库发送HTTP请求，然后通过BeautifulSoup解析HTML来获取指定类别的h1标签中的文本，即网站标题。

摘要由CSDN通过智能技术生成

import requests
from bs4 import BeautifulSoup

link = "http://www.santostang.com"
header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 SE 2.X MetaSr 1.0" }
response  = requests.get(link,headers = header)
# print(response.text)
soup = BeautifulSoup(response.text,"html.parser")
title = soup.find("h1",class_ = "post-title").a.text.strip() #参数class_带下划线
print(title)