python爬虫 request和bs4的基本用法

雨天560

于 2023-11-18 20:57:43 发布

阅读量276

点赞数

文章标签： python 爬虫开发语言

本文链接：https://blog.csdn.net/weixin_62654120/article/details/134483025

版权

request库的用法

request库的安装

pip3 install request

request的主要方法

1、我们主要使用以下方法去请求网页信息，获取网页文件信息

requests.post('http://httpbin.org/post')

requests.get('http://httpbin.org/get')

2、在请求某些网站访问时必须带有浏览器等信息，如果不传入headers就会报错

headers = {

'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'

}

response = requests.get("发现 - 知乎", headers=headers)

print(response.text)

3、调用方法后返回网页信息的变量属性

response = requests.get('简书 - 创作你的创作')

# 状态码

print(type(response.status_code), response.status_code)

print(type(response.headers), response.headers)

print(type(response.cookies), response.cookies)

# 网页链接

print(type(response.url), response.url)

print(type(response.history), response.history)

bs4库用法

bs4库的安装

pip install bs4

bs4库的用法

bs4库通过将request库方法返回的网页信息解析，生成一颗类树状的数据，方便我们去调用和遍历。

# 调用request库的get方法获取网页信息

response = requests.get('https://www.52bqg.org/')

# 传入刚获取到的网页信息以及要使用解析器参数

soup = BeautifulSoup(response.text,'html.parser')

BeautifulSoup可以接受以下几种解析器的参数来对信息进行初始化。

解析网页信息后，我们就可以便捷的寻找我们想要的信息。BeautifulSoup中的树状数据格式与html文本类似。

例如，假如我们需要从获取到的网页信息中寻找以下信息

<div class="cl3">

<a href='http://firefox.com.cn/'></a>



油茶是生活在广西、湖南、贵州等山区的瑶族、侗族、苗族等少数民族最喜爱的一种传统食品。制作方法是以老叶红茶为主料，用油炒至微焦而香，放入食盐加水煮沸，多数加生姜同煮，味浓而涩，涩中带辣，在古镇内的小巷里就能喝到。景区保留了传统的茶叶作坊以及唐宋时期的蒸青制茶工艺，游客还可以吃到按...价格

10元/碗 





推荐等级：A级



</div>