Python爬虫之三：爬虫 requests 库的使用

最新推荐文章于 2024-05-11 11:28:36 发布

晴朗_不积跬步无以至千里

最新推荐文章于 2024-05-11 11:28:36 发布

阅读量1.1k

点赞数

分类专栏： python爬虫数据抓取文章标签： python post cookie 爬虫编程语言

本文链接：https://blog.csdn.net/qq_35092730/article/details/113614843

版权

一、requests 的安装

1、通过dos窗口，输入：pip install requests 进行安装
2、通过pycharm 的 setting 中搜索： resquests 进行安装

requests 库的中文文档：https://requests.readthedocs.io/zh_CN/latest/

二、发送 get 请求

在百度搜索中国，爬取搜索后的网页：

import requests

url = 'https://www.baidu.com/s'

headers = {
   
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}

kw = {
   'wd':'中国'}
response = requests.get(url, headers=headers, params=kw)
print(response.url)
print(response.text)
print(response.content)

解释：
1、params ：接收一个字典类型的参数，直接添加到请求中去，不需要转码
2、response.text ：返回字符串类型的网页代码
3、response.content ：返回字节流类型的网页代码
4、response.url ：返回请求的 url

response.text 和 response.content 区别：

response.text ： requests 库根据猜测来将 response.content 通过猜测的解码格式解码成字符串返回，因此有时候requests库会猜测错误的解码格式进行解码，那样就会出现乱码。

response.content ：直接从网络上抓取的数据，没有进行任何编码和解码，是一个byte类型的数据。

三、发送 post 请求

爬取网址：http://42824.com/member/index.php?mod=login

1、查询登录后的网址

最低0.47元/天解锁文章

晴朗_不积跬步无以至千里

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫之三：爬虫 requests 库的使用

一、requests 的安装1、通过dos窗口，输入：pip install requests 进行安装2、通过pycharm 的 setting 中搜索： resquests 进行安装requests 库的中文文档：https://requests.readthedocs.io/zh_CN/latest/二、发送 get 请求在百度搜索中国，爬取搜索后的网页：import requestsurl = 'https://www.baidu.com/s'headers = { '
复制链接

扫一扫