Requests Module Learning

最新推荐文章于 2024-07-24 10:38:58 发布

BuKaiXIu

最新推荐文章于 2024-07-24 10:38:58 发布

阅读量198

点赞数

分类专栏：爬虫 python 文章标签：爬虫 python

本文链接：https://blog.csdn.net/weixin_41539580/article/details/87824585

版权

爬虫同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

python

2 篇文章 0 订阅

订阅专栏

Requests

1. 准备

安装 Requests
- pip install requests

2. 发送GET、POST请求，获取响应

requests.get(url): 发送get请求，请求url地址对应的响应
requests.post(url,data={请求体的字典}):发送post请求，请求url地址对应的响应

url = "baidu.com"
query_hearders = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.20 Safari/537.36"}
response = requests.get(url)
responsePost = requests.post(url, data = query_headers, )
print(response)
print(response.content.decode())

3. response的方法

response.text
- 该方式往往出现乱码，出现乱码使用response.encoding=‘utf-8’
- 该方法获取网页的HTML字符串
response.content.decode()
- 该方法把响应的二进制字节流转化为str类型
response.request.url
- 发送请求的url地址
response.url
- response 响应的 url 地址
response.reques.headers
- 请求头
response.headers
- 响应头

4. 获取网页源码的正确方法(通过下面三种方式一定能获取到网页的正确解码后的字符串)

response.content.decode()
response.content.decode('gbk')
response.text

5. header 的使用

为了模拟浏览器，获取和浏览器一模一样的内容

headers = {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1",
"Referer":"https://fanyi.baidu.com/translate?aldtype=16047"}
response = requests.get(url, headers)

6. 使用超时参数

requests.get(url, headers=headers, timeout=3)
- 3秒内必须返回响应，否则会报错

7. Retrying

安装
- pip install retrying

from retrying import retry
@retry(stop_max_attempt_number=3)
def fun1():
	print('this is func1')
	raise ValueError('this is test error')

处理 cookie 相关请求

直接携带 cookie 请求 url 地址
- 1. cookie 放在 headers中
```
headers = {"User-Agent": "...","Cookie": "cokkie str"}
```
- 1. cookie 字典传给 cookie 参数
  - request.get(url, cookies = cookie_dict)
先发送 post 请求，获取 cookie，带上 cookie 请求登陆后的页面
- 1. seesion = request.session()
  - session 具有的方法和 requests 一样
- 1. session.post(url, data, headers)
  - 服务器设置在本地的 cookie 会保存在 session
- session.get(url)
  - 会带上之前保存在 session 中的 cookie，能够请求成功

BuKaiXIu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Requests Module Learning

目录1. 准备2. 发送GET、POST请求，获取响应3. response的方法4. 获取网页源码的正确方法(通过下面三种方式一定能获取到网页的正确解码后的字符串)header 的使用1. 准备安装 Requestspip install requests2. 发送GET、POST请求，获取响应requests.get(url): 发送get请求，请求url地址对应的响应...
复制链接

扫一扫

专栏目录