爬虫基础--requests库(获取网页信息)

最新推荐文章于 2024-07-21 02:54:02 发布

BullGod

最新推荐文章于 2024-07-21 02:54:02 发布

阅读量5.6k

点赞数 1

文章标签： spider

本文链接：https://blog.csdn.net/BullGod/article/details/79665619

版权

本文介绍了Python爬虫中requests库的基本用法，包括GET和POST请求的发送，处理响应的方法如text、content、json，设置超时、header、cookie以及使用session进行会话保持。还提及了文件上传、证书认证和代理设置等内容。

摘要由CSDN通过智能技术生成

官网文档–http://docs.python-requests.org/zh_CN/latest/user/quickstart.html

发送get,post请求

res=requests.get(url) #发送get请求，请求url地址对应的响应
res=requests.post(url,data={请求的字典}) #发送post请求

#post请求
import requests

url="http://fanyi.baidu.com/sug"
data={'kw':'早上好'}#该字典键值对的形式可以通过form data中查询
headers={
"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Mobile Safari/537.36"
}
res=requests.post(url,data=data,headers=headers)
print(res.text)

response方法

res.text（该方法往往会出现乱码，出现乱码使用res.encoding=’utf-8’ 或者res.encoding=res.apparent_encoding）
res.content.decode(‘utf-8’)#或者’gbk’
res.json() #针对响应为json字符串解码为python字典
res.request.url #发送请求的url地址
res.url #res响应的url地址(页面跳转时，请求的url地址与真正打开的url地址是不同的)
res.request.headers #请求头
res.headers #res响应头