#requests库常用方法

最新推荐文章于 2024-05-22 14:55:11 发布

陈年辣鸡

最新推荐文章于 2024-05-22 14:55:11 发布

阅读量483

点赞数

分类专栏： # python爬虫

本文链接：https://blog.csdn.net/Dallan/article/details/103373928

版权

python爬虫专栏收录该内容

4 篇文章 0 订阅

订阅专栏

#requests库
虽然标准的urllib库中模块已经包含了平时我们使用的大部分功能，但API终究还是不够友好，request再次基础上进行了封装，使用起来更加简洁

安装
pip install requests
其中文文档和github源码可以自己百度
中文文档：https://2.python-requests.org//zh_CN/latest/index.html
github源码：https://github.com/psf/requests

1.发送GET请求
response = request.get(‘http://www.baidu.com’)
2.添加参数
get 请求是将数据放在url中，我们可以通过指定params和headers参数来填充请求参数
response = request.get(‘http://www.baidu.com’, params = cs, headers = head)
3.查询参数
response对象有很多参数
.text 根据猜测自动将返回的字节流数据进行解码成unicode格式的数据，有时可能会因为判断失误而造成乱码
.content 返回网络传输的原始的字节流(byte)数据，一般网络传输和硬盘存储的都是字节流数据
.url 查看当前请求的URL地址
.encoding 查看相应头部字符编码
.status_code 查看响应吗

import requests
from urllib import parse

head = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}

cs = {'wd':'中国'}

response = requests.get('https://www.baidu.com/s?', headers = head, params = cs)

with open('../laji/w.html', 'w', encoding = 'utf-8') as fp:
    fp.write(response.content.decode('utf-8'))

print(parse.parse_qs(response.url))
print(response.encoding)
print(response.status_code)

1.发送POST请求
与GET唯一不同的是发送的数据变成了DATA

import requests
url = 'https://www.lagou.com/jobs/positionAjax.json?city=%E5%8C%97%E4%BA%AC&needAddtionalResult=false'
head = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Referer':'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=',
'Cookie':''
}
data = {'first':'true',
        'pn':'1',
        'kd':'python'
        }

response = requests.post(url, headers = head, data = data)

print(type(response.json()))
print(response.json())

如果返回的是一个js数据，可以使用json去处理。可以使用py内置的josn去处理，也可使使用调用.json方法处理，json的本质是一个字符串，json() 可以直接将网页中的数据以json格式(字典)处理

3.使用代理
在请求方法的时候设置 peosies 参数即可

import requests

url = 'http://httpbin.org/ip'
proxy = {'http':'49.51.193.134:1080'}
response = requests.get(url, proxies = proxy)

print(response.text)

比较一下IP，发现IP不同了，代理成功

4.取得cookie
response.cookies.get_dict()
如上可以取得cookies, 如果想按照字典的方式打印，可以使用get_dict()方法

session web 里保存在服务器里数据的一种机制，这里的session 是原本会议的意思，

import requests

head = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}

data = {
        'email':'ID',
        'password':'密码'
        }

login_url = "http://www.renren.com/ajaxLogin/login?"
url = "http://www.renren.com/972992926/newsfeed/photo"

session = requests.Session()
session.post(login_url, data = data, headers = head)

response = session.get(url, headers = head)
with open('../laji/w.html', 'w', encoding = 'utf-8') as fp:
    fp.write(response.text)

5.处理不合法的SSL证书
有些网站是走https协议的，但协议前打了个X，说明该网站的证书不是官方认证的，比如自己写的什么的。
此时，需要在请求的时候将，verify = false 设置一下。

response = session.get(url, headers = head, verify = False)

陈年辣鸡

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
#requests库常用方法

#requests库虽然标准的urllib库中模块已经包含了平时我们使用的大部分功能，但API终究还是不够友好，request再次基础上进行了封装，使用起来更加简洁安装pip install requests其中文文档和github源码可以自己百度中文文档：https://2.python-requests.org//zh_CN/latest/index.htmlgithub源码：h...
复制链接

扫一扫