安装
requests是python爬虫中常用的第三方库,使用前需要先安装,推荐使用pip安装:
pip install requests
GET请求
GET请求是最常用的请求,使用requests的get()方法就可以发送GET请求,代码如下:
import requests
r = requests.get('http://httpbin.org/get')
print(r.text)
执行结果为:
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.25.1",
"X-Amzn-Trace-Id": "Root=1-60be2033-2526bd6d3b0ab8cd7449dfc8"
},
"origin": "222.212.17.222",
"url": "http://httpbin.org/get"
}
有些时候需要在URL中添加参数,比如,name为tom,age为20,可以直接在原来的URL的基础上拼接,像下面一样:
http://httpbin.org/get?name=tom&age=20
也可以使用params参数进行传递,代码如下:
import requests
params = {
'name':'tom',
'age':22
}
r = requests.get('http://httpbin.org/get', params=params)
print(r.text)
执行结果如下:
{
"args": {
"age": "22",
"name": "tom"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.25.1",
"X-Amzn-Trace-Id": "Root=1-60be1fc1-7fef016e2effbf5e01400c7b"
},
"origin": "222.212.17.222",
"url": "http://httpbin.org/get?name=tom&age=22"
}
POST请求
使用requests实现POST请求只需要调用post()方法,将要提交的数据传递给data参数就可以了,示例代码如下:
import requests
data = {'name':'tom', 'age':20}
r = requests.post('http://httpbin.org/post', data=data)
print(r.text)
执行结果如下:
{
"args": {},
"data": "",
"files": {},
"form": {
"age": "20",
"name": "tom"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "15",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.25.1",
"X-Amzn-Trace-Id": "Root=1-60be23ba-1d8f06ca1373e6460c328b8b"
},
"json": null,
"origin": "222.212.17.222",
"url": "http://httpbin.org/post"
}
响应
使用requests获取的响应是一个requests.models.Response
类型的对象,通过该对象,我们可以获取有关响应的详细信息,比如,响应的状态码、头信息、Cookies等。示例代码如下:
import requests
r = requests.get('http://httpbin.org/get')
print(r.status_code) # 获取响应的状态码
print(r.headers) # 获取响应头
print(r.cookies) # 获取Cookies
print(r.url) # 获取请求的URL
print(r.history) # 获取请求的历史记录
print(r.text) # 获取响应的内容,文本格式
print(r.content) # 获取响应的内容,二进制格式
执行结果如下:
200
{'Date': 'Mon, 07 Jun 2021 13:56:04 GMT', 'Content-Type': 'application/json', 'Content-Length': '307', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
<RequestsCookieJar[]>
http://httpbin.org/get
[]
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.25.1",
"X-Amzn-Trace-Id": "Root=1-60be2574-57a9a2522fb0b5552e470e1e"
},
"origin": "222.212.17.222",
"url": "http://httpbin.org/get"
}
b'{\n "args": {}, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.25.1", \n "X-Amzn-Trace-Id": "Root=1-60be2574-57a9a2522fb0b5552e470e1e"\n }, \n "origin": "222.212.17.222", \n "url": "http://httpbin.org/get"\n}\n'
headers
可以使用headers参数来传递请求的头信息,比如,User-Agent、Host、Cookies等,代码如下:
import requests
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36 Edg/91.0.864.41'
}
r = requests.get('https://www.zhihu.com/explore', headers=headers)
print(r.text)
执行结果为:
<!doctype html>
<html lang="zh" data-hairline="true" data-theme="light"><head><meta charSet="utf-8"/><title data-react-helmet="true">发现 - 知乎</title>
省略......