- 作为客户端与 HTTP 服务交互
// Problem
你需要通过 HTTP 协议以客户端的方式访问多种服务。例如,下载数据或者与基
于 REST 的 API 进行交互。
// Solution: urllib.request
发送一个简单的 HTTP GET 请求到远程的服务上
from urllib import request, parse
# Base URL being accessed
url = 'http://httpbin.org/get'
# Dictionary of query parameters (if any)
parms = {
'name1': 'value1',
'name2': 'value2',
}
# Encode the query string
querystring = parse.urlencode(parms)
# Make a GET request and read the response
u = request.urlopen(url + '?' + querystring)
resp = u.read()
>>>
{
"args": {
"name1": "value1",
"name2": "value2"
},
"headers": {
"Accept-Encoding": "identity",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "Python-urllib/3.5"
},
"origin": "117.136.38.38",
"url": "http://httpbin.org/get?name2=value2&name1=value1"
}
发送一个简单的 HTTP POST 请求到远程的服务上
# 如果你需要使用 POST 方法在请求主体中发送查询参数,可以将参数编码后作为
# 可选参数提供给 urlopen() 函数,
from urllib import request, parse
# Base URL being accessed
url = 'http://httpbin.org/post'
# Dictionary of query parameters (if any)
parms = {
'name1': 'value1',
'name2': 'value2'
}
# Encode the query string
querystring = parse.urlencode(parms)
# Make a POST request and read the response
u = request.urlopen(url, querystring.encode('ascii'))
resp = u.read()
with open('1.txt', 'wb') as f:
f.write(resp)
>>>
{
"args": {},
"data": "",
"files": {},
"form": {
"name1": "value1",
"name2": "value2"
},
"headers": {
"Accept-Encoding": "identity",
"Connection": "close",
"Content-Length": "25",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "Python-urllib/3.5"
},
"json": null,
"origin": "117.136.38.38",
"url": "http://httpbin.org/post"
}
自定义的 HTTP 头
如果你需要在发出的请求中提供一些自定义的 HTTP 头,例如修改 user-agent 字段, 可以创建一个包含字段值的字典,并创建一个 Request 实例,然后将其传给 urlopen()
from urllib import request, parse
url = "http://httpbin.org/post"
parms = {
"name1": 'value1',
"name2": 'value2'
}
# Extra headers
headers = {
'User-agent': 'none/ofyourbusiness',
'Spam': 'Eggs'
}
querystring = parse.urlencode(parms)
req = request.Request(url,
querystring.encode('ascii'), headers=headers)
# Make a request and read the response
u = request.urlopen(req)
resp = u.read()
with open("1.txt", 'wb') as file:
file.write(resp)
>>>
{
"args": {},
"data": "",
"files": {},
"form": {
"name1": "value1",
"name2": "value2"
},
"headers": {
"Accept-Encoding": "identity",
"Connection": "close",
"Content-Length": "25",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"Spam": "Eggs",
"User-Agent": "none/ofyourbusiness"
},
"json": null,
"origin": "117.136.38.38",
"url": "http://httpbin.org/post"
}
如果需要交互的服务比上面的例子都要复杂,requests 库
import requests
url = "http://httpbin.org/post"
parms = {
"name1": 'value1',
"name2": 'value2'
}
# Extra headers
headers = {
'User-agent': 'none/ofyourbusiness',
'Spam': 'Eggs'
}
resp = requests.post(url, data=parms, headers=headers)
# Decoded text returned by the request
# resp.text 带给我们的是以 Unicode 解码的响应文本
# 如果去访问 resp.content ,就会得到原始的二进制数据。
# 如果访问resp.json ,那么就会得到 JSON 格式的响应内容
text = resp.text
with open("1.txt", 'wt') as file:
file.write(text)
>>>
{
"args": {},
"data": "",
"files": {},
"form": {
"name1": "value1",
"name2": "value2"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Content-Length": "25",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"Spam": "Eggs",
"User-Agent": "none/ofyourbusiness"
},
"json": null,
"origin": "117.136.38.38",
"url": "http://httpbin.org/post"
}
利用 requests 库发起一个 HEAD 请求,并从响应中提取出一些HTTP 头数据的字段:
import requests
resp = requests.head('https://www.baidu.com/index.php')
status = resp.status_code
last_modified = resp.headers['last-modified']
content_type = resp.headers['content-type']
# content_length = resp.headers['content-length'] # 长连接一般没有content_length
>>>
<Response [200]>
200
Mon, 13 Jun 2016 02:50:08 GMT
text/html
利用 requests 通过基本认证登录 Pypi 的例子:
import requests
resp = requests.get('https://pypi.python.org/pypi?:action=login',
auth=('user', 'password'))
print(resp)
print(resp.content)
利用 requests 将 HTTP cookies 从一个请求传递到另一个的例子
import requests
url = 'https://pypi.python.org'
# First request
resp1 = requests.get(url)
# Second requests with cookies received on first requests
resp2 = requests.get(url, cookies=resp1.cookies)
print(resp2.content)
requests 上传内容:
import requests
url = 'https://httpbin.org/post'
files = {'file': ('data.csv', open('data.csv', 'rb'))}
r = requests.post(url, files=files)
http.client
如果你决定坚持使用标准的程序库而不考虑像 requests 这样的第三方库,
那么也许就不得不使用底层的 http.client 模块来实现自己的代码。比方说,下面的
代码展示了如何执行一个 HEAD 请求:
from http.client import HTTPConnection
from urllib import parse
c = HTTPConnection('www.python.org', 80)
c.request('HEAD', '/index.html')
resp = c.getresponse()
print('Status', resp.status)
for name, value in resp.getheaders():
print(name, value)
Python 包索引上的认证
如果必须编写涉及代理、认证、cookies 以及其他一些细节方面的代码,那
么使用 urllib 就显得特别别扭和啰嗦。比方说,下面这个示例实现在 Python 包索引上的认证:
import urllib.request
auth = urllib.request.HTTPBasicAuthHandler()
auth.add_password('pypi', 'https://pypi.python.org', 'username', 'password')
opener = urllib.request.build_opener(auth)
r = urllib.request.Request('http://pypi.python.org/pypi?:action=login')
u = opener.open(r)
resp = u.read()
print(resp)
# From here. You can access more pages using opener
....
HTTP 客户端
考虑使用 httpbin 服务(http://httpbin.org)。这个站点会接收发出的请求,然后以 JSON的形式将相应信息回传回来。(cookies、认证、HTTP 头、编码方式等)。
下面是一个交互式的例子
import requests
r = requests.get('http://httpbin.org/get?name=Dave&n=37', headers={
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/69.0.3497.81 Chrome/69.0.3497.81 Safari/537.36'})
resp = r.json()
print(resp['headers'])
print(resp['args'])
在要同一个真正的站点进行交互前,先在 httpbin.org 这样的网站上做实验常常是
可取的办法。尤其是当我们面对 3 次登录失败就会关闭账户这样的风险时尤为有用(不
要尝试自己编写 HTTP 认证客户端来登录你的银行账户)。
尽管本节没有涉及,request 库还对许多高级的 HTTP 客户端协议提供了支持,
比如 OAuth。requests 模块的文档(http://docs.python-requests.org) 质量很高