网络与Web编程(一)

  1. 作为客户端与 HTTP 服务交互
    // Problem
    你需要通过 HTTP 协议以客户端的方式访问多种服务。例如,下载数据或者与基
    于 REST 的 API 进行交互。
    // Solution: urllib.request

发送一个简单的 HTTP GET 请求到远程的服务上

from urllib import request, parse

# Base URL being accessed
url = 'http://httpbin.org/get'

# Dictionary of query parameters (if any)
parms = {
    'name1': 'value1',
    'name2': 'value2',
}

# Encode the query string
querystring = parse.urlencode(parms)

# Make a GET request and read the response
u = request.urlopen(url + '?' + querystring)
resp = u.read()
>>>
{
  "args": {
    "name1": "value1", 
    "name2": "value2"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.5"
  }, 
  "origin": "117.136.38.38", 
  "url": "http://httpbin.org/get?name2=value2&name1=value1"
}

发送一个简单的 HTTP POST 请求到远程的服务上

# 如果你需要使用 POST 方法在请求主体中发送查询参数,可以将参数编码后作为
# 可选参数提供给 urlopen() 函数,
from urllib import request, parse

# Base URL being accessed
url = 'http://httpbin.org/post'
# Dictionary of query parameters (if any)
parms = {
    'name1': 'value1',
    'name2': 'value2'
}
# Encode the query string
querystring = parse.urlencode(parms)
# Make a POST request and read the response
u = request.urlopen(url, querystring.encode('ascii'))
resp = u.read()

with open('1.txt', 'wb') as f:
    f.write(resp)
>>>
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "name1": "value1", 
    "name2": "value2"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Connection": "close", 
    "Content-Length": "25", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.5"
  }, 
  "json": null, 
  "origin": "117.136.38.38", 
  "url": "http://httpbin.org/post"
}

自定义的 HTTP 头

如果你需要在发出的请求中提供一些自定义的 HTTP 头,例如修改 user-agent 字段, 可以创建一个包含字段值的字典,并创建一个 Request 实例,然后将其传给 urlopen()

from urllib import request, parse

url = "http://httpbin.org/post"

parms = {
    "name1": 'value1',
    "name2": 'value2'
}

# Extra headers
headers = {
    'User-agent': 'none/ofyourbusiness',
    'Spam': 'Eggs'
}

querystring = parse.urlencode(parms)

req = request.Request(url,
                      querystring.encode('ascii'), headers=headers)

# Make a request and read the response
u = request.urlopen(req)
resp = u.read()

with open("1.txt", 'wb') as file:
    file.write(resp)
>>>
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "name1": "value1", 
    "name2": "value2"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Connection": "close", 
    "Content-Length": "25", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "Spam": "Eggs", 
    "User-Agent": "none/ofyourbusiness"
  }, 
  "json": null, 
  "origin": "117.136.38.38", 
  "url": "http://httpbin.org/post"
}

如果需要交互的服务比上面的例子都要复杂,requests 库

import requests

url = "http://httpbin.org/post"

parms = {
    "name1": 'value1',
    "name2": 'value2'
}

# Extra headers
headers = {
    'User-agent': 'none/ofyourbusiness',
    'Spam': 'Eggs'
}

resp = requests.post(url, data=parms, headers=headers)

# Decoded text returned by the request
# resp.text 带给我们的是以 Unicode 解码的响应文本
# 如果去访问 resp.content ,就会得到原始的二进制数据。
# 如果访问resp.json ,那么就会得到 JSON 格式的响应内容
text = resp.text
with open("1.txt", 'wt') as file:
    file.write(text)
>>>
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "name1": "value1", 
    "name2": "value2"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "25", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "Spam": "Eggs", 
    "User-Agent": "none/ofyourbusiness"
  }, 
  "json": null, 
  "origin": "117.136.38.38", 
  "url": "http://httpbin.org/post"
}

利用 requests 库发起一个 HEAD 请求,并从响应中提取出一些HTTP 头数据的字段:

import requests

resp = requests.head('https://www.baidu.com/index.php')
status = resp.status_code
last_modified = resp.headers['last-modified']
content_type = resp.headers['content-type']
# content_length = resp.headers['content-length'] # 长连接一般没有content_length

>>>
<Response [200]>
200
Mon, 13 Jun 2016 02:50:08 GMT
text/html

利用 requests 通过基本认证登录 Pypi 的例子:

import requests

resp = requests.get('https://pypi.python.org/pypi?:action=login',
                    auth=('user', 'password'))

print(resp)
print(resp.content)

利用 requests 将 HTTP cookies 从一个请求传递到另一个的例子

import requests

url = 'https://pypi.python.org'
# First request
resp1 = requests.get(url)

# Second requests with cookies received on first requests
resp2 = requests.get(url, cookies=resp1.cookies)
print(resp2.content)

requests 上传内容:

import requests

url = 'https://httpbin.org/post'
files = {'file': ('data.csv', open('data.csv', 'rb'))}

r = requests.post(url, files=files)

http.client

如果你决定坚持使用标准的程序库而不考虑像 requests 这样的第三方库,
那么也许就不得不使用底层的 http.client 模块来实现自己的代码。比方说,下面的
代码展示了如何执行一个 HEAD 请求:

from http.client import HTTPConnection
from urllib import parse

c = HTTPConnection('www.python.org', 80)
c.request('HEAD', '/index.html')
resp = c.getresponse()

print('Status', resp.status)
for name, value in resp.getheaders():
    print(name, value)


Python 包索引上的认证

如果必须编写涉及代理、认证、cookies 以及其他一些细节方面的代码,那
么使用 urllib 就显得特别别扭和啰嗦。比方说,下面这个示例实现在 Python 包索引上的认证:

import urllib.request

auth = urllib.request.HTTPBasicAuthHandler()
auth.add_password('pypi', 'https://pypi.python.org', 'username', 'password')
opener = urllib.request.build_opener(auth)

r = urllib.request.Request('http://pypi.python.org/pypi?:action=login')
u = opener.open(r)
resp = u.read()
print(resp)

# From here. You can access more pages using opener
....

HTTP 客户端

考虑使用 httpbin 服务(http://httpbin.org)。这个站点会接收发出的请求,然后以 JSON的形式将相应信息回传回来。(cookies、认证、HTTP 头、编码方式等)。
下面是一个交互式的例子

import requests

r = requests.get('http://httpbin.org/get?name=Dave&n=37', headers={
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/69.0.3497.81 Chrome/69.0.3497.81 Safari/537.36'})
resp = r.json()

print(resp['headers'])
print(resp['args'])

在要同一个真正的站点进行交互前,先在 httpbin.org 这样的网站上做实验常常是
可取的办法。尤其是当我们面对 3 次登录失败就会关闭账户这样的风险时尤为有用(不
要尝试自己编写 HTTP 认证客户端来登录你的银行账户)。
尽管本节没有涉及,request 库还对许多高级的 HTTP 客户端协议提供了支持,
比如 OAuth。requests 模块的文档(http://docs.python-requests.org) 质量很高

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值