- http的一些基本请求都可以用requests这个模块来实现。
先实现post和get请求。
post和get请求的区别主要在于:
- 安全性:post请求的内容是通过表单形式的,而get是通过url,所以get的请求内容会通过url暴露出来,get的请求长度在有的浏览器也会有限制
- 幂等性:post请求能改变资源状态,get不能,所以get具有幂等性
用requests写了一个简单的实例:
import requests
import urllib
ID_USERNAME = 'id_username'
ID_PASSWORD = 'id_password'
USERNAME = 'hexiaodouaipiqiu'
PASSWORD = '*******'
LOGIN_URL = 'https://passport.csdn.net/account/login?from=http://my.csdn.net/my/mycsdn'
def submit_form():
"""Submit a form"""
login_data = urllib.urlencode({ID_USERNAME: USERNAME, ID_PASSWORD: PASSWORD})
# Make a get request
resp = requests.get(LOGIN_URL)
print "Response to GET request: %s" % resp.content
# Send POST request
resp = requests.post(LOGIN_URL, login_data)
print "Headers from a POST request response: %s" % resp.headers
print "Response: %s" % resp
if __name__ == '__main__':
submit_form()
可以看到get请求的发出只需要一个url参数,而post可以添加body。
- 使用代理服务器发送web请求,这里的代理服务器是165.24.10.8
import urllib
URL = 'https://www.github.com'
PROXY_ADDRESS = '165.24.10.8:8000'
if __name__ == '__main__':
resp = urllib.urlopen(URL, proxies = {"http": PROXY_ADDRESS})
print "Proxy server returns headers: %s" % resp.headers
- 使用HEAD请求检查网页是否存在
head方法其实就是一个简化版的get方法,请求的响应一样,head不需要响应主体只返回http头。
import httplib
import urlparse
DEFAULT_URL = 'http://www.python.org'
HTTP_GOOD_CODES = [httplib.OK, httplib.FOUND, httplib.MOVED_PERMANENTLY]
def get_server_status_code(url):
host, path = urlparse.urlparse(url)[1:3]
try:
conn = httplib.HTTPConnection(host)
conn.request('HEAD', path)
return conn.getresponse().status
except StandardError:
return None
if __name__ == '__main__':
url = DEFAULT_URL
if get_server_status_code(url) in HTTP_GOOD_CODES:
print "Server is ok, url: %s" % url
else:
print "Server is bad, url: %s" % url
可以看到head能检查网页是否能访问。
- http压缩资源
编写一个简易的web服务器,把内容压缩为gzip格式。
import cStringIO
import gzip
import sys
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
DEFAULT_HOST = '192.168.199.110'
DEFAULT_PORT = 8888
HTML_CONTENT = """Compressed string from server!"""
class RequestHandler(BaseHTTPRequestHandler):
""" Custom request handler"""
# This method will parse and dispatch the request to the appropriate do_*() method.
# You should never need to override it.
def do_GET(self):
""" Handler for GET requests"""
self.send_response(200)
self.send_header('Content-type', 'text/html')
zbuf = self.compress_buffer(HTML_CONTENT)
self.send_header('Content-Encoding', 'gzip')
self.send_header("Content-Length", len(zbuf))
self.end_headers()
# Send message to browser
zbuf = self.compress_buffer(HTML_CONTENT)
sys.stdout.write("Content-Encoding: gzip\r\n")
sys.stdout.write("Content-Length: %d\r\n" % len(zbuf) )
sys.stdout.write("\r\n")
self.wfile.write(zbuf)
return
def compress_buffer(self, buf):
zbuf = cStringIO.StringIO()
zfile = gzip.GzipFile(mode='wb', fileobj=zbuf, compresslevel=6)
zfile.write(buf)
zfile.close()
return zbuf.getvalue()
if __name__ == '__main__':
server = HTTPServer((DEFAULT_HOST, DEFAULT_PORT),RequestHandler)
server.serve_forever()
当浏览器访问时,可以读取到压缩的内容。