关于python的HTTP请求方式HTTP请求步骤为:
1. 域名解析
2. 发起TCP的3次握手
3. 建立TCP连接后发起http请求
4. 服务器端响应http请求,浏览器得到html代码
5. 浏览器解析html代码,并请求html代码中的资源
6. 浏览器对页面进行渲染呈现给用户简化为:
DNS解析(浏) ->TCP连接(三次握手)->http Requests(浏)->Response(服) -> Parse(浏)-> Render(浏)->TCP断开(四次挥手)SOCKET访问web http
import socket url = 'www.sina.com.cn' port = 80 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((url, port)) request_url = 'GET / HTTP/1.1\r\nHost: www.sina.com.cn\r\nConnection: close\r\n\r\n' sock.send(request_url.encode()) response = b'' rec = sock.recv(1024) while rec: response += rec rec = sock.recv(1024) print(response.decode())
SOCKET访问https
import socket import ssl url = 'dps-precheck-h.camcard.com' port = 443 sock = ssl.wrap_socket(socket.socket()) sock.connect((url, port)) request_url = 'GET /api/v1/block/block_info?id=dpsv45_9aeb8b0e953711e7af605254003cf65b HTTP/1.1\r\nHost: dps-precheck-h.camcard.com\r\nConnection: close\r\n\r\n' sock.send(request_url.encode()) response = b'' rec = sock.recv(1024) while rec: response += rec rec = sock.recv(1024) print(response.decode())
模块举例
Python中将这些步骤被封装成了完整的模块,直接调用即可进行。
以python3为例,类似模块有:urllib,urllib3,httplib,requests等
getpostrequestsData = requests.get(url)Data = requests.post(url,data =data)
urllibf = urllib.request.urlopen(url)f.read().decode('utf-8')类似get,在urlopen加入post提交的data
urllib3http=urllib3.PoolManager()r=http.request(‘get’,url,fields={‘ ’:’ ’},headers={})get变为post
httplib2h = httplib2.Http() head, content=h.request(url)在request函数中加设置请求方式post,以及提交的表单
pycurlc = pycurl.Curl()c.setopt(c.URL, url)b = StringIO.StringIO()c.setopt(c.WRITEFUNCTION, b.write)c.perform() print b.getvalue()curl.setopt(pycurl.POSTFIELDS, urllib.urlencode(data))各模块的区别:Python的urllib集合了python2的urllib和urllib2两个模块,解决了urllib不可伪装user Agent和urllib2无urlencode两个主要问题,功能齐全
Urllib3是在urllib基础上集成第三方库,提供了:线程安全,连接池等新的特性
Requests:是使用最广泛的,使用最简单,功能全面
Httplib2实现了HTTP和HTTPS的客户端协议,一般不直接使用,在python更高层的封装模块中(urllib,urllib2)使用了它的http实现相关参考文章: