1、 Request & Response
引入方法
引入urllib.request库
-
python 2
import urllib2
response = urllib2.urlopen(“www.baidu.com”) -
python3
import urllib.requset
response = urllib.request.urlopen(“www.baidu.com”)
直接url读取
- 1.0
import urllib.requset
response = urllib.request.urlopen(‘http://www.baidu.com’)
response.read()
此时传回的是bytes(???)需要转码
- 2.0
import urllib.requset
response = urllib.request.urlopen(‘http://www.baidu.com’)
response.read().decode(‘utf-8’)
注意,urlopen,必须指定http协议,不然会报错
- 关于decode&encode
bytes.decode
将bytes按指定编码方式转换为string
str.encode
将string按指定编码转换为bytes
request访问
request = urllib.request.Request(‘http://www.baidu.com’)
response = urllib.request.urlopen(request)
response.read().decode(‘utf-8’)
#获取响应码
response.status
#获取headers信息
response.getheaders()
2、Get & Post
#构造header,设定host,user-agent,accept,connection
header = {
‘host’ : ‘www.baidu.com’
…
}
url = ‘www.baidu.com’
#参数名是headers
request = urllib.request.Request(url, headers = header)
response = urllib.request.urlopen(request)
response.read().decode(‘utf-8’)
3、Handler
免费代理
http://www.xicidaili.com
#设置代理,伪装成其他的ip
proxy_handler = urllib.request.ProxyHandler({
‘http’ : ‘http://219.141.153.3:80’
})
#建立并打开代理
opener = urllib.request.build_opener(proxy_handler)
#利用代理,访问网站
response = opener.open(‘http://www.baidu.com’)
print(response.read().decode(‘utf-8’))
4、Cookie
import http,cookiejar,url.request
#声明cookie对象
cj = http.cookiejar.CookieJar()
#代理处理cookie
hander = urllib.request.HTTPCookieProcessor(cj)
opener = urllib.request.build_opener(hander)
#访问
response = opener.open(“http://www.baidu.com”)
for item in cj:
print(item.name, ‘:’, item.value)
END
以上内容来自视频笔记:https://www.bilibili.com/video/av28236630?from=search&seid=1114191555597217960
可以参考:http://python.jobbole.com/81332/