add_header() 添加header头
例:from urllib import request as sa
url = 'https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/84302896'
r = sa.Request(url)
r.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Safari/537.36 Core/1.63.6776.400 QQBrowser/10.3.2601.400')
d = sa.urlopen(r).read()
发送post数据,urlencode() 整理数据、encode() 转换编码
例:from urllib import request as sa
from urllib import parse as sp
url = 'http://www.iqianyue.com/mypost/'
p = sp.urlencode({
'name':111,
'pass':222,
}).encode('utf-8')
r = sa.Request(url,p)
d = sa.urlopen(r).read()
http://yum.iqianyue.com/proxy 代理服务器地址
使用代理服务器爬取网站信息
ProxyHandler() 设置对应的代理服务器信息
build_opener() 创建opener工具
install_opener() 创建全局opener对象
例:from urllib import request as sa
from urllib import parse as sp
def up(p,url):
pr = sa.ProxyHandler({'http':p})
op = sa.build_opener(pr,sa.HTTPHandler)
sa.install_opener(op)
da = sa.urlopen(url).read().decode('utf-8')
return da
p = '219.234.5.128:3128'
url = 'http://www.baidu.com'
da = up(p,url)
print(da)
DebugLog设置
HTTPHandler() debuglevel=1
HTTPSHandler() debuglevel=1
build_opener() 创建opener对象并使用HTTPHandler、HTTPSHandler设置的参数
install_opener() 创建全局默认opener对象
例:from urllib import request as sa
ht = sa.HTTPHandler(debuglevel=1)
hs = sa.HTTPSHandler(debuglevel=1)
op = sa.build_opener(ht,hs)
sa.install_opener(op)
da = sa.urlopen("http://edu.51cto.com")
print(da)
send: b'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: edu.51cto.com\r\nUser-Agent: Python-urllib/3.7\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 27 Nov 2018 03:30:42 GMT
header: Content-Type: text/html; charset=UTF-8
header: Transfer-Encoding: chunked
header: Connection: close
header: Set-Cookie: acw_tc=276aedef15432894423986507e64d81a2f2aba60d34ca9de13e960bac343d2;path=/;HttpOnly;Max-Age=2678401
header: Server: nginx
header: Vary: Accept-Encoding
header: Vary: Accept-Encoding
header: X-Powered-By: PHP/7.1.9
header: Set-Cookie: acw_tc=276aedef15432894423986507e64d81a2f2aba60d34ca9de13e960bac343d2;path=/;HttpOnly;Max-Age=2678401
header: Set-Cookie: acw_tc=276aedef15432894423986507e64d81a2f2aba60d34ca9de13e960bac343d2;path=/;HttpOnly;Max-Age=2678401
header: Set-Cookie: acw_tc=276aedef15432894423986507e64d81a2f2aba60d34ca9de13e960bac343d2;path=/;HttpOnly;Max-Age=2678401
header: Load-Balancing: web01
header: Load-Balancing: web01