* coding : utf-8 *
@Time : 2021/11/25 15:44
@Author : Harken
为什么要学习handler?
urllib.request.urlopen(url) 不能定制请求头
urllib.request.Request(url,headers,data) 可以定制请求头
handler 定制更高级的请求头(动态cookie和代理不能使用请求对象的定制)
#例子
import urllib.request
url = ‘http://www.baidu.com’
headers = {
‘User-Agent’: ’ Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36’
}
request = urllib.request.Request(url=url,headers=headers)
handler build_opener open
1.获取handler对象
handler = urllib.request.HTTPHandler()
2.获取opener对象
opener = urllib.request.build_opener(handler)
3.调用open方法
response = opener.open(request)
content = response.read().decode(‘utf-8’)
print(content)
代理(https://www.kuaidaili.com/free/ 快代理-免费代理)
代码配置代理
创建Request对象
创建proxyhanler对象
用handler对象创建opener对象
使用opener.open函数发送请求
import urllib.request
url = ‘http://www.baidu.com/s?wd=ip’
headers = {
‘User-Agent’: ’ Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36’
}
request = urllib.request.Request(url=url,headers=headers)
proxies = {‘http’:‘113.125.156.47:8888’}
handler = urllib.request.ProxyHandler(proxies=proxies)
opener = urllib.request.build_opener(handler)
response = opener.open(request)
content = response.read().decode(‘utf-8’)
with open(‘daili.html’,‘w’,encoding=‘utf-8’)as fp:
fp.write(content)
代理池
自制代理池
import urllib.request
proxies_pool = [
{‘http’:‘113.125.156.47:8888’},
{‘http’:‘113.125.156.47:9999’}
]
import random
proxies = random.choice(proxies_pool)
url = ‘http://www.baidu.com/s?wd=ip’
headers = {
‘User-Agent’: ’ Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36’
}
request = urllib.request.Request(url=url,headers=headers)
handler = urllib.request.ProxyHandler(proxies=proxies)
opener = urllib.request.build_opener(handler)
response = opener.open(request)
content = response.read().decode(‘utf-8’)
with open(‘daili.html’,‘w’,encoding=‘utf-8’)as fp:
fp.write(content)