代理的设置
在urllib库中使用代理,代码如下:
from urllib.request importProxyHandler,build_openerfrom urllib.error importURLError
proxy= "113.116.50.182:808"proxy_handler=ProxyHandler({"http":"http://"+proxy,"https":"https://"+proxy,
})
opener=build_opener(proxy_handler)try:
response= opener.open("http://httpbin.org/ip")print(response.read().decode())exceptURLError as e:print("ip不能用")
显示为下面的情况,说明代理设置成功:
{"origin": "113.116.50.182, 113.116.50.182"}
对于需要认证的代理,,只需要改变proxy变量,在代理前面加入代理认证的用户名密码即可:"username:password@113.116.50.182"
from urllib.request importProxyHandler,build_openerfrom urllib.error importURLError
proxy= "username:password@113.116.50.182:808"proxy_handler=ProxyHandler({"http":"http://"+proxy,"https":"https://"+proxy,
})
opener=build_opener(proxy_handler)try:
response= opener.open("http://httpbin.org/ip")print(response.read().decode())exceptURLError as e:print("ip不能用")
如果遇到了socks代理服务器:
采用socks协议的代理服务器就是SOCKS服务器,是一种通用的代理服务器。Socks是个电路级的底层网关,是DavidKoblas在1990年开发的,此后就一直作为Internet RFC标准的开放标准。Socks 不要求应用程序遵循特定的操作系统平台,Socks 代理与应用层代理、 HTTP 层代理不同,Socks 代理只是简单地传递数据包,而不必关心是何种应用协议(比如FTP、HTTP和NNTP请求)。所以,Socks代理比其他应用层代理要快得多。
代码设置如下:
importsocksimportsocketfrom urllib importrequestfrom urllib.error importURLError
socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)
socket.socket=socks.socksockettry:
response= request.urlopen("http://httpbin.org/ip")print(response.read().decode())exceptURLError as e:print("ip不能用")
requests库代理设置
importrequests
proxy= "113.116.50.182:808"proxies={"http":"http://"+proxy,"https":"https://"+proxy,
}try:
response= requests.get("http://httpbin.org/ip",proxies=proxies)print(response.text)exceptrequests.exceptions.ConnectionError as e:print("Error",e.args)
比urllib中使用代理设置要简单的多,当然这里对于需要认证的代理,同样使用proxy = “username:password@113.116.50.182:808”即可,这里不再演示
对于requests库中使用socks5代理,设置如下:
importrequestsimportsocksimportsocket
socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)
socket.socket=socks.socksockettry:
response= requests.get("http://httpbin.org/ip")print(response.text)exceptrequests.exceptions.ConnectionError as e:print("Error",e.args)
Selenium中设置代理
鉴于PhantomJS无界面浏览器已经无人维护,这里只演示有界面浏览器Chrome
from selenium importwebdriver
proxy= "113.116.50.182:808"chromeOptions=webdriver.ChromeOptions()
chromeOptions.add_argument('--proxy-server=http://'+proxy)
driver= webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chromeOptions)
driver.get("http://httpbin.org/ip")print(driver.page_source)
爬取结果如下:
{"origin": "113.116.50.182, 113.116.50.182"}
注意:chromeOptions目前需要使用options代替
对于在Selenium中使用认证代理,稍微麻烦一些,以后直接修改以下代码即可
from selenium importwebdriverfrom selenium.webdriver.chrome.options importOptionsimportzipfile
ip= '113.116.50.182'port= 808username= 'xxxx'password= 'xxxx'manifest_json= """{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
}
}"""background_js= """var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "%(ip)s",
port: %(port)s
}
}
}
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "%(username)s",
password: "%(password)s"
}
}
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: [""]},
['blocking']
)""" % {'ip': ip, 'port': port, 'username': username, 'password': password}
plugin_file= 'proxy_auth_plugin.zip'with zipfile.ZipFile(plugin_file,'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
chrome_options=Options()
chrome_options.add_argument("--start-maximized")
chrome_options.add_extension(plugin_file)
browser= webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chrome_options)
browser.get('http://httpbin.org/ip')