python爬虫代理_python爬虫-代理的使用

代理的设置

在urllib库中使用代理,代码如下:

from urllib.request importProxyHandler,build_openerfrom urllib.error importURLError

proxy= "113.116.50.182:808"proxy_handler=ProxyHandler({"http":"http://"+proxy,"https":"https://"+proxy,

})

opener=build_opener(proxy_handler)try:

response= opener.open("http://httpbin.org/ip")print(response.read().decode())exceptURLError as e:print("ip不能用")

显示为下面的情况,说明代理设置成功:

{"origin": "113.116.50.182, 113.116.50.182"}

对于需要认证的代理,,只需要改变proxy变量,在代理前面加入代理认证的用户名密码即可:"username:password@113.116.50.182"

from urllib.request importProxyHandler,build_openerfrom urllib.error importURLError

proxy= "username:password@113.116.50.182:808"proxy_handler=ProxyHandler({"http":"http://"+proxy,"https":"https://"+proxy,

})

opener=build_opener(proxy_handler)try:

response= opener.open("http://httpbin.org/ip")print(response.read().decode())exceptURLError as e:print("ip不能用")

如果遇到了socks代理服务器:

采用socks协议的代理服务器就是SOCKS服务器,是一种通用的代理服务器。Socks是个电路级的底层网关,是DavidKoblas在1990年开发的,此后就一直作为Internet RFC标准的开放标准。Socks 不要求应用程序遵循特定的操作系统平台,Socks 代理与应用层代理、 HTTP 层代理不同,Socks 代理只是简单地传递数据包,而不必关心是何种应用协议(比如FTP、HTTP和NNTP请求)。所以,Socks代理比其他应用层代理要快得多。

代码设置如下:

importsocksimportsocketfrom urllib importrequestfrom urllib.error importURLError

socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)

socket.socket=socks.socksockettry:

response= request.urlopen("http://httpbin.org/ip")print(response.read().decode())exceptURLError as e:print("ip不能用")

requests库代理设置

importrequests

proxy= "113.116.50.182:808"proxies={"http":"http://"+proxy,"https":"https://"+proxy,

}try:

response= requests.get("http://httpbin.org/ip",proxies=proxies)print(response.text)exceptrequests.exceptions.ConnectionError as e:print("Error",e.args)

比urllib中使用代理设置要简单的多,当然这里对于需要认证的代理,同样使用proxy = “username:password@113.116.50.182:808”即可,这里不再演示

对于requests库中使用socks5代理,设置如下:

importrequestsimportsocksimportsocket

socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)

socket.socket=socks.socksockettry:

response= requests.get("http://httpbin.org/ip")print(response.text)exceptrequests.exceptions.ConnectionError as e:print("Error",e.args)

Selenium中设置代理

鉴于PhantomJS无界面浏览器已经无人维护,这里只演示有界面浏览器Chrome

from selenium importwebdriver

proxy= "113.116.50.182:808"chromeOptions=webdriver.ChromeOptions()

chromeOptions.add_argument('--proxy-server=http://'+proxy)

driver= webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chromeOptions)

driver.get("http://httpbin.org/ip")print(driver.page_source)

爬取结果如下:

{"origin": "113.116.50.182, 113.116.50.182"}

注意:chromeOptions目前需要使用options代替

对于在Selenium中使用认证代理,稍微麻烦一些,以后直接修改以下代码即可

from selenium importwebdriverfrom selenium.webdriver.chrome.options importOptionsimportzipfile

ip= '113.116.50.182'port= 808username= 'xxxx'password= 'xxxx'manifest_json= """{

"version": "1.0.0",

"manifest_version": 2,

"name": "Chrome Proxy",

"permissions": [

"proxy",

"tabs",

"unlimitedStorage",

"storage",

"",

"webRequest",

"webRequestBlocking"

],

"background": {

"scripts": ["background.js"]

}

}"""background_js= """var config = {

mode: "fixed_servers",

rules: {

singleProxy: {

scheme: "http",

host: "%(ip)s",

port: %(port)s

}

}

}

chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});

function callbackFn(details) {

return {

authCredentials: {

username: "%(username)s",

password: "%(password)s"

}

}

}

chrome.webRequest.onAuthRequired.addListener(

callbackFn,

{urls: [""]},

['blocking']

)""" % {'ip': ip, 'port': port, 'username': username, 'password': password}

plugin_file= 'proxy_auth_plugin.zip'with zipfile.ZipFile(plugin_file,'w') as zp:

zp.writestr("manifest.json", manifest_json)

zp.writestr("background.js", background_js)

chrome_options=Options()

chrome_options.add_argument("--start-maximized")

chrome_options.add_extension(plugin_file)

browser= webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chrome_options)

browser.get('http://httpbin.org/ip')

参与评论 您还未登录,请先 登录 后发表或查看评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
©️2022 CSDN 皮肤主题:数字20 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值