Python：使用代理IP 进行网络爬虫

最新推荐文章于 2024-05-17 11:03:15 发布

这般女子

最新推荐文章于 2024-05-17 11:03:15 发布

阅读量3.9k

点赞数 2

分类专栏：爬虫文章标签： python 爬虫 selenium

本文链接：https://blog.csdn.net/xiaoxiaojie521/article/details/111559804

版权

爬虫专栏收录该内容

7 篇文章 3 订阅

订阅专栏

在进行爬虫时，有时爬取的次数多了，会遇到无法访问的情况，很可能是IP被该网站封了，为了避免这个，我们需要可以使用代理IP ，目前有很多网站提供免费的代理IP ，如西拉免费代理IP，快代理等，但是这些IP的缺陷就是很不稳定，可能下一次测试就不能使用了，因此在实际工程中可以购买稳定的代理IP。本文主要介绍如何使用代理IP进行网站爬取。每种方式添加参数不太一样，亲测可用。

1 使用requests

import requests

url ='http://httpbin.org/get'
i = '186.226.174.193:80'

# 设置代理参数
proxies = {
    'http': 'http://' + i,
    'https': 'https://' + i,
}

try:
    response = requests.get(url, proxies=proxies)
    print(response.text)
except requests.exceptions.ConnectionError as e:
    print('Error', e.args)

2 使用selenium

2.1 Chromedriver

from selenium import webdriver

url ='http://httpbin.org/get'
# 设置参数
ip = '20.80.89.20:80'
option = webdriver.ChromeOptions()
option.add_argument(('--proxy-server=' + ip))
browser = webdriver.Chrome(options=option)

# 对网址进行爬取
browser.get(url)
html = browser.page_source

2.2 PlantomJS

from selenium.webdriver import PhantomJS

url ='http://httpbin.org/get'
service_args = [
        '--proxy=20.80.89.20:80',
        '--proxy-type=http']
browser = PhantomJS(service_args=service_args)
# 对网址进行爬取
browser.get(url)
html = browser.page_source

2.3 Firefox

Firefox的代理设置，目前没有找到比较简单的方式，如有请留言。

from selenium import webdriver
from selenium.webdriver import FirefoxOptions
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile


profile = FirefoxProfile()

# 激活手动代理配置
profile.set_preference("network.proxy.type", 1)
# ip及其端口号配置为 http 协议代理
profile.set_preference("network.proxy.http", "157.7.199.56")
profile.set_preference("network.proxy.http_port", 1080)

# 所有协议共用一种 ip 及端口，如果单独配置，不必设置该项，因为其默认为 False
# profile.set_preference("network.proxy.share_proxy_settings", True)

driver = webdriver.Firefox(profile)
url = 'http://httpbin.org/get'
driver.get(url=url)
print(driver.page_source)
driver.close()

参考文献

《Python3 网络爬虫开发实战》
Python | Firefox IP代理

这般女子

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
Python：使用代理IP 进行网络爬虫

在进行爬虫时，有时爬取的次数多了，会遇到无法访问的情况，很可能是IP被该网站封了，为了避免这个，我们需要可以使用代理IP ，目前有很多网站提供免费的代理IP ，如西拉免费代理IP，快代理等，但是这些IP的缺陷就是很不稳定，可能下一次测试就不能使用了，因此在实际工程中可以购买稳定的代理IP。本文主要介绍如何使用代理IP进行网站爬取。每种方式添加参数不太一样，亲测可用。1 使用requestsimport requestsurl ='http://httpbin.org/get'i = '186.22
复制链接

扫一扫