scrapy代理ip

最新推荐文章于 2024-08-16 19:12:04 发布

不愿透露姓名的菜鸟

最新推荐文章于 2024-08-16 19:12:04 发布

阅读量729

点赞数 1

分类专栏：爬虫开发学习

本文链接：https://blog.csdn.net/Homewm/article/details/77477399

版权

本文介绍了两种在Scrapy中设置代理IP的方法。第一种是通过获取西刺网站上的代理IP存入数据库，在中间件中调用；第二种是在下载器中直接添加代理IP，如proxy_ips.py文件。这两种方式能帮助提升爬虫的匿名性和防止被目标网站封禁。

摘要由CSDN通过智能技术生成

方法1：

首先可以在类似西刺网站获取ip并存储在数据库

然后在spider的middlewares.py中添加代理ip

# importing base64 library because we'll need it ONLY in case 
#if the proxy we are going to use requires authentication
#-*- coding:utf-8-*-
import base64
from proxy import GetIp,counter
import logging
ips=GetIp().get_ips()  ##########################################

class ProxyMiddleware(object):
    http_n=0     #counter for http requests
    https_n=0    #counter for https requests  
    # overwrite process request
    def process_request(self, request, spider):
        # Set the location of the proxy
        if request.url.startswith("http://"):
            n=ProxyMiddleware.http_n
            n=n if n<len(ips['http']) else 0 
            request.meta['proxy']= "http://%s:%d"%(
                ips['http'][n][0],int(ips['http'][n][1]))
            logging.info(