『scrapy爬虫』07. scrapy中使用代理（详细注释步骤）

发现你走远了

于 2024-03-18 10:30:25 发布

阅读量624

点赞数 8

分类专栏： python # python爬虫 # scrapy爬虫文章标签： scrapy 爬虫

本文链接：https://blog.csdn.net/u011027547/article/details/136561713

版权

python 同时被 3 个专栏收录

201 篇文章 155 订阅

订阅专栏

python爬虫

47 篇文章 16 订阅

订阅专栏

scrapy爬虫

14 篇文章 0 订阅

订阅专栏

start_requests中添加代理

pipelines.py通道中对应的通道的start_requests

    def start_requests(self) :
        for page in range(10): #10页
            yield Request(
                url=f'https://movie.douban.com/top250?start={page*25}&filter=',
                meta={'proxy':"socket5://127.0.0.1:1086"},#socket5代理
                # meta={'proxy':"http://127.0.0.1:1086"}#购买的商业代理一般是http给一个api接口
            )

中间件中添加代理

middlewares.py中的MyscrapyDownloaderMiddleware下面的process_request函数

class MyscrapyDownloaderMiddleware:
	#------省略各种函数--------
    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called
        request.meta={'proxy':"socket5://127.0.0.1:1086"}#在中间件中请求前拦截请求 添加代理
        return None