安装:
pip install scrapy_proxies
github: https://github.com/aivarsk/scrapy-proxies
scrapy爬虫配置文件settings.py
:
# Retry many times since proxies often fail RETRY_TIMES = 10 # Retry on most error codes since proxies fail for different reasons RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408] DOWNLOADER_MIDDLEWARES = { 'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90, 'scrapy_proxies.RandomProxy': 100, 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110, } # Proxy list containing entries like # http://host1:port # http://username:password@host2:port # http://host3:port # 这是存放代理IP列表的位置 PROXY_LIST = '/path/to/proxy/list.txt' #代理模式 # 0 = Every requests have different proxy # 1 = Take only one proxy from the list and assign it to every requests # 2 = Put a custom proxy to use in the settings PROXY_MODE = 0 #如果使用模式2,将下面解除注释: #CUSTOM_PROXY = "http://host1:port" 使用方法: 将之前用Python爬到的代理IP列表存储到PROXY_LIST可以找到的位置; 几种PROXY_MODE里,可能0是最常用的;如果有哪个IP是特别稳定的话,应该使用2。