Scrapy动态设置User-Agent
1、middlewares.py里添加
‘’’
这个类主要用于产生随机User-Agent
‘’’
class RandomUserAgent(object):
def __init__(self, agents):
self.agents = agents
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.settings.getlist('USER_AGENTS'))
def process_request(self, request, spider):
request.headers.setdefault('User-Agent', random.choice(self.agents))
2、settings.py里修改DOWNLOADER_MIDDLEWARES :
DOWNLOADER_MIDDLEWARES = {
‘xxxxx.middlewares.RandomUserAgent’: 544,
}
3、settings.py再添加:
USER_AGENTS = [
“Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; Nexus S Build/GRK39F) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1”,
“Avant Browser/1.2.789rel1 (http://www.avantbrowser.com)”,

本文介绍了如何在Scrapy中动态设置User-Agent,通过在middlewares.py中创建RandomUserAgent类,然后在settings.py中配置DOWNLOADER_MIDDLEWARES,最后提供了一个包含多种User-Agent的列表供随机选择。
最低0.47元/天 解锁文章
1209

被折叠的 条评论
为什么被折叠?



