我正在玩scrapy,现在我尝试搜索不同的关键字,从命令行工具传递参数。 基本上,我想定义一个关键字,爬虫应该搜索包含这个关键字的url。 这是我的命令行:
scrapy crawl myfirst -a nombre="Vermont"
这是我的履带:
class myfirstSpider(CrawlSpider):
name = 'myfirst'
allowed_domains= ["leroymerlin.es"]
start_urls = ["https://www.leroymerlin.es/decoracion-navidena/arboles-navidad?index=%s" % (page_number) for page_number in range(2)]
def __init__(self, nombre=None, *args, **kwargs):
super(myfirstSpider, self).__init__(*args, **kwargs)
rules = (
Rule(LinkExtractor(allow= r'/fp/\*nombre*',), callback = 'parse_item'),)
def parse_item(self, response):
items = myfirstItem()
product_name = response.css ('.titleTechniqueSheet::text').extract()
items['product_name'] = product_name
yield items
不幸的是,它不工作… 欢迎帮助,谢谢! 我找到办法了!它适合我:
class myfirstSpider(CrawlSpider):
name = 'myfirst'
allowed_domains= ["leroymerlin.es"]
start_urls = ["https://www.leroymerlin.es/decoracion-navidena/arboles-navidad?index=%s" % (page_number) for page_number in range(2)]
def __init__(self, nombre=None, *args, **kwargs):
self.rules = (
Rule(LinkExtractor(allow= nombre), callback = 'parse_item'),)
super(myfirstSpider, self).__init__(*args, **kwargs)
def parse_item(self, response):
items = myfirstItem()
product_name = response.css ('.titleTechniqueSheet::text').extract()
items['product_name'] = product_name
yield items
和命令:
scrapy crawl myfirst -a nombre="vermont"
谢谢大家! 问题来源StackOverflow 地址:/questions/59383208/scrapy-how-to-use-arguments-for-multiple-search-terms