我想在scrapy crawl …命令行中传递一个参数,用于扩展
CrawlSpider中的规则定义,如下所示
name = 'example.com'
allowed_domains = ['example.com']
start_urls = ['http://www.example.com']
rules = (
# Extract links matching 'category.php' (but not matching 'subsection.php')
# and follow links from them (since no callback means follow=True by default).
Rule(SgmlLinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))),
# Extract links matching 'item.php' and parse them with the spider's method parse_item
Rule(SgmlLinkExtractor(allow=('item\.php', )), callback='parse_item'),
)
我希望在命令行参数中指定SgmlLinkExtractor中的allow属性.
我用google搜索并发现我可以在spider的__init__方法中获取参数值,但是如何在命令行中获取参数以在Rule定义中使用?