python中helloworld 获取llow_python – 如何在scrapy中访问crawlspider中的命令行参数？...

最新推荐文章于 2021-01-14 04:32:12 发布

weixin_39849387

最新推荐文章于 2021-01-14 04:32:12 发布

阅读量59

点赞数

文章标签： python中helloworld 获取llow

本文链接：https://blog.csdn.net/weixin_39849387/article/details/111839953

版权

我想在scrapy crawl …命令行中传递一个参数,用于扩展

CrawlSpider中的规则定义,如下所示

name = 'example.com'

allowed_domains = ['example.com']

start_urls = ['http://www.example.com']

rules = (

# Extract links matching 'category.php' (but not matching 'subsection.php')

# and follow links from them (since no callback means follow=True by default).

Rule(SgmlLinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))),

# Extract links matching 'item.php' and parse them with the spider's method parse_item

Rule(SgmlLinkExtractor(allow=('item\.php', )), callback='parse_item'),

)

我希望在命令行参数中指定SgmlLinkExtractor中的allow属性.

我用google搜索并发现我可以在spider的__init__方法中获取参数值,但是如何在命令行中获取参数以在Rule定义中使用？

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注