错误的代码:
class XXSpider(scrapy.Spider):
name = 'xxspider'
allowed_domains = ['https://www.xx.com']
start_urls = ['https://www.xx.com/ask/highlight/']
正确的代码:
class XXSpider(scrapy.Spider):
name = 'xxspider'
allowed_domains = ['www.xx.com']
start_urls = ['https://www.xx.com/ask/highlight/']
这里, allowed_domains中域名设置问题, Request需要的是一组域名而不是一组url
还有一情况也会导致yield scrapy.Request()失效:
系统don't_filter将该Url过滤掉了
解决方案:
yield scrapy.Request(next_url, call_back=self.parse, dont_filter=True)