问题来自于在运行spider过程中,pipeline当中写入的数据库存储过程始终得不到item传递的数据,经调试,发现了pipeline被调用的机制。
写一段代码来测试:
spider.py:
import scrapy
from items import Work1Item #自定义的item,用于结构化数据
class Work1Spider(scrapy.Spider):
name = 'work1'
start_urls = [
'http://quotes.toscrape.com/',
]
def parse(self, response):
for quote in response.xpath('//div[@class="quote"]'):
item = Work1Item()
item['author'] = quote.xpath('.//small[@class="author"]/text()').extract_first()
item['tags'] = quote.xpath('.//div[@class="tags"]/a[@class="tag"]/text()').extract()
item['quote'] = quote.xpath('./span[@class="text"]/text()').extract_first()
next_page_url = response.xpath(&#