流程思路
- 将解析数据存到
items对象
- 使用
yield
将items交给管道文件处理 - 在管道文件
pipelines编写代码储存
- 在
setting
配置文件开启管道
案例
setting.py配置文件
ITEM_PIPELINES = {
'qiubaiPro.pipelines.QiubaiproPipeline': 300,
}
爬虫文件
- 必须导入items 中的类
- 将数据录入item
- 用yield item提交给管道
import scrapy
from qiubaiPro.items import QiubaiproItem
class QiubaiSpider(scrapy.Spider):
name = 'qiubai'
start_urls = ['https://www.qiushibaike.com/text/']
def parse(self, response):
div_list = response.xpath("//div[@id='content-left']/div")
data_list = []
for div in div_list:
author = div.xpath("./div/a[2]/h2/text()").extract()[0]
content = div.xpath(".//div[@class='content']/span/text()").extract_first()
item = QiubaiproItem()
item['author'] = author
item['content'] = content
yield item
itmes.py
class QiubaiproItem(scrapy.Item):
author = scrapy.Field()
content = scrapy.Field()
管道文件pipelines.py
- open_spide 开始时执行
- close_spider结束执行
class QiubaiproPipeline(object):
fp = None
def open_spider(self, spider):
print('爬虫开始')
self.fp = open('./qiubai_pipe.txt', 'w', encoding='utf-8')
def close_spider(self, spider):
print('爬虫结束')
self.fp.close()
def process_item(self, item, spider):
author = item['author']
content = item['content']
self.fp.write(author + content + '\n\n\n')
return item