一、在items.py文件中定义数据结构
title: 诗词的标题
writer: 诗词的作者
dynasty:诗词编写的朝代
content: 诗词的正文
content_url: 正文链接
二、shici.py分析爬取内容
三、settings.py配置相关内容
四、pipelines.py中写入mongo
1、items.py文件
# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html
import scrapy
class ShiciItem(scrapy.Item):
title = scrapy.Field()
writer = scrapy.Field()
dynasty = scrapy.Field()
content = scrapy.Field()
content_url = scrapy.Field()
2、shici.py文件
import scrapy
from ..items import ShiciItem
class ShiciSpider(scrapy.Spider):
name = 'shici'
allowed_domains = [