Python scrapy爬取古诗文网，数据存入mongo

最新推荐文章于 2024-06-13 11:18:37 发布

@懒羊羊

最新推荐文章于 2024-06-13 11:18:37 发布

阅读量395

点赞数

分类专栏：爬虫

本文链接：https://blog.csdn.net/qq_46659912/article/details/109677398

版权

一、在items.py文件中定义数据结构
title: 诗词的标题
writer：诗词的作者
dynasty：诗词编写的朝代
content: 诗词的正文
content_url：正文链接
二、shici.py分析爬取内容
三、settings.py配置相关内容
四、pipelines.py中写入mongo

1、items.py文件

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class ShiciItem(scrapy.Item):
    title = scrapy.Field()
    writer = scrapy.Field()
    dynasty = scrapy.Field()
    content = scrapy.Field()
    content_url = scrapy.Field()

2、shici.py文件

import scrapy
from ..items import ShiciItem


class ShiciSpider(scrapy.Spider):
    name = 'shici'
    allowed_domains = [

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

@懒羊羊

关注关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Python scrapy爬取古诗文网，数据存入mongo

一、在items.py文件中定义数据结构title: 诗词的标题writer：诗词的作者dynasty：诗词编写的朝代content: 诗词的正文content_url：正文链接二、shici.py分析爬取内容三、settings.py配置相关内容四、pipelines.py中写入mongo1、items.py文件# Define here the models for your scraped items## See documentation in:# https://do
复制链接

扫一扫