python—scrapy数据库存储到mysql,redis,mongo

郑*杰

已于 2022-03-26 22:47:49 修改

阅读量612

点赞数

分类专栏： python三方库文章标签： python

于 2022-03-26 22:47:05 首次发布

本文链接：https://blog.csdn.net/qq_34577961/article/details/123764952

版权

python三方库专栏收录该内容

24 篇文章 0 订阅

订阅专栏

redis语法，python使用redis_郑*杰的博客-CSDN博客

python-pymongo模块_郑*杰的博客-CSDN博客

python操作mysql数据库_郑*杰的博客-CSDN博客

基本步骤：python—scrapy数据解析、存储_郑*杰的博客-CSDN博客

正文：

当前文件：D:\python_test\scrapyProject\scrapyProject\settings.py

ITEM_PIPELINES = {
   #数字表示管道类被执行的优先级，数字越小表示优先级越高
   'xiaoshuoPro.pipelines.MysqlPipeline': 300,
   'xiaoshuoPro.pipelines.RedisPipeLine': 301,
   'xiaoshuoPro.pipelines.MongoPipeline': 302,
}

当前文件：D:\python_test\scrapyProject\scrapyProject\pipelines.py

from itemadapter import ItemAdapter
import pymysql
import redis
import pymongo

# 数据存储到mysql
class MysqlPipeline:
    def open_spider(self,spider):
        self.conn = pymysql.Connect(
            host = '127.0.0.1',
            port = 3306,
            user = 'root',
            password = 'root',
            db = 'test',
            charset = 'utf8'
        )
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):
        title = item['title']
        sql = f'insert into xiaoshuo (title) values ("{title}")'
        self.cursor.execute(sql)
        self.conn.commit()
        print('成功写入一条数据！')
       # 爬虫文件只会将item提交给优先级最高的管道类。优先级最高的管道类的process_item中需要写return item操作，该操作表示将item对象传递给下一个管道类
        return item

    def close_spider(self,spider):
        self.cursor.close()
        self.conn.close()

# 数据存储到redis中
class RedisPipeLine:

    def open_spider(self,spider):
        self.conn = redis.Redis(
            host='127.0.0.1',
            port=6379
        )
    def process_item(self,item,spider):
        self.conn.lpush('xiaoshuo',item)
        print('数据存储redis成功！')
        return item

    def close_spider(self,spider):
        self.conn.close()

# 数据存储到Mongo中
class MongoPipeline:
    def open_spider(self, spider):
        self.conn = pymongo.MongoClient(host='127.0.0.1',port=27017)
        self.db_test = self.conn['test']

    def process_item(self, item, spider):
        self.db_test['xiaoshuo'].insert_one({'title': item['item_title']})
        print('插入成功！')
        return item

    def close_spider(self,spider):
        self.conn.close()

郑*杰

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
python—scrapy数据库存储到mysql,redis,mongo

redis语法，python使用redis_郑*杰的博客-CSDN博客python-pymongo模块_郑*杰的博客-CSDN博客python操作mysql数据库_郑*杰的博客-CSDN博客基本步骤：python—scrapy数据解析、存储_郑*杰的博客-CSDN博客当前文件：D:\python_test\scrapyProject\scrapyProject\settings.pyITEM_PIPELINES = { #数字表示管道类被执行的优先级，数字越小表示优先级越高
复制链接

扫一扫