爬取小说入库Mysql和Mongo

最新推荐文章于 2021-12-30 22:12:02 发布

为谁攀登

最新推荐文章于 2021-12-30 22:12:02 发布

阅读量395

点赞数 1

分类专栏： # 爬虫文章标签： python

本文链接：https://blog.csdn.net/shaixinxin/article/details/107008707

版权

爬虫专栏收录该内容

16 篇文章 0 订阅

订阅专栏

一、入库Mysql

1.1、安装pymysql插件

执行命令：pip3 install pymysql

1.2、创建MysqlPipeline类

在pipelines.py文件中新增MysqlPipeline类，将小说内容写入Mysql数据库（需要先创建t_xiaoshuo表，connect()中的参数根据实际情况修改）

# 入库Mysql
class MysqlPipeline(object):
    def open_spider(self, spider):
        self.client = connect(host='localhost', port=3306, user='root', password='root', db='testdb', charset='utf8')
        self.cursor = self.client.cursor()

    def process_item(self, item, spider):
        sql = 'insert into t_xiaoshuo values(0,%s,%s)'
        self.cursor.execute(sql, [item['title'], item['content']])
        self.client.commit()
        return item

    def close_spider(self, spider):
        self.cursor.close()
        self.client.close()

1.3、配置文件修改

在settings.py中将ITEM_PIPELINES改为MysqlPipeline

ITEM_PIPELINES = {
   'xiaoshuospider.pipelines.MysqlPipeline': 301,
}

1.4、效果查看

在这里插入图片描述

二、入库Mongo

2.1、安装pymongo插件

执行命令：pip3 install pymongo

2.2、创建MongoPipeline类

在pipelines.py文件中新增MongoPipeline类，将小说内容写入Mongo数据库

# 入库Mongo
class MongoPipeline(object):
    def open_spider(self, spider):
        self.client = MongoClient()
        self.db = self.client.testdb
        self.xiaoshuo = self.db.xiaoshuo

    def process_item(self, item, spider):
        self.xiaoshuo.insert(item)
        return item

    def close_spider(self, spider):
        self.client.close()

2.3、配置文件修改

在settings.py中将ITEM_PIPELINES改为MysqlPipeline

ITEM_PIPELINES = {
   'xiaoshuospider.pipelines.MongoPipeline': 302
}

2.4、效果查看

在这里插入图片描述

为谁攀登

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录