目录
本文以 scrapy爬虫爬取完整小说为基础
一、入库Mysql
1.1、安装pymysql插件
执行命令:pip3 install pymysql
1.2、创建MysqlPipeline类
在pipelines.py
文件中新增MysqlPipeline
类,将小说内容写入Mysql数据库(需要先创建t_xiaoshuo表
,connect()
中的参数根据实际情况修改)
# 入库Mysql
class MysqlPipeline(object):
def open_spider(self, spider):
self.client = connect(host='localhost', port=3306, user='root', password='root', db='testdb', charset='utf8')
self.cursor = self.client.cursor()
def process_item(self, item, spider):
sql = 'insert into t_xiaoshuo values(0,%s,%s)'
self.cursor.execute(sql, [item['title'], item['content']])
self.client.commit()
return item
def close_spider(self, spider):
self.cursor.close()
self.client.close()
1.3、配置文件修改
在settings.py
中将ITEM_PIPELINES
改为MysqlPipeline
ITEM_PIPELINES = {
'xiaoshuospider.pipelines.MysqlPipeline': 301,
}
1.4、效果查看
二、入库Mongo
2.1、安装pymongo插件
执行命令:pip3 install pymongo
2.2、创建MongoPipeline类
在pipelines.py
文件中新增MongoPipeline
类,将小说内容写入Mongo数据库
# 入库Mongo
class MongoPipeline(object):
def open_spider(self, spider):
self.client = MongoClient()
self.db = self.client.testdb
self.xiaoshuo = self.db.xiaoshuo
def process_item(self, item, spider):
self.xiaoshuo.insert(item)
return item
def close_spider(self, spider):
self.client.close()
2.3、配置文件修改
在settings.py
中将ITEM_PIPELINES
改为MysqlPipeline
ITEM_PIPELINES = {
'xiaoshuospider.pipelines.MongoPipeline': 302
}