一、概述
之前利用Scrapy爬取的数据,都是写入在json文件中,现在需要写入到mysql中。
在items.py中,主要有2个字段:
class CityItem(scrapy.Item):
name = scrapy.Field()
url = scrapy.Field()
环境说明
mysql服务器ip:192.168.0.3
用户名:root
密码:abcd@1234
创建数据库
CREATE DATABASE qunar CHARACTER SET utf8 COLLATE utf8_general_ci;
创建表test
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
二、pipelines写入数据库
修改settings.py
MYSQL_HOST = "192.168.0.3"
MYSQL_PORT = 3306
MYSQL_DBNAME = "qunar"
MYSQL_USER = "root"
MYSQL_PASSWORD = "abcd@1234"
修改pipelines.py,内容如下:
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
# useful for handling different item types with a single interface
# from itemadapter import ItemAdapter
import pymysql
from twisted.enterprise import adbapi
# 异步更新操作
class LvyouPipeline(object):
def __init__(self, dbpool):
self.dbpool = dbpool
@classmethod
def from_settings(cls, settings): # 函数名固定,会被scrapy调用,直接可用settings的值
"""
数据库建立连接
:param settings: 配置参数
:return: 实例化参数
"""
adbparams = dict(
host=settings['MYSQL_HOST'],
port=settings['MYSQL_PORT'],
db=settings['MYSQL_DBNAME'],
user=settings['MYSQL_USER'],
password=settings['MYSQL_PASSWORD'],
cursorclass=pymysql.cursors.DictCursor # 指定cursor类型
)
# 连接数据池ConnectionPool,使用pymysql或者Mysqldb连接
dbpool = adbapi.ConnectionPool('pymysql', **adbparams)
# 返回实例化参数
return cls(dbpool)
def process_item(self, item, spider):
"""
使用twisted将MySQL插入变成异步执行。通过连接池执行具体的sql操作,返回一个对象
"""
query = self.dbpool.runInteraction(self.do_insert, item) # 指定操作方法和操作数据
# 添加异常处理
query.addCallback(self.handle_error) # 处理异常
def do_insert(self, cursor, item):
# 对数据库进行插入操作,并不需要commit,twisted会自动commit
insert_sql = """
insert into test(name, url) VALUES (%s,%s)
"""
cursor.execute(insert_sql, (item['name'], item['url']))
def handle_error(self, failure):
if failure:
# 打印错误信息
print(failure)
注意:insert语句,请根据实际情况修改
最后执行爬虫程序,就可以写入数据库了。
本文参考链接: