scrapy爬取成功后可以保存在本地或者数据库,保存的格式也是多样的。可参考官方文档
https://docs.scrapy.org/en/latest/topics/item-pipeline.html#writing-your-own-item-pipeline
本文总结保存mysql
首先,setting.py文件配置
ITEM_PIPELINES = {
xxxxx
'ArticleSpider.pipelines.MysqlPipeline': 20,
xxxxx
}
pipelines.py中写数据库保存的具体方法:
MysqlPipeline(采用同步机制,且不添加去重逻辑时,最简单的实现)
class MysqlPipeline(object):
def __init__(self):
self.conn = MySQLdb.connect('xxxx', 'mysql', 'xxxx', 'xxxx', charset="utf8", use_unicode=True)
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
#新的url进行存储
insert_novelinfo_sql = """
insert into novel_info(novel_id, novel_url,title,author,introduction,category,picture_url,picture_path,update_time)
VALUES (%s, %s, %s, %s,%s, %s, %s, %s, %s)
"""
insert_noveldetail_sql = """
insert into novel_content(novel_id, chapter_url,chapter_id,chapter_name,novel_detail)
VAL