Scrapy（六）：scrapy爬取数据保存到MySql数据库

最新推荐文章于 2024-08-23 17:12:49 发布

Sophia$

最新推荐文章于 2024-08-23 17:12:49 发布

阅读量906

点赞数 1

分类专栏： Scrapy Python 文章标签： python mysql 大数据

原文链接：https://zhuanlan.zhihu.com/p/133507215

版权

Python 同时被 2 个专栏收录

14 篇文章 5 订阅

订阅专栏

Scrapy

9 篇文章 0 订阅

订阅专栏

通过往期的文章分享，我们了解了如何爬取想要的数据到Items中，也了解了如何操作MySQL数据库，那么我们继续完善我们的爬虫代码，把爬取的items，保存到MySQL数据库中。

scrapy构架
为了方便操作，我们自己新建一个mysqlpipelines文件夹，编写自己的pipelines.py文件，来运行保存items，在此文件夹下新建sql.py来编写我们保存数据库的sql语句。
编写sql语句
打开sql.py 编写代码，首先要连接数据库
import pymysql.cursors
# 连接数据库
connect = pymysql.Connect(
 host=MYSQL_HOSTS,port=MYSQL_PORT,user=MYSQL_USER,passwd=MYSQL_PASSWORD, db=MYSQL_DB,charset='utf8'
)
cursor = connect.cursor()# 获取游标
print('连接数据库OK')

数据库连接ok后，我们打印一下，以便测试
新建一个类，编写sql语句
class my_sql:
 @classmethod #插入数据
 def insert_data(cls,novelname,author,category,nameid,status,num,url):
 ...................................................................................
 @classmethod #判断数据是否存在
 def select_name(cls,novelname):
 ................................................................................
 @classmethod # 更新数据
 def update_data(cls,author,category,nameid,status,num,url,novelname):
 ................................................................................
 @classmethod # close sql
 def close_sql(cls):
 cursor.close()
 connect.close()
 print('数据库断开连接OK')

类中我们定义了插入，更新，查询的基本sql语句，最后定义一个关闭数据库的操作,中间操作数据库的详细代码请参考往期文件。
编写pipelines
from myproject.mysqlpipelines.sql import my_sql
from myproject.items import PowersItem
#插入新建的sql与item

定义pipelines
class Powerspipeline(object):
 def process_item(self,item,spider):
 if isinstance(item,PowersItem):#判断item是否存在
 novelname=item['novelname']
 author = item['author']
 category = item['category']
 nameid = item['nameid']
 status = item['status']
 num = item['num']
 url = item['novelurl']
#以上获取爬取的数据
 ret=my_sql.select_name(novelname)#判断数据是否存在在数据库中
 if ret[0]==1:#已经存在
 print('已经存在，等待更新')
#若数据库中有以前的数据，更新数据库
 my_sql.update_data(author,category,nameid,status,num,url,novelname)
 pass
 else:
#若数据库中没有数据，保存item
 print('开始保存')
 my_sql.insert_data(novelname,author,category,nameid,status,num,url)
 else :
 print('no find items')
 return item

通过以上的操作我们爬虫的所有代码就完成了，运行代码就可以从数据库中，看到保存的数据