爬虫之scrapy数据库存储

最新推荐文章于 2024-07-21 21:11:33 发布

某得感情

最新推荐文章于 2024-07-21 21:11:33 发布

阅读量778

点赞数 4

文章标签：爬虫 scrapy

本文链接：https://blog.csdn.net/weixin_53984419/article/details/137052150

版权

在记录数据库存储数据之前，先记录一下items.py的作用，itmes.py是scrapy提供的一种数据封装的工具，与字典类似，都是有键和值构成，其中key需要再itmes.py中提前定义。

class NewItem(scrapy.Item):
    # define the fields for your item here like:
    #name = scrapy.Field()
    id = scrapy.Field()
    name = scrapy.Field()
    category = scrapy.Field()
    geme_times = scrapy.Field()

scrapy此举主要是为了避免传递字典时出现key值错误（我瞎想的）。items中key的定义scrapy有模板，直接照着写就行。在使用时需要先实例化

#实例化
 new_key = NewItem()
# 传入值           
new_key["id"] = id
new_key["name"] = name
new_key["category"] =  category
new_key["geme_times"] = game_time
yield new_key

在spider中使用时需要导入items，这里有个坑，有很多人会遇到导进入了但是还是报错，这是pycharm导包的问题，这样写就没问题了。

from ..items import NewItem

接下来就该进入正题了

1.存储导excel中

都知道存储实在管道中进行的，管道可以有多个，在数据处理管道完成后会将数据传给数据存储管道，在存储管道中需要再open_spider中写打开文件的代码，在close_spider中写关闭文件的代码，open_spider是爬虫开始前执行的方法，close_spider是爬虫关闭后执行的方法

代码如下

class NewPipeline:
    def process_item(self, item, spider):
        print(str(item['id'])+item['name']+item['category']+item['game_times'])
        return item
#先新建一个管道
class data_csv_Pipeline:
#在爬虫开始前打开文件
    def open_spider(self, spider):
        self.f = open('data.csv', 'w',encoding='utf-8')
        print('我要开始写了')
#在爬虫结束后关闭文件
    def close_spider(self, spider):
        self.f.close()
        print('我写完了')
#写入文件
    def process_item(self, item, spider):
        self.f.write(str(item['id'])+item['name']+item['category']+item['game_times']+'\n')

        return item

最后别忘了在setting中把刚刚新建的管道打开

ITEM_PIPELINES = {
   "new.pipelines.NewPipeline": 300,
   "new.pipelines.data_csv_Pipeline": 301,
}

2.保存到数据库中

其实流程和上面是差不多的，在open_spider中写连接数据库的代码，在close_spider中写关闭数据库连接的代码

代码如下

#新建一个管道
class data_mysql_Pipeline:
#爬虫开始前打开数据库连接
    def open_spider(self, spider):
        self.con = pymysql.connect(host='localhost',
                              user='root',
                              password='root',
                              db='day01',
                              port=3306)
#爬虫结束后关闭连接
    def close_spider(self, spider):
        if self.con:
            self.con.close()
#把数据存储到数据库
    def process_item(self, item, spider):
        try:
            # 创建游标
            cur = self.con.cursor()
            sql = "insert into game (id, name, category,time) values (%s,%s,%s,%s)"
            #print(type(item['id']))
            cur.execute(sql, (item['id'], item['name'], item['category'],item['game_times']))

            self.con.commit()
        except:
            print("出错了")
            self.con.rollback()
        finally:
            if cur:
                cur.close()
        return item

同样的需要在setting中要打开管道

3.优化存储

在存储到数据库时，把数据库的信息写在open_spider中肯定是没问题的，但是还是写在setting中最好，方便修改。修改如下：

#在setting中
MYSQL ={
      'host':'localhost',
      'user':'root',
      'password':'root',
      'db':'day01',
      'port': 3306
}

from settings import MYSQL
class data_mysql_Pipeline:
    def open_spider(self, spider):
        self.con = pymysql.connect(host=MYSQL['host'],
                              user=MYSQL['user'],
                              password=MYSQL['password'],
                              db=MYSQL['db'],
                              port=MYSQL['port'])
    def close_spider(self, spider):
        if self.con:
            self.con.close()
    def process_item(self, item, spider):
        try:
            cur = self.con.cursor()
            sql = "insert into game (id, name, category,time) values (%s,%s,%s,%s)"
            #print(type(item['id']))
            cur.execute(sql, (item['id'], item['name'], item['category'],item['game_times']))

            self.con.commit()
        except:
            print("出错了")
            self.con.rollback()
        finally:
            if cur:
                cur.close()
        return item

某得感情

关注

4
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
爬虫之scrapy数据库存储

在记录数据库存储数据之前，先记录一下items.py的作用，itmes.py是scrapy提供的一种数据封装的工具，与字典类似，都是有键和值构成，其中key需要再itmes.py中提前定义。scrapy此举主要是为了避免传递字典时出现key值错误（我瞎想的）。items中key的定义scrapy有模板，直接照着写就行。在使用时需要先实例化在spider中使用时需要导入items，这里有个坑，有很多人会遇到导进入了但是还是报错，这是pycharm导包的问题，这样写就没问题了。接下来就该进入正题了。
复制链接

扫一扫