爬虫将数据导入数据库中

一些数据库要使用到的知识

虚拟机中安装MySQL,这里以CentOs7为例:

mysql基础语法 :http://t.csdnimg.cn/8IEQU

  • 启动 mysql:
  • service mysql start
  • 登录mysql:
  • mysql -u root -p
    
  • 创建一个数据库:
  • create database books;
    

细节:database 不是datebase

        报错:You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'datebase books' at line 1

           一句写完后面有 ;

  •  进入数据库并建表
  • > use books;
    > create table book(id int  primary key auto_increment,name char(16),src char(20));

爬虫连接数据库

1.编辑setting文件随便找一个位置写(6个),即注意事项

DB_HOST = '198.168.100.100'
#  虚拟机上通过 ifconfig 指令查看
DB_PORT = 3306
#MySQL端口,一般为3306
DB_NAME = 'books'
#数据库名字
DB_USER = 'root'
#用户名,默认为root
DB_PASSWORD = '123456'
DB_CHARSET = 'utf8'
#不能写utf-8,因为识别不了-

2.爬虫连接数据库pymysql:

使用自定义pipelines下载,首先设置setting文件的管道(打开管道),随后在pipelines中写入:

  • 打开连接,设置游标
  • from scrapy.utils.project import get_project_settings
    import pymysql
        # 保存到数据库中
    class ReadingDBLoad:
        def open_spider(self,spider):
            settings = get_project_settings()
            self.host = settings['DB_HOST']
            self.port = settings['DB_PORT']
            self.user = settings['DB_USER']
            self.password = settings['DB_PASSWD']
            self.name = settings['DB_NAME']
            self.charset = settings['DB_CHARSET']
            self.connect()
    
        def connect(self):
            self.conn = pymysql.connect(
                            host=self.host,
                            port=self.port,
                            user=self.user,
                            password=self.password,
                            db = self.name,
                            charset=self.charset
            )
            self.cursor = self.conn.cursor()
    
  • 写入内容
  •     def process_item(self, item, spider):
            sql = "insert into book(name,src) values('{}','{}')".format(item['name'],item['src'])
            self.cursor.execute(sql)
            self.conn.commit()
    
            return item
  • 关闭连接
  •     def close_spider(self):
            self.cursor.close()
            self.conn.close()

    总代码:

from scrapy.utils.project import get_project_settings
import pymysql
    # 保存到数据库中
class ReadingDBLoad:
    def open_spider(self,spider):
        settings = get_project_settings()
        self.host = settings['DB_HOST']
        self.port = settings['DB_PORT']
        self.user = settings['DB_USER']
        self.password = settings['DB_PASSWD']
        self.name = settings['DB_NAME']
        self.charset = settings['DB_CHARSET']
        self.connect()

    def connect(self):
        self.conn = pymysql.connect(
                        host=self.host,
                        port=self.port,
                        user=self.user,
                        password=self.password,
                        db = self.name,
                        charset=self.charset
        )
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):
        sql = "insert into book(name,src) values('{}','{}')".format(item['name'],item['src'])
        self.cursor.execute(sql)
        self.conn.commit()

        return item

    def close_spider(self):
        self.cursor.close()
        self.conn.close()
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值