一些数据库要使用到的知识
虚拟机中安装MySQL,这里以CentOs7为例:
mysql基础语法 :http://t.csdnimg.cn/8IEQU
- 启动 mysql:
-
service mysql start
- 登录mysql:
-
mysql -u root -p
- 创建一个数据库:
-
create database books;
细节:database 不是datebase
报错:You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'datebase books' at line 1
一句写完后面有 ;
- 进入数据库并建表
-
> use books; > create table book(id int primary key auto_increment,name char(16),src char(20));
爬虫连接数据库
1.编辑setting文件随便找一个位置写(6个),即注意事项:
DB_HOST = '198.168.100.100'
# 虚拟机上通过 ifconfig 指令查看
DB_PORT = 3306
#MySQL端口,一般为3306
DB_NAME = 'books'
#数据库名字
DB_USER = 'root'
#用户名,默认为root
DB_PASSWORD = '123456'
DB_CHARSET = 'utf8'
#不能写utf-8,因为识别不了-
2.爬虫连接数据库pymysql:
使用自定义pipelines下载,首先设置setting文件的管道(打开管道),随后在pipelines中写入:
- 打开连接,设置游标
-
from scrapy.utils.project import get_project_settings import pymysql # 保存到数据库中 class ReadingDBLoad: def open_spider(self,spider): settings = get_project_settings() self.host = settings['DB_HOST'] self.port = settings['DB_PORT'] self.user = settings['DB_USER'] self.password = settings['DB_PASSWD'] self.name = settings['DB_NAME'] self.charset = settings['DB_CHARSET'] self.connect() def connect(self): self.conn = pymysql.connect( host=self.host, port=self.port, user=self.user, password=self.password, db = self.name, charset=self.charset ) self.cursor = self.conn.cursor()
- 写入内容
-
def process_item(self, item, spider): sql = "insert into book(name,src) values('{}','{}')".format(item['name'],item['src']) self.cursor.execute(sql) self.conn.commit() return item
- 关闭连接
-
def close_spider(self): self.cursor.close() self.conn.close()
总代码:
from scrapy.utils.project import get_project_settings
import pymysql
# 保存到数据库中
class ReadingDBLoad:
def open_spider(self,spider):
settings = get_project_settings()
self.host = settings['DB_HOST']
self.port = settings['DB_PORT']
self.user = settings['DB_USER']
self.password = settings['DB_PASSWD']
self.name = settings['DB_NAME']
self.charset = settings['DB_CHARSET']
self.connect()
def connect(self):
self.conn = pymysql.connect(
host=self.host,
port=self.port,
user=self.user,
password=self.password,
db = self.name,
charset=self.charset
)
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
sql = "insert into book(name,src) values('{}','{}')".format(item['name'],item['src'])
self.cursor.execute(sql)
self.conn.commit()
return item
def close_spider(self):
self.cursor.close()
self.conn.close()