简介
MySQL 是一种关系型数据库管理系统,关系数据库将数据保存在不同的表中,而不是将所有数据放在一个大仓库内,这样就增加了速度并提高了灵活性。
GUI
有时候使用 cmd 进行数据库操作不太方便,因此会借助于 GUI 来进行数据库的编辑和修改,常见用于 MySQL 的 GUI 工具有 workbench 和 navicat,前者是官方出的 GUI。
python 库
如果我们想要将爬取到的数据以数据库的形式存储时,就需要 python 和 MySQL 的连接。常见的驱动 MySQL 的 python 库有 mysqldb,mysqlclient,pymysql 等。按照自己 python 的安装版本自行选择合适的即可。
数据库连接
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='password',
database='pymysql',
port=3306
)
cursor = db.cursor()
for i in range(cursor.execute('show databases')):
print(cursor.fetchone())
db.close()
结果为:
('information_schema',)
('mysql',)
('performance_schema',)
('pymysql',)
('sys',)
数据操作
数据插入
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='password',
database='pymysql',
port=3306
)
command = """
insert into sheet1(id,name) value(105,'qianduo')
"""
cursor = db.cursor()
for i in range(cursor.execute('select * from sheet1')):
print(cursor.fetchone())
cursor.execute(command)
db.commit()
for i in range(cursor.execute('select * from sheet1')):
print(cursor.fetchone())
db.close()
结果为:
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
除了上面提到的执行命令的方式,也可以将参数放到命令外部执行:
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='password',
database='pymysql',
port=3306
)
# 此时 value 中的参数类型应该都是 %s 而不管是否真的是 str,内部会进行处理
command = """
insert into sheet1(id,name) value(%s,%s)
"""
cursor = db.cursor()
cursor.execute(command,(106,'sunjie'))
db.commit()
for i in range(cursor.execute('select * from sheet1')):
print(cursor.fetchone())
db.close()
结果为:
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
(106, 'sunjie')
数据获取
- fetchone:每次获取一条数据
- fetchall:接受全部的返回结果
- fetchmany:获取指定条数的数据
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='password',
database='pymysql',
port=3306
)
command = """
select * from sheet1
"""
cursor = db.cursor()
for i in range(cursor.execute(command)):
print(cursor.fetchone())
print("***************")
cursor.execute(command)
print(cursor.fetchall())
print("***************")
cursor.execute(command)
print(cursor.fetchmany(3))
db.close()
结果为:
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
(106, 'sunjie')
***************
((101, 'zhangsan'), (102, 'lisi'), (103, 'wangwu'), (104, 'zhaoliu'), (105, 'qianduo'), (106, 'sunjie'))
***************
((101, 'zhangsan'), (102, 'lisi'), (103, 'wangwu'))
数据删除
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='2602388671',
database='pymysql',
port=3306
)
delete_command = """
delete from sheet1 where id=101
"""
select_command = """
select * from sheet1
"""
cursor = db.cursor()
for i in range(cursor.execute(select_command)):
print(cursor.fetchone())
print('*********************')
cursor.execute(delete_command)
for i in range(cursor.execute(select_command)):
print(cursor.fetchone())
db.commit()
db.close()
结果为:
(101, 'zhangsan')
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
(106, 'sunjie')
*********************
(102, 'lisi')
(103, 'wangwu')
(104, 'zhaoliu')
(105, 'qianduo')
(106, 'sunjie')
数据更新
import pymysql
db = pymysql.connect(
host='localhost',
user='root',
password='2602388671',
database='pymysql',
port=3306
)
command = """
update sheet1 set name='gaojie' where id=102
"""
select_command = """
select * from sheet1
"""
cursor = db.cursor()
cursor.execute(command)
for i in range(cursor.execute(select_command)):
print(cursor.fetchone())
db.commit()
db.close()
操作步骤
- 首先利用 connect 创建一个 Connection 类对象,进行数据库连接:
class Connection(object):
"""
Representation of a socket with a mysql server.
The proper way to get an instance of this class is to call
connect().
Establish a connection to the MySQL database. Accepts several
arguments:
:param host: Host where the database server is located
:param user: Username to log in as
:param password: Password to use.
:param database: Database to use, None to not use a particular one.
:param port: MySQL port to use, default is usually OK. (default: 3306)
:param bind_address: When the client has multiple network interfaces, specify
the interface from which to connect to the host. Argument can be
a hostname or an IP address.
:param unix_socket: Optionally, you can use a unix socket rather than TCP/IP.
:param read_timeout: The timeout for reading from the connection in seconds (default: None - no timeout)
:param write_timeout: The timeout for writing to the connection in seconds (default: None - no timeout)
:param charset: Charset you want to use.
:param sql_mode: Default SQL_MODE to use.
:param read_default_file:
Specifies my.cnf file to read these parameters from under the [client] section.
:param conv:
Conversion dictionary to use instead of the default one.
This is used to provide custom marshalling and unmarshaling of types.
See converters.
:param use_unicode:
Whether or not to default to unicode strings.
This option defaults to true for Py3k.
:param client_flag: Custom flags to send to MySQL. Find potential values in constants.CLIENT.
:param cursorclass: Custom cursor class to use.
:param init_command: Initial SQL statement to run when connection is established.
:param connect_timeout: Timeout before throwing an exception when connecting.
(default: 10, min: 1, max: 31536000)
:param ssl:
A dict of arguments similar to mysql_ssl_set()'s parameters.
:param read_default_group: Group to read from in the configuration file.
:param compress: Not supported
:param named_pipe: Not supported
:param autocommit: Autocommit mode. None means use server default. (default: False)
:param local_infile: Boolean to enable the use of LOAD DATA LOCAL command. (default: False)
:param max_allowed_packet: Max size of packet sent to server in bytes. (default: 16MB)
Only used to limit size of "LOAD LOCAL INFILE" data packet smaller than default (16KB).
:param defer_connect: Don't explicitly connect on contruction - wait for connect call.
(default: False)
:param auth_plugin_map: A dict of plugin names to a class that processes that plugin.
The class will take the Connection object as the argument to the constructor.
The class needs an authenticate method taking an authentication packet as
an argument. For the dialog plugin, a prompt(echo, prompt) method can be used
(if no authenticate method) for returning a string from the user. (experimental)
:param server_public_key: SHA256 authenticaiton plugin public key value. (default: None)
:param db: Alias for database. (for compatibility to MySQLdb)
:param passwd: Alias for password. (for compatibility to MySQLdb)
:param binary_prefix: Add _binary prefix on bytes and bytearray. (default: False)
See `Connection <https://www.python.org/dev/peps/pep-0249/#connection-objects>`_ in the
specification.
"""
_sock = None
_auth_plugin_name = ''
_closed = False
_secure = False
def __init__(self, host=None, user=None, password="",
database=None, port=0, unix_socket=None,
charset='', sql_mode=None,
read_default_file=None, conv=None, use_unicode=None,
client_flag=0, cursorclass=Cursor, init_command=None,
connect_timeout=10, ssl=None, read_default_group=None,
compress=None, named_pipe=None,
autocommit=False, db=None, passwd=None, local_infile=False,
max_allowed_packet=16*1024*1024, defer_connect=False,
auth_plugin_map=None, read_timeout=None, write_timeout=None,
bind_address=None, binary_prefix=False, program_name=None,
server_public_key=None):
if use_unicode is None and sys.version_info[0] > 2:
use_unicode = True
if db is not None and database is None:
database = db
if passwd is not None and not password:
password = passwd
if compress or named_pipe:
raise NotImplementedError("compress and named_pipe arguments are not supported")
self._local_infile = bool(local_infile)
if self._local_infile:
client_flag |= CLIENT.LOCAL_FILES
if read_default_group and not read_default_file:
if sys.platform.startswith("win"):
read_default_file = "c:\\my.ini"
else:
read_default_file = "/etc/my.cnf"
if read_default_file:
if not read_default_group:
read_default_group = "client"
cfg = Parser()
cfg.read(os.path.expanduser(read_default_file))
def _config(key, arg):
if arg:
return arg
try:
return cfg.get(read_default_group, key)
except Exception:
return arg
user = _config("user", user)
password = _config("password", password)
host = _config("host", host)
database = _config("database", database)
unix_socket = _config("socket", unix_socket)
port = int(_config("port", port))
bind_address = _config("bind-address", bind_address)
charset = _config("default-character-set", charset)
if not ssl:
ssl = {}
if isinstance(ssl, dict):
for key in ["ca", "capath", "cert", "key", "cipher"]:
value = _config("ssl-" + key, ssl.get(key))
if value:
ssl[key] = value
self.ssl = False
if ssl:
if not SSL_ENABLED:
raise NotImplementedError("ssl module not found")
self.ssl = True
client_flag |= CLIENT.SSL
self.ctx = self._create_ssl_ctx(ssl)
self.host = host or "localhost"
self.port = port or 3306
self.user = user or DEFAULT_USER
self.password = password or b""
if isinstance(self.password, text_type):
self.password = self.password.encode('latin1')
self.db = database
self.unix_socket = unix_socket
self.bind_address = bind_address
if not (0 < connect_timeout <= 31536000):
raise ValueError("connect_timeout should be >0 and <=31536000")
self.connect_timeout = connect_timeout or None
if read_timeout is not None and read_timeout <= 0:
raise ValueError("read_timeout should be >= 0")
self._read_timeout = read_timeout
if write_timeout is not None and write_timeout <= 0:
raise ValueError("write_timeout should be >= 0")
self._write_timeout = write_timeout
if charset:
self.charset = charset
self.use_unicode = True
else:
self.charset = DEFAULT_CHARSET
self.use_unicode = False
if use_unicode is not None:
self.use_unicode = use_unicode
self.encoding = charset_by_name(self.charset).encoding
client_flag |= CLIENT.CAPABILITIES
if self.db:
client_flag |= CLIENT.CONNECT_WITH_DB
self.client_flag = client_flag
self.cursorclass = cursorclass
self._result = None
self._affected_rows = 0
self.host_info = "Not connected"
# specified autocommit mode. None means use server default.
self.autocommit_mode = autocommit
if conv is None:
conv = converters.conversions
# Need for MySQLdb compatibility.
self.encoders = {k: v for (k, v) in conv.items() if type(k) is not int}
self.decoders = {k: v for (k, v) in conv.items() if type(k) is int}
self.sql_mode = sql_mode
self.init_command = init_command
self.max_allowed_packet = max_allowed_packet
self._auth_plugin_map = auth_plugin_map or {}
self._binary_prefix = binary_prefix
self.server_public_key = server_public_key
self._connect_attrs = {
'_client_name': 'pymysql',
'_pid': str(os.getpid()),
'_client_version': VERSION_STRING,
}
if program_name:
self._connect_attrs["program_name"] = program_name
if defer_connect:
self._sock = None
else:
self.connect()
- 构建一个 Cursor 类对象,以与数据库对象进行交互:
class Cursor(object):
"""
This is the object you use to interact with the database.
Do not create an instance of a Cursor yourself. Call
connections.Connection.cursor().
See `Cursor <https://www.python.org/dev/peps/pep-0249/#cursor-objects>`_ in
the specification.
"""
#: Max statement size which :meth:`executemany` generates.
#:
#: Max size of allowed statement is max_allowed_packet - packet_header_size.
#: Default value of max_allowed_packet is 1048576.
max_stmt_length = 1024000
_defer_warnings = False
def __init__(self, connection):
self.connection = connection
self.description = None
self.rownumber = 0
self.rowcount = -1
self.arraysize = 1
self._executed = None
self._result = None
self._rows = None
self._warnings_handled = False
- 利用 execute 函数执行 MySQL 命令
- 利用 fetch 获取数据
- 使用 commit 提交更改
- 关闭数据库
需要注意的是:
- 与数据库交互是通过 cursor 实现的
- 数据库打开、关闭和更新则是通过 db 实现的
- 因为执行一些操作后,cursor 所指向的当前位置会发生变化,因此要注意 cursor 的当前位置
- 所有修改数据库的行为,如果要使修改生效,需要使用 commit 提交更改