目录
4. 从数据库读写数据-read_sql/to_sql
4.1 .read_sql() 语法
.read_sql 的用法比较简单,大概因为数据库的数据基本上都是比较规整的。参数具体用法请见范例。
Help on function read_sql in module pandas.io.sql: read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize: Union[int, NoneType] = None) -> Union[pandas.core.frame.DataFrame, Iterator[pandas.core.frame.DataFrame]] Read SQL query or database table into a DataFrame. This function is a convenience wrapper around ``read_sql_table`` and ``read_sql_query`` (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to ``read_sql_query``, while a database table name will be routed to ``read_sql_table``. Note that the delegated function might have more specific notes about their functionality not listed here. Parameters ---------- sql : str or SQLAlchemy Selectable (select or text object) SQL query to be executed or a table name. con : SQLAlchemy connectable, str, or sqlite3 connection Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable; str connections are closed automatically. See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_. index_col : str or list of str, optional, default: None Column(s) to set as index(MultiIndex). coerce_float : bool, default True Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets. params : list, tuple or dict, optional, default: None List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent. Check your database driver documentation for which of the five syntax styles, described in PEP 249's paramstyle, is supported. Eg. for psycopg2, uses %(name)s so use params={'name' : 'value'}. parse_dates : list or dict, default: None - List of column names to parse as dates. - Dict of ``{column_name: format string}`` where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps. - Dict of ``{column_name: arg dict}``, where the arg dict corresponds to the keyword arguments of :func:`pandas.to_datetime` Especially useful with databases without native Datetime support, such as SQLite. columns : list, default: None List of column names to select from SQL table (only used when reading a table). chunksize : int, default None If specified, return an iterator where `chunksize` is the number of rows to include in each chunk.
4.2 .read_sql() 范例
4.2.1 必不可少的 sql 和 con
如果大家直接之前曾经用 Python 操作过数据库,应该比较熟悉各种 Python 数据库接口。
Python 标准数据库接口为 Python DB-API,Python DB-API为开发人员提供了数据库应用编程接口。我比较喜欢MySQL 数据库,在 Python 2.* 阶段可以用第三方库 MySQLdb,在 3.* 阶段推荐使用第三方库 pymysql。我也使用的pymysql,后面所有代码都是如此。
sql : str or SQLAlchemy Selectable (select or text object) SQL query to be executed or a table name.
con : SQLAlchemy connectable, str, or sqlite3 connection Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable; str
参看我的本地数据库数据
代码很简单
conn=pymysql.connect(host="127.0.0.1",# IP 地址指向本地数据库,如果是远程数据库,
那么数据库配置应该允许远程访问
user="root", # 用户名
password="wxf123",# 密码
database="ivydb") # 数据库名
data=pd.read_sql('''SELECT * FROM human;''', # 简单明了的 sql 语句
con = conn) # 把数据库的入口传给它
data
运行结果
4.2.2 相似的 index_col
index_col : str or list of str, optional, default: None Column(s) to set as index(MultiIndex).
此处的 index_col 作用和 read_csv/read_excel等函数是一致的,但是需要注意,此处不接受数值,只接受字符串或者字符串的列表。大家想想数据库的特质就明白了,不管是 csv 还是 excel表或者 txt 文档都可能有很多不规范的数据,数据库基本上就是整整齐齐的了,每一列都有列名。
代码如下
data=pd.read_sql('''SELECT * FROM human;''', con = conn,index_col="id")
data
运行结果
4.2.3 很有用的 coerce_float
python从3.0开始去掉了coerce()等函数,但有些场合还是需要类似的函数。coerce_float是一个布尔值,默认为True,它将会非字符串,非数值型的对象,类似于decimal,转化为浮点数。
coerce_float : bool, default True Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
当前 Mysql 库数据如下:
大家可以看到 coerce_float 为 True 和为 False 结果是不一样的
4.2.4 特别的 params
params : list, tuple or dict, optional, default: None List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent. Check your database driver documentation for which of the five syntax styles, described in PEP 249's paramstyle, is supported. Eg. for psycopg2, uses %(name)s so use params={'name' : 'value'}.
params 参数可以是一个列表,元组或字典,默认是 None。这些被传递的参数用于执行某些策略。但是并非所有数据库驱动程序都支持所有这些可能性,支持哪种语法取决于您使用的驱动程序
当前 Mysql 数据库数据如下
代码如下
conn=pymysql.connect(host="127.0.0.1",
user="root",
password="wxf123",
database="ivydb")
# %(low)s and %(high)s 配合 params
data=pd.read_sql('''SELECT * FROM test_order WHERE cost BETWEEN %(low)s and %(high)s;''',
con = conn,
index_col="id",
params={"low":20,"high":600}
)
data
运行结果
4.2.5 有用的的 parse_dates
parse_dates : list or dict, default: None,列表或者字典,默认为 None
将某一列日期型String转换为datetime类型,与pd.to_datetime()函数功能类似。
- List of column names to parse as dates. 希望解析成dates的列名列表,表示以默认format转换这些列
- Dict of ``{column_name: format string}`` where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
``{column_name: format string}``格式的字典,将指定的列c转换为对应的格式f,一个format例子为"%Y:%m:%H:%M:%S"
- Dict of ``{column_name: arg dict}``, where the arg dict corresponds to the keyword arguments of :func:`pandas.to_datetime` Especially useful with databases without native Datetime support, such as SQLite.
4.2.6 相似的 columns
columns : list, default: None
List of column names to select from SQL table (only used when reading a table).
从SQL表中选择的列名列表(仅在读取表时使用)
一般没用,因为SQL中一般选定了要选择的列了
4.2.7 相似的 chunksize
chunksize : int, default None int
If specified, return an iterator where `chunksize` is the number of rows to include in each chunk.
类型,默认为 None,一旦指定,返回一个 迭代器,这个迭代器每个 chunk 内迭代 chunksize 个 行。
当前数据表
代码如下
conn=pymysql.connect(host="127.0.0.1",
user="root",
password="wxf123",
database="ivydb")
data=pd.read_sql('''SELECT * FROM human;''',
con = conn,
index_col="id",
chunksize=4
)
data
运行结果
让我们详细看看数据
代码
for tmp in data:
print(tmp)
运行结果
4.3 .to_sql() 语法
Help on function to_sql in module pandas.core.generic: to_sql(self, name: 'str', con, schema=None, if_exists: 'str' = 'fail', index: 'bool_t' = True, index_label=None, chunksize=None, dtype=None, method=None) -> 'None' Write records stored in a DataFrame to a SQL database. Databases supported by SQLAlchemy [1]_ are supported. Tables can be newly created, appended to, or overwritten. Parameters ---------- name : str Name of SQL table. con : sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_. schema : str, optional Specify the schema (if database flavor supports this). If None, use default schema. if_exists : {'fail', 'replace', 'append'}, default 'fail' How to behave if the table already exists. * fail: Raise a ValueError. * replace: Drop the table before inserting new values. * append: Insert new values to the existing table. index : bool, default True Write DataFrame index as a column. Uses `index_label` as the column name in the table. index_label : str or sequence, default None Column label for index column(s). If None is given (default) and `index` is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. chunksize : int, optional Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once. dtype : dict or scalar, optional Specifying the datatype for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns. method : {None, 'multi', callable}, optional Controls the SQL insertion clause used: * None : Uses standard SQL ``INSERT`` clause (one per row). * 'multi': Pass multiple values in a single ``INSERT`` clause. * callable with signature ``(pd_table, conn, keys, data_iter)``.
4.4 .to_sql() 范例
代码很简单
from sqlalchemy import create_engine
#engine = create_engine("mysql+pymysql://【此处填用户名】:【此处填密码】@【此处填host】:【此处填port】/【此处填数据库的名称】?charset=utf8")
engine = create_engine("mysql+pymysql://root:wxf123@localhost:3306/ivydb?charset=utf8", echo=True)
result=data.to_sql("human2", con = engine)
result
运行结果如下
2021-11-21 13:42:19,617 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'sql_mode' 2021-11-21 13:42:19,618 INFO sqlalchemy.engine.Engine [raw sql] {} 2021-11-21 13:42:19,626 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'lower_case_table_names' 2021-11-21 13:42:19,627 INFO sqlalchemy.engine.Engine [generated in 0.00130s] {} 2021-11-21 13:42:19,636 INFO sqlalchemy.engine.Engine SELECT DATABASE() 2021-11-21 13:42:19,638 INFO sqlalchemy.engine.Engine [raw sql] {} 2021-11-21 13:42:19,647 INFO sqlalchemy.engine.Engine SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = %(table_schema)s AND table_name = %(table_name)s 2021-11-21 13:42:19,648 INFO sqlalchemy.engine.Engine [generated in 0.00135s] {'table_schema': 'ivydb', 'table_name': 'human2'} 2021-11-21 13:42:19,657 INFO sqlalchemy.engine.Engine BEGIN (implicit) 2021-11-21 13:42:19,661 INFO sqlalchemy.engine.Engine CREATE TABLE human2 ( id BIGINT, title TEXT, age BIGINT, location TEXT, comment DATE ) 2021-11-21 13:42:19,662 INFO sqlalchemy.engine.Engine [no key 0.00123s] {} 2021-11-21 13:42:19,713 INFO sqlalchemy.engine.Engine CREATE INDEX ix_human2_id ON human2 (id) 2021-11-21 13:42:19,714 INFO sqlalchemy.engine.Engine [no key 0.00092s] {} 2021-11-21 13:42:19,756 INFO sqlalchemy.engine.Engine COMMIT 2021-11-21 13:42:19,760 INFO sqlalchemy.engine.Engine BEGIN (implicit) 2021-11-21 13:42:19,761 INFO sqlalchemy.engine.Engine INSERT INTO human2 (id, title, age, location, comment) VALUES (%(id)s, %(title)s, %(age)s, %(location)s, %(comment)s) 2021-11-21 13:42:19,763 INFO sqlalchemy.engine.Engine [generated in 0.00113s] ({'id': 1, 'title': 'Teacher', 'age': 36, 'location': 'Beijing', 'comment': datetime.date(1982, 1, 1)}, {'id': 2, 'title': 'NewMan', 'age': 3, 'location': 'Shanghai', 'comment': datetime.date(1983, 2, 1)}, {'id': 3, 'title': 'Policeman', 'age': 33, 'location': 'Beijing', 'comment': datetime.date(1984, 5, 9)}, {'id': 4, 'title': 'CodingMan', 'age': 32, 'location': 'Nanjing', 'comment': datetime.date(1982, 1, 1)}, {'id': 5, 'title': 'Officer', 'age': 28, 'location': 'GuangZhou', 'comment': datetime.date(1982, 1, 1)}, {'id': 6, 'title': 'Farmer', 'age': 54, 'location': 'ShangDong', 'comment': datetime.date(1982, 1, 1)}, {'id': 7, 'title': 'Trader', 'age': 31, 'location': 'GuangZhou', 'comment': datetime.date(2001, 8, 19)}, {'id': 8, 'title': 'Thief', 'age': 20, 'location': 'ShangDong', 'comment': datetime.date(1982, 1, 1)}, {'id': 9, 'title': 'Builder', 'age': 27, 'location': 'Beijing', 'comment': datetime.date(2007, 8, 19)}, {'id': 10, 'title': 'Singer', 'age': 22, 'location': 'Nanjing', 'comment': datetime.date(1982, 1, 1)}) 2021-11-21 13:42:19,766 INFO sqlalchemy.engine.Engine COMMIT
'''
要是大家觉得写得还行,麻烦点个赞或者收藏吧,想给博客涨涨人气,非常感谢!
'''