Pandas 模块 - 读写(4)-从数据库读写数据-read_sql/to_sql

江南野栀子

已于 2024-04-17 15:55:46 修改

阅读量7.5k

点赞数 8

分类专栏： # Python 数据分析文章标签： python 数据分析 pandas

于 2021-11-21 13:48:23 首次发布

本文链接：https://blog.csdn.net/u010701274/article/details/121435488

版权

Python 数据分析专栏收录该内容

22 篇文章 28 订阅

订阅专栏

4. 从数据库读写数据-read_sql/to_sql

4.1 .read_sql() 语法

4.2 .read_sql() 范例

4.2.1 必不可少的 sql 和 con

4.2.2 相似的 index_col

4.2.3 很有用的 coerce_float

4.2.4 特别的 params

4.2.5 有用的的 parse_dates

4. 从数据库读写数据-read_sql/to_sql

4.1 .read_sql() 语法

.read_sql 的用法比较简单，大概因为数据库的数据基本上都是比较规整的。参数具体用法请见范例。

Help on function read_sql in module pandas.io.sql:

read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize: Union[int, NoneType] = None) -> Union[pandas.core.frame.DataFrame, Iterator[pandas.core.frame.DataFrame]]
    Read SQL query or database table into a DataFrame.
    
    This function is a convenience wrapper around ``read_sql_table`` and
    ``read_sql_query`` (for backward compatibility). It will delegate
    to the specific function depending on the provided input. A SQL query
    will be routed to ``read_sql_query``, while a database table name will
    be routed to ``read_sql_table``. Note that the delegated function might
    have more specific notes about their functionality not listed here.
    
    Parameters
    ----------
    sql : str or SQLAlchemy Selectable (select or text object)
        SQL query to be executed or a table name.
    con : SQLAlchemy connectable, str, or sqlite3 connection
        Using SQLAlchemy makes it possible to use any DB supported by that
        library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible
        for engine disposal and connection closure for the SQLAlchemy connectable; str
        connections are closed automatically. See
        `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_.
    index_col : str or list of str, optional, default: None
        Column(s) to set as index(MultiIndex).
    coerce_float : bool, default True
        Attempts to convert values of non-string, non-numeric objects (like
        decimal.Decimal) to floating point, useful for SQL result sets.
    params : list, tuple or dict, optional, default: None
        List of parameters to pass to execute method.  The syntax used
        to pass parameters is database driver dependent. Check your
        database driver documentation for which of the five syntax styles,
        described in PEP 249's paramstyle, is supported.
        Eg. for psycopg2, uses %(name)s so use params={'name' : 'value'}.
    parse_dates : list or dict, default: None
        - List of column names to parse as dates.
        - Dict of ``{column_name: format string}`` where format string is
          strftime compatible in case of parsing string times, or is one of
          (D, s, ns, ms, us) in case of parsing integer timestamps.
        - Dict of ``{column_name: arg dict}``, where the arg dict corresponds
          to the keyword arguments of :func:`pandas.to_datetime`
          Especially useful with databases without native Datetime support,
          such as SQLite.
    columns : list, default: None
        List of column names to select from SQL table (only used when reading
        a table).
    chunksize : int, default None
        If specified, return an iterator where `chunksize` is the
        number of rows to include in each chunk.

4.2 .read_sql() 范例

4.2.1 必不可少的 sql 和 con

如果大家直接之前曾经用 Python 操作过数据库，应该比较熟悉各种 Python 数据库接口。

Python 标准数据库接口为 Python DB-API，Python DB-API为开发人员提供了数据库应用编程接口。我比较喜欢MySQL 数据库，在 Python 2.* 阶段可以用第三方库 MySQLdb，在 3.* 阶段推荐使用第三方库 pymysql。我也使用的pymysql，后面所有代码都是如此。

sql : str or SQLAlchemy Selectable (select or text object) SQL query to be executed or a table name.

con : SQLAlchemy connectable, str, or sqlite3 connection Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable; str

参看我的本地数据库数据

代码很简单

conn=pymysql.connect(host="127.0.0.1",# IP 地址指向本地数据库，如果是远程数据库，
                                        那么数据库配置应该允许远程访问
                    user="root", # 用户名
                    password="wxf123",# 密码
                    database="ivydb") # 数据库名
data=pd.read_sql('''SELECT * FROM  human;''', # 简单明了的 sql 语句
                con = conn) # 把数据库的入口传给它
data

运行结果

4.2.2 相似的 index_col

index_col : str or list of str, optional, default: None Column(s) to set as index(MultiIndex).

此处的 index_col 作用和 read_csv/read_excel等函数是一致的，但是需要注意，此处不接受数值，只接受字符串或者字符串的列表。大家想想数据库的特质就明白了，不管是 csv 还是 excel表或者 txt 文档都可能有很多不规范的数据，数据库基本上就是整整齐齐的了，每一列都有列名。

代码如下

data=pd.read_sql('''SELECT * FROM  human;''', con = conn,index_col="id")
data

运行结果

4.2.3 很有用的 coerce_float

python从3.0开始去掉了coerce()等函数，但有些场合还是需要类似的函数。coerce_float是一个布尔值，默认为True，它将会非字符串，非数值型的对象，类似于decimal，转化为浮点数。

coerce_float : bool, default True Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.

当前 Mysql 库数据如下：

大家可以看到 coerce_float 为 True 和为 False 结果是不一样的

4.2.4 特别的 params

params : list, tuple or dict, optional, default: None
        List of parameters to pass to execute method.  The syntax used
        to pass parameters is database driver dependent. Check your
        database driver documentation for which of the five syntax styles,
        described in PEP 249's paramstyle, is supported.
        Eg. for psycopg2, uses %(name)s so use params={'name' : 'value'}.

params 参数可以是一个列表，元组或字典，默认是 None。这些被传递的参数用于执行某些策略。但是并非所有数据库驱动程序都支持所有这些可能性，支持哪种语法取决于您使用的驱动程序

当前 Mysql 数据库数据如下

代码如下

conn=pymysql.connect(host="127.0.0.1",
                     user="root",
                     password="wxf123",
                     database="ivydb")
#  %(low)s and %(high)s 配合 params 
data=pd.read_sql('''SELECT * FROM  test_order WHERE cost BETWEEN %(low)s and %(high)s;''', 
                 con = conn,
                 index_col="id",
                 params={"low":20,"high":600}
                )
data

运行结果

4.2.5 有用的的 parse_dates

parse_dates : list or dict, default: None，列表或者字典，默认为 None

将某一列日期型String转换为datetime类型，与pd.to_datetime()函数功能类似。

- List of column names to parse as dates. 希望解析成dates的列名列表，表示以默认format转换这些列

- Dict of ``{column_name: format string}`` where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

``{column_name: format string}``格式的字典，将指定的列c转换为对应的格式f，一个format例子为"%Y:%m:%H:%M:%S"

- Dict of ``{column_name: arg dict}``, where the arg dict corresponds to the keyword arguments of :func:`pandas.to_datetime` Especially useful with databases without native Datetime support, such as SQLite.

4.2.6 相似的 columns

columns : list, default: None

List of column names to select from SQL table (only used when reading a table).

从SQL表中选择的列名列表（仅在读取表时使用）

一般没用，因为SQL中一般选定了要选择的列了

4.2.7 相似的 chunksize

chunksize : int, default None int

If specified, return an iterator where `chunksize` is the number of rows to include in each chunk.

类型，默认为 None，一旦指定，返回一个迭代器，这个迭代器每个 chunk 内迭代 chunksize 个行。

当前数据表

代码如下

conn=pymysql.connect(host="127.0.0.1",
                     user="root",
                     password="wxf123",
                     database="ivydb")

data=pd.read_sql('''SELECT * FROM  human;''', 
                 con = conn,
                 index_col="id",
                chunksize=4
                )
data

运行结果

让我们详细看看数据

代码

for tmp in data:
    print(tmp)

运行结果

4.3 .to_sql() 语法

Help on function to_sql in module pandas.core.generic:

to_sql(self, name: 'str', con, schema=None, if_exists: 'str' = 'fail', index: 'bool_t' = True, index_label=None, chunksize=None, dtype=None, method=None) -> 'None'
Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1]_ are supported. Tables can be
newly created, appended to, or overwritten.

Parameters
----------
name : str
Name of SQL table.
con : sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_.

schema : str, optional
Specify the schema (if database flavor supports this). If None, use
default schema.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
How to behave if the table already exists.

* fail: Raise a ValueError.
* replace: Drop the table before inserting new values.
* append: Insert new values to the existing table.

index : bool, default True
Write DataFrame index as a column. Uses `index_label` as the column
name in the table.
index_label : str or sequence, default None
Column label for index column(s). If None is given (default) and
`index` is True, then the index names are used.
A sequence should be given if the DataFrame uses MultiIndex.
chunksize : int, optional
Specify the number of rows in each batch to be written at a time.
By default, all rows will be written at once.
dtype : dict or scalar, optional
Specifying the datatype for columns. If a dictionary is used, the
keys should be the column names and the values should be the
SQLAlchemy types or strings for the sqlite3 legacy mode. If a
scalar is provided, it will be applied to all columns.
method : {None, 'multi', callable}, optional
Controls the SQL insertion clause used:

* None : Uses standard SQL ``INSERT`` clause (one per row).
* 'multi': Pass multiple values in a single ``INSERT`` clause.
* callable with signature ``(pd_table, conn, keys, data_iter)``.

4.4 .to_sql() 范例

代码很简单

from sqlalchemy import create_engine
#engine = create_engine("mysql+pymysql://【此处填用户名】:【此处填密码】@【此处填host】:【此处填port】/【此处填数据库的名称】?charset=utf8")
engine = create_engine("mysql+pymysql://root:wxf123@localhost:3306/ivydb?charset=utf8", echo=True)
result=data.to_sql("human2", con = engine)
result

运行结果如下

2021-11-21 13:42:19,617 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'sql_mode'
2021-11-21 13:42:19,618 INFO sqlalchemy.engine.Engine [raw sql] {}
2021-11-21 13:42:19,626 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2021-11-21 13:42:19,627 INFO sqlalchemy.engine.Engine [generated in 0.00130s] {}
2021-11-21 13:42:19,636 INFO sqlalchemy.engine.Engine SELECT DATABASE()
2021-11-21 13:42:19,638 INFO sqlalchemy.engine.Engine [raw sql] {}
2021-11-21 13:42:19,647 INFO sqlalchemy.engine.Engine SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = %(table_schema)s AND table_name = %(table_name)s
2021-11-21 13:42:19,648 INFO sqlalchemy.engine.Engine [generated in 0.00135s] {'table_schema': 'ivydb', 'table_name': 'human2'}
2021-11-21 13:42:19,657 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2021-11-21 13:42:19,661 INFO sqlalchemy.engine.Engine 
CREATE TABLE human2 (
	id BIGINT, 
	title TEXT, 
	age BIGINT, 
	location TEXT, 
	comment DATE
)


2021-11-21 13:42:19,662 INFO sqlalchemy.engine.Engine [no key 0.00123s] {}
2021-11-21 13:42:19,713 INFO sqlalchemy.engine.Engine CREATE INDEX ix_human2_id ON human2 (id)
2021-11-21 13:42:19,714 INFO sqlalchemy.engine.Engine [no key 0.00092s] {}
2021-11-21 13:42:19,756 INFO sqlalchemy.engine.Engine COMMIT
2021-11-21 13:42:19,760 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2021-11-21 13:42:19,761 INFO sqlalchemy.engine.Engine INSERT INTO human2 (id, title, age, location, comment) VALUES (%(id)s, %(title)s, %(age)s, %(location)s, %(comment)s)
2021-11-21 13:42:19,763 INFO sqlalchemy.engine.Engine [generated in 0.00113s] ({'id': 1, 'title': 'Teacher', 'age': 36, 'location': 'Beijing', 'comment': datetime.date(1982, 1, 1)}, {'id': 2, 'title': 'NewMan', 'age': 3, 'location': 'Shanghai', 'comment': datetime.date(1983, 2, 1)}, {'id': 3, 'title': 'Policeman', 'age': 33, 'location': 'Beijing', 'comment': datetime.date(1984, 5, 9)}, {'id': 4, 'title': 'CodingMan', 'age': 32, 'location': 'Nanjing', 'comment': datetime.date(1982, 1, 1)}, {'id': 5, 'title': 'Officer', 'age': 28, 'location': 'GuangZhou', 'comment': datetime.date(1982, 1, 1)}, {'id': 6, 'title': 'Farmer', 'age': 54, 'location': 'ShangDong', 'comment': datetime.date(1982, 1, 1)}, {'id': 7, 'title': 'Trader', 'age': 31, 'location': 'GuangZhou', 'comment': datetime.date(2001, 8, 19)}, {'id': 8, 'title': 'Thief', 'age': 20, 'location': 'ShangDong', 'comment': datetime.date(1982, 1, 1)}, {'id': 9, 'title': 'Builder', 'age': 27, 'location': 'Beijing', 'comment': datetime.date(2007, 8, 19)}, {'id': 10, 'title': 'Singer', 'age': 22, 'location': 'Nanjing', 'comment': datetime.date(1982, 1, 1)})
2021-11-21 13:42:19,766 INFO sqlalchemy.engine.Engine COMMIT

'''

要是大家觉得写得还行，麻烦点个赞或者收藏吧，想给博客涨涨人气，非常感谢！

'''

江南野栀子

关注

8
点赞
踩
18

收藏

觉得还不错? 一键收藏
打赏
0
评论
Pandas 模块 - 读写(4)-从数据库读写数据-read_sql/to_sql

4.从数据库读写数据-read_sql/to_sql4.1 .read_sql() 语法.read_sql 的用法比较简单，大概因为数据库的数据基本上都是比较规整的。参数具体用法请见范例。Help on function read_sql in module pandas.io.sql:read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, ...
复制链接

扫一扫