Python酷库之旅-第三方库Pandas(015)

最新推荐文章于 2024-11-14 08:53:59 发布

神奇夜光杯

最新推荐文章于 2024-11-14 08:53:59 发布

阅读量945

点赞数 43

分类专栏： Myelsa的Python酷库之旅文章标签： python pandas 开发语言标准库及第三方库基础知识学习与成长

本文链接：https://blog.csdn.net/ygb_1024/article/details/140332564

版权

Myelsa的Python酷库之旅专栏收录该内容

266 篇文章 79 订阅

订阅专栏

一、用法精讲

37、pandas.read_sql函数

37-1、语法

37-2、参数

37-3、功能

37-4、返回值

37-5、说明

37-6、用法

37-6-1、数据准备

37-6-2、代码示例

37-6-3、结果输出

38、pandas.DataFrame.to_sql函数

38-1、语法

38-2、参数

38-3、功能

38-4、返回值

38-5、说明

38-6、用法

一、用法精讲

37、pandas.read_sql函数

37-1、语法

# 37、pandas.read_sql函数
pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None, dtype_backend=_NoDefault.no_default, dtype=None)
Read SQL query or database table into a DataFrame.

This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to read_sql_query, while a database table name will be routed to read_sql_table. Note that the delegated function might have more specific notes about their functionality not listed here.

Parameters:
sqlstr or SQLAlchemy Selectable (select or text object)
SQL query to be executed or a table name.

conADBC Connection, SQLAlchemy connectable, str, or sqlite3 connection
ADBC provides high performance I/O with native type support, where available. Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible for engine disposal and connection closure for the ADBC connection and SQLAlchemy connectable; str connections are closed automatically. See here.

index_colstr or list of str, optional, default: None
Column(s) to set as index(MultiIndex).

coerce_floatbool, default True
Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.

paramslist, tuple or dict, optional, default: None
List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent. Check your database driver documentation for which of the five syntax styles, described in PEP 249’s paramstyle, is supported. Eg. for psycopg2, uses %(name)s so use params={‘name’ : ‘value’}.

parse_dateslist or dict, default: None
List of column names to parse as dates.

Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.

columnslist, default: None
List of column names to select from SQL table (only used when reading a table).

chunksizeint, default None
If specified, return an iterator where chunksize is the number of rows to include in each chunk.

dtype_backend{‘numpy_nullable’, ‘pyarrow’}, default ‘numpy_nullable’
Back-end data type applied to the resultant DataFrame (still experimental). Behaviour is as follows:

"numpy_nullable": returns nullable-dtype-backed DataFrame (default).

"pyarrow": returns pyarrow-backed nullable ArrowDtype DataFrame.

New in version 2.0.

dtypeType name or dict of columns
Data type for data or columns. E.g. np.float64 or {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}. The argument is ignored if a table is passed instead of a query.

New in version 2.0.0.

Returns:
DataFrame or Iterator[DataFrame].

37-2、参数

37-2-1、sql(必须)：要执行的SQL查询或SQLAlchemy可选对象(如表名或查询)。如果是表名，则通常与read_sql_table函数一起使用，但也可以在某些数据库引擎中通过read_sql执行，具体取决于引擎的实现。

37-2-2、con(必须)：数据库连接对象，通常是SQLAlchemy的engine或DB-API 2.0的连接对象，对于SQLite，还可以直接传递连接字符串。

37-2-3、index_col(可选，默认值为None)：将一列或多列作为返回的DataFrame的索引。默认为None，即不设置索引。

37-2-4、coerce_float(可选，默认值为True)：尝试将非字符串、非数字对象转换为浮点数。

37-2-5、params(可选，默认值为None)：用于SQL查询中的参数替换的列表、元组或字典，这有助于防止SQL注入攻击，并允许安全地传递查询参数。

37-2-6、parse_dates(可选，默认值为None)：指定要解析为日期时间类型的列，可以是一个列名的列表，也可以是一个字典，其中键是列名，值是要用于解析日期的格式字符串。

37-2-7、columns(可选，默认值为None)：要从查询结果中选择的列名列表，这允许你仅加载查询结果中的一部分列。

37-2-8、chunksize(可选，默认值为None)：如果指定了，则返回一个迭代器，该迭代器每次产生指定大小的DataFrame块，这对于处理大型数据集非常有用；如果为None(默认值)，则一次性返回整个DataFrame。

37-2-9、dtype_backend(可选)：内部参数，通常不需要用户指定。

37-2-10、dtype(可选，默认值为None)：一个字典，用于指定列的数据类型，键是列名，值是你希望该列具有的数据类型(如np.float64、str等)。

37-3、功能

用于从数据库中读取数据并将其转换为pandas DataFrame的函数。

37-4、返回值

返回值是一个pandas DataFrame对象。

37-5、说明

37-5-1、在使用pandas.read_sql时，请确保你已经安装了pandas库以及适用于目标数据库的接口库。

37-5-2、对于复杂的SQL查询，建议使用pandas.read_sql_query，因为它更直接地支持SQL查询字符串，并且通常更清楚地表达了你的意图。

37-5-3、如果sql参数是一个表名而不是SQL查询，那么你要考虑使用pandas.read_sql_table函数，它专门为读取整个表而设计。然而，请注意，并非所有数据库引擎都支持通过read_sql_table读取表，而且某些情况下你可以使用read_sql来达到相同的目的。

37-6、用法

37-6-1、数据准备

# 确保已经安装了sqlalchemy库
# 1、创建数据库表
from sqlalchemy import create_engine, Column, Integer, String, MetaData
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
# 数据库连接配置（使用 SQLAlchemy 的 URI 格式）
DATABASE_URI = 'mysql+pymysql://root:123456@127.0.0.1/test_database?charset=utf8mb4'
# 创建一个引擎实例
engine = create_engine(DATABASE_URI, echo=True)  # echo=True 用于显示生成的 SQL 语句，调试时可以打开
# 创建基类
Base = declarative_base()
# 定义模型类
class MyElsaTable(Base):
    __tablename__ = 'myelsa_table'
    name = Column(String(255), nullable=False)
    ID_Card = Column(String(255), primary_key=True)  # 设置为主键
    age = Column(Integer, nullable=False)
    city = Column(String(255), nullable=False)
# 创建表（如果表不存在）
Base.metadata.create_all(engine)
# 如果你想要使用 ORM 来进行操作，可以创建一个 session 类
Session = sessionmaker(bind=engine)
session = Session()
# 这里不需要执行 SQL 语句或提交更改，因为 create_all 方法会自动处理
# 关闭 session（如果需要的话，但在这种情况下我们并没有进行任何 ORM 操作）
# session.close()
print("Table myelsa_table created successfully!")

# 2、在数据库表中新增记录
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.exc import SQLAlchemyError
# 定义基类
Base = declarative_base()
# 定义数据库模型类
class MyElsaTable(Base):
    __tablename__ = 'myelsa_table'
    ID_Card = Column(String, primary_key=True)
    name = Column(String)
    age = Column(Integer)
    city = Column(String)
    def __repr__(self):
        return f"<MyElsaTable(ID_Card={self.ID_Card}, name={self.name}, age={self.age}, city={self.city})>"
# 数据库连接配置
config = {
    'username': 'root',  # 替换为你的MySQL用户名
    'password': '123456',  # 替换为你的MySQL密码
    'host': '127.0.0.1',  # 如果数据库在远程服务器上，请替换为相应的主机名或IP地址
    'database': 'test_database',  # 数据库名
}
# 创建数据库引擎
engine = create_engine(
    f'mysql+pymysql://{config["username"]}:{config["password"]}@{config["host"]}/{config["database"]}')
# 确保所有表都已创建（可选）
Base.metadata.create_all(engine)
# 创建会话类
Session = sessionmaker(bind=engine)
# 定义要插入的数据
new_record = {
    'name': 'Myelsa',
    'ID_Card': '443689564710526448',
    'age': 18,
    'city': 'Guangzhou'
}
try:
    # 使用上下文管理器自动管理会话
    with Session() as session:
        # 创建新的模型实例
        new_entry = MyElsaTable(**new_record)
        # 将新实例添加到会话中
        session.add(new_entry)
        # 提交更改
        session.commit()
        print("Record inserted successfully!")
except SQLAlchemyError as e:
    print(f"Error: '{e}'")
    # 在使用上下文管理器时，无需显式回滚，因为上下文管理器会在退出时处理它

37-6-2、代码示例

# 37、pandas.read_sql函数
import pandas as pd
from sqlalchemy import create_engine
connection_string = 'mysql+pymysql://root:123456@127.0.0.1/test_database'
# 创建SQLAlchemy Engine
engine = create_engine(connection_string, echo=False)  # echo=True 可以打印出所有执行的 SQL 语句
# SQL查询
sql_query = "SELECT * FROM myelsa_table"
# 使用pandas.read_sql读取数据
# 注意：这里我们将engine传递给read_sql而不是连接对象
df = pd.read_sql(sql_query, engine)
# 显示DataFrame
print(df)
# 通常不需要手动关闭 Engine，因为SQLAlchemy会管理连接池
# 但是，如果你想要显式地关闭它（尽管这通常不是必需的），可以这样做：
# engine.dispose()

37-6-3、结果输出

# 37、pandas.read_sql函数
#      name             ID_Card  age       city
# 0  Myelsa  443689564710526448   18  Guangzhou

38、pandas.DataFrame.to_sql函数

38-1、语法

# 38、pandas.DataFrame.to_sql函数
DataFrame.to_sql(name, con, *, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.

Parameters:
namestr
Name of SQL table.

consqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable. See here. If passing a sqlalchemy.engine.Connection which is already in a transaction, the transaction will not be committed. If passing a sqlite3.Connection, it will not be possible to roll back the record insertion.

schemastr, optional
Specify the schema (if database flavor supports this). If None, use default schema.

if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’
How to behave if the table already exists.

fail: Raise a ValueError.

replace: Drop the table before inserting new values.

append: Insert new values to the existing table.

indexbool, default True
Write DataFrame index as a column. Uses index_label as the column name in the table. Creates a table index for this column.

index_labelstr or sequence, default None
Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

chunksizeint, optional
Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.

dtypedict or scalar, optional
Specifying the datatype for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns.

method{None, ‘multi’, callable}, optional
Controls the SQL insertion clause used:

None : Uses standard SQL INSERT clause (one per row).

‘multi’: Pass multiple values in a single INSERT clause.

callable with signature (pd_table, conn, keys, data_iter).

Details and a sample callable implementation can be found in the section insert method.

Returns:
None or int
Number of rows affected by to_sql. None is returned if the callable passed into method does not return an integer number of rows.

The number of returned rows affected is the sum of the rowcount attribute of sqlite3.Cursor or SQLAlchemy connectable which may not reflect the exact number of written rows as stipulated in the sqlite3 or SQLAlchemy.

New in version 1.4.0.

Raises:
ValueError
When the table already exists and if_exists is ‘fail’ (the default).

38-2、参数

38-2-1、name(必须)：字符串，表示要写入数据的SQL表名，如果表不存在且if_exists参数允许，则会创建该表。

38-2-2、con(必须)：连接到数据库的对象，这可以是SQLAlchemy的Engine对象，也可以是DB-API 2.0连接对象(但请注意，pandas官方推荐使用SQLAlchemy)，对于sqlite3，可以直接传递sqlite3的连接对象。

38-2-3、schema(可选，默认值为None)：指定表所在的数据库模式(schema)名称，对于不支持模式的数据库(如SQLite)，此参数将被忽略。

38-2-4、if_exists(可选，默认值为'fail')：当表已存在时的行为：
38-2-4-1、'fail'：如果表存在，则引发ValueError。
38-2-4-2、'replace'：如果表存在，则替换它(即删除旧表并创建新表)。
38-2-4-3、'append'：如果表存在，则将数据追加到表中。

38-2-5、index(可选，默认值为True)：是否将DataFrame的索引作为一列写入到SQL表中。如果为True，则DataFrame的索引会被写入到表中，通常作为名为index或由index_label指定的列名。

38-2-6、index_label(可选，默认值为None)：如果index=True，则指定用作索引列名称的字符串或字符串序列。如果DataFrame的索引是多级索引，并且index_label是单个字符串，则该名称将应用于所有索引级别；如果index=False，则忽略此参数。

38-2-7、chunksize(可选，默认值为None)：如果不是None，则分批写入数据，这个参数指定了每批写入的行数，这对于处理大量数据或内存受限的环境非常有用。

38-2-8、dtype(可选，默认值为None)：用于指定列的数据类型，如果提供，则必须是一个字典，其键是列名，值是SQLAlchemy类型或SQL字符串(如'VARCHAR(50)')；如果未提供，则pandas将尝试从DataFrame的dtype中推断数据类型。

38-2-9、method(可选，默认值为None)：自定义写入数据到SQL表的函数。如果提供，则忽略chunksize参数，此方法应接受一个连接对象和一个DataFrame(可选地，还有表名和if_exists参数)，并自行处理数据写入，这提供了对数据写入过程的完全控制。

38-3、功能

将pandas的DataFrame对象中的数据写入到SQL数据库中指定的表中。

38-4、返回值

没有返回值(即返回None)，因为它的主要作用是执行写入操作，而不是返回数据。

38-5、说明

无

38-6、用法

38-6-1、数据准备

无

38-6-2、代码示例

# 38、pandas.DataFrame.to_sql函数
import pandas as pd
from sqlalchemy import create_engine
# 创建示例DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
engine = create_engine('mysql+pymysql://root:123456@127.0.0.1/test_database')
# 将DataFrame写入MySQL表，如果表不存在则创建
# 注意：这里的表名是'my_table'，你可以根据需要修改它
df.to_sql('my_table', con=engine, if_exists='replace', index=False)
# 注意：if_exists='replace'会删除旧表并创建新表，这可能会导致数据丢失。
# 如果你只想在表不存在时创建表，请使用if_exists='fail'（默认），
# 或者在表已存在时追加数据，请使用if_exists='append'。
print("数据已成功写入MySQL数据库中的'my_table'表！")

38-6-3、结果输出

# 使用pandas.read_sql函数查询
import pandas as pd
from sqlalchemy import create_engine
connection_string = 'mysql+pymysql://root:123456@127.0.0.1/test_database'
# 创建SQLAlchemy Engine
engine = create_engine(connection_string, echo=False)  # echo=True 可以打印出所有执行的 SQL 语句
# SQL查询
sql_query = "SELECT * FROM my_table"
# 使用pandas.read_sql读取数据
# 注意：这里我们将engine传递给read_sql而不是连接对象
df = pd.read_sql(sql_query, engine)
# 显示DataFrame
print(df)
# 通常不需要手动关闭 Engine，因为SQLAlchemy会管理连接池
# 但是，如果你想要显式地关闭它（尽管这通常不是必需的），可以这样做：
# engine.dispose()

#     Name  Age      City
# 0   John   28  New York
# 1   Anna   34     Paris
# 2  Peter   29    Berlin
# 3  Linda   32    London