python 快速写入postgresql数据库方法

SQLAlchemy与Psycopg2实战

最新推荐文章于 2025-09-26 11:11:07 发布

原创最新推荐文章于 2025-09-26 11:11:07 发布 · 1.5w 阅读

44 ·

CC 4.0 BY-SA版权

一种是导入sqlalchemy包，另一种是导入psycopg2包。
具体用法如下（此处以postgre数据库举例）
第一种：

# 导入包
from sqlalchemy import create_engine
import pandas as pd
from string import Template
engine = create_engine("oracle://user:pwd@***:***/racdb", echo=False)
# 初始化引擎
engine = create_engine('postgresql+psycopg2://' + pg_username + ':' + pg_password + '@' + pg_host + ':' + str(
    pg_port) + '/' + pg_database)
query_sql = """
      select * from $arg1
      """
query_sql = Template(query_sql) # template方法
df = pd.read_sql_query(query_sql .substitute(arg1=tablename),engine) # 配合pandas的方法读取数据库值
# 配合pandas的to_sql方法使用十分方便（dataframe对象直接入库）
df.to_sql(table, engine, if_exists='replace', index=False) #覆盖入库
df.to_sql(table, engine, if_exists='append', index=False)  #增量入库

注意：上述df.to_sql的方法实在是太慢太慢了，千万的数据chunksize设置为万，上传了5个小时郁闷。查资料后得知以下方法：速度极快！！！！！

def write_to_table(df, table_name, if_exists='fail'):
    import io
    import pandas as pd
    from sqlalchemy import create_engine
    db_engine = create_engine('postgresql://***:***@***:***/***')# 初始化引擎
    string_data_io = io.StringIO()
    df.to_csv(string_data_io, sep='|', index=False)
    pd_sql_engine = pd.io.sql.pandasSQL_builder(db_engine)
    table = pd.io.sql.SQLTable(table_name, pd_sql_engine, frame=df,
                               index=False, if_exists=if_exists,schema = 'goods_code')
    table.create()
    string_data_io.seek(0)
    string_data_io.readline()  # remove header
    with db_engine.connect() as connection:
        with connection.connection.cursor() as cursor:
            copy_cmd = "COPY goods_code.%s FROM STDIN HEADER DELIMITER '|' CSV" %table_name
            cursor.copy_expert(copy_cmd, string_data_io)
        connection.connection.commit()

9 条评论

m0_68540599 2023.04.17
库中有表时能自动对齐字段吗

haishigecainiao 2021.10.21
刚才验证了第二种方法，同时用这个方法，入库会少掉第一行（非表头），[code=python] string_data_io.readline() copy_cmd = "COPY family_mart.%s FROM STDIN DELIMITER '|' CSV HEADER" %table_name [/code] string_data_io.readline() 可以注释掉

Kandw 2021.05.07
[code=python] pd_sql_engine = pd.io.sql.pandasSQL_builder(db_engine) table = pd.io.sql.SQLTable(table_name, pd_sql_engine, frame=df, index=False, if_exists=if_exists,schema = 'goods_code') table.create() [/code] 这部分似乎没有用？另外 [code=python] string_data_io.readline() # remove header [/code] 似乎也可以不用，添加上去会导致直接跳过第一行，而不是变量的header部分
- Kandw回复AI仙人掌 2021.05.07
  又验证一遍，table.create() 这段确实需要，在库中没有表格情况下创建一个新的表格； string_data_io.readline() 可以不用，实际写入数据时候并没有将表头写入
- AI仙人掌回复Kandw 2021.05.07
  创建表，以及跳过表头，不是变量header

温庭筠 2019.10.25
的确很快，不过schema = 'goods_code'和goods_code.%s 的‘goods_code’需要改一下，postgres默认是public模式
- AI仙人掌回复你送赠的非积蓄买得到 2021.05.07
  正解，逻辑库名称
- 你送赠的非积蓄买得到回复温庭筠 2020.07.16
  [reply]qq_42758746[/reply]他这个goods_code是作者自己的指定的模式，或者说逻辑数据库