有个任务需要通过pandas向oracle写数据
最开始没加dtype,发现to_sql很慢,几百条数据都要十多秒;而且有时候会有如下莫名其妙的报错,但仔细检查数据发现数据是没问题的
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
raise exception
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1752, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/oracle/cx_oracle.py", line 1347, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00932: inconsistent datatypes: expected DATE got CLOB
后面加上 to_sql 中加上 dtype 参数后,就快非常快了,上万条数据不到1s,而且同样的数据也没报错了。
解决问题时,发现几篇相关文章,记录下
-
Pandas to_sql详解
https://blog.csdn.net/weixin_44278512/article/details/102959370 -
嫌pandas的to_sql方法过慢?神方法让你一分钟搞定
https://blog.csdn.net/chenKFKevin/article/details/72911525
import cStringIO
output = cStringIO.StringIO()
# ignore the index
df_a.to_csv(output, sep='\t',index = False, header = False)
output.getvalue()
# jump to start of stream
output.seek(0)
connection = engine.raw_connection() #engine 是 from sqlalchemy import create_engine
cursor = connection.cursor()
# null value become ''
cursor.copy_from(output,table_name,null='')
connection.commit()
cursor.close()
这个好像是专门针对postgresql 的
https://gist.github.com/catawbasam/3164289
- 关于pandas中to_sql性能太慢的优化
https://blog.csdn.net/weixin_44313745/article/details/126994182