1. pandas.DataFrame.to_sql to MySQL
Until pandas include the option of INSERT IGNORE in the default pd.DataFrame.to_sql class, here is the best solution:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
# adds the word IGNORE after INSERT in sqlalchemy
@compiles(Insert)
def _prefix_insert_with_ignore(insert, compiler, **kw):
return compiler.visit_insert(insert.prefix_with('IGNORE'), **kw)
Now you can run your normal df.to_sql commands!
from sqlalchemy import create_engine
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
@compiles(Insert)
def _prefix_insert_with_ignore(insert, compiler, **kw):
return compiler.visit_insert(insert.prefix_with('IGNORE'), **kw)
df.index.rename('index_column', inplace=True)
sqlEngine = create_engine('mysql+pymysql://'+dbUser+':'+dbPass+'@'+dbHost+'/'+dbDbse, pool_recycle=3600)
dbConnection = sqlEngine.connect()
try: df.to_sql(dbTble, con=dbConnection, if_exists='append')
except ValueError as vx: print(vx)
except Exception as ex: print(ex)
else: print("Table %s created successfully."%dbTble)
2. SQL Server
If you really need to run multiple threads at the same time you can enable the ignore_dup_key
option on the primary key.
This will just give a warning instead of an error when an insert would result in a duplicate key violation. But instead of failing it will discard the row(s) that if inserted would cause the uniqueness violation.
CREATE TABLE [dbo].[DeleteMe](
[id] [uniqueidentifier] NOT NULL,
[Value] [varchar](max) NULL,
CONSTRAINT [PK_DeleteMe]
PRIMARY KEY ([id] ASC)
WITH (IGNORE_DUP_KEY = ON));
Paul White's detailed explanation on the IGNORE_DUP_KEY option. Thanks Paul.