本例背景为: 用PDI(Kettle) 向Mysql数据库导入大量的日志分析数据,开始导入的速度300+r/s,

通过设置如下JDBC的连接参数,明显提升了写入的速度。

useServerPrepStmts=false

rewriteBatchedStatements=true

useCompression=true


wKioL1UeXS6zrz_EAAE4JnWdwIo906.jpg


原理参考 :http://forums.pentaho.com/showthread.php?142217-Table-Output-Performance-MySQL#9


To remedy this, in PDI I create a separate, specialized Database Connection I use for batch inserts. Set these two MySQL-specific options on your Database Connection:

useServerPrepStmts false
rewriteBatchedStatements true

Used together, these "fake" batch inserts on the client. Specificially, the insert statements:

INSERT INTO t (c1,c2) VALUES ('One',1);
INSERT INTO t (c1,c2) VALUES ('Two',2);
INSERT INTO t (c1,c2) VALUES ('Three',3);

will be rewritten into:

INSERT INTO t (c1,c2) VALUES ('One',1),('Two',2),('Three',3);

So that the batched rows will be inserted with one statement (and one network round-trip). With this simple change, Table Output is very fast and close to performance of the bulk loader steps.