mysql数据流处理,优化MySQL插入以处理数据流

I am consuming a high rate data stream and doing the following steps to store data in a MySQL database. For each new arriving item.

(1) Parse incoming item.

(2) Execute several "INSERT ... ON DUPLICATE KEY UPDATE"

I have used INSERT ... ON DUPLICATE KEY UPDATE to eliminate one additional round-trip to the database.

While trying to improve the overall performance, I have considered doing bulk updates in the following way:

(1) Parse incoming item.

(2) Generate SQL statement with "INSERT ... ON DUPLICATE KEY UPDATE" and append to a file.

Periodically flush the SQL statements in the file to the database.

Two questions:

(1) will this have a positive impact in the database load?

(2) how should I flush the statements to the database so that indices are only reconstructed after the complete flush? (using transactions?)

UPDATE: I am using Perl DBI + MySQL MyISAM.

Thanks in advance for any comments.

解决方案

You don't say what kind of database access environment (PERL DBI? JDBC? ODBC?) you're running in, or what kind of table storage engine (MyISAM? InnoDB?) you're using.

First of all, you're right to pick INSERT ... ON DUPLICATE KEY UPDATE. Good move, unless you can guarantee unique keys.

Secondly, if your database access environment allows it, you should use prepared statements. You definitely won't get good performance if you write a bunch of statements into a file, and then make a database client read the file once again. Do the INSERT operations directly from the software package that consumes the incoming data stream.

Thirdly, pick the right kind of table storage engine. MyISAM inserts are going to be faster than InnoDB, so if you're logging data and retrieving it later that will be a win. But InnoDB has better transactional integrity. If you're really handling tonnage of data, and you don't need to read it very often, consider the ARCHIVE storage engine.

Finally, consider doing a START TRANSACTION at the beginning of a batch of INSERT ... commands, then doing a COMMIT and another START TRANSACTION after a fixed number of rows, like 100 or so. If you're using InnoDB, this will speed things up a lot. If you're using MyISAM or ARCHIVE, it won't matter.

Your big wins will come from the prepared statement stuff and the best choice of storage engine.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值