mysql数据流处理,优化MySQL插入以处理数据流

最新推荐文章于 2024-06-21 09:49:33 发布

大明少女花木槿

最新推荐文章于 2024-06-21 09:49:33 发布

阅读量112

点赞数

文章标签： mysql数据流处理

I am consuming a high rate data stream and doing the following steps to store data in a MySQL database. For each new arriving item.

(1) Parse incoming item.

(2) Execute several "INSERT ... ON DUPLICATE KEY UPDATE"

I have used INSERT ... ON DUPLICATE KEY UPDATE to eliminate one additional round-trip to the database.

While trying to improve the overall performance, I have considered doing bulk updates in the following way:

(1) Parse incoming item.

(2) Generate SQL statement with "INSERT ... ON DUPLICATE KEY UPDATE" and append to a file.

Periodically flush the SQL statements in the file to the database.

Two questions:

(1) will this have a positive impact in the database load?

(2) how should I flush the statements to the database so that indices are only reconstructed after the complete flush? (using transactions?)

UPDATE: I am using Perl DBI + MySQL MyISAM.

Thanks in advance for any comments.

解决方案

You don't say what kind of database access environment (PERL DBI? JDBC? ODBC?) you're running in, or what kind of table storage engine (MyISAM? InnoDB?) you're using.

First of all, you're right to pick INSERT ... ON DUPLICATE KEY UPDATE. Good move, unless you can guarantee unique keys.

Secondly, if your database access environment allows it, you should use prepared statements. You definitely won't get good performance if you write a bunch of statements into a file, and then make a database client read the file once again. Do the INSERT operations directly from the software package that consumes the incoming data stream.

Thirdly, pick the right kind of table storage engine. MyISAM inserts are going to be faster than InnoDB, so if you're logging data and retrieving it later that will be a win. But InnoDB has better transactional integrity. If you're really handling tonnage of data, and you don't need to read it very often, consider the ARCHIVE storage engine.

Finally, consider doing a START TRANSACTION at the beginning of a batch of INSERT ... commands, then doing a COMMIT and another START TRANSACTION after a fixed number of rows, like 100 or so. If you're using InnoDB, this will speed things up a lot. If you're using MyISAM or ARCHIVE, it won't matter.

Your big wins will come from the prepared statement stuff and the best choice of storage engine.