Let me set up the situation. We are trying to insert a modestly high number of rows (roughly 10-20M a day) into a MyISAM table that is modestly wide:
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| blah1 | varchar(255) | NO | PRI | | |
| blah2 | varchar(255) | NO | PRI | | |
| blah3 | varchar(5) | NO | PRI | | |
| blah4 | varchar(5) | NO | PRI | | |
| blah5 | varchar(2) | NO | PRI | | |
| blah6 | varchar(2) | NO | PRI | | |
| blah7 | date | NO | PRI | | |
| blah8 | smallint(6) | NO | PRI | | |
| blah9 | varchar(255) | NO | PRI | | |
| blah10 | bigint(20) | YES | | NULL | |
+--------------+--------------+------+-----+---------+-------+
The only index besides that whopping primary key is on the blah7, the date field. We are using LOAD DATA INFILE and seeing what strikes me as pretty awful performance, around 2 hours to load the data. I was led to believe that LOAD DATA INFILE was orders of magnitude faster than that.
Interestingly, we have some less fat tables (5-6 fields) that we also use LOAD DATA INFILE to batch data into and we see much better performance on those. The number of records is quite a bit smaller, which leads me to think that we are running up against a buffer size limit when we load the large table, and are having to go to disk (and really, what else but going to disk would explain such slow load times?).
...which brings me to my question. What my.cnf settings are most important when dealing with LOAD DATA INFILE commands?
解决方案
Inserting into indexes in general kills performance. You may be better off removing the index before inserting data and re-indexing after insertion.
Normally MySQL is rather fast loading
data in MyISAM table, but there is
exception, which is when it can't
rebuild indexes by sort but builds
them row by row instead. It can be
happening due to wrong configuration
(ie too small
myisam_max_sort_file_size or
myisam_max_extra_sort_file_size) or it
could be just lack of optimization, if
you're having large (does not fit in
memory) PRIMARY or UNIQUE indexes.