I am trying to insert a lot of users into a MySQL database with two tables:
The first table contains the user data. An example INSERT looks like this (id is the primary key, mail is a unique key):
INSERT INTO users (id, mail, name)
VALUES (NULL, "foo@bar.tld", "John Smith")
ON DUPLICATE KEY UPDATE name = VALUE(name)
The second table contains the group the user belongs to. It only stores two foreign keys users_id and groups_id. An example query looks like this:
INSERT INTO users_groups (users_id, groups_id)
VALUES (LAST_INSERT_ID(), 1)
This setup works perfectly fine for small data sets. When I import large amounts of data (>1M rows) the INSERTs get slow. Obviously, it would be much better to do a batch insert:
INSERT INTO users (id, mail, name)
VALUES (NULL, "foo@bar.tld", "John Smith"), (NULL, "baz@qux.tld", "Anna Smith")
ON DUPLICATE KEY UPDATE name = VALUE(name)
and:
INSERT INTO users_groups (users_id, groups_id)
VALUES (LAST_INSERT_ID(), 1), (LAST_INSERT_ID(), 4)
The problem of course is, that LAST_INSERT_ID() only returns one (the first) id of a batch INSERT.
So, what I would need is a "nested" batch INSERT, which IMO does not exist in MySQL.
What can I do to make my INSERTs faster?
解决方案
Bulk inserts by default provide sequential auto increments, with this knowledge you can do your inserts like;
INSERT INTO users (id, mail, name)
VALUES (NULL, "foo@bar.tld", "John Smith"),
(NULL, "baz@qux.tld", "Anna Smith"),
(...) # repeat n-times
;
SET @LASTID=LAST_INSERT_ID()
;
INSERT INTO users_groups (users_id, groups_id)
VALUES (@LASTID - n , 1), # Note n in descending sequence
(@LASTID - n-1, 1),
...
(@LASTID - 1 , 1),
(@LASTID - 0 , 4)
;
Importantly, make sure that innodb_autoinc_lock_mode=1
show global variables like 'innodb_autoinc_lock_mode'
Otherwise consider wrapping your inserts in LOCK TABLES
LOCK TABLES tbl_name WRITE
... sqls ...
UNLOCK TABLES