mysql合并表去重
目标:
现有表a和b,把两个表中的数据合并去重到c表中。其中a和b表中数据量大概在2千万左右。
基本情况
操作系统版本:CentOS release 5.6 64位
操作系统内存:8G
数据库版本:5.1.56-community 64位
数据库初始化参数:默认
数据库表和数据量
表a:
mysql> desc a2kw; +-------+-------------+------+-----+---------+-------+ | Field | Type +-------+-------------+------+-----+---------+-------+ | c1 | c2 | c3 | c4 +-------+-------------+------+-----+---------+-------+ 4 rows in set (0.00 sec) |
表b
mysql> desc b2kw; +-------+-------------+------+-----+---------+-------+ | Field | Type +-------+-------------+------+-----+---------+-------+ | c1 | c2 | c3 | c4 +-------+-------------+------+-----+---------+-------+ 4 rows in set (0.00 sec) |
a和b表的数据概况如下
mysql> select * from a2kw limit 10; +-----------+-----------+------+----------+ | c1 +-----------+-----------+------+----------+ | 662164461 | 131545534 | TOM0 | 20120520 | | 226662142 | 605685564 | TOM0 | 20120516 | | 527008225 | 172557633 | TOM0 | 20120514 | | 574408183 | 350897450 | TOM0 | 20120510 | | 781619324 | 583989494 | TOM0 | 20120510 | | 158872754 | 775676430 | TOM0 | 20120512 | | 815875622 | 631631832 | TOM0 | 20120514 | | 905943640 | 477433083 | TOM0 | 20120514 | | 660790641 | 616774715 | TOM0 | 20120512 | | 999083595 | 953186525 | TOM0 | 20120513 | +-----------+-----------+------+----------+ 10 rows in set (0.01 sec) |
基本步骤
1、在B表上创建索引
mysql> select count(*) from b2kw; +----------+ | count(*) | +----------+ | 20000002 | +----------+ 1 row in set (0.00 sec) mysql> create index ind_b2kw_c1 on Query OK, 20000002 rows affected (1 min 2.94 sec) Records: 20000002 |
数据量为:20000002 ,时间为:1 min 2.94 sec
2、把a、b分别插入中间表temp表中
创建中间表 mysql> create table temp Query OK, 0 rows affected (0.00 sec) Records: 0 插入数据 mysql> insert into temp Query OK, 20000002 rows affected (13.23 sec) Records: 20000002 mysql> insert into temp Query OK, 20000002 rows affected (13.27 sec) Records: 20000002 mysql> select count(*) from temp; +----------+ | count(*) | +----------+ | 40000004 | +----------+ 1 row in set (0.00 sec) |
数据量为:40000004 ,时间为:26.50 sec
3、temp建立联合索引,强制索引去掉重复数据
mysql> create index ind_temp_c123 on temp(c1,c2,c3); Query OK, 40000004 rows affected (3 min 43.87 sec) Records: 40000004 查看执行计划 mysql> explain select c1,c2,c3,max(c4) from temp FORCE INDEX (ind_temp_c123) group by c1,c2,c3 ; +----+-------------+-------+-------+---------------+---------------+---------+------+----------+-------+ | id | select_type | table | type +----+-------------+-------+-------+---------------+---------------+---------+------+----------+-------+ | +----+-------------+-------+-------+---------------+---------------+---------+------+----------+-------+ 1 row in set (0.05 sec) mysql> insert into c2kw select c1,c2,c3,max(c4) from temp FORCE INDEX (ind_temp_c123) group by c1,c2,c3 ; Query OK, 20000004 rows affected (2 min 0.85 sec) Records: 20000004 |
实际大约花费实际为:6 min
4、删除中间表
mysql> drop table temp; Query OK, 0 rows affected (0.99 sec) |
实际大约花费实际为:1 sec
5、建立c索引
mysql> create index ind_c2kw_c1 on c2kw(c1); Query OK, 20000004 rows affected (49.74 sec) Records: 20000004 mysql> create index ind_c2kw_c2 on c2kw(c2); Query OK, 20000004 rows affected (1 min 47.20 sec) Records: 20000004 mysql> create index ind_c2kw_c3 on c2kw(c3); Query OK, 20000004 rows affected (2 min 42.02 sec) Records: 20000004 |
实际大约花费实际为:5分钟
6、清空a、b表
mysql> truncate table a2kw; Query OK, 0 rows affected (1.15 sec) mysql> truncate table b2kw; Query OK, 0 rows affected (1.34 sec) |
实际大约花费实际为:3sec
一共花费的时间大概在15分钟左右