- 查询
1 查询结果唯一化---distinct
->select distinct * from person
2 统计查询结果中互不相同的结果数量--count(distinct)
->select count(distinct last_name,first_name) from person
3 统计重复的列
->select count(*) - count(distinct last_name,first_name) as 'duplicate names' frm person
4显示不唯一的值
->select count(*) as repetitions ,last_name,first_name from person group by last_name,first_name having repetitions > 1
适用having 子句,限制输出出现次数多于一次的名字,如果省略则只显示出现一次的名字,这个操作需要:
--确定可能包含重复的列
--count(*)在选择列中列出那些列
--group by在该子句列出这些列
--增加having限制数目以消除唯一值
5显示重复行是所有信息
摘要写入临时表,然后与原表连接
->creat table tmp select count(*) as count,last_name,first_name from person group by last_name,first_name having count > 1;
select person.* from tmp inner jion person using(last_name,first_name)
- 插入更新数据操作时的重复错误
1 当重复出现,保持原有行,应使用insert ignore取代insert
->insert ignore into person (last_name,first_name) value ('x2','y2');
->insert ignore into person(last_name,first_name) value ('x2','y2');Query OK,1 rows affected
Query OK,0 rows affected
如果待插入行与已存在未重复,Mysql执行插入,如果是重复的,那么ignore会忽略错误提示,不执行插入
2 当重复出现,代替原有行,应使用replace
->relace into person (last_name,first_name) value ('x3','y3');Query OK,1 rows affected
->relace into person (last_name,first_name) value ('x3','y3');发生重复时,新行会代替旧行,包括删除旧行,插入新行Query OK,2 rows affected
3当重复发生时,你希望修改已存在的相关列,使用insert...on duplicate key update
如果未重复,正常插入,重复则按on duplicate key update 子句修改存在行mysql> desc test; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | uid | int(11) | NO | PRI | | | | uname | varchar(20) | YES | | NULL | | +-------+-------------+------+-----+---------+-------+ 2 rows in set (0.00 sec) mysql> select * from test; +-----+--------+ | uid | uname | +-----+--------+ | 1 | uname1 | | 2 | uname2 | | 3 | me | +-----+--------+ 3 rows in set (0.00 sec) mysql> INSERT INTO test values ( 3,'insertName' ) -> ON DUPLICATE KEY UPDATE uname='updateName'; Query OK, 2 rows affected (0.03 sec) mysql> select * from test; +-----+------------+ | uid | uname | +-----+------------+ | 1 | uname1 | | 2 | uname2 | | 3 | updateName | +-----+------------+ 3 rows in set (0.00 sec) mysql> create index i_test_uname on test(uname); Query OK, 3 rows affected (0.20 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> INSERT INTO test VALUES ( 1 , 'uname2') -> ON DUPLICATE KEY UPDATE uname='update2records'; Query OK, 2 rows affected (0.00 sec) mysql> select * from test; +-----+----------------+ | uid | uname | +-----+----------------+ | 2 | uname2 | | 1 | update2records | | 3 | updateName | +-----+----------------+ 3 rows in set (0.00 sec)
插入时会与两条记录发生冲突,分别由主键和唯一索引引起。但最终只UPDATE了其中一条。这在手册中也说明了,有多个唯一索引(或者有键也有唯一索引)的情况下,不建议使用该语句。
insert ignore 比replace效率高,因为它不实际插入重复行,replace适用于表中其他非键列需要被代替,当你需要在发生重复时,使用insert...on duplicate key update
- 删除重复行
1使用表替换来删除重复
如果是一个行完成和另一个重复,可以使用该方法,将唯一行移入另一相同结构的表,新表代替旧的
a 创建于原表结构相同的新表
b 插入原表的唯一行->create table tmp like person
c 新表代替旧表-> insert into tmp select distinct * from person
如果重复项只与表中列的子集相关,可以为其创建具有唯一索引的新表,并使用insert ignore插入,然后代替旧表->drop table person; ->rename tbale tmp to person;
这样的缺点是,如果索引列可以为空,必须建立unique而不是primary key,此时索引不会删除多余null键值->creat table tmp like person; ->alter table tmp add primary key (last_name,first_name); ->insert ignore into tmp select * from person; ->drop table person; ->rename table tmp to person;
2通过增加索引删除重复
->select * from person order by last_name,first_name
增加索引
->alter ignore table person add primary key (last_name,first_name); ->select * from person order by last_name,first_name
如果索引列可以为空,必须使用unique索引,这时,索引不会删除重复的空值,3删除特定行的重复项
->delete from t where color ='blue' limit 2