一.数据库清洗去重
在数据库里清洗时,会用到DELETE语句进行操作,很多时候需要删除重复记录保存,保存一条。百度之后有些语句会报错,直到发现一个在实战中可以用不会报错的,并且跑起来很快的一条sql语句
DELETE consum_record
FROM
consum_record,
(
SELECT
min(id) id,
user_id,
monetary,
consume_time
FROM
consum_record
GROUP BY
user_id,
monetary,
consume_time
HAVING
count(*) > 1
) t2
WHERE
consum_record.user_id = t2.user_id
and consum_record.monetary = t2.monetary
and consum_record.consume_time = t2.consume_time
AND consum_record.id > t2.id;
1.(
SELECT
min
(id) id,
user_id,
monetary,
consume_time
FROM
consum_record
GROUP
BY
user_id,
monetary,
consume_time
HAVING
count
(*) > 1 ) t2 将重复数据建一张临时表,集合里是重复记录的最小id
2.关联两张表,根据条件删除原表大于投t2表的记录,这样就可以去重保留一条
二.查询去重
查询去重有两种方法,一个是distinct,一个是group by,distinct 用于select 语句中,group by使用的频率相对较高,它的目的是用来进行聚合统计的,但也可以实现去重的功能。速度会慢与distinct