在SQL表中查找重复值

最新推荐文章于 2024-07-25 23:12:39 发布

CHCH998

最新推荐文章于 2024-07-25 23:12:39 发布

阅读量516

点赞数

文章标签： sql duplicates

原文链接：https://oldbug.net/q/At25/Finding-duplicate-values-in-a-SQL-table

版权

本文翻译自：Finding duplicate values in a SQL table

It's easy to find duplicates with one field: 使用一个字段很容易找到duplicates ：

SELECT name, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

So if we have a table 所以，如果我们有一张桌子

ID   NAME   EMAIL
1    John   asd@asd.com
2    Sam    asd@asd.com
3    Tom    asd@asd.com
4    Bob    bob@asd.com
5    Tom    asd@asd.com

This query will give us John, Sam, Tom, Tom because they all have the same email . 这个查询将给我们John，Sam，Tom，Tom，因为他们都有相同的email 。

However, what I want is to get duplicates with the same email and name . 但是，我想要的是使用相同的email和name获取重复项。

That is, I want to get "Tom", "Tom". 也就是说，我想得到“汤姆”，“汤姆”。

The reason I need this: I made a mistake, and allowed to insert duplicate name and email values. 我需要这个的原因：我犯了一个错误，并允许插入重复的name和email值。 Now I need to remove/change the duplicates, so I need to find them first. 现在我需要删除/更改重复项，所以我需要先找到它们。

#1楼

参考：https://stackoom.com/question/At25/在SQL表中查找重复值

#2楼

Try the following: 请尝试以下方法：

SELECT * FROM
(
    SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
        AS Rank 
        FROM Customers
) AS B WHERE Rank>1

#3楼

In case you work with Oracle, this way would be preferable: 如果您使用Oracle，这种方式更可取：

create table my_users(id number, name varchar2(100), email varchar2(100));

insert into my_users values (1, 'John', 'asd@asd.com');
insert into my_users values (2, 'Sam', 'asd@asd.com');
insert into my_users values (3, 'Tom', 'asd@asd.com');
insert into my_users values (4, 'Bob', 'bob@asd.com');
insert into my_users values (5, 'Tom', 'asd@asd.com');

commit;

select *
  from my_users
 where rowid not in (select min(rowid) from my_users group by name, email);

#4楼

If you wish to see if there is any duplicate rows in your table, I used below Query: 如果你想看看你的表中是否有任何重复的行，我使用下面的Query：

create table my_table(id int, name varchar(100), email varchar(100));

insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (2, 'Aman', 'aman@rms.com');
insert into my_table values (3, 'Tom', 'tom@rms.com');
insert into my_table values (4, 'Raj', 'raj@rms.com');


Select COUNT(1) As Total_Rows from my_table 
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc

#5楼

try this code 试试这段代码

WITH CTE AS

( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE

#6楼

SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1

Simply group on both of the columns. 只需在两个列上分组。

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency" : 注意：旧的ANSI标准是在GROUP BY中包含所有非聚合列，但这已经改变了“功能依赖”的概念：

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. 在关系数据库理论中，函数依赖性是来自数据库的关系中的两组属性之间的约束。 In other words, functional dependency is a constraint that describes the relationship between attributes in a relation. 换句话说，函数依赖是描述关系中属性之间关系的约束。

Support is not consistent: 支持不一致：

Recent PostgreSQL supports it . 最近的PostgreSQL 支持它。
SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY. SQL Server（与SQL Server 2017一样）仍然需要GROUP BY中的所有非聚合列。
MySQL is unpredictable and you need sql_mode=only_full_group_by : MySQL是不可预测的，你需要sql_mode=only_full_group_by ：
- GROUP BY lname ORDER BY showing wrong results ; GROUP BY lname ORDER BY显示错误的结果 ;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer). 在没有ANY（）的情况下，哪个是最便宜的聚合函数（参见接受答案中的注释）。
Oracle isn't mainstream enough (warning: humour, I don't know about Oracle). Oracle不够主流（警告：幽默，我不了解Oracle）。