使用一个字段很容易找到重复项:

SELECT email, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

所以如果我们有一张桌子

ID   NAME   EMAIL
1    John   asd@asd.com
2    Sam    asd@asd.com
3    Tom    asd@asd.com
4    Bob    bob@asd.com
5    Tom    asd@asd.com

这个查询将告诉我们John、Sam、Tom和Tom,因为他们都有相同的电子邮件。

然而,我想要的是获得相同电子邮件和名称的副本。

也就是说,我想得到“汤姆”,“汤姆”。

我需要这个的原因是:我犯了一个错误,允许插入重复的名称和电子邮件值。现在我需要删除/更改重复项,所以我需要先找到它们。


当前回答

与其他答案不同,您可以查看包含所有列(如果有的话)的整个记录。在row_number函数的PARTITION BY部分中,选择所需的唯一/双工列。

SELECT  *
FROM    (
 SELECT a.*
 ,      Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS r
 FROM   Customers AS a
)       AS b
WHERE   r > 1;

当您想选择所有字段中的所有重复记录时,可以这样写

CREATE TABLE test (
        id      bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
,       c1      integer
,       c2      text
,       d       date DEFAULT now()
,       v       text
);

INSERT INTO test (c1, c2, v) VALUES
(1, 'a', 'Select'),
(1, 'a', 'ALL'),
(1, 'a', 'multiple'),
(1, 'a', 'records'),
(2, 'b', 'in columns'),
(2, 'b', 'c1 and c2'),
(3, 'c', '.');
SELECT * FROM test ORDER BY 1;

SELECT  *
FROM    test
WHERE   (c1, c2) IN (
 SELECT c1, c2
 FROM   test
 GROUP  BY 1,2
 HAVING count(*) > 1
)
ORDER   BY 1;

在PostgreSQL中测试。

其他回答

尝试此代码

WITH CTE AS

( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE 

试试看:

declare @YourTable table (id int, name varchar(10), email varchar(50))

INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')

SELECT
    name,email, COUNT(*) AS CountOf
    FROM @YourTable
    GROUP BY name,email
    HAVING COUNT(*)>1

输出:

name       email       CountOf
---------- ----------- -----------
John       John-email  2
sam        sam-email   2

(2 row(s) affected)

如果您想要重复数据集的ID,请使用以下命令:

SELECT
    y.id,y.name,y.email
    FROM @YourTable y
        INNER JOIN (SELECT
                        name,email, COUNT(*) AS CountOf
                        FROM @YourTable
                        GROUP BY name,email
                        HAVING COUNT(*)>1
                    ) dt ON y.name=dt.name AND y.email=dt.email

输出:

id          name       email
----------- ---------- ------------
1           John       John-email
2           John       John-email
5           sam        sam-email
6           sam        sam-email

(4 row(s) affected)

要删除重复项,请尝试:

DELETE d
    FROM @YourTable d
        INNER JOIN (SELECT
                        y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                        FROM @YourTable y
                            INNER JOIN (SELECT
                                            name,email, COUNT(*) AS CountOf
                                            FROM @YourTable
                                            GROUP BY name,email
                                            HAVING COUNT(*)>1
                                        ) dt ON y.name=dt.name AND y.email=dt.email
                   ) dt2 ON d.id=dt2.id
        WHERE dt2.RowRank!=1
SELECT * FROM @YourTable

输出:

id          name       email
----------- ---------- --------------
1           John       John-email
3           fred       John-email
4           fred       fred-email
5           sam        sam-email

(4 row(s) affected)

与其他答案不同,您可以查看包含所有列(如果有的话)的整个记录。在row_number函数的PARTITION BY部分中,选择所需的唯一/双工列。

SELECT  *
FROM    (
 SELECT a.*
 ,      Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS r
 FROM   Customers AS a
)       AS b
WHERE   r > 1;

当您想选择所有字段中的所有重复记录时,可以这样写

CREATE TABLE test (
        id      bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
,       c1      integer
,       c2      text
,       d       date DEFAULT now()
,       v       text
);

INSERT INTO test (c1, c2, v) VALUES
(1, 'a', 'Select'),
(1, 'a', 'ALL'),
(1, 'a', 'multiple'),
(1, 'a', 'records'),
(2, 'b', 'in columns'),
(2, 'b', 'c1 and c2'),
(3, 'c', '.');
SELECT * FROM test ORDER BY 1;

SELECT  *
FROM    test
WHERE   (c1, c2) IN (
 SELECT c1, c2
 FROM   test
 GROUP  BY 1,2
 HAVING count(*) > 1
)
ORDER   BY 1;

在PostgreSQL中测试。

这个问题在上面的所有答案中都得到了很好的回答。但我想列出所有可能的方式,我们可以通过各种方式来做到这一点,这可能会让我们了解如何做到,寻求者可以选择最适合他/她的需求的解决方案,因为这是SQL开发人员遇到不同业务用例或在访谈中遇到的最常见的查询之一。

创建示例数据

我将仅从这个问题中设置一些示例数据开始。

Create table NewTable (id int, name varchar(10), email varchar(50))
INSERT  NewTable VALUES (1,'John','asd@asd.com')
INSERT  NewTable VALUES (2,'Sam','asd@asd.com')
INSERT  NewTable VALUES (3,'Tom','asd@asd.com')
INSERT  NewTable VALUES (4,'Bob','bob@asd.com')
INSERT  NewTable VALUES (5,'Tom','asd@asd.com')

1.使用groupby子句

SELECT
    name,email, COUNT(*) AS Occurence
    FROM NewTable
    GROUP BY name,email
    HAVING COUNT(*)>1

工作原理:

GROUP BY子句按中的值将行分组姓名和电子邮件栏。然后,COUNT()函数返回数字每个组的出现次数(姓名、电子邮件)。然后,HAVING子句保持仅重复组,这些组包含多个发生

2.使用CTE:

要返回每个重复行的整个行,请使用公共表表达式(CTE)将上述查询的结果与NewTable表连接:

WITH cte AS (
    SELECT
        name, 
        email, 
        COUNT(*) occurrences
    FROM NewTable
    GROUP BY 
        name, 
        email
    HAVING COUNT(*) > 1
)
SELECT 
    t1.Id,
    t1.name, 
    t1.email
FROM  NewTable t1
    INNER JOIN cte ON 
        cte.name = t1.name AND 
        cte.email = t1.email
ORDER BY 
    t1.name, 
    t1.email;

3.使用ROW_NUMBER()函数

WITH cte AS (
    SELECT 
        name, 
        email, 
        ROW_NUMBER() OVER (
            PARTITION BY name,email
            ORDER BY name,email) rownum
    FROM 
        NewTable t1
) 
SELECT 
  * 
FROM 
    cte 
WHERE 
    rownum > 1;

工作原理:

ROW_NUMBER()将NewTable表的行按名称和电子邮件列中的值分配到分区中。重复的行在名称和电子邮件列中具有重复的值,但行号不同外部查询删除每个组中的第一行。

好吧,现在我相信,你可以有正确的想法,如何找到重复,并应用逻辑在所有可能的场景中找到重复。谢谢

另一种简单的方法是使用解析函数:

SELECT * from 

(SELECT name, email,

COUNT(name) OVER (PARTITION BY name, email) cnt 

FROM users)

WHERE cnt >1;