如何删除没有唯一行id存在的重复行?

我的座位是

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

我想留下以下重复删除后:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

我尝试了一些查询,但我认为他们取决于有一个行id,因为我没有得到想要的结果。例如:

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

当前回答

要在SQL Server中删除表中的重复行,请执行以下步骤:

使用GROUP BY子句或ROW_NUMBER()函数查找重复的行。 使用DELETE语句删除重复的行。

设置一个示例表

DROP TABLE IF EXISTS contacts;

CREATE TABLE contacts(
    contact_id INT IDENTITY(1,1) PRIMARY KEY,
    first_name NVARCHAR(100) NOT NULL,
    last_name NVARCHAR(100) NOT NULL,
    email NVARCHAR(255) NOT NULL,
);

插入的值

INSERT INTO contacts
    (first_name,last_name,email) 
VALUES
    ('Syed','Abbas','syed.abbas@example.com'),
    ('Catherine','Abel','catherine.abel@example.com'),
    ('Kim','Abercrombie','kim.abercrombie@example.com'),
    ('Kim','Abercrombie','kim.abercrombie@example.com'),
    ('Kim','Abercrombie','kim.abercrombie@example.com'),
    ('Hazem','Abolrous','hazem.abolrous@example.com'),
    ('Hazem','Abolrous','hazem.abolrous@example.com'),
    ('Humberto','Acevedo','humberto.acevedo@example.com'),
    ('Humberto','Acevedo','humberto.acevedo@example.com'),
    ('Pilar','Ackerman','pilar.ackerman@example.com');

查询

    SELECT 
   contact_id, 
   first_name, 
   last_name, 
   email
FROM 
   contacts;

从表中删除重复的行

   WITH cte AS (
    SELECT 
        contact_id, 
        first_name, 
        last_name, 
        email, 
        ROW_NUMBER() OVER (
            PARTITION BY 
                first_name, 
                last_name, 
                email
            ORDER BY 
                first_name, 
                last_name, 
                email
        ) row_num
     FROM 
        contacts
)
DELETE FROM cte
WHERE row_num > 1;

现在要删除记录吗

其他回答

DELETE p1 FROM Person p1,
    Person p2
WHERE
    p1.Email = p2.Email AND p1.Id > p2.Id

在尝试了上面建议的解决方案后,它适用于小型中型表。 我可以为非常大的表提出这个解决方案。因为它在迭代中运行。

Drop all dependency views of the LargeSourceTable you can find the dependecies by using sql managment studio, right click on the table and click "View Dependencies" Rename the table: sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO Create the LargeSourceTable again, but now, add a primary key with all the columns that define the duplications add WITH (IGNORE_DUP_KEY = ON) For example: CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO Create again the views that you dropped in the first place for the new created table Now, Run the following sql script, you will see the results in 1,000,000 rows per page, you can change the row number per page to see the results more often. Note, that I set the IDENTITY_INSERT on and off because one the columns contains auto incremental id, which I'm also copying

设置IDENTITY_INSERT LargeSourceTable ON 声明@PageNumber为INT, @RowspPage为INT 声明@TotalRows为INT 声明@dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 select @TotalRows = count (*) from LargeSourceTable_TEMP

While ((@PageNumber - 1) * @RowspPage < @TotalRows )
Begin
    begin transaction tran_inner
        ; with cte as
        (
            SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
            OFFSET ((@PageNumber) * @RowspPage) ROWS
            FETCH NEXT @RowspPage ROWS ONLY
        )

        INSERT INTO LargeSourceTable 
        (
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        )       
        select 
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        from cte

    commit transaction tran_inner

    PRINT 'Page: ' + convert(varchar(10), @PageNumber)
    PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
    PRINT 'Of: ' + convert(varchar(20), @TotalRows)

    SELECT @dt = convert(varchar(19), getdate(), 121)
    RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
    SET @PageNumber = @PageNumber + 1
End

SET IDENTITY_INSERT LargeSourceTable OFF

如果你没有引用,比如外键,你可以这样做。在测试概念证明和测试数据重复时,我经常这样做。

SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]

INTO [newTable]

FROM [oldTable]

进入对象资源管理器并删除旧表。

用旧表的名称重命名新表。

在mysql中有两个解决方案:

A)使用Delete JOIN语句删除重复的行

DELETE t1 FROM contacts t1
INNER JOIN contacts t2 
WHERE 
    t1.id < t2.id AND 
    t1.email = t2.email;

该查询两次引用联系人表,因此,它使用表别名t1和t2。

输出结果为:

1 查询确定,影响4行(0.10秒)

如果你想删除重复的行并保留最低的id,你可以使用下面的语句:

DELETE c1 FROM contacts c1
INNER JOIN contacts c2 
WHERE
    c1.id > c2.id AND 
    c1.email = c2.email;

   

B)使用中间表删除重复的行

下面是使用中间表删除重复行的步骤:

1。创建一个新表,其结构与要删除重复行的原始表相同。

2。将原始表中的不同行插入到直接表中。

3所示。将原始表中的不同行插入到直接表中。

 

步骤1。创建一个与原表结构相同的新表:

CREATE TABLE source_copy LIKE source;

步骤2。从原表中插入不同的行到新表中:

INSERT INTO source_copy
SELECT * FROM source
GROUP BY col; -- column that has duplicate values

步骤3。删除原始表并将直接表重命名为原始表

DROP TABLE source;
ALTER TABLE source_copy RENAME TO source;

来源:http://www.mysqltutorial.org/mysql-delete-duplicate-rows/

哦,哇,我觉得准备这些答案太愚蠢了,他们就像专家的答案,包括所有的CTE和临时表等。

为了让它工作,我所做的只是使用MAX聚合ID列。

DELETE FROM table WHERE col1 IN (
    SELECT MAX(id) FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)

注意:您可能需要多次运行它来删除重复,因为这一次只会删除一组重复的行。