如何删除没有唯一行id存在的重复行?

我的座位是

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

我想留下以下重复删除后:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

我尝试了一些查询,但我认为他们取决于有一个行id,因为我没有得到想要的结果。例如:

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

当前回答

微软有一个关于如何删除重复文件的非常简洁的指南。查看http://support.microsoft.com/kb/139444

简而言之,当你只有几行要删除时,下面是删除重复项的最简单方法:

SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;

Myprimarykey是行标识符。

我将rowcount设置为1,因为我只有两行复制。如果我复制了3行,那么我就会将rowcount设置为2,这样它就会删除它看到的前两行,只在表t1中留下一行。

其他回答

我更喜欢CTE从sql server表中删除重复的行

强烈推荐阅读本文::http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

保持原创性

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)

DELETE FROM CTE WHERE RN<>1

不保留原创

WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
 
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)

在尝试了上面建议的解决方案后,它适用于小型中型表。 我可以为非常大的表提出这个解决方案。因为它在迭代中运行。

Drop all dependency views of the LargeSourceTable you can find the dependecies by using sql managment studio, right click on the table and click "View Dependencies" Rename the table: sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO Create the LargeSourceTable again, but now, add a primary key with all the columns that define the duplications add WITH (IGNORE_DUP_KEY = ON) For example: CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO Create again the views that you dropped in the first place for the new created table Now, Run the following sql script, you will see the results in 1,000,000 rows per page, you can change the row number per page to see the results more often. Note, that I set the IDENTITY_INSERT on and off because one the columns contains auto incremental id, which I'm also copying

设置IDENTITY_INSERT LargeSourceTable ON 声明@PageNumber为INT, @RowspPage为INT 声明@TotalRows为INT 声明@dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 select @TotalRows = count (*) from LargeSourceTable_TEMP

While ((@PageNumber - 1) * @RowspPage < @TotalRows )
Begin
    begin transaction tran_inner
        ; with cte as
        (
            SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
            OFFSET ((@PageNumber) * @RowspPage) ROWS
            FETCH NEXT @RowspPage ROWS ONLY
        )

        INSERT INTO LargeSourceTable 
        (
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        )       
        select 
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        from cte

    commit transaction tran_inner

    PRINT 'Page: ' + convert(varchar(10), @PageNumber)
    PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
    PRINT 'Of: ' + convert(varchar(20), @TotalRows)

    SELECT @dt = convert(varchar(19), getdate(), 121)
    RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
    SET @PageNumber = @PageNumber + 1
End

SET IDENTITY_INSERT LargeSourceTable OFF

DELETE FROM TBL1  WHERE ID  IN
(SELECT ID FROM TBL1  a WHERE ID!=
(select MAX(ID) from TBL1  where DUPVAL=a.DUPVAL 
group by DUPVAL
having count(DUPVAL)>1))

从一个巨大的(几百万条记录)表中删除重复项可能需要很长时间。我建议将所选行的批量插入到临时表中,而不是删除。

--REWRITING YOUR CODE(TAKE NOTE OF THE 3RD LINE) WITH CTE AS(SELECT NAME,ROW_NUMBER() 
OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB) SELECT * INTO #unique_records FROM 
CTE WHERE ID =1;

删除所有重复项,但删除第一个重复项(具有最小ID)

应该同样适用于其他SQL服务器,如Postgres:

DELETE FROM table
WHERE id NOT IN (
   select min(id) from table
   group by col1, col2, col3, col4, col5, col6, col7
)