我需要从一个相当大的SQL Server表(即300,000+行)中删除重复的行。

当然,由于RowID标识字段的存在,这些行不会完全重复。

MyTable

RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null

我该怎么做呢?


当前回答

使用这个

WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Name,Department ORDER BY Name)
   As RowNumber,* FROM <table_name>
)
DELETE FROM tblTemp where RowNumber >1

其他回答

在微软支持网站上有一篇关于删除重复文件的好文章。这是相当保守的——他们让你在不同的步骤中做所有的事情——但它应该适用于大的表格。

在过去,我使用了自连接来实现这一点,尽管它可能会用一个HAVING子句来美化:

DELETE dupes
FROM MyTable dupes, MyTable fullTable
WHERE dupes.dupField = fullTable.dupField 
AND dupes.secondDupField = fullTable.secondDupField 
AND dupes.uniqueField > fullTable.uniqueField

通过使用下面的查询,我们可以基于单列或多列删除重复的记录。下面的查询是基于两列进行删除。表名为:testing,列名为empno,empname

DELETE FROM testing WHERE empno not IN (SELECT empno FROM (SELECT empno, ROW_NUMBER() OVER (PARTITION BY empno ORDER BY empno) 
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)
or empname not in
(select empname from (select empname,row_number() over(PARTITION BY empno ORDER BY empno) 
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)

我更喜欢CTE从sql server表中删除重复的行

强烈推荐阅读本文::http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

保持原创性

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)

DELETE FROM CTE WHERE RN<>1

不保留原创

WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
 
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)

我想这会很有帮助。这里,ROW_NUMBER() OVER(分区由res1。Title ORDER BY res1.Id)作为num来区分重复的行。

delete FROM
(SELECT res1.*,ROW_NUMBER() OVER(PARTITION BY res1.Title ORDER BY res1.Id)as num
 FROM 
(select * from [dbo].[tbl_countries])as res1
)as res2
WHERE res2.num > 1

如果您想预览要删除的行,并控制要保留的重复行。参见http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/

with MYCTE as (
  SELECT ROW_NUMBER() OVER (
    PARTITION BY DuplicateKey1
                ,DuplicateKey2 -- optional
    ORDER BY CreatedAt -- the first row among duplicates will be kept, other rows will be removed
  ) RN
  FROM MyTable
)
DELETE FROM MYCTE
WHERE RN > 1