我需要从一个相当大的SQL Server表(即300,000+行)中删除重复的行。

当然,由于RowID标识字段的存在,这些行不会完全重复。

MyTable

RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null

我该怎么做呢?


当前回答

我想我应该分享我的解决方案,因为它在特殊情况下是有效的。 在我的例子中,有重复值的表没有外键(因为这些值是从另一个db复制的)。

begin transaction
-- create temp table with identical structure as source table
Select * Into #temp From tableName Where 1 = 2

-- insert distinct values into temp
insert into #temp 
select distinct * 
from  tableName

-- delete from source
delete from tableName 

-- insert into source from temp
insert into tableName 
select * 
from #temp

rollback transaction
-- if this works, change rollback to commit and execute again to keep you changes!!

PS:在处理这样的事情时,我总是使用事务,这不仅确保了所有事情都作为一个整体执行,而且还允许我在没有任何风险的情况下进行测试。但是当然你应该做个备份,以防万一……

其他回答

另一种表达方式:——

DELETE A
FROM   TABLE A,
       TABLE B
WHERE  A.COL1 = B.COL1
       AND A.COL2 = B.COL2
       AND A.UNIQUEFIELD > B.UNIQUEFIELD 

使用CTE。这个想法是连接一个或多个列,形成一个重复的记录,然后删除你喜欢的:

;with cte as (
    select 
        min(PrimaryKey) as PrimaryKey
        UniqueColumn1,
        UniqueColumn2
    from dbo.DuplicatesTable 
    group by
        UniqueColumn1, UniqueColumn1
    having count(*) > 1
)
delete d
from dbo.DuplicatesTable d 
inner join cte on 
    d.PrimaryKey > cte.PrimaryKey and
    d.UniqueColumn1 = cte.UniqueColumn1 and 
    d.UniqueColumn2 = cte.UniqueColumn2;
CREATE TABLE car(Id int identity(1,1), PersonId int, CarId int)

INSERT INTO car(PersonId,CarId)
VALUES(1,2),(1,3),(1,2),(2,4)

--SELECT * FROM car

;WITH CTE as(
SELECT ROW_NUMBER() over (PARTITION BY personid,carid order by personid,carid) as rn,Id,PersonID,CarId from car)

DELETE FROM car where Id in(SELECT Id FROM CTE WHERE rn>1)

现在让我们看看elasticalsearch表,这个表有重复的行,Id是相同的uniq字段。我们知道如果某个id存在于某个组条件下,那么我们可以删除该组作用域之外的其他行。我的举止表明了这一标准。

很多情况下,这个线程是在类似的状态,我。只需根据删除重复(重复)行的情况更改目标组条件。

DELETE 
FROM elasticalsearch
WHERE Id NOT IN 
               (SELECT min(Id)
                     FROM elasticalsearch
                     GROUP BY FirmId,FilterSearchString
                     ) 

干杯

这将删除重复的行,除了第一行

DELETE
FROM
    Mytable
WHERE
    RowID NOT IN (
        SELECT
            MIN(RowID)
        FROM
            Mytable
        GROUP BY
            Col1,
            Col2,
            Col3
    )

引用(http://www.codeproject.com/Articles/157977/Remove-Duplicate-Rows-from-a-Table-in-SQL-Server)