我需要从一个相当大的SQL Server表(即300,000+行)中删除重复的行。

当然,由于RowID标识字段的存在,这些行不会完全重复。

MyTable

RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null

我该怎么做呢?


当前回答

我想我应该分享我的解决方案,因为它在特殊情况下是有效的。 在我的例子中,有重复值的表没有外键(因为这些值是从另一个db复制的)。

begin transaction
-- create temp table with identical structure as source table
Select * Into #temp From tableName Where 1 = 2

-- insert distinct values into temp
insert into #temp 
select distinct * 
from  tableName

-- delete from source
delete from tableName 

-- insert into source from temp
insert into tableName 
select * 
from #temp

rollback transaction
-- if this works, change rollback to commit and execute again to keep you changes!!

PS:在处理这样的事情时,我总是使用事务,这不仅确保了所有事情都作为一个整体执行,而且还允许我在没有任何风险的情况下进行测试。但是当然你应该做个备份,以防万一……

其他回答

通过使用下面的查询,我们可以基于单列或多列删除重复的记录。下面的查询是基于两列进行删除。表名为:testing,列名为empno,empname

DELETE FROM testing WHERE empno not IN (SELECT empno FROM (SELECT empno, ROW_NUMBER() OVER (PARTITION BY empno ORDER BY empno) 
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)
or empname not in
(select empname from (select empname,row_number() over(PARTITION BY empno ORDER BY empno) 
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)

使用这个

WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Name,Department ORDER BY Name)
   As RowNumber,* FROM <table_name>
)
DELETE FROM tblTemp where RowNumber >1

获取重复的行:

SELECT
name, email, COUNT(*)
FROM 
users
GROUP BY
name, email
HAVING COUNT(*) > 1

删除重复的行。

DELETE users 
WHERE rowid NOT IN 
(SELECT MIN(rowid)
FROM users
GROUP BY name, email);      
CREATE TABLE car(Id int identity(1,1), PersonId int, CarId int)

INSERT INTO car(PersonId,CarId)
VALUES(1,2),(1,3),(1,2),(2,4)

--SELECT * FROM car

;WITH CTE as(
SELECT ROW_NUMBER() over (PARTITION BY personid,carid order by personid,carid) as rn,Id,PersonID,CarId from car)

DELETE FROM car where Id in(SELECT Id FROM CTE WHERE rn>1)

我想我应该分享我的解决方案,因为它在特殊情况下是有效的。 在我的例子中,有重复值的表没有外键(因为这些值是从另一个db复制的)。

begin transaction
-- create temp table with identical structure as source table
Select * Into #temp From tableName Where 1 = 2

-- insert distinct values into temp
insert into #temp 
select distinct * 
from  tableName

-- delete from source
delete from tableName 

-- insert into source from temp
insert into tableName 
select * 
from #temp

rollback transaction
-- if this works, change rollback to commit and execute again to keep you changes!!

PS:在处理这样的事情时,我总是使用事务,这不仅确保了所有事情都作为一个整体执行,而且还允许我在没有任何风险的情况下进行测试。但是当然你应该做个备份,以防万一……