我需要从一个相当大的SQL Server表(即300,000+行)中删除重复的行。
当然,由于RowID标识字段的存在,这些行不会完全重复。
MyTable
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
我该怎么做呢?
我需要从一个相当大的SQL Server表(即300,000+行)中删除重复的行。
当然,由于RowID标识字段的存在,这些行不会完全重复。
MyTable
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
我该怎么做呢?
当前回答
首先,您可以使用MIN()和Group By选择最小RowId。我们将保留这些行。
SELECT MIN(RowId) as RowId
FROM MyTable
GROUP BY Col1, Col2, Col3
删除那些不在最小RowId使用范围内的RowId
DELETE FROM MyTable WHERE RowId Not IN()
最后的查询:
DELETE FROM MyTable WHERE RowId Not IN(
SELECT MIN(RowId) as RowId
FROM MyTable
GROUP BY Col1, Col2, Col3
)
你也可以在SQL Fiddle中检查我的答案
其他回答
对于表结构
MyTable
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
删除重复项的查询:
DELETE t1
FROM MyTable t1
INNER JOIN MyTable t2
WHERE t1.RowID > t2.RowID
AND t1.Col1 = t2.Col1
AND t1.Col2=t2.Col2
AND t1.Col3=t2.Col3;
我假设RowID是一种自动递增,其余列有重复的值。
delete t1
from table t1, table t2
where t1.columnA = t2.columnA
and t1.rowid>t2.rowid
邮政:
delete
from table t1
using table t2
where t1.columnA = t2.columnA
and t1.rowid > t2.rowid
通过使用下面的查询,我们可以基于单列或多列删除重复的记录。下面的查询是基于两列进行删除。表名为:testing,列名为empno,empname
DELETE FROM testing WHERE empno not IN (SELECT empno FROM (SELECT empno, ROW_NUMBER() OVER (PARTITION BY empno ORDER BY empno)
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)
or empname not in
(select empname from (select empname,row_number() over(PARTITION BY empno ORDER BY empno)
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)
另一种基于两列删除重复项的方法
我发现这个查询更容易阅读和替换。
DELETE
FROM
TABLE_NAME
WHERE FIRST_COLUMNS
IN(
SELECT * FROM
( SELECT MIN(FIRST_COLUMNS)
FROM TABLE_NAME
GROUP BY
FIRST_COLUMNS,
SECOND_COLUMNS
HAVING COUNT(FIRST_COLUMNS) > 1
) temp
)
注意:在运行查询之前最好模拟查询。
使用这个
WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Name,Department ORDER BY Name)
As RowNumber,* FROM <table_name>
)
DELETE FROM tblTemp where RowNumber >1