如何删除没有唯一行id存在的重复行?

我的座位是

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

我想留下以下重复删除后:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

我尝试了一些查询,但我认为他们取决于有一个行id,因为我没有得到想要的结果。例如:

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

当前回答

在尝试了上面建议的解决方案后,它适用于小型中型表。 我可以为非常大的表提出这个解决方案。因为它在迭代中运行。

Drop all dependency views of the LargeSourceTable you can find the dependecies by using sql managment studio, right click on the table and click "View Dependencies" Rename the table: sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO Create the LargeSourceTable again, but now, add a primary key with all the columns that define the duplications add WITH (IGNORE_DUP_KEY = ON) For example: CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO Create again the views that you dropped in the first place for the new created table Now, Run the following sql script, you will see the results in 1,000,000 rows per page, you can change the row number per page to see the results more often. Note, that I set the IDENTITY_INSERT on and off because one the columns contains auto incremental id, which I'm also copying

设置IDENTITY_INSERT LargeSourceTable ON 声明@PageNumber为INT, @RowspPage为INT 声明@TotalRows为INT 声明@dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 select @TotalRows = count (*) from LargeSourceTable_TEMP

While ((@PageNumber - 1) * @RowspPage < @TotalRows )
Begin
    begin transaction tran_inner
        ; with cte as
        (
            SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
            OFFSET ((@PageNumber) * @RowspPage) ROWS
            FETCH NEXT @RowspPage ROWS ONLY
        )

        INSERT INTO LargeSourceTable 
        (
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        )       
        select 
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        from cte

    commit transaction tran_inner

    PRINT 'Page: ' + convert(varchar(10), @PageNumber)
    PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
    PRINT 'Of: ' + convert(varchar(20), @TotalRows)

    SELECT @dt = convert(varchar(19), getdate(), 121)
    RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
    SET @PageNumber = @PageNumber + 1
End

SET IDENTITY_INSERT LargeSourceTable OFF

其他回答

我更喜欢CTE从sql server表中删除重复的行

强烈推荐阅读本文::http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

保持原创性

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)

DELETE FROM CTE WHERE RN<>1

不保留原创

WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
 
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)

请参见下面的删除方式。

Declare @table table
(col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
Insert into @table values 
('john',1,1,1,1,1,1),
('john',1,1,1,1,1,1),
('sally',2,2,2,2,2,2),
('sally',2,2,2,2,2,2)

创建一个名为@table的样例表,并用给定的数据加载它。

Delete  aliasName from (
Select  *,
        ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
From    @table) aliasName 
Where   rowNumber > 1

Select * from @table

注意:如果您按部分给出分区中的所有列,那么按顺序没有多大意义。

我知道,这个问题是三年前问的,我的答案是蒂姆发布的另一个版本,但发布只是为了对任何人有帮助。

如果你有能力临时添加一个列到表中,这是一个适合我的解决方案:

ALTER TABLE dbo.DUPPEDTABLE ADD RowID INT NOT NULL IDENTITY(1,1)

然后使用MIN和GROUP BY的组合执行DELETE

DELETE b
FROM dbo.DUPPEDTABLE b
WHERE b.RowID NOT IN (
                     SELECT MIN(RowID) AS RowID
                     FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
                     GROUP BY a.ITEM_NUMBER,
                              a.CHARACTERISTIC,
                              a.INTVALUE,
                              a.FLOATVALUE,
                              a.STRINGVALUE
                 );

验证DELETE执行正确:

SELECT a.ITEM_NUMBER,
    a.CHARACTERISTIC,
    a.INTVALUE,
    a.FLOATVALUE,
    a.STRINGVALUE, COUNT(*)--MIN(RowID) AS RowID
FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
GROUP BY a.ITEM_NUMBER,
    a.CHARACTERISTIC,
    a.INTVALUE,
    a.FLOATVALUE,
    a.STRINGVALUE
ORDER BY COUNT(*) DESC 

结果中不应有计数大于1的行。最后,删除rowid列:

ALTER TABLE dbo.DUPPEDTABLE DROP COLUMN RowID;
DELETE p1 FROM Person p1,
    Person p2
WHERE
    p1.Email = p2.Email AND p1.Id > p2.Id

试着使用:

SELECT linkorder
    ,Row_Number() OVER (
        PARTITION BY linkorder ORDER BY linkorder DESC
        ) AS RowNum
FROM u_links