我有一个表与以下字段:

id (Unique)
url (Unique)
title
company
site_id

现在,我需要删除具有相同标题、company和site_id的行。一种方法是使用下面的SQL和脚本(PHP):

SELECT title, site_id, location, id, count( * ) 
FROM jobs
GROUP BY site_id, company, title, location
HAVING count( * ) >1

运行此查询后,可以使用服务器端脚本删除重复项。

但是,我想知道这是否只能使用SQL查询。


当前回答

我有这个查询片段的SQLServer,但我认为它可以用在其他DBMS与小的变化:

DELETE
FROM Table
WHERE Table.idTable IN  (  
    SELECT MAX(idTable)
    FROM idTable
    GROUP BY field1, field2, field3
    HAVING COUNT(*) > 1)

我忘了告诉您,这个查询不会删除重复行中id最低的行。如果这对你有用,试试这个查询:

DELETE
FROM jobs
WHERE jobs.id IN  (  
    SELECT MAX(id)
    FROM jobs
    GROUP BY site_id, company, title, location
    HAVING COUNT(*) > 1)

其他回答

如果你有一个有大量记录的大表,那么上述解决方案将不起作用或花费太多时间。然后我们有一个不同的解

-- Create temporary table

CREATE TABLE temp_table LIKE table1;

-- Add constraint
ALTER TABLE temp_table ADD UNIQUE(title, company,site_id);

-- Copy data
INSERT IGNORE INTO temp_table SELECT * FROM table1;

-- Rename and drop
RENAME TABLE table1 TO old_table1, temp_table TO table1;
DROP TABLE old_table1;

下面是我用过的方法,它很有效:

create table temp_table like my_table;

T_id是唯一的列

insert into temp_table (id) select id from my_table GROUP by t_id;
delete from my_table where id not in (select id from temp_table);
drop table temp_table;

为了复制具有唯一列的记录,例如COL1,COL2, COL3不应该被复制(假设我们在表结构中遗漏了3个唯一列,并且在表中添加了多个重复项)

DROP TABLE TABLE_NAME_copy;
CREATE TABLE TABLE_NAME_copy LIKE TABLE_NAME;
INSERT INTO TABLE_NAME_copy
SELECT * FROM TABLE_NAME
GROUP BY COLUMN1, COLUMN2, COLUMN3; 
DROP TABLE TABLE_NAME;
ALTER TABLE TABLE_NAME_copy RENAME TO TABLE_NAME;

希望将有助于开发。

MySQL对引用要删除的表有限制。你可以用一个临时表来解决这个问题,比如:

create temporary table tmpTable (id int);

insert  into tmpTable
        (id)
select  id
from    YourTable yt
where   exists
        (
        select  *
        from    YourTabe yt2
        where   yt2.title = yt.title
                and yt2.company = yt.company
                and yt2.site_id = yt.site_id
                and yt2.id > yt.id
        );

delete  
from    YourTable
where   ID in (select id from tmpTable);

以下是Kostanos在评论中的建议: 上面唯一缓慢的查询是DELETE,适用于数据库非常大的情况。这个查询可以更快:

DELETE FROM YourTable USING YourTable, tmpTable WHERE YourTable.id=tmpTable.id

使用Delete JOIN语句删除重复的行:

DELETE t1 FROM table_name t1
JOIN table_name t2
WHERE
    t1.id < t2.id AND
    t1.title = t2.title AND t1.company = t2.company AND t1.site_id = t2.site_id;