我有一个表与以下字段:

id (Unique)
url (Unique)
title
company
site_id

现在,我需要删除具有相同标题、company和site_id的行。一种方法是使用下面的SQL和脚本(PHP):

SELECT title, site_id, location, id, count( * ) 
FROM jobs
GROUP BY site_id, company, title, location
HAVING count( * ) >1

运行此查询后,可以使用服务器端脚本删除重复项。

但是,我想知道这是否只能使用SQL查询。


当前回答

如果不想更改列属性,那么可以使用下面的查询。

因为你有一个列,它有唯一的id(例如,auto_increment列),你可以用它来删除重复项:

DELETE `a`
FROM
    `jobs` AS `a`,
    `jobs` AS `b`
WHERE
    -- IMPORTANT: Ensures one version remains
    -- Change "ID" to your unique column's name
    `a`.`ID` < `b`.`ID`

    -- Any duplicates you want to check for
    AND (`a`.`title` = `b`.`title` OR `a`.`title` IS NULL AND `b`.`title` IS NULL)
    AND (`a`.`company` = `b`.`company` OR `a`.`company` IS NULL AND `b`.`company` IS NULL)
    AND (`a`.`site_id` = `b`.`site_id` OR `a`.`site_id` IS NULL AND `b`.`site_id` IS NULL);

在MySQL中,你可以用空安全的相等操作符(又名“宇宙飞船操作符”)来简化它:

DELETE `a`
FROM
    `jobs` AS `a`,
    `jobs` AS `b`
WHERE
    -- IMPORTANT: Ensures one version remains
    -- Change "ID" to your unique column's name
    `a`.`ID` < `b`.`ID`

    -- Any duplicates you want to check for
    AND `a`.`title` <=> `b`.`title`
    AND `a`.`company` <=> `b`.`company`
    AND `a`.`site_id` <=> `b`.`site_id`;

其他回答

我必须对文本字段执行此操作,并且遇到了索引上100字节的限制。

我通过添加一个列来解决这个问题,对字段进行md5哈希,并进行更改。

ALTER TABLE table ADD `merged` VARCHAR( 40 ) NOT NULL ;
UPDATE TABLE SET merged` = MD5(CONCAT(`col1`, `col2`, `col3`))
ALTER IGNORE TABLE table ADD UNIQUE INDEX idx_name (`merged`);

还有另一种解决方案:

DELETE t1 FROM my_table t1, my_table t2 WHERE t1.id < t2.id AND t1.my_field = t2.my_field AND t1.my_field_2 = t2.my_field_2 AND ...

我有一个忘记在id行添加主键的表。虽然它在id上有auto_increment。但是有一天,一个东西在数据库中重放了mysql bin日志,插入了一些重复的行。

我删除重复的行

选择唯一的重复行并导出它们 选择T1。(select count(*) as c,id from table_name group by id) T2 on T1。id = T2。T2.c > 1 group by T1.id; 按id删除重复的行 从导出的数据中插入该行。 然后在id上添加主键

我想更具体地说明我删除了哪些记录,下面是我的解决方案:

delete
from jobs c1
where not c1.location = 'Paris'
and  c1.site_id > 64218
and exists 
(  
select * from jobs c2 
where c2.site_id = c1.site_id
and   c2.company = c1.company
and   c2.location = c1.location
and   c2.title = c1.title
and   c2.site_id > 63412
and   c2.site_id < 64219
)

这将删除标题、公司和网站值相同的重复行。最后一个重复项将被保留,其余重复项将被删除(如果你想保留第一个重复项并删除其他重复项,将id的比较项更改为大于t1。Id > t2.id)

DELETE t1 FROM tablename t1
INNER JOIN tablename t2 
WHERE 
    t1.id < t2.id AND
    t1.title = t2.title AND
    t1.company=t2.company AND
    t1.site_ID=t2.site_ID;