我有一个表与以下字段:

id (Unique)
url (Unique)
title
company
site_id

现在,我需要删除具有相同标题、company和site_id的行。一种方法是使用下面的SQL和脚本(PHP):

SELECT title, site_id, location, id, count( * ) 
FROM jobs
GROUP BY site_id, company, title, location
HAVING count( * ) >1

运行此查询后,可以使用服务器端脚本删除重复项。

但是,我想知道这是否只能使用SQL查询。


当前回答

如果不想更改列属性,那么可以使用下面的查询。

因为你有一个列,它有唯一的id(例如,auto_increment列),你可以用它来删除重复项:

DELETE `a`
FROM
    `jobs` AS `a`,
    `jobs` AS `b`
WHERE
    -- IMPORTANT: Ensures one version remains
    -- Change "ID" to your unique column's name
    `a`.`ID` < `b`.`ID`

    -- Any duplicates you want to check for
    AND (`a`.`title` = `b`.`title` OR `a`.`title` IS NULL AND `b`.`title` IS NULL)
    AND (`a`.`company` = `b`.`company` OR `a`.`company` IS NULL AND `b`.`company` IS NULL)
    AND (`a`.`site_id` = `b`.`site_id` OR `a`.`site_id` IS NULL AND `b`.`site_id` IS NULL);

在MySQL中,你可以用空安全的相等操作符(又名“宇宙飞船操作符”)来简化它:

DELETE `a`
FROM
    `jobs` AS `a`,
    `jobs` AS `b`
WHERE
    -- IMPORTANT: Ensures one version remains
    -- Change "ID" to your unique column's name
    `a`.`ID` < `b`.`ID`

    -- Any duplicates you want to check for
    AND `a`.`title` <=> `b`.`title`
    AND `a`.`company` <=> `b`.`company`
    AND `a`.`site_id` <=> `b`.`site_id`;

其他回答

如果您试图删除其中一个副本并保留另一个副本,那么这是完美的。注意,如果没有子查询,您将得到#1093错误。

DELETE FROM table_name
WHERE id IN (
    SELECT * FROM (SELECT n.id FROM table_name n
    WHERE n.column2 != "value"
    GROUP BY n.column HAVING COUNT(n.column ) > 1) x
)

要做到这一点,一个非常简单的方法是在3列上添加UNIQUE索引。在编写ALTER语句时,请包含IGNORE关键字。像这样:

ALTER IGNORE TABLE jobs
ADD UNIQUE INDEX idx_name (site_id, title, company);

这将删除所有重复的行。作为一个额外的好处,将来重复的insert将出错。像往常一样,在运行这样的程序之前,您可能想要进行备份…

编辑:不再工作在MySQL 5.7+

这个特性在MySQL 5.6中已经被弃用,在MySQL 5.7中被移除,所以它不起作用。

我找到了一个简单的方法。(保持最新的)

DELETE t1 FROM table_name t1 INNER JOIN table_name t2 
WHERE t1.primary_id < t2.primary_id 
AND t1.check_duplicate_col_1 = t2.check_duplicate_col_1 
AND t1.check_duplicate_col_2 = t2.check_duplicate_col_2
...

使用Delete JOIN语句删除重复的行:

DELETE t1 FROM table_name t1
JOIN table_name t2
WHERE
    t1.id < t2.id AND
    t1.title = t2.title AND t1.company = t2.company AND t1.site_id = t2.site_id;

你可以很容易地从这个代码中删除重复的记录。

$qry = mysql_query("SELECT * from cities");
while($qry_row = mysql_fetch_array($qry))
{
$qry2 = mysql_query("SELECT * from cities2 where city = '".$qry_row['city']."'");

if(mysql_num_rows($qry2) > 1){
    while($row = mysql_fetch_array($qry2)){
        $city_arry[] = $row;

        }

    $total = sizeof($city_arry) - 1;
        for($i=1; $i<=$total; $i++){


            mysql_query( "delete from cities2 where town_id = '".$city_arry[$i][0]."'");

            }
    }
    //exit;
}