我有一个表与以下字段:
id (Unique)
url (Unique)
title
company
site_id
现在,我需要删除具有相同标题、company和site_id的行。一种方法是使用下面的SQL和脚本(PHP):
SELECT title, site_id, location, id, count( * )
FROM jobs
GROUP BY site_id, company, title, location
HAVING count( * ) >1
运行此查询后,可以使用服务器端脚本删除重复项。
但是,我想知道这是否只能使用SQL查询。
这个解决方案将把重复的数据移到一个表中,唯一的数据移到另一个表中。
-- speed up creating uniques table if dealing with many rows
CREATE INDEX temp_idx ON jobs(site_id, company, title, location);
-- create the table with unique rows
INSERT jobs_uniques SELECT * FROM
(
SELECT *
FROM jobs
GROUP BY site_id, company, title, location
HAVING count(1) > 1
UNION
SELECT *
FROM jobs
GROUP BY site_id, company, title, location
HAVING count(1) = 1
) x
-- create the table with duplicate rows
INSERT jobs_dupes
SELECT *
FROM jobs
WHERE id NOT IN
(SELECT id FROM jobs_uniques)
-- confirm the difference between uniques and dupes tables
SELECT COUNT(1)
AS jobs,
(SELECT COUNT(1) FROM jobs_dupes) + (SELECT COUNT(1) FROM jobs_uniques)
AS sum
FROM jobs
我有这个查询片段的SQLServer,但我认为它可以用在其他DBMS与小的变化:
DELETE
FROM Table
WHERE Table.idTable IN (
SELECT MAX(idTable)
FROM idTable
GROUP BY field1, field2, field3
HAVING COUNT(*) > 1)
我忘了告诉您,这个查询不会删除重复行中id最低的行。如果这对你有用,试试这个查询:
DELETE
FROM jobs
WHERE jobs.id IN (
SELECT MAX(id)
FROM jobs
GROUP BY site_id, company, title, location
HAVING COUNT(*) > 1)
这个解决方案将把重复的数据移到一个表中,唯一的数据移到另一个表中。
-- speed up creating uniques table if dealing with many rows
CREATE INDEX temp_idx ON jobs(site_id, company, title, location);
-- create the table with unique rows
INSERT jobs_uniques SELECT * FROM
(
SELECT *
FROM jobs
GROUP BY site_id, company, title, location
HAVING count(1) > 1
UNION
SELECT *
FROM jobs
GROUP BY site_id, company, title, location
HAVING count(1) = 1
) x
-- create the table with duplicate rows
INSERT jobs_dupes
SELECT *
FROM jobs
WHERE id NOT IN
(SELECT id FROM jobs_uniques)
-- confirm the difference between uniques and dupes tables
SELECT COUNT(1)
AS jobs,
(SELECT COUNT(1) FROM jobs_dupes) + (SELECT COUNT(1) FROM jobs_uniques)
AS sum
FROM jobs
要做到这一点,一个非常简单的方法是在3列上添加UNIQUE索引。在编写ALTER语句时,请包含IGNORE关键字。像这样:
ALTER IGNORE TABLE jobs
ADD UNIQUE INDEX idx_name (site_id, title, company);
这将删除所有重复的行。作为一个额外的好处,将来重复的insert将出错。像往常一样,在运行这样的程序之前,您可能想要进行备份…
编辑:不再工作在MySQL 5.7+
这个特性在MySQL 5.6中已经被弃用,在MySQL 5.7中被移除,所以它不起作用。