我有一个varchar列的表,我想找到在这个列中有重复值的所有记录。我可以使用什么查询来查找重复项?


对GROUP BY子句执行SELECT操作。假设name是你想要在其中找到重复项的列:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

这将返回一个在第一列中包含名称值的结果,以及该值在第二列中出现次数的计数。


SELECT varchar_col
FROM table
GROUP BY varchar_col
HAVING COUNT(*) > 1;

SELECT ColumnA, COUNT( * )
FROM Table
GROUP BY ColumnA
HAVING COUNT( * ) > 1

假设您的表名为TableABC,您想要的列是Col, T1的主键是key。

SELECT a.Key, b.Key, a.Col 
FROM TableABC a, TableABC b
WHERE a.Col = b.Col 
AND a.Key <> b.Key

与上面的答案相比,这种方法的优点是它给出了Key。


SELECT  *
FROM    mytable mto
WHERE   EXISTS
        (
        SELECT  1
        FROM    mytable mti
        WHERE   mti.varchar_column = mto.varchar_column
        LIMIT 1, 1
        )
ORDER BY varchar_column

这个查询返回完整的记录,而不仅仅是不同的varchar_column。

这个查询不使用COUNT(*)。如果有很多重复项,COUNT(*)是昂贵的,并且不需要整个COUNT(*),只需要知道是否有两行具有相同的值。

这是通过相关查询底部的LIMIT 1,1来实现的(本质上意味着“返回第二行”)。EXISTS只在前面提到的第二行存在时才返回true(即至少有两行具有相同的varchar_column值)。

当然,在varchar_column上建立索引将大大加快此查询的速度。


SELECT t.*,(select count(*) from city as tt
  where tt.name=t.name) as count
  FROM `city` as t
  where (
     select count(*) from city as tt
     where tt.name=t.name
  ) > 1 order by count desc

用你的表格替换城市。 将name替换为字段名


SELECT 
    t.*,
    (SELECT COUNT(*) FROM city AS tt WHERE tt.name=t.name) AS count 
FROM `city` AS t 
WHERE 
    (SELECT count(*) FROM city AS tt WHERE tt.name=t.name) > 1 ORDER BY count DESC

SELECT DISTINCT a.email FROM `users` a LEFT JOIN `users` b ON a.email = b.email WHERE a.id != b.id;

SELECT * 
FROM `dps` 
WHERE pid IN (SELECT pid FROM `dps` GROUP BY pid HAVING COUNT(pid)>1)

根据levik的回答来获取重复行的id,如果服务器支持的话,可以执行GROUP_CONCAT(这将返回一个以逗号分隔的id列表)。

SELECT GROUP_CONCAT(id), name, COUNT(*) c
FROM documents
GROUP BY name
HAVING c > 1;

要查找Employee中的name列中有多少记录是重复的,下面的查询很有用;

Select name from employee group by name having count(*)>1;

Select column_name, column_name1,column_name2, count(1) as temp from table_name group by column_name having temp > 1

如果要删除具有多个字段的重复行,首先将它们取消为唯一不同的行指定的新唯一键,然后使用group by命令删除具有相同新唯一键的重复行:

Create TEMPORARY table tmp select concat(f1,f2) as cfs,t1.* from mytable as t1;
Create index x_tmp_cfs on tmp(cfs);
Create table unduptable select f1,f2,... from tmp group by cfs;

CREATE TABLE tbl_master
    (`id` int, `email` varchar(15));

INSERT INTO tbl_master
    (`id`, `email`) VALUES
    (1, 'test1@gmail.com'),
    (2, 'test2@gmail.com'),
    (3, 'test1@gmail.com'),
    (4, 'test2@gmail.com'),
    (5, 'test5@gmail.com');

QUERY : SELECT id, email FROM tbl_master
WHERE email IN (SELECT email FROM tbl_master GROUP BY email HAVING COUNT(id) > 1)

我看到上面的结果和查询将工作良好,如果你需要检查单列值是重复的。比如电子邮件。

但如果你需要检查更多的列,并希望检查结果的组合,那么这个查询将正常工作:

SELECT COUNT(CONCAT(name,email)) AS tot,
       name,
       email
FROM users
GROUP BY CONCAT(name,email)
HAVING tot>1 (This query will SHOW the USER list which ARE greater THAN 1
              AND also COUNT)

下面的代码将找到所有使用了不止一次的product_id。每个product_id只能得到一条记录。

SELECT product_id FROM oc_product_reward GROUP BY product_id HAVING count( product_id ) >1

代码取自:http://chandreshrana.blogspot.in/2014/12/find-duplicate-records-based-on-any.html


一个非常晚的贡献…万一这能帮助到以后的任何人…我有一个任务是在一个银行应用程序中找到匹配的交易对(实际上是账户到账户转账的双方),以识别每个账户间转账交易的“从”和“到”,所以我们最终得到了这个:

SELECT 
    LEAST(primaryid, secondaryid) AS transactionid1,
    GREATEST(primaryid, secondaryid) AS transactionid2
FROM (
    SELECT table1.transactionid AS primaryid, 
        table2.transactionid AS secondaryid
    FROM financial_transactions table1
    INNER JOIN financial_transactions table2 
    ON table1.accountid = table2.accountid
    AND table1.transactionid <> table2.transactionid 
    AND table1.transactiondate = table2.transactiondate
    AND table1.sourceref = table2.destinationref
    AND table1.amount = (0 - table2.amount)
) AS DuplicateResultsTable
GROUP BY transactionid1
ORDER BY transactionid1;

The result is that the DuplicateResultsTable provides rows containing matching (i.e. duplicate) transactions, but it also provides the same transaction id's in reverse the second time it matches the same pair, so the outer SELECT is there to group by the first transaction ID, which is done by using LEAST and GREATEST to make sure the two transactionid's are always in the same order in the results, which makes it safe to GROUP by the first one, thus eliminating all the duplicate matches. Ran through nearly a million records and identified 12,000+ matches in just under 2 seconds. Of course the transactionid is the primary index, which really helped.


我的最后一个查询在这里合并了一些有用的答案-组合group by, count和GROUP_CONCAT。

SELECT GROUP_CONCAT(id), `magento_simple`, COUNT(*) c 
FROM product_variant 
GROUP BY `magento_simple` HAVING c > 1;

这提供了两个示例的id(逗号分隔)、我需要的条形码以及重复的数量。

相应地更改表和列。


进一步采取@maxyfc的答案,我需要找到所有返回的重复值的行,这样我就可以在MySQL Workbench中编辑它们:

SELECT * FROM table
   WHERE field IN (
     SELECT field FROM table GROUP BY field HAVING count(*) > 1
   ) ORDER BY field

我没有看到任何JOIN方法,它在复制方面有很多用途。

这种方法会给您带来实际的双倍结果。

SELECT t1.* FROM my_table as t1 
LEFT JOIN my_table as t2 
ON t1.name=t2.name and t1.id!=t2.id 
WHERE t2.id IS NOT NULL 
ORDER BY t1.name

我更喜欢使用窗口函数(MySQL 8.0+)来查找副本,因为我可以看到整行:

WITH cte AS (
  SELECT *
    ,COUNT(*) OVER(PARTITION BY col_name) AS num_of_duplicates_group
    ,ROW_NUMBER() OVER(PARTITION BY col_name ORDER BY col_name2) AS pos_in_group
  FROM table
)
SELECT *
FROM cte
WHERE num_of_duplicates_group > 1;

DB小提琴演示


试着使用这个查询:

SELECT name, COUNT(*) value_count FROM company_master GROUP BY name HAVING value_count > 1;

如果你想删除重复使用DISTINCT

否则使用这个查询:

SELECT users.*,COUNT(user_ID) as user FROM users GROUP BY user_name有用户>


为了获得所有包含复制的数据,我使用了以下方法:

SELECT * FROM TableName INNER JOIN(
  SELECT DupliactedData FROM TableName GROUP BY DupliactedData HAVING COUNT(DupliactedData) > 1 order by DupliactedData)
  temp ON TableName.DupliactedData = temp.DupliactedData;

TableName =您正在使用的表。

DupliactedData =您正在寻找的重复数据。


我从下面开始改进:

SELECT 
    col, 
    COUNT(col)
FROM
    table_name
GROUP BY col
HAVING COUNT(col) > 1; 

作为利维克的答案的一个变体,它可以让你找到重复结果的id,我使用了以下方法:

SELECT * FROM table1 WHERE column1 IN (SELECT column1 AS duplicate_value FROM table1 GROUP BY column1 HAVING COUNT(*) > 1)

感谢@novocaine的精彩回答,他的解决方案对我很有效。我稍微改变了它,以包括一个百分比的循环值,这在我的例子中是需要的。以下是修改后的版本。它将百分比减少到小数点后两位。如果你把2改成0,它就不会显示小数,改成1,它就会显示一位小数,以此类推。

SELECT GROUP_CONCAT(id), name, COUNT(*) c, 
COUNT(*) OVER() AS totalRecords, 
CONCAT(FORMAT(COUNT(*)/COUNT(*) OVER()*100,2),'%') as recurringPecentage
FROM table
GROUP BY name
HAVING c > 1