我有一个PostgreSQL数据库表称为“user_links”,目前允许以下重复字段:

year, user_id, sid, cid

唯一的约束目前是第一个字段称为“id”,但我现在希望添加一个约束,以确保年份,user_id, sid和cid都是唯一的,但我不能应用约束,因为重复的值已经存在,违反这一约束。

有没有办法找到所有的副本?


当前回答

在您的情况下,由于限制,您需要删除重复的记录。

查找重复的行 根据created_at日期组织它们——在本例中,我保留了最老的日期 使用USING删除记录以过滤正确的行

WITH duplicated AS ( 
    SELECT id,
        count(*) 
    FROM products 
    GROUP BY id 
    HAVING count(*) > 1), 
ordered AS ( 
    SELECT p.id, 
        created_at, 
        rank() OVER (partition BY p.id ORDER BY p.created_at) AS rnk 
    FROM products o 
    JOIN     duplicated d ON d.id = p.id ), 
products_to_delete AS ( 
    SELECT id, 
        created_at 
    FROM   ordered 
    WHERE  rnk = 2
) 
DELETE 
FROM products 
USING products_to_delete 
WHERE products.id = products_to_delete.id 
    AND products.created_at = products_to_delete.created_at;

其他回答

为了方便起见,我假设您希望仅对列year应用唯一约束,并且主键是名为id的列。

为了找到重复的值,您应该运行,

SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);

使用上面的sql语句,您将得到一个包含表中所有重复年份的表。为了删除除最新重复条目外的所有重复条目,您应该使用上面的sql语句。

DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;

遵循SQL语法可以在检查重复行的时候提供更好的性能。

SELECT id, count(id)
FROM table1
GROUP BY id
HAVING count(id) > 1

从“用PostgreSQL查找重复行”这里有一个聪明的解决方案:

select * from (
  SELECT id,
  ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
  FROM tbl
) dups
where 
dups.Row > 1

在您的情况下,由于限制,您需要删除重复的记录。

查找重复的行 根据created_at日期组织它们——在本例中,我保留了最老的日期 使用USING删除记录以过滤正确的行

WITH duplicated AS ( 
    SELECT id,
        count(*) 
    FROM products 
    GROUP BY id 
    HAVING count(*) > 1), 
ordered AS ( 
    SELECT p.id, 
        created_at, 
        rank() OVER (partition BY p.id ORDER BY p.created_at) AS rnk 
    FROM products o 
    JOIN     duplicated d ON d.id = p.id ), 
products_to_delete AS ( 
    SELECT id, 
        created_at 
    FROM   ordered 
    WHERE  rnk = 2
) 
DELETE 
FROM products 
USING products_to_delete 
WHERE products.id = products_to_delete.id 
    AND products.created_at = products_to_delete.created_at;

基本思想将使用一个嵌套查询计数聚合:

select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1

您可以调整内部查询中的where子句来缩小搜索范围。


评论中提到了另一个很好的解决方案,(但不是每个人都读它们):

select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1

或更短:

SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1