如何在PostgreSQL中找到重复的记录

我有一个PostgreSQL数据库表称为“user_links”，目前允许以下重复字段:

year, user_id, sid, cid

唯一的约束目前是第一个字段称为“id”，但我现在希望添加一个约束，以确保年份，user_id, sid和cid都是唯一的，但我不能应用约束，因为重复的值已经存在，违反这一约束。

有没有办法找到所有的副本?

当前回答

受到Sandro Wiggers的启发，我做了一些类似的事情

WITH ordered AS ( 
  SELECT id,year, user_id, sid, cid,
    rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk 
  FROM user_links 
), 
to_delete AS ( 
  SELECT id
  FROM   ordered 
  WHERE  rnk > 1
) 
DELETE 
FROM user_links
USING to_delete 
WHERE user_link.id = to_delete.id;

如果你想测试它，稍微改变一下:

WITH ordered AS ( 
  SELECT id,year, user_id, sid, cid,
    rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk 
  FROM user_links 
), 
to_delete AS ( 
  SELECT id,year,user_id,sid, cid
  FROM   ordered 
  WHERE  rnk > 1
) 
SELECT * FROM to_delete;

这将给出将要删除的内容的概述(在运行删除时，在to_delete查询中保留year,user_id,sid,cid是没有问题的，但随后它们就不需要了)

2022-01-17 18:48:48

其他回答

遵循SQL语法可以在检查重复行的时候提供更好的性能。

SELECT id, count(id)
FROM table1
GROUP BY id
HAVING count(id) > 1

2023-01-23 04:21:26

为了方便起见，我假设您希望仅对列year应用唯一约束，并且主键是名为id的列。

为了找到重复的值，您应该运行，

SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);

使用上面的sql语句，您将得到一个包含表中所有重复年份的表。为了删除除最新重复条目外的所有重复条目，您应该使用上面的sql语句。

DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;

2019-12-02 16:02:44

基本思想将使用一个嵌套查询计数聚合:

select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1

您可以调整内部查询中的where子句来缩小搜索范围。

评论中提到了另一个很好的解决方案，(但不是每个人都读它们):

select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1

或更短:

SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1

2015-01-26 19:16:02

begin;
create table user_links(id serial,year bigint, user_id bigint, sid bigint, cid bigint);
insert into  user_links(year, user_id, sid, cid) values (null,null,null,null),
 (null,null,null,null), (null,null,null,null),
 (1,2,3,4), (1,2,3,4),
 (1,2,3,4),(1,1,3,8),
 (1,1,3,9),
 (1,null,null,null),(1,null,null,null);
commit;

使用distinct和except设置操作。

(select id, year, user_id, sid, cid from user_links order by 1)
except
select distinct on (year, user_id, sid, cid) id, year, user_id, sid, cid 
from user_links order by 1;

除了所有工作。因为id serial使所有行都是唯一的。

(select id, year, user_id, sid, cid from user_links order by 1) 
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid  from user_links order by 1;

到目前为止适用于空值和非空值。删除:

with a as(
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid  from user_links order by 1)
delete from user_links using a  where user_links.id = a.id returning *;

2022-04-04 06:49:16

从“用PostgreSQL查找重复行”这里有一个聪明的解决方案:

select * from (
  SELECT id,
  ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
  FROM tbl
) dups
where 
dups.Row > 1

2015-08-28 07:26:25

如何在PostgreSQL中找到重复的记录

推荐文章

最新文章

标签