我有一个PostgreSQL数据库表称为“user_links”,目前允许以下重复字段:

year, user_id, sid, cid

唯一的约束目前是第一个字段称为“id”,但我现在希望添加一个约束,以确保年份,user_id, sid和cid都是唯一的,但我不能应用约束,因为重复的值已经存在,违反这一约束。

有没有办法找到所有的副本?


当前回答

基本思想将使用一个嵌套查询计数聚合:

select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1

您可以调整内部查询中的where子句来缩小搜索范围。


评论中提到了另一个很好的解决方案,(但不是每个人都读它们):

select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1

或更短:

SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1

其他回答

遵循SQL语法可以在检查重复行的时候提供更好的性能。

SELECT id, count(id)
FROM table1
GROUP BY id
HAVING count(id) > 1

从“用PostgreSQL查找重复行”这里有一个聪明的解决方案:

select * from (
  SELECT id,
  ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
  FROM tbl
) dups
where 
dups.Row > 1

您可以在将被复制的字段上连接到同一个表,然后在id字段上反连接。从第一个表别名(tn1)中选择id字段,然后对第二个表别名的id字段使用array_agg函数。最后,为了使array_agg函数正常工作,您将根据tn1对结果进行分组。id字段。这将产生一个结果集,其中包含记录的id和符合连接条件的所有id的数组。

select tn1.id,
       array_agg(tn2.id) as duplicate_entries, 
from table_name tn1 join table_name tn2 on 
    tn1.year = tn2.year 
    and tn1.sid = tn2.sid 
    and tn1.user_id = tn2.user_id 
    and tn1.cid = tn2.cid
    and tn1.id <> tn2.id
group by tn1.id;

显然,在duplicate_entries数组中的id在结果集中也有自己的条目。你必须使用这个结果集来决定你想要哪个id成为“真相”的来源。唯一不应该被删除的记录。也许你可以这样做:

with dupe_set as (
select tn1.id,
       array_agg(tn2.id) as duplicate_entries, 
from table_name tn1 join table_name tn2 on 
    tn1.year = tn2.year 
    and tn1.sid = tn2.sid 
    and tn1.user_id = tn2.user_id 
    and tn1.cid = tn2.cid
    and tn1.id <> tn2.id
group by tn1.id
order by tn1.id asc)
select ds.id from dupe_set ds where not exists 
 (select de from unnest(ds.duplicate_entries) as de where de < ds.id)

选择具有重复的最小数字ID(假设ID在PK中递增)。这些就是你要保存的ID。

基本思想将使用一个嵌套查询计数聚合:

select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1

您可以调整内部查询中的where子句来缩小搜索范围。


评论中提到了另一个很好的解决方案,(但不是每个人都读它们):

select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1

或更短:

SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1
begin;
create table user_links(id serial,year bigint, user_id bigint, sid bigint, cid bigint);
insert into  user_links(year, user_id, sid, cid) values (null,null,null,null),
 (null,null,null,null), (null,null,null,null),
 (1,2,3,4), (1,2,3,4),
 (1,2,3,4),(1,1,3,8),
 (1,1,3,9),
 (1,null,null,null),(1,null,null,null);
commit;

使用distinct和except设置操作。

(select id, year, user_id, sid, cid from user_links order by 1)
except
select distinct on (year, user_id, sid, cid) id, year, user_id, sid, cid 
from user_links order by 1;

除了所有工作。因为id serial使所有行都是唯一的。

(select id, year, user_id, sid, cid from user_links order by 1) 
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid  from user_links order by 1;

到目前为止适用于空值和非空值。 删除:

with a as(
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid  from user_links order by 1)
delete from user_links using a  where user_links.id = a.id returning *;