我有一个PostgreSQL数据库表称为“user_links”,目前允许以下重复字段:
year, user_id, sid, cid
唯一的约束目前是第一个字段称为“id”,但我现在希望添加一个约束,以确保年份,user_id, sid和cid都是唯一的,但我不能应用约束,因为重复的值已经存在,违反这一约束。
有没有办法找到所有的副本?
我有一个PostgreSQL数据库表称为“user_links”,目前允许以下重复字段:
year, user_id, sid, cid
唯一的约束目前是第一个字段称为“id”,但我现在希望添加一个约束,以确保年份,user_id, sid和cid都是唯一的,但我不能应用约束,因为重复的值已经存在,违反这一约束。
有没有办法找到所有的副本?
当前回答
从“用PostgreSQL查找重复行”这里有一个聪明的解决方案:
select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
FROM tbl
) dups
where
dups.Row > 1
其他回答
在您的情况下,由于限制,您需要删除重复的记录。
查找重复的行 根据created_at日期组织它们——在本例中,我保留了最老的日期 使用USING删除记录以过滤正确的行
WITH duplicated AS (
SELECT id,
count(*)
FROM products
GROUP BY id
HAVING count(*) > 1),
ordered AS (
SELECT p.id,
created_at,
rank() OVER (partition BY p.id ORDER BY p.created_at) AS rnk
FROM products o
JOIN duplicated d ON d.id = p.id ),
products_to_delete AS (
SELECT id,
created_at
FROM ordered
WHERE rnk = 2
)
DELETE
FROM products
USING products_to_delete
WHERE products.id = products_to_delete.id
AND products.created_at = products_to_delete.created_at;
为了方便起见,我假设您希望仅对列year应用唯一约束,并且主键是名为id的列。
为了找到重复的值,您应该运行,
SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);
使用上面的sql语句,您将得到一个包含表中所有重复年份的表。为了删除除最新重复条目外的所有重复条目,您应该使用上面的sql语句。
DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;
受到Sandro Wiggers的启发,我做了一些类似的事情
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id
FROM ordered
WHERE rnk > 1
)
DELETE
FROM user_links
USING to_delete
WHERE user_link.id = to_delete.id;
如果你想测试它,稍微改变一下:
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id,year,user_id,sid, cid
FROM ordered
WHERE rnk > 1
)
SELECT * FROM to_delete;
这将给出将要删除的内容的概述(在运行删除时,在to_delete查询中保留year,user_id,sid,cid是没有问题的,但随后它们就不需要了)
begin;
create table user_links(id serial,year bigint, user_id bigint, sid bigint, cid bigint);
insert into user_links(year, user_id, sid, cid) values (null,null,null,null),
(null,null,null,null), (null,null,null,null),
(1,2,3,4), (1,2,3,4),
(1,2,3,4),(1,1,3,8),
(1,1,3,9),
(1,null,null,null),(1,null,null,null);
commit;
使用distinct和except设置操作。
(select id, year, user_id, sid, cid from user_links order by 1)
except
select distinct on (year, user_id, sid, cid) id, year, user_id, sid, cid
from user_links order by 1;
除了所有工作。因为id serial使所有行都是唯一的。
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1;
到目前为止适用于空值和非空值。 删除:
with a as(
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1)
delete from user_links using a where user_links.id = a.id returning *;
从“用PostgreSQL查找重复行”这里有一个聪明的解决方案:
select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
FROM tbl
) dups
where
dups.Row > 1