我有以下UPSERT在PostgreSQL 9.5:

INSERT INTO chats ("user", "contact", "name") 
           VALUES ($1, $2, $3), 
                  ($2, $1, NULL) 
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id;

如果没有冲突,则返回如下内容:

----------
    | id |
----------
  1 | 50 |
----------
  2 | 51 |
----------

但是如果有冲突,它不会返回任何行:

----------
    | id |
----------

如果没有冲突,我想返回新的id列,或者返回冲突列的现有id列。 这能做到吗?如果有,怎么做?


当前回答

根据Erwin上面的回答(顺便说一句,这是一个很棒的答案,如果没有它,我永远不会走到这里!),这就是我最终的结果。它解决了两个额外的潜在问题——通过对输入集执行不同的选择,它允许重复(否则会抛出错误),并且它确保返回的id与输入集完全匹配,包括相同的顺序和允许重复。

Additionally, and one part that was important for me, it significantly reduces the number of unnecessary sequence advancements using the new_rows CTE to only try inserting the ones that aren't already in there. Considering the possibility of concurrent writes, it will still hit some conflicts in that reduced set, but the later steps will take care of that. In most cases, sequence gaps aren't a big deal, but when you're doing billions of upserts, with a high percentage of conflicts, it can make the difference between using an int or a bigint for the ID.

尽管它又大又丑,但性能非常好。我用数百万个upserts、高并发性和大量冲突对它进行了广泛测试。坚如磐石。

我将其打包为一个函数,但如果这不是您想要的,那么应该很容易看到如何将其转换为纯SQL。我还将示例数据更改为简单的内容。

CREATE TABLE foo
(
  bar varchar PRIMARY KEY,
  id  serial
);
CREATE TYPE ids_type AS (id integer);
CREATE TYPE bars_type AS (bar varchar);

CREATE OR REPLACE FUNCTION upsert_foobars(_vals bars_type[])
  RETURNS SETOF ids_type AS
$$
BEGIN
  RETURN QUERY
    WITH
      all_rows AS (
        SELECT bar, ordinality
        FROM UNNEST(_vals) WITH ORDINALITY
      ),
      dist_rows AS (
        SELECT DISTINCT bar
        FROM all_rows
      ),
      new_rows AS (
        SELECT d.bar
        FROM dist_rows d
             LEFT JOIN foo f USING (bar)
        WHERE f.bar IS NULL
      ),
      ins AS (
        INSERT INTO foo (bar)
          SELECT bar
          FROM new_rows
          ORDER BY bar
          ON CONFLICT DO NOTHING
          RETURNING bar, id
      ),
      sel AS (
        SELECT bar, id
        FROM ins
        UNION ALL
        SELECT f.bar, f.id
        FROM dist_rows
             JOIN foo f USING (bar)
      ),
      ups AS (
        INSERT INTO foo AS f (bar)
          SELECT d.bar
          FROM dist_rows d
               LEFT JOIN sel s USING (bar)
          WHERE s.bar IS NULL
          ORDER BY bar
          ON CONFLICT ON CONSTRAINT foo_pkey DO UPDATE
            SET bar = f.bar
          RETURNING bar, id
      ),
      fin AS (
        SELECT bar, id
        FROM sel
        UNION ALL
        TABLE ups
      )
    SELECT f.id
    FROM all_rows a
         JOIN fin f USING (bar)
    ORDER BY a.ordinality;
END
$$ LANGUAGE plpgsql;

其他回答

Upsert是INSERT查询的扩展,在约束冲突的情况下可以用两种不同的行为定义:DO NOTHING或DO UPDATE。

INSERT INTO upsert_table VALUES (2, 6, 'upserted')
   ON CONFLICT DO NOTHING RETURNING *;

 id | sub_id | status
----+--------+--------
 (0 rows)

还要注意,RETURNING没有返回任何东西,因为没有插入任何元组。现在使用DO UPDATE,可以在与之冲突的元组上执行操作。首先请注意,定义一个约束是很重要的,它将用于定义存在冲突。

INSERT INTO upsert_table VALUES (2, 2, 'inserted')
   ON CONFLICT ON CONSTRAINT upsert_table_sub_id_key
   DO UPDATE SET status = 'upserted' RETURNING *;

 id | sub_id |  status
----+--------+----------
  2 |      2 | upserted
(1 row)

如果您只想插入一行

然后你可以使用一个简单的EXISTS检查来简化事情:

WITH
  extant AS (
    SELECT id FROM chats WHERE ("user", "contact") = ($1, $2)
  ),
  inserted AS (
    INSERT INTO chats ("user", "contact", "name")
    SELECT $1, $2, $3
    WHERE NOT EXISTS (SELECT NULL FROM extant)
    RETURNING id
  )
SELECT id FROM inserted
UNION ALL
SELECT id FROM extant

因为没有ON CONFLICT子句,所以没有更新——只有一个插入,而且只有在必要时才有。所以没有不必要的更新,没有不必要的写锁,没有不必要的序列增量。也不需要强制转换。

如果写锁是用例中的一个特性,则可以在现有表达式中使用SELECT FOR UPDATE。

如果你需要知道是否插入了新行,你可以在顶级的UNION中添加一个标志列:

SELECT id, TRUE AS inserted FROM inserted
UNION ALL
SELECT id, FALSE FROM extant

我也遇到了同样的问题,我使用“做更新”而不是“什么都不做”来解决它,尽管我没有什么可更新的。在你的情况下,它会是这样的:

INSERT INTO chats ("user", "contact", "name") 
       VALUES ($1, $2, $3), 
              ($2, $1, NULL) 
ON CONFLICT("user", "contact") 
DO UPDATE SET 
    name=EXCLUDED.name 
RETURNING id;

这个查询将返回所有的行,无论它们是刚刚插入的还是之前已经存在的。

最简单、最高效的解决方案是

BEGIN;

INSERT INTO chats ("user", contact, name) 
    VALUES ($1, $2, $3), ($2, $1, NULL) 
ON CONFLICT ("user", contact) DO UPDATE
  SET name = excluded.name
  WHERE false
RETURNING id;

SELECT id
FROM chats
WHERE (user, contact) IN (($1, $2), ($2, $1));

COMMIT;

DO UPDATE WHERE false锁定但不更新行,这是一个特性,而不是一个错误,因为它确保了另一个事务不能删除行。

有些注释想要区分更新的行和创建的行。

在这种情况下,只需将创建的txid_current() = xmin AS添加到select。

对于单个项的插入,我可能会在返回id时使用一个coalesce:

WITH new_chats AS (
    INSERT INTO chats ("user", "contact", "name")
    VALUES ($1, $2, $3)
    ON CONFLICT("user", "contact") DO NOTHING
    RETURNING id
) SELECT COALESCE(
    (SELECT id FROM new_chats),
    (SELECT id FROM chats WHERE user = $1 AND contact = $2)
);

对于多个项的插入,你可以把这些值放在一个临时WITH中,并在以后引用它们:

WITH chats_values("user", "contact", "name") AS (
    VALUES ($1, $2, $3),
           ($4, $5, $6)
), new_chats AS (
    INSERT INTO chats ("user", "contact", "name")
    SELECT * FROM chat_values
    ON CONFLICT("user", "contact") DO NOTHING
    RETURNING id
) SELECT id
    FROM new_chats
   UNION
  SELECT chats.id
    FROM chats, chats_values
   WHERE chats.user = chats_values.user
     AND chats.contact = chats_values.contact;

注意:根据Erwin的评论,在您的应用程序将尝试'upsert'相同的数据并发(两个worker尝试同时插入<unique_field> = 1),并且这样的数据还不存在于表中,您应该在运行'upsert'之前更改事务的隔离级别:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

在这种特定情况下,两个事务中的一个将被中止。如果这种情况在您的应用程序中经常发生,您可能希望只执行2个单独的查询,否则,处理错误并重新执行查询会更容易、更快。