我有以下UPSERT在PostgreSQL 9.5:

INSERT INTO chats ("user", "contact", "name") 
           VALUES ($1, $2, $3), 
                  ($2, $1, NULL) 
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id;

如果没有冲突,则返回如下内容:

----------
    | id |
----------
  1 | 50 |
----------
  2 | 51 |
----------

但是如果有冲突,它不会返回任何行:

----------
    | id |
----------

如果没有冲突,我想返回新的id列,或者返回冲突列的现有id列。 这能做到吗?如果有,怎么做?


当前回答

如果您只想插入一行

然后你可以使用一个简单的EXISTS检查来简化事情:

WITH
  extant AS (
    SELECT id FROM chats WHERE ("user", "contact") = ($1, $2)
  ),
  inserted AS (
    INSERT INTO chats ("user", "contact", "name")
    SELECT $1, $2, $3
    WHERE NOT EXISTS (SELECT NULL FROM extant)
    RETURNING id
  )
SELECT id FROM inserted
UNION ALL
SELECT id FROM extant

因为没有ON CONFLICT子句,所以没有更新——只有一个插入,而且只有在必要时才有。所以没有不必要的更新,没有不必要的写锁,没有不必要的序列增量。也不需要强制转换。

如果写锁是用例中的一个特性,则可以在现有表达式中使用SELECT FOR UPDATE。

如果你需要知道是否插入了新行,你可以在顶级的UNION中添加一个标志列:

SELECT id, TRUE AS inserted FROM inserted
UNION ALL
SELECT id, FALSE FROM extant

其他回答

Upsert是INSERT查询的扩展,在约束冲突的情况下可以用两种不同的行为定义:DO NOTHING或DO UPDATE。

INSERT INTO upsert_table VALUES (2, 6, 'upserted')
   ON CONFLICT DO NOTHING RETURNING *;

 id | sub_id | status
----+--------+--------
 (0 rows)

还要注意,RETURNING没有返回任何东西,因为没有插入任何元组。现在使用DO UPDATE,可以在与之冲突的元组上执行操作。首先请注意,定义一个约束是很重要的,它将用于定义存在冲突。

INSERT INTO upsert_table VALUES (2, 2, 'inserted')
   ON CONFLICT ON CONSTRAINT upsert_table_sub_id_key
   DO UPDATE SET status = 'upserted' RETURNING *;

 id | sub_id |  status
----+--------+----------
  2 |      2 | upserted
(1 row)
WITH e AS(
    INSERT INTO chats ("user", "contact", "name") 
           VALUES ($1, $2, $3), 
                  ($2, $1, NULL) 
    ON CONFLICT("user", "contact") DO NOTHING
    RETURNING id
)
SELECT * FROM e
UNION
    SELECT id FROM chats WHERE user=$1, contact=$2;

使用ON冲突DO NOTHING的主要目的是避免抛出错误,但它不会导致行返回。因此,我们需要另一个SELECT来获取现有的id。

在这个SQL中,如果它在冲突上失败,它将不返回任何东西,然后第二个SELECT将获得现有的行;如果插入成功,那么将会有两条相同的记录,然后我们需要UNION来合并结果。

我修改了Erwin Brandstetter的惊人答案,它不会增加序列,也不会写锁任何行。我对PostgreSQL比较陌生,所以如果你看到这个方法的任何缺点,请随时告诉我:

WITH input_rows(usr, contact, name) AS (
   VALUES
      (text 'foo1', text 'bar1', text 'bob1')  -- type casts in first row
    , ('foo2', 'bar2', 'bob2')
    -- more?
   )
, new_rows AS (
   SELECT 
     c.usr
     , c.contact
     , c.name
     , r.id IS NOT NULL as row_exists
   FROM input_rows AS r
   LEFT JOIN chats AS c ON r.usr=c.usr AND r.contact=c.contact
   )
INSERT INTO chats (usr, contact, name)
SELECT usr, contact, name
FROM new_rows
WHERE NOT row_exists
RETURNING id, usr, contact, name

这假设表聊天对列(usr, contact)有唯一的约束。

更新:添加了来自spatar的建议修订(如下)。谢谢!

根据Revinand的评论,还有一个更新:

WITH input_rows(usr, contact, name) AS (
   VALUES
      (text 'foo1', text 'bar1', text 'bob1')  -- type casts in first row
    , ('foo2', 'bar2', 'bob2')
    -- more?
   )
, new_rows AS (
   INSERT INTO chats (usr, contact, name)
   SELECT 
     c.usr
     , c.contact
     , c.name
   FROM input_rows AS r
   LEFT JOIN chats AS c ON r.usr=c.usr AND r.contact=c.contact
   WHERE r.id IS NULL
   RETURNING id, usr, contact, name
   )
SELECT id, usr, contact, name, 'new' as row_type
FROM new_rows
UNION ALL
SELECT id, usr, contact, name, 'update' as row_type
FROM input_rows AS ir
INNER JOIN chats AS c ON ir.usr=c.usr AND ir.contact=c.contact

我没有测试上面的内容,但是如果您发现新插入的行被多次返回,那么您可以将UNION ALL更改为只是UNION,或者(更好),完全删除第一个查询。

我也遇到了同样的问题,我使用“做更新”而不是“什么都不做”来解决它,尽管我没有什么可更新的。在你的情况下,它会是这样的:

INSERT INTO chats ("user", "contact", "name") 
       VALUES ($1, $2, $3), 
              ($2, $1, NULL) 
ON CONFLICT("user", "contact") 
DO UPDATE SET 
    name=EXCLUDED.name 
RETURNING id;

这个查询将返回所有的行,无论它们是刚刚插入的还是之前已经存在的。

根据Erwin上面的回答(顺便说一句,这是一个很棒的答案,如果没有它,我永远不会走到这里!),这就是我最终的结果。它解决了两个额外的潜在问题——通过对输入集执行不同的选择,它允许重复(否则会抛出错误),并且它确保返回的id与输入集完全匹配,包括相同的顺序和允许重复。

Additionally, and one part that was important for me, it significantly reduces the number of unnecessary sequence advancements using the new_rows CTE to only try inserting the ones that aren't already in there. Considering the possibility of concurrent writes, it will still hit some conflicts in that reduced set, but the later steps will take care of that. In most cases, sequence gaps aren't a big deal, but when you're doing billions of upserts, with a high percentage of conflicts, it can make the difference between using an int or a bigint for the ID.

尽管它又大又丑,但性能非常好。我用数百万个upserts、高并发性和大量冲突对它进行了广泛测试。坚如磐石。

我将其打包为一个函数,但如果这不是您想要的,那么应该很容易看到如何将其转换为纯SQL。我还将示例数据更改为简单的内容。

CREATE TABLE foo
(
  bar varchar PRIMARY KEY,
  id  serial
);
CREATE TYPE ids_type AS (id integer);
CREATE TYPE bars_type AS (bar varchar);

CREATE OR REPLACE FUNCTION upsert_foobars(_vals bars_type[])
  RETURNS SETOF ids_type AS
$$
BEGIN
  RETURN QUERY
    WITH
      all_rows AS (
        SELECT bar, ordinality
        FROM UNNEST(_vals) WITH ORDINALITY
      ),
      dist_rows AS (
        SELECT DISTINCT bar
        FROM all_rows
      ),
      new_rows AS (
        SELECT d.bar
        FROM dist_rows d
             LEFT JOIN foo f USING (bar)
        WHERE f.bar IS NULL
      ),
      ins AS (
        INSERT INTO foo (bar)
          SELECT bar
          FROM new_rows
          ORDER BY bar
          ON CONFLICT DO NOTHING
          RETURNING bar, id
      ),
      sel AS (
        SELECT bar, id
        FROM ins
        UNION ALL
        SELECT f.bar, f.id
        FROM dist_rows
             JOIN foo f USING (bar)
      ),
      ups AS (
        INSERT INTO foo AS f (bar)
          SELECT d.bar
          FROM dist_rows d
               LEFT JOIN sel s USING (bar)
          WHERE s.bar IS NULL
          ORDER BY bar
          ON CONFLICT ON CONSTRAINT foo_pkey DO UPDATE
            SET bar = f.bar
          RETURNING bar, id
      ),
      fin AS (
        SELECT bar, id
        FROM sel
        UNION ALL
        TABLE ups
      )
    SELECT f.id
    FROM all_rows a
         JOIN fin f USING (bar)
    ORDER BY a.ordinality;
END
$$ LANGUAGE plpgsql;