我从谷歌搜索开始,找到了如何在标准SQL中写INSERT如果不存在查询,其中讨论了互斥表。
我有一个包含1400万条记录的表。如果我想以相同的格式添加更多的数据,是否有一种方法可以确保我想要插入的记录不存在,而不使用一对查询(即,一个要检查的查询和一个要插入的查询是结果集为空)?
如果字段已经存在,对字段的唯一约束是否保证插入将失败?
似乎只有一个约束,当我通过PHP发出插入时,脚本就会出错。
我从谷歌搜索开始,找到了如何在标准SQL中写INSERT如果不存在查询,其中讨论了互斥表。
我有一个包含1400万条记录的表。如果我想以相同的格式添加更多的数据,是否有一种方法可以确保我想要插入的记录不存在,而不使用一对查询(即,一个要检查的查询和一个要插入的查询是结果集为空)?
如果字段已经存在,对字段的唯一约束是否保证插入将失败?
似乎只有一个约束,当我通过PHP发出插入时,脚本就会出错。
当前回答
解决方案:
INSERT INTO `table` (`value1`, `value2`)
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
WHERE NOT EXISTS (SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1)
解释:
最里面的查询
SELECT * FROM `table`
WHERE `value1`='stuff for value1' AND `value2`='stuff for value2' LIMIT 1
用作WHERE NOT exists -条件检测是否已经存在要插入数据的行。在找到这样的一行之后,查询可能会停止,因此LIMIT 1(微优化,可以省略)。
中间查询
SELECT 'stuff for value1', 'stuff for value2' FROM DUAL
表示要插入的值。DUAL指的是一个特殊的单行一列表,默认存在于所有Oracle数据库中(参见https://en.wikipedia.org/wiki/DUAL_table)。在MySQL-Server 5.7.26版本中,当省略FROM DUAL时,我得到了一个有效的查询,但旧版本(如5.5.60)似乎需要FROM信息。通过使用WHERE NOT EXISTS,如果最里面的查询找到匹配的数据,中间查询将返回一个空结果集。
外部查询
INSERT INTO `table` (`value1`, `value2`)
插入中间查询返回的数据(如果有)。
其他回答
值得注意的是,INSERT IGNORE仍然会增加主键,不管语句是否成功,就像普通的INSERT一样。
这将导致主键之间的间隙,可能会使程序员精神不稳定。或者如果您的应用程序设计得很差,并且依赖于完美的增量主键,这可能会成为一个令人头痛的问题。
查看innodb_autoinc_lock_mode = 0(服务器设置,会有轻微的性能损失),或者先使用SELECT以确保查询不会失败(这也会有性能损失和额外的代码)。
如果可以接受异常,任何简单的约束都可以完成这项工作。例子:
如果不是代理则是主键 列上的唯一约束 多列唯一约束
如果这看起来很简单,我很抱歉。我知道面对你和我们分享的链接看起来很糟糕。, (
但我还是给出了这个答案,因为它似乎满足了你的需要。(如果不是,它可能会触发您更新您的需求,这也是“一件好事”(TM))。
如果插入会破坏数据库唯一约束,则在数据库级别抛出异常,由驱动程序转发。它肯定会因为失败而停止您的脚本。它必须有可能在PHP中解决这种情况…
使用INSERT IGNORE INTO表。
还有INSERT…ON DUPLICATE KEY UPDATE语法,你可以在13.2.6.2 INSERT…打开重复的键更新语句。
根据谷歌的webcache,帖子来自bogdan.org.ua:
18th October 2007 To start: as of the latest MySQL, syntax presented in the title is not possible. But there are several very easy ways to accomplish what is expected using existing functionality. There are 3 possible solutions: using INSERT IGNORE, REPLACE, or INSERT … ON DUPLICATE KEY UPDATE. Imagine we have a table: CREATE TABLE `transcripts` ( `ensembl_transcript_id` varchar(20) NOT NULL, `transcript_chrom_start` int(10) unsigned NOT NULL, `transcript_chrom_end` int(10) unsigned NOT NULL, PRIMARY KEY (`ensembl_transcript_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Now imagine that we have an automatic pipeline importing transcripts meta-data from Ensembl, and that due to various reasons the pipeline might be broken at any step of execution. Thus, we need to ensure two things:
反复执行的管道不会破坏我们的 >数据库
repeated executions will not die due to ‘duplicate > primary key’ errors. Method 1: using REPLACE It’s very simple: REPLACE INTO `transcripts` SET `ensembl_transcript_id` = 'ENSORGT00000000001', `transcript_chrom_start` = 12345, `transcript_chrom_end` = 12678; If the record exists, it will be overwritten; if it does not yet exist, it will be created. However, using this method isn’t efficient for our case: we do not need to overwrite existing records, it’s fine just to skip them. Method 2: using INSERT IGNORE Also very simple: INSERT IGNORE INTO `transcripts` SET `ensembl_transcript_id` = 'ENSORGT00000000001', `transcript_chrom_start` = 12345, `transcript_chrom_end` = 12678; Here, if the ‘ensembl_transcript_id’ is already present in the database, it will be silently skipped (ignored). (To be more precise, here’s a quote from MySQL reference manual: “If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted.”.) If the record doesn’t yet exist, it will be created. This second method has several potential weaknesses, including non-abortion of the query in case any other problem occurs (see the manual). Thus it should be used if previously tested without the IGNORE keyword. Method 3: using INSERT … ON DUPLICATE KEY UPDATE: Third option is to use INSERT … ON DUPLICATE KEY UPDATE syntax, and in the UPDATE part just do nothing do some meaningless (empty) operation, like calculating 0+0 (Geoffray suggests doing the id=id assignment for the MySQL optimization engine to ignore this operation). Advantage of this method is that it only ignores duplicate key events, and still aborts on other errors. As a final notice: this post was inspired by Xaprb. I’d also advise to consult his other post on writing flexible SQL queries.
下面是一个PHP函数,它只在表中不存在所有指定列的值时插入一行。
If one of the columns differ, the row will be added. If the table is empty, the row will be added. If a row exists where all the specified columns have the specified values, the row won't be added. function insert_unique($table, $vars) { if (count($vars)) { $table = mysql_real_escape_string($table); $vars = array_map('mysql_real_escape_string', $vars); $req = "INSERT INTO `$table` (`". join('`, `', array_keys($vars)) ."`) "; $req .= "SELECT '". join("', '", $vars) ."' FROM DUAL "; $req .= "WHERE NOT EXISTS (SELECT 1 FROM `$table` WHERE "; foreach ($vars AS $col => $val) $req .= "`$col`='$val' AND "; $req = substr($req, 0, -5) . ") LIMIT 1"; $res = mysql_query($req) OR die(); return mysql_insert_id(); } return False; }
使用示例:
<?php
insert_unique('mytable', array(
'mycolumn1' => 'myvalue1',
'mycolumn2' => 'myvalue2',
'mycolumn3' => 'myvalue3'
)
);
?>
在没有已知主键的情况下更新或插入
如果你已经有一个唯一的或主键,其他的答案是INSERT INTO…重复密钥更新…或REPLACE INTO…应该工作正常(请注意,如果存在则将replace转换为delete,然后插入-因此不会部分更新现有值)。
但是,如果您有some_column_id和some_type的值,它们的组合已知是唯一的。如果some_value存在,则更新,如果不存在则插入。而且您希望在一个查询中完成它(以避免使用事务)。这可能是一个解决方案:
INSERT INTO my_table (id, some_column_id, some_type, some_value)
SELECT t.id, t.some_column_id, t.some_type, t.some_value
FROM (
SELECT id, some_column_id, some_type, some_value
FROM my_table
WHERE some_column_id = ? AND some_type = ?
UNION ALL
SELECT s.id, s.some_column_id, s.some_type, s.some_value
FROM (SELECT NULL AS id, ? AS some_column_id, ? AS some_type, ? AS some_value) AS s
) AS t
LIMIT 1
ON DUPLICATE KEY UPDATE
some_value = ?
基本上,查询是这样执行的(没有看起来那么复杂):
Select an existing row via the WHERE clause match. Union that result with a potential new row (table s), where the column values are explicitly given (s.id is NULL, so it will generate a new auto-increment identifier). If an existing row is found, then the potential new row from table s is discarded (due to LIMIT 1 on table t), and it will always trigger an ON DUPLICATE KEY which will UPDATE the some_value column. If an existing row is not found, then the potential new row is inserted (as given by table s).
注意:关系数据库中的每个表都应该至少有一个主自动递增id列。如果你没有这个,那就加进去,即使你第一眼不需要。这绝对是这个“把戏”所需要的。