我从谷歌搜索开始,找到了如何在标准SQL中写INSERT如果不存在查询,其中讨论了互斥表。
我有一个包含1400万条记录的表。如果我想以相同的格式添加更多的数据,是否有一种方法可以确保我想要插入的记录不存在,而不使用一对查询(即,一个要检查的查询和一个要插入的查询是结果集为空)?
如果字段已经存在,对字段的唯一约束是否保证插入将失败?
似乎只有一个约束,当我通过PHP发出插入时,脚本就会出错。
我从谷歌搜索开始,找到了如何在标准SQL中写INSERT如果不存在查询,其中讨论了互斥表。
我有一个包含1400万条记录的表。如果我想以相同的格式添加更多的数据,是否有一种方法可以确保我想要插入的记录不存在,而不使用一对查询(即,一个要检查的查询和一个要插入的查询是结果集为空)?
如果字段已经存在,对字段的唯一约束是否保证插入将失败?
似乎只有一个约束,当我通过PHP发出插入时,脚本就会出错。
当前回答
使用INSERT IGNORE INTO表。
还有INSERT…ON DUPLICATE KEY UPDATE语法,你可以在13.2.6.2 INSERT…打开重复的键更新语句。
根据谷歌的webcache,帖子来自bogdan.org.ua:
18th October 2007 To start: as of the latest MySQL, syntax presented in the title is not possible. But there are several very easy ways to accomplish what is expected using existing functionality. There are 3 possible solutions: using INSERT IGNORE, REPLACE, or INSERT … ON DUPLICATE KEY UPDATE. Imagine we have a table: CREATE TABLE `transcripts` ( `ensembl_transcript_id` varchar(20) NOT NULL, `transcript_chrom_start` int(10) unsigned NOT NULL, `transcript_chrom_end` int(10) unsigned NOT NULL, PRIMARY KEY (`ensembl_transcript_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Now imagine that we have an automatic pipeline importing transcripts meta-data from Ensembl, and that due to various reasons the pipeline might be broken at any step of execution. Thus, we need to ensure two things:
反复执行的管道不会破坏我们的 >数据库
repeated executions will not die due to ‘duplicate > primary key’ errors. Method 1: using REPLACE It’s very simple: REPLACE INTO `transcripts` SET `ensembl_transcript_id` = 'ENSORGT00000000001', `transcript_chrom_start` = 12345, `transcript_chrom_end` = 12678; If the record exists, it will be overwritten; if it does not yet exist, it will be created. However, using this method isn’t efficient for our case: we do not need to overwrite existing records, it’s fine just to skip them. Method 2: using INSERT IGNORE Also very simple: INSERT IGNORE INTO `transcripts` SET `ensembl_transcript_id` = 'ENSORGT00000000001', `transcript_chrom_start` = 12345, `transcript_chrom_end` = 12678; Here, if the ‘ensembl_transcript_id’ is already present in the database, it will be silently skipped (ignored). (To be more precise, here’s a quote from MySQL reference manual: “If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted.”.) If the record doesn’t yet exist, it will be created. This second method has several potential weaknesses, including non-abortion of the query in case any other problem occurs (see the manual). Thus it should be used if previously tested without the IGNORE keyword. Method 3: using INSERT … ON DUPLICATE KEY UPDATE: Third option is to use INSERT … ON DUPLICATE KEY UPDATE syntax, and in the UPDATE part just do nothing do some meaningless (empty) operation, like calculating 0+0 (Geoffray suggests doing the id=id assignment for the MySQL optimization engine to ignore this operation). Advantage of this method is that it only ignores duplicate key events, and still aborts on other errors. As a final notice: this post was inspired by Xaprb. I’d also advise to consult his other post on writing flexible SQL queries.
其他回答
在MySQL中,ON DUPLICATE KEY UPDATE或INSERT IGNORE可以是可行的解决方案。
一个基于mysql.com的ON DUPLICATE KEY UPDATE更新示例:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
一个基于mysql.com的INSERT IGNORE示例
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
{VALUES | VALUE} ({expr | DEFAULT},...),(...),...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name
SET col_name={expr | DEFAULT}, ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
Or:
INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
SELECT ...
[ ON DUPLICATE KEY UPDATE
col_name=expr
[, col_name=expr] ... ]
值得注意的是,INSERT IGNORE仍然会增加主键,不管语句是否成功,就像普通的INSERT一样。
这将导致主键之间的间隙,可能会使程序员精神不稳定。或者如果您的应用程序设计得很差,并且依赖于完美的增量主键,这可能会成为一个令人头痛的问题。
查看innodb_autoinc_lock_mode = 0(服务器设置,会有轻微的性能损失),或者先使用SELECT以确保查询不会失败(这也会有性能损失和额外的代码)。
下面是一个PHP函数,它只在表中不存在所有指定列的值时插入一行。
If one of the columns differ, the row will be added. If the table is empty, the row will be added. If a row exists where all the specified columns have the specified values, the row won't be added. function insert_unique($table, $vars) { if (count($vars)) { $table = mysql_real_escape_string($table); $vars = array_map('mysql_real_escape_string', $vars); $req = "INSERT INTO `$table` (`". join('`, `', array_keys($vars)) ."`) "; $req .= "SELECT '". join("', '", $vars) ."' FROM DUAL "; $req .= "WHERE NOT EXISTS (SELECT 1 FROM `$table` WHERE "; foreach ($vars AS $col => $val) $req .= "`$col`='$val' AND "; $req = substr($req, 0, -5) . ") LIMIT 1"; $res = mysql_query($req) OR die(); return mysql_insert_id(); } return False; }
使用示例:
<?php
insert_unique('mytable', array(
'mycolumn1' => 'myvalue1',
'mycolumn2' => 'myvalue2',
'mycolumn3' => 'myvalue3'
)
);
?>
在没有已知主键的情况下更新或插入
如果你已经有一个唯一的或主键,其他的答案是INSERT INTO…重复密钥更新…或REPLACE INTO…应该工作正常(请注意,如果存在则将replace转换为delete,然后插入-因此不会部分更新现有值)。
但是,如果您有some_column_id和some_type的值,它们的组合已知是唯一的。如果some_value存在,则更新,如果不存在则插入。而且您希望在一个查询中完成它(以避免使用事务)。这可能是一个解决方案:
INSERT INTO my_table (id, some_column_id, some_type, some_value)
SELECT t.id, t.some_column_id, t.some_type, t.some_value
FROM (
SELECT id, some_column_id, some_type, some_value
FROM my_table
WHERE some_column_id = ? AND some_type = ?
UNION ALL
SELECT s.id, s.some_column_id, s.some_type, s.some_value
FROM (SELECT NULL AS id, ? AS some_column_id, ? AS some_type, ? AS some_value) AS s
) AS t
LIMIT 1
ON DUPLICATE KEY UPDATE
some_value = ?
基本上,查询是这样执行的(没有看起来那么复杂):
Select an existing row via the WHERE clause match. Union that result with a potential new row (table s), where the column values are explicitly given (s.id is NULL, so it will generate a new auto-increment identifier). If an existing row is found, then the potential new row from table s is discarded (due to LIMIT 1 on table t), and it will always trigger an ON DUPLICATE KEY which will UPDATE the some_value column. If an existing row is not found, then the potential new row is inserted (as given by table s).
注意:关系数据库中的每个表都应该至少有一个主自动递增id列。如果你没有这个,那就加进去,即使你第一眼不需要。这绝对是这个“把戏”所需要的。
INSERT INTO table_name (columns) VALUES (values) ON CONFLICT (id) DO NOTHING;