在我的生产错误日志中,我偶尔会看到:

SQLSTATE[HY000]:一般错误:1205 超过锁等待超时;试一试 重新启动事务

我知道哪个查询在那个时刻试图访问数据库,但是是否有一种方法可以找出哪个查询在那个精确的时刻拥有锁?


当前回答

在记录中,锁等待超时异常也发生在出现死锁且MySQL无法检测到它时,因此它只是超时。另一个原因可能是一个运行时间非常长的查询,但是它更容易解决/修复,我在这里不描述这种情况。

如果死锁在两个事务中被“正确地”构造,MySQL通常能够处理死锁。然后MySQL只是杀死/回滚一个拥有较少锁的事务(不太重要,因为它影响的行较少),并让另一个事务完成。

现在,让我们假设有两个进程A和B和3个事务:

Process A Transaction 1: Locks X
Process B Transaction 2: Locks Y
Process A Transaction 3: Needs Y => Waits for Y
Process B Transaction 2: Needs X => Waits for X
Process A Transaction 1: Waits for Transaction 3 to finish

(see the last two paragraph below to specify the terms in more detail)

=> deadlock 

这是一个非常不幸的设置,因为MySQL无法看到有死锁(跨越3个事务)。MySQL做的是…没有什么!它只是等待,因为它不知道该做什么。它等待直到第一个获得的锁超过超时(进程A事务1:锁X),然后这将解锁锁X,从而解锁事务2等。

The art is to find out what (which query) causes the first lock (Lock X). You will be able to see easily (show engine innodb status) that Transaction 3 waits for Transaction 2, but you will not see which transaction Transaction 2 is waiting for (Transaction 1). MySQL will not print any locks or query associated with Transaction 1. The only hint will be that at the very bottom of the transaction list (of the show engine innodb status printout), you will see Transaction 1 apparently doing nothing (but in fact waiting for Transaction 3 to finish).

关于如何找到哪个SQL查询导致锁(lock X)被授予正在等待的给定事务的技术,在此处描述了在长时间运行的事务中跟踪MySQL查询历史

如果您想知道示例中的流程和事务究竟是什么。该进程是一个PHP进程。Transaction是由innodb-trx-table定义的事务。在我的例子中,我有两个PHP进程,在每个进程中我手动启动一个事务。有趣的是,即使我在一个进程中启动了一个事务,MySQL内部实际上使用了两个独立的事务(我不知道为什么,也许MySQL开发人员可以解释)。

MySQL is managing its own transactions internally and decided (in my case) to use two transactions to handle all the SQL requests coming from the PHP process (Process A). The statement that Transaction 1 is waiting for Transaction 3 to finish is an internal MySQL thing. MySQL "knew" the Transaction 1 and Transaction 3 were actually instantiated as part of one "transaction" request (from Process A). Now the whole "transaction" was blocked because Transaction 3 (a subpart of "transaction") was blocked. Because "transaction" was not able to finish the Transaction 1 (also a subpart of the "transaction") was marked as not finished as well. This is what I meant by "Transaction 1 waits for Transaction 3 to finish".

其他回答

暴露这一点的是“事务”这个词。从语句中可以明显看出,该查询试图更改一个或多个InnoDB表中的至少一行。

因为您知道这个查询,所以所有被访问的表都可能是罪魁祸首。

从那里,你应该可以运行SHOW ENGINE INNODB STATUS\G

您应该能够看到受影响的表

你会得到各种额外的锁定和互斥信息。

下面是我的一个客户的例子:

mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
110514 19:44:14 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 4 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 9014315, signal count 7805377
Mutex spin waits 0, rounds 11487096053, OS waits 7756855
RW-shared spins 722142, OS waits 211221; RW-excl spins 787046, OS waits 39353
------------------------
LATEST FOREIGN KEY ERROR
------------------------
110507 21:41:35 Transaction:
TRANSACTION 0 606162814, ACTIVE 0 sec, process no 29956, OS thread id 1223895360 updating or deleting, thread declared inside InnoDB 499
mysql tables in use 1, locked 1
14 lock struct(s), heap size 3024, 8 row lock(s), undo log entries 1
MySQL thread id 3686635, query id 124164167 10.64.89.145 viget updating
DELETE FROM file WHERE file_id in ('6dbafa39-7f00-0001-51f2-412a450be5cc' )
Foreign key constraint fails for table `backoffice`.`attachment`:
,
  CONSTRAINT `attachment_ibfk_2` FOREIGN KEY (`file_id`) REFERENCES `file` (`file_id`)
Trying to delete or update in parent table, in index `PRIMARY` tuple:
DATA TUPLE: 17 fields;
 0: len 36; hex 36646261666133392d376630302d303030312d353166322d343132613435306265356363; asc 6dbafa39-7f00-0001-51f2-412a450be5cc;; 1: len 6; hex 000024214f7e; asc   $!O~;; 2: len 7; hex 000000400217bc; asc    @   ;; 3: len 2; hex 03e9; asc   ;; 4: len 2; hex 03e8; asc   ;; 5: len 36; hex 65666635323863622d376630302d303030312d336632662d353239626433653361333032; asc eff528cb-7f00-0001-3f2f-529bd3e3a302;; 6: len 40; hex 36646234376337652d376630302d303030312d353166322d3431326132346664656366352e6d7033; asc 6db47c7e-7f00-0001-51f2-412a24fdecf5.mp3;; 7: len 21; hex 416e67656c73204e6f7720436f6e666572656e6365; asc Angels Now Conference;; 8: len 34; hex 416e67656c73204e6f7720436f6e666572656e6365204a756c7920392c2032303131; asc Angels Now Conference July 9, 2011;; 9: len 1; hex 80; asc  ;; 10: len 8; hex 8000124a5262bdf4; asc    JRb  ;; 11: len 8; hex 8000124a57669dc3; asc    JWf  ;; 12: SQL NULL; 13: len 5; hex 8000012200; asc    " ;; 14: len 1; hex 80; asc  ;; 15: len 2; hex 83e8; asc   ;; 16: len 4; hex 8000000a; asc     ;;

But in child table `backoffice`.`attachment`, in index `PRIMARY`, there is a record:
PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 30; hex 36646261666133392d376630302d303030312d353166322d343132613435; asc 6dbafa39-7f00-0001-51f2-412a45;...(truncated); 1: len 30; hex 38666164663561652d376630302d303030312d326436612d636164326361; asc 8fadf5ae-7f00-0001-2d6a-cad2ca;...(truncated); 2: len 6; hex 00002297b3ff; asc   "   ;; 3: len 7; hex 80000040070110; asc    @   ;; 4: len 2; hex 0000; asc   ;; 5: len 30; hex 416e67656c73204e6f7720436f6e666572656e636520446f63756d656e74; asc Angels Now Conference Document;;

------------
TRANSACTIONS
------------
Trx id counter 0 620783814
Purge done for trx's n:o < 0 620783800 undo n:o < 0 0
History list length 35
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 29956, OS thread id 1192212800
MySQL thread id 5341758, query id 189708501 127.0.0.1 lwdba
show innodb status
---TRANSACTION 0 620783788, not started, process no 29956, OS thread id 1196472640
MySQL thread id 5341773, query id 189708353 10.64.89.143 viget
---TRANSACTION 0 0, not started, process no 29956, OS thread id 1223895360
MySQL thread id 5341667, query id 189706152 10.64.89.145 viget
---TRANSACTION 0 0, not started, process no 29956, OS thread id 1227888960
MySQL thread id 5341556, query id 189699857 172.16.135.63 lwdba
---TRANSACTION 0 620781112, not started, process no 29956, OS thread id 1222297920
MySQL thread id 5341511, query id 189696265 10.64.89.143 viget
---TRANSACTION 0 620783736, not started, process no 29956, OS thread id 1229752640
MySQL thread id 5339005, query id 189707998 10.64.89.144 viget
---TRANSACTION 0 620783785, not started, process no 29956, OS thread id 1198602560
MySQL thread id 5337583, query id 189708349 10.64.89.145 viget
---TRANSACTION 0 620783469, not started, process no 29956, OS thread id 1224161600
MySQL thread id 5333500, query id 189708478 10.64.89.144 viget
---TRANSACTION 0 620781240, not started, process no 29956, OS thread id 1198336320
MySQL thread id 5324256, query id 189708493 10.64.89.145 viget
---TRANSACTION 0 617458223, not started, process no 29956, OS thread id 1195141440
MySQL thread id 736, query id 175038790 Has read all relay log; waiting for the slave I/O thread to update it
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
519878 OS file reads, 18962880 OS file writes, 13349046 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 6.25 writes/s, 4.50 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 1190, seg size 1192,
174800 inserts, 174800 merged recs, 54439 merges
Hash table size 35401603, node heap has 35160 buffer(s)
0.50 hash searches/s, 11.75 non-hash searches/s
---
LOG
---
Log sequence number 28 1235093534
Log flushed up to   28 1235093534
Last checkpoint at  28 1235091275
0 pending log writes, 0 pending chkp writes
12262564 log i/o's done, 3.25 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 18909316674; in additional pool allocated 1048576
Dictionary memory allocated 2019632
Buffer pool size   1048576
Free buffers       175763
Database pages     837653
Modified db pages  6
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 770138, created 108485, written 7795318
0.00 reads/s, 0.00 creates/s, 4.25 writes/s
Buffer pool hit rate 1000 / 1000
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 29956, id 1185823040, state: sleeping
Number of rows inserted 6453767, updated 4602534, deleted 3638793, read 388349505551
0.25 inserts/s, 1.25 updates/s, 0.00 deletes/s, 2.75 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

1 row in set, 1 warning (0.00 sec)

你应该考虑通过设置innodb_lock_wait_timeout来增加InnoDB的锁等待超时时间,默认值是50秒

mysql> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50    |
+--------------------------+-------+
1 row in set (0.01 sec)

你可以用这一行在/etc/my.cnf中永久地将它设置为更高的值

[mysqld]
innodb_lock_wait_timeout=120

重新启动mysql。如果你不能在这个时候重新启动mysql,运行这个:

SET GLOBAL innodb_lock_wait_timeout = 120; 

您也可以将它设置为会话的持续时间

SET innodb_lock_wait_timeout = 120; 

然后是你的问题

这个异常的最大问题是,它通常在测试环境中不可重现,当它发生在prod上时,我们无法运行innodb引擎状态。所以在其中一个项目中,我把下面的代码放入了这个异常的catch块中。这帮助我在异常发生时捕捉引擎状态。这帮了大忙。

Statement st = con.createStatement();
ResultSet rs =  st.executeQuery("SHOW ENGINE INNODB STATUS");
while(rs.next()){
    log.info(rs.getString(1));
    log.info(rs.getString(2));
    log.info(rs.getString(3));
}

下面是我最终不得不做的事情,以找出是什么“其他查询”导致了锁定超时问题。在应用程序代码中,我们在专用于此任务的单独线程上跟踪所有挂起的数据库调用。如果任何DB调用的时间超过n秒(对我们来说是30秒),我们记录:

-- Pending InnoDB transactions
SELECT * FROM information_schema.innodb_trx ORDER BY trx_started; 

-- Optionally, log what transaction holds what locks
SELECT * FROM information_schema.innodb_locks;

通过上述方法,我们能够精确定位锁定导致死锁的行的并发查询。在我的例子中,它们是像INSERT…与普通SELECT不同,SELECT锁定底层行。然后可以重新组织代码或使用不同的事务隔离(如read uncommitted)。

好运!

正如有人在关于这个问题的众多SO线程中的一个线程中提到的:有时已经锁定表的进程在进程列表中显示为休眠!我非常着急,直到我杀死了数据库中打开的所有睡眠线程(当时没有一个是活动的)。这最终解锁了表并让更新查询运行。

评论者说了类似于“有时MySQL线程锁定了一个表,然后在等待与MySQL无关的事情发生时进入睡眠状态。”

在重新检查了show engine innodb状态日志后(一旦我找到了负责锁的客户端),我注意到被卡住的线程被列在事务列表的最底部,在由于锁冻结而即将出错的活动查询下面:

------------------
---TRANSACTION 2744943820, ACTIVE 1154 sec(!!)
2 lock struct(s), heap size 376, 2 row lock(s), undo log entries 1
MySQL thread id 276558, OS thread handle 0x7f93762e7710, query id 59264109 [ip] [database] cleaning up
Trx read view will not see trx with id >= 2744943821, sees < 2744943821

(不确定“Trx读视图”消息是否与冻结锁相关,但与其他活动事务不同的是,这个事务不显示与发出的查询一起,而是声称事务正在“清理”,但有多个行锁)

这个故事的寓意是,即使线程处于睡眠状态,事务也可以是活动的。

看看pt-deadlock-logger实用程序的手册页:

brew install percona-toolkit
pt-deadlock-logger --ask-pass server_name

它从上面提到的引擎innodb状态中提取信息 它可以用来创建一个守护进程,每30秒运行一次。