除了本地/全局变量存储时间外,操作码预测使函数更快。
正如其他答案所解释的那样,该函数在循环中使用STORE_FAST操作码。下面是函数循环的字节码:
>> 13 FOR_ITER 6 (to 22) # get next value from iterator
16 STORE_FAST 0 (x) # set local variable
19 JUMP_ABSOLUTE 13 # back to FOR_ITER
通常,当程序运行时,Python会一个接一个地执行每个操作码,跟踪a堆栈,并在每个操作码执行后对堆栈帧进行其他检查。操作码预测意味着在某些情况下Python能够直接跳转到下一个操作码,从而避免了一些这种开销。
在这种情况下,每当Python看到FOR_ITER(循环的顶部)时,它将“预测”STORE_FAST是它必须执行的下一个操作码。然后Python查看下一个操作码,如果预测正确,它会直接跳转到STORE_FAST。这可以将两个操作码压缩成一个操作码。
另一方面,STORE_NAME操作码在全局级别的循环中使用。当Python看到这个操作码时,它不会做出类似的预测。相反,它必须返回到求值循环的顶部,这对循环执行的速度有明显的影响。
为了提供关于此优化的更多技术细节,这里引用了ceval.c文件(Python虚拟机的“引擎”):
Some opcodes tend to come in pairs thus making it possible to
predict the second code when the first is run. For example,
GET_ITER is often followed by FOR_ITER. And FOR_ITER is often
followed by STORE_FAST or UNPACK_SEQUENCE.
Verifying the prediction costs a single high-speed test of a register
variable against a constant. If the pairing was good, then the
processor's own internal branch predication has a high likelihood of
success, resulting in a nearly zero-overhead transition to the
next opcode. A successful prediction saves a trip through the eval-loop
including its two unpredictable branches, the HAS_ARG test and the
switch-case. Combined with the processor's internal branch prediction,
a successful PREDICT has the effect of making the two opcodes run as if
they were a single new opcode with the bodies combined.
我们可以在FOR_ITER操作码的源代码中准确地看到对STORE_FAST的预测:
case FOR_ITER: // the FOR_ITER opcode case
v = TOP();
x = (*v->ob_type->tp_iternext)(v); // x is the next value from iterator
if (x != NULL) {
PUSH(x); // put x on top of the stack
PREDICT(STORE_FAST); // predict STORE_FAST will follow - success!
PREDICT(UNPACK_SEQUENCE); // this and everything below is skipped
continue;
}
// error-checking and more code for when the iterator ends normally
PREDICT函数扩展为if (*next_instr == op) goto pred_# #op,即我们只是跳到预测操作码的开始。在本例中,我们跳转到这里:
PREDICTED_WITH_ARG(STORE_FAST);
case STORE_FAST:
v = POP(); // pop x back off the stack
SETLOCAL(oparg, v); // set it as the new local variable
goto fast_next_opcode;
现在设置了局部变量,准备执行下一个操作码。Python继续遍历可迭代对象,直到它到达终点,每次都做出成功的预测。
Python wiki页面有更多关于CPython虚拟机如何工作的信息。