PEP 8规定:

导入总是放在文件的顶部,就在任何模块注释和文档字符串之后,在模块全局变量和常量之前。

然而,如果我导入的类/方法/函数只在很少的情况下使用,那么在需要时进行导入肯定会更有效吗?

这不是:

class SomeClass(object):

    def not_often_called(self)
        from datetime import datetime
        self.datetime = datetime.now()

比这更有效率?

from datetime import datetime

class SomeClass(object):

    def not_often_called(self)
        self.datetime = datetime.now()

当前回答

Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.

将这些信息与可读性参数结合起来,我会说import语句最好在模块范围内。

其他回答

有趣的是,到目前为止,没有一个回答提到并行处理,当序列化的函数代码被推到其他核心时,可能需要将导入放在函数中,例如在ipyparallel的情况下。

I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.

所以我的答案是否定的,不要总是把导入放在模块的顶部。

在函数中导入变量/局部作用域可以提高性能。这取决于函数中导入对象的使用情况。如果你多次循环并访问一个模块全局对象,将它导入为本地会有帮助。

test.py

X=10
Y=11
Z=12
def add(i):
  i = i + 10

runlocal.py

from test import add, X, Y, Z

    def callme():
      x=X
      y=Y
      z=Z
      ladd=add 
      for i  in range(100000000):
        ladd(i)
        x+y+z

    callme()

run.py

from test import add, X, Y, Z

def callme():
  for i in range(100000000):
    add(i)
    X+Y+Z

callme()

在Linux上的时间显示了一个小的增益

/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 
    0:17.80 real,   17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 
    0:14.23 real,   14.22 user, 0.01 sys

真实的是挂钟。用户是程序中的时间。Sys是系统调用的时间。

https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names

我很惊讶没有看到重复负载检查的实际成本数字,尽管有很多很好的解释。

如果你在顶部导入,不管发生什么,你都要加载命中。这非常小,但通常是毫秒级,而不是纳秒级。

If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as @aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).

下面是一个简单测试的结果,该测试从函数内部导入了一些内容。报告的时间(在2.3 GHz Intel Core i7上的Python 2.7.14中)如下所示(第2个调用比后面的调用多似乎是一致的,尽管我不知道为什么)。

 0 foo:   14429.0924 µs
 1 foo:      63.8962 µs
 2 foo:      10.0136 µs
 3 foo:       7.1526 µs
 4 foo:       7.8678 µs
 0 bar:       9.0599 µs
 1 bar:       6.9141 µs
 2 bar:       7.1526 µs
 3 bar:       7.8678 µs
 4 bar:       7.1526 µs

代码:

from __future__ import print_function
from time import time

def foo():
    import collections
    import re
    import string
    import math
    import subprocess
    return

def bar():
    import collections
    import re
    import string
    import math
    import subprocess
    return

t0 = time()
for i in xrange(5):
    foo()
    t1 = time()
    print("    %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1
for i in xrange(5):
    bar()
    t1 = time()
    print("    %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
    t0 = t1

当函数被调用0次或1次时,第一种变体确实比第二种更有效。然而,对于第二次和后续调用,“导入每个调用”方法实际上效率较低。请参阅此链接,了解一种通过“惰性导入”将两种方法的优点结合起来的惰性加载技术。

但除了效率之外,还有其他原因可以解释为什么你会更喜欢其中一种。一种方法是让阅读代码的人更清楚地了解这个模块所具有的依赖关系。它们也有非常不同的失败特征——如果没有“datetime”模块,第一个将在加载时失败,而第二个直到方法被调用才会失败。

补充说明:在IronPython中,导入可能比在CPython中要昂贵一些,因为代码基本上是在导入时被编译的。