是否有理由更喜欢使用map()而不是列表理解,反之亦然?它们中的任何一个通常比另一个更有效或被认为更python化吗?


当前回答

我尝试了@alex-martelli的代码,但发现了一些差异

python -mtimeit -s "xs=range(123456)" "map(hex, xs)"
1000000 loops, best of 5: 218 nsec per loop

python -mtimeit -s "xs=range(123456)" "[hex(x) for x in xs]"
10 loops, best of 5: 19.4 msec per loop

即使对于非常大的范围,Map也需要相同的时间,而从我的代码中可以明显看出,使用列表理解需要花费大量时间。所以除了被认为是“非python的”,我没有遇到任何与map使用有关的性能问题。

其他回答

情况下

Common case: Almost always, you will want to use a list comprehension in python because it will be more obvious what you're doing to novice programmers reading your code. (This does not apply to other languages, where other idioms may apply.) It will even be more obvious what you're doing to python programmers, since list comprehensions are the de-facto standard in python for iteration; they are expected. Less-common case: However if you already have a function defined, it is often reasonable to use map, though it is considered 'unpythonic'. For example, map(sum, myLists) is more elegant/terse than [sum(x) for x in myLists]. You gain the elegance of not having to make up a dummy variable (e.g. sum(x) for x... or sum(_) for _... or sum(readableName) for readableName...) which you have to type twice, just to iterate. The same argument holds for filter and reduce and anything from the itertools module: if you already have a function handy, you could go ahead and do some functional programming. This gains readability in some situations, and loses it in others (e.g. novice programmers, multiple arguments)... but the readability of your code highly depends on your comments anyway. Almost never: You may want to use the map function as a pure abstract function while doing functional programming, where you're mapping map, or currying map, or otherwise benefit from talking about map as a function. In Haskell for example, a functor interface called fmap generalizes mapping over any data structure. This is very uncommon in python because the python grammar compels you to use generator-style to talk about iteration; you can't generalize it easily. (This is sometimes good and sometimes bad.) You can probably come up with rare python examples where map(f, *lists) is a reasonable thing to do. The closest example I can come up with would be sumEach = partial(map,sum), which is a one-liner that is very roughly equivalent to:

def sumEach(myLists):
    return [sum(_) for _ in myLists]

Just using a for-loop: You can also of course just use a for-loop. While not as elegant from a functional-programming viewpoint, sometimes non-local variables make code clearer in imperative programming languages such as python, because people are very used to reading code that way. For-loops are also, generally, the most efficient when you are merely doing any complex operation that is not building a list like list-comprehensions and map are optimized for (e.g. summing, or making a tree, etc.) -- at least efficient in terms of memory (not necessarily in terms of time, where I'd expect at worst a constant factor, barring some rare pathological garbage-collection hiccuping).

“蟒蛇主义”

我不喜欢“pythonic”这个词,因为我觉得pythonic在我眼里并不总是优雅的。然而,map和filter以及类似的函数(比如非常有用的itertools模块)在风格上可能被认为是非python的。

懒惰

在效率方面,像大多数函数式编程结构一样,MAP可以是LAZY,实际上在python中是LAZY。这意味着你可以这样做(在python3中),你的计算机将不会耗尽内存,丢失所有未保存的数据:

>>> map(str, range(10**100))
<map object at 0x2201d50>

试着用一个列表理解来做这个:

>>> [str(n) for n in range(10**100)]
# DO NOT TRY THIS AT HOME OR YOU WILL BE SAD #

请注意,列表推导式本身也是惰性的,但python选择将它们实现为非惰性的。尽管如此,python确实支持生成器表达式形式的惰性列表推导,如下所示:

>>> (str(n) for n in range(10**100))
<generator object <genexpr> at 0xacbdef>

你基本上可以想到[…]将生成器表达式传递给列表构造函数,如list(x for x in range(5))。

简单的例子

from operator import neg
print({x:x**2 for x in map(neg,range(5))})

print({x:x**2 for x in [-y for y in range(5)]})

print({x:x**2 for x in (-y for y in range(5))})

列表推导式是非惰性的,因此可能需要更多内存(除非使用生成器推导式)。方括号[…]通常会让事情变得很明显,尤其是当括号乱七八糟的时候。另一方面,有时您最终会变得冗长,例如在....中键入[x for x只要保持迭代器变量短小,如果不缩进代码,列表推导式通常会更清晰。但是你总是可以缩进你的代码。

print(
    {x:x**2 for x in (-y for y in range(5))}
)

或者分手:

rangeNeg5 = (-y for y in range(5))
print(
    {x:x**2 for x in rangeNeg5}
)

python3的效率比较

地图现在是惰性的:

% python3 -mtimeit -s 'xs=range(1000)' 'f=lambda x:x' 'z=map(f,xs)'
1000000 loops, best of 3: 0.336 usec per loop            ^^^^^^^^^

Therefore if you will not be using all your data, or do not know ahead of time how much data you need, map in python3 (and generator expressions in python2 or python3) will avoid calculating their values until the last moment necessary. Usually this will usually outweigh any overhead from using map. The downside is that this is very limited in python as opposed to most functional languages: you only get this benefit if you access your data left-to-right "in order", because python generator expressions can only be evaluated the order x[0], x[1], x[2], ....

然而,假设我们有一个预先创建的函数f,我们想要映射,我们忽略了map的惰性,立即强制用list(…)求值。我们得到了一些非常有趣的结果:

% python3 -mtimeit -s 'xs=range(1000)' 'f=lambda x:x' 'z=list(map(f,xs))'                                                                                                                                                
10000 loops, best of 3: 165/124/135 usec per loop        ^^^^^^^^^^^^^^^
                    for list(<map object>)

% python3 -mtimeit -s 'xs=range(1000)' 'f=lambda x:x' 'z=[f(x) for x in xs]'                                                                                                                                      
10000 loops, best of 3: 181/118/123 usec per loop        ^^^^^^^^^^^^^^^^^^
                    for list(<generator>), probably optimized

% python3 -mtimeit -s 'xs=range(1000)' 'f=lambda x:x' 'z=list(f(x) for x in xs)'                                                                                                                                    
1000 loops, best of 3: 215/150/150 usec per loop         ^^^^^^^^^^^^^^^^^^^^^^
                    for list(<generator>)

In results are in the form AAA/BBB/CCC where A was performed with on a circa-2010 Intel workstation with python 3.?.?, and B and C were performed with a circa-2013 AMD workstation with python 3.2.1, with extremely different hardware. The result seems to be that map and list comprehensions are comparable in performance, which is most strongly affected by other random factors. The only thing we can tell seems to be that, oddly, while we expect list comprehensions [...] to perform better than generator expressions (...), map is ALSO more efficient that generator expressions (again assuming that all values are evaluated/used).

重要的是要意识到这些测试假设一个非常简单的函数(恒等函数);不过,这也没关系,因为如果函数很复杂,那么与程序中的其他因素相比,性能开销就可以忽略不计了。(用f= x:x+x等简单的东西进行测试可能仍然很有趣)

如果你擅长阅读python程序集,你可以使用dis模块来查看幕后是否真的发生了什么:

>>> listComp = compile('[f(x) for x in xs]', 'listComp', 'eval')
>>> dis.dis(listComp)
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x2511a48, file "listComp", line 1>) 
              3 MAKE_FUNCTION            0 
              6 LOAD_NAME                0 (xs) 
              9 GET_ITER             
             10 CALL_FUNCTION            1 
             13 RETURN_VALUE         
>>> listComp.co_consts
(<code object <listcomp> at 0x2511a48, file "listComp", line 1>,)
>>> dis.dis(listComp.co_consts[0])
  1           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                18 (to 27) 
              9 STORE_FAST               1 (x) 
             12 LOAD_GLOBAL              0 (f) 
             15 LOAD_FAST                1 (x) 
             18 CALL_FUNCTION            1 
             21 LIST_APPEND              2 
             24 JUMP_ABSOLUTE            6 
        >>   27 RETURN_VALUE

 

>>> listComp2 = compile('list(f(x) for x in xs)', 'listComp2', 'eval')
>>> dis.dis(listComp2)
  1           0 LOAD_NAME                0 (list) 
              3 LOAD_CONST               0 (<code object <genexpr> at 0x255bc68, file "listComp2", line 1>) 
              6 MAKE_FUNCTION            0 
              9 LOAD_NAME                1 (xs) 
             12 GET_ITER             
             13 CALL_FUNCTION            1 
             16 CALL_FUNCTION            1 
             19 RETURN_VALUE         
>>> listComp2.co_consts
(<code object <genexpr> at 0x255bc68, file "listComp2", line 1>,)
>>> dis.dis(listComp2.co_consts[0])
  1           0 LOAD_FAST                0 (.0) 
        >>    3 FOR_ITER                17 (to 23) 
              6 STORE_FAST               1 (x) 
              9 LOAD_GLOBAL              0 (f) 
             12 LOAD_FAST                1 (x) 
             15 CALL_FUNCTION            1 
             18 YIELD_VALUE          
             19 POP_TOP              
             20 JUMP_ABSOLUTE            3 
        >>   23 LOAD_CONST               0 (None) 
             26 RETURN_VALUE

 

>>> evalledMap = compile('list(map(f,xs))', 'evalledMap', 'eval')
>>> dis.dis(evalledMap)
  1           0 LOAD_NAME                0 (list) 
              3 LOAD_NAME                1 (map) 
              6 LOAD_NAME                2 (f) 
              9 LOAD_NAME                3 (xs) 
             12 CALL_FUNCTION            2 
             15 CALL_FUNCTION            1 
             18 RETURN_VALUE 

似乎用……更好。语法比list(…)遗憾的是,map类对分解来说有点不透明,但我们可以通过速度测试来实现。

在某些情况下,Map可能会快得多(当您没有为此目的而使用lambda,而是在Map和列表推导中使用相同的函数时)。在其他情况下,列表推导式可能更快,大多数(不是所有)python主义者认为它们更直接、更清晰。

当使用完全相同的函数时,map的微小速度优势的例子:

$ python -m timeit -s'xs=range(10)' 'map(hex, xs)'
100000 loops, best of 3: 4.86 usec per loop

$ python -m timeit -s'xs=range(10)' '[hex(x) for x in xs]'
100000 loops, best of 3: 5.58 usec per loop

当map需要lambda时,性能比较完全颠倒的示例:

$ python -m timeit -s'xs=range(10)' 'map(lambda x: x+2, xs)'
100000 loops, best of 3: 4.24 usec per loop

$ python -m timeit -s'xs=range(10)' '[x+2 for x in xs]'
100000 loops, best of 3: 2.32 usec per loop

我用perfplot(我的一个项目)计算了一些结果。

正如其他人所注意到的,map实际上只返回一个迭代器,因此它是一个常量时间操作。当通过list()实现迭代器时,它与列表推导式相当。根据不同的表达方式,任何一种都可能有轻微的优势,但并不显著。

注意,像x ** 2这样的算术运算在NumPy中要快得多,特别是如果输入数据已经是NumPy数组的话。

hex:

X ** 2:


代码重现图:

import perfplot


def standalone_map(data):
    return map(hex, data)


def list_map(data):
    return list(map(hex, data))


def comprehension(data):
    return [hex(x) for x in data]


b = perfplot.bench(
    setup=lambda n: list(range(n)),
    kernels=[standalone_map, list_map, comprehension],
    n_range=[2 ** k for k in range(20)],
    equality_check=None,
)
b.save("out.png")
b.show()
import perfplot
import numpy as np


def standalone_map(data):
    return map(lambda x: x ** 2, data[0])


def list_map(data):
    return list(map(lambda x: x ** 2, data[0]))


def comprehension(data):
    return [x ** 2 for x in data[0]]


def numpy_asarray(data):
    return np.asarray(data[0]) ** 2


def numpy_direct(data):
    return data[1] ** 2


b = perfplot.bench(
    setup=lambda n: (list(range(n)), np.arange(n)),
    kernels=[standalone_map, list_map, comprehension, numpy_direct, numpy_asarray],
    n_range=[2 ** k for k in range(20)],
    equality_check=None,
)
b.save("out2.png")
b.show()

实际上,在Python 3语言中,map和list推导式的行为非常不同。看一下下面的Python 3程序:

def square(x):
    return x*x
squares = map(square, [1, 2, 3])
print(list(squares))
print(list(squares))

你可能希望它打印“[1,4,9]”这一行两次,但实际上它打印的是“[1,4,9]”后面跟着“[]”。当你第一次看到正方形时,它似乎表现为一个由三个元素组成的序列,但第二次则是一个空的序列。

在Python 2语言中,map返回一个普通的旧列表,就像两种语言中的列表推导一样。关键是Python 3中的map(以及Python 2中的imap)的返回值不是一个列表——它是一个迭代器!

与遍历列表不同,元素是在遍历迭代器时使用的。这就是为什么在最后一个print(list(squares))行中squares看起来是空的。

总结:

在处理迭代器时,必须记住它们是有状态的,并且在遍历时发生变化。 列表更容易预测,因为只有当你显式地改变它们时,它们才会改变;它们是容器。 还有一个好处:数字、字符串和元组甚至更可预测,因为它们根本不能改变;它们是价值观。

我运行了一个快速测试,比较了调用对象方法的三种方法。在这种情况下,时间差可以忽略不计,这是函数的问题(参见@Alex Martelli的回复)。在这里,我研究了以下方法:

# map_lambda
list(map(lambda x: x.add(), vals))

# map_operator
from operator import methodcaller
list(map(methodcaller("add"), vals))

# map_comprehension
[x.add() for x in vals]

我查看了整数(Python int)和浮点数(Python float)的列表(存储在变量vals中),以增加列表的大小。考虑以下虚拟类DummyNum:

class DummyNum(object):
    """Dummy class"""
    __slots__ = 'n',

    def __init__(self, n):
        self.n = n

    def add(self):
        self.n += 5

具体来说,就是add方法。__slots__属性是Python中的一个简单优化,用于定义类(属性)所需的总内存,减少内存大小。 这里是结果图。

如前所述,所使用的技术只会产生最小的差异,您应该以对您来说最易读的方式进行编码,或者在特定的情况下进行编码。在这种情况下,列表理解(map_comprehension技术)对于对象中的两种类型的添加是最快的,特别是对于较短的列表。

访问这个粘贴文件以获取用于生成图表和数据的源。