理解Python中的生成器

我正在阅读Python烹饪书，目前正在研究生成器。我觉得很难理解。

由于我有Java背景，那么Java中是否有对等的语言?这本书讲的是“生产者/消费者”，但当我听到这个词时，我想到了线程。

什么是发电机，你为什么要用它?显然，没有引用任何书籍(除非你能直接从一本书中找到一个体面、简单的答案)。如果你慷慨的话，还可以举个例子!

当前回答

性能差异:

macOS Big Sur 11.1
MacBook Pro (13-inch, M1, 2020)
Chip Apple M1
Memory 8gb

案例1

import random
import psutil # pip install psutil
import os
from datetime import datetime


def memory_usage_psutil():
    # return the memory usage in MB
    process = psutil.Process(os.getpid())
    mem = process.memory_info().rss / float(2 ** 20)
    return '{:.2f} MB'.format(mem)


names = ['John', 'Milovan', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']

print('Memory (Before): {}'.format(memory_usage_psutil()))


def people_list(num_people):
    result = []
    for i in range(num_people):
        person = {
            'id': i,
            'name': random.choice(names),
            'major': random.choice(majors)
        }
        result.append(person)
    return result


t1 = datetime.now()
people = people_list(1000000)
t2 = datetime.now()


print('Memory (After) : {}'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2 - t1))

输出:

Memory (Before): 50.38 MB
Memory (After) : 1140.41 MB
Took 0:00:01.056423 Seconds

函数，返回一个包含100万个结果的列表。在底部，我打印出内存使用情况和总时间。基本内存使用大约是50.38兆字节，在我创建了100万条记录的列表之后，你可以看到它增加了近1140.41兆字节，花了1.1秒。

案例2

import random
import psutil # pip install psutil
import os
from datetime import datetime

def memory_usage_psutil():
    # return the memory usage in MB
    process = psutil.Process(os.getpid())
    mem = process.memory_info().rss / float(2 ** 20)
    return '{:.2f} MB'.format(mem)


names = ['John', 'Milovan', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']

print('Memory (Before): {}'.format(memory_usage_psutil()))

def people_generator(num_people):
    for i in range(num_people):
        person = {
            'id': i,
            'name': random.choice(names),
            'major': random.choice(majors)
        }
        yield person


t1 = datetime.now()
people = people_generator(1000000)
t2 = datetime.now()

print('Memory (After) : {}'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2 - t1))

输出:

Memory (Before): 50.52 MB
Memory (After) : 50.73 MB
Took 0:00:00.000008 Seconds

After I ran this that the memory is almost exactly the same and that's because the generator hasn't actually done anything yet it's not holding those million values in memory it's waiting for me to grab the next one. Basically it didn't take any time because as soon as it gets to the first yield statement it stops. I think that it is generator a little bit more readable and it also gives you big performance boosts not only with execution time but with memory. As well and you can still use all of the comprehensions and this generator expression here so you don't lose anything in that area. So those are a few reasons why you would use generators and also some of the advantages that come along with that.

2021-01-09 22:22:21

其他回答

对于Stephan202的回答，我唯一能补充的是建议您看一看David Beazley的PyCon '08演示文稿“生成器技巧给系统程序员”，这是我所见过的关于如何以及为什么使用生成器的最好的解释。这就是让我从“Python看起来很有趣”变成“这就是我一直在寻找的东西”的原因。网址是http://www.dabeaz.com/generators/。

2009-11-18 17:54:00

对于那些具有编程语言和计算背景的人，我喜欢从堆栈框架的角度来描述生成器。

在许多语言中，有一个堆栈在其上面是当前堆栈“帧”。堆栈框架包括分配给函数局部变量的空间，包括传递给该函数的参数。

当你调用一个函数时，当前的执行点(“程序计数器”或类似的东西)被压入堆栈，一个新的堆栈帧被创建。然后执行转移到被调用函数的开始。

对于常规函数，在某个时刻函数返回一个值，堆栈就会“弹出”。函数的堆栈帧将被丢弃，并在之前的位置继续执行。

当函数是生成器时，它可以使用yield语句在不丢弃堆栈帧的情况下返回值。函数中局部变量和程序计数器的值将被保留。这允许生成器在稍后恢复，从yield语句开始继续执行，并且它可以执行更多代码并返回另一个值。

在Python 2.5之前，所有生成器都这样做。Python 2.5还增加了将值传递回生成器的功能。这样，传入的值可以作为yield语句的表达式使用，yield语句从生成器临时返回了控件(和值)。

生成器的关键优势是函数的“状态”被保留，不像常规函数，每次堆栈帧被丢弃，你就会失去所有的“状态”。第二个优点是避免了一些函数调用开销(创建和删除堆栈帧)，尽管这通常是一个次要的优点。

2009-12-19 10:50:33

生成器可以看作是创建迭代器的简写。它们的行为类似于Java迭代器。例子:

>>> g = (x for x in range(10))
>>> g
<generator object <genexpr> at 0x7fac1c1e6aa0>
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> list(g)   # force iterating the rest
[3, 4, 5, 6, 7, 8, 9]
>>> g.next()  # iterator is at the end; calling next again will throw
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

希望这有助于/是你正在寻找的。

更新:

正如许多其他答案所示，有不同的方法来创建生成器。你可以像上面的例子一样使用圆括号语法，也可以使用yield。另一个有趣的特性是生成器可以是“无限的”——迭代器不会停止:

>>> def infinite_gen():
...     n = 0
...     while True:
...         yield n
...         n = n + 1
... 
>>> g = infinite_gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
...

2009-11-18 13:53:35

我给出了这段代码，解释了关于生成器的3个关键概念:

def numbers():
    for i in range(10):
            yield i

gen = numbers() #this line only returns a generator object, it does not run the code defined inside numbers

for i in gen: #we iterate over the generator and the values are printed
    print(i)

#the generator is now empty

for i in gen: #so this for block does not print anything
    print(i)

2020-02-13 21:13:04

Java中没有对等的。

这里有一个有点做作的例子:

#! /usr/bin/python
def  mygen(n):
    x = 0
    while x < n:
        x = x + 1
        if x % 3 == 0:
            yield x

for a in mygen(100):
    print a

生成器中有一个从0到n运行的循环，如果循环变量是3的倍数，则生成该变量。

在for循环的每次迭代中，都会执行生成器。如果这是生成器第一次执行，它将从开始开始，否则它将从上一次生成的时间开始。

2009-11-18 13:58:13

理解Python中的生成器

推荐文章

最新文章

标签