我正在阅读Python烹饪书,目前正在研究生成器。我觉得很难理解。
由于我有Java背景,那么Java中是否有对等的语言?这本书讲的是“生产者/消费者”,但当我听到这个词时,我想到了线程。
什么是发电机,你为什么要用它?显然,没有引用任何书籍(除非你能直接从一本书中找到一个体面、简单的答案)。如果你慷慨的话,还可以举个例子!
我正在阅读Python烹饪书,目前正在研究生成器。我觉得很难理解。
由于我有Java背景,那么Java中是否有对等的语言?这本书讲的是“生产者/消费者”,但当我听到这个词时,我想到了线程。
什么是发电机,你为什么要用它?显然,没有引用任何书籍(除非你能直接从一本书中找到一个体面、简单的答案)。如果你慷慨的话,还可以举个例子!
当前回答
我给出了这段代码,解释了关于生成器的3个关键概念:
def numbers():
for i in range(10):
yield i
gen = numbers() #this line only returns a generator object, it does not run the code defined inside numbers
for i in gen: #we iterate over the generator and the values are printed
print(i)
#the generator is now empty
for i in gen: #so this for block does not print anything
print(i)
其他回答
使用列表推导式的经验表明,它们在Python中具有广泛的实用性。然而,许多用例不需要在内存中创建一个完整的列表。相反,它们每次只需要迭代一个元素。
例如,下面的求和代码将在内存中构建一个完整的方块列表,遍历这些值,当引用不再需要时,删除列表:
Sum ([x*x for x in range(10)])
通过使用生成器表达式来节省内存:
求和(x*x for x in range(10))
容器对象的构造函数也有类似的好处:
s = Set(word for line in page for word in line.split())
d = dict( (k, func(k)) for k in keylist)
生成器表达式对于sum(), min()和max()这样的函数特别有用,它们将可迭代输入减少为单个值:
max(len(line) for line in file if line.strip())
more
首先,术语生成器最初在Python中定义不清,导致了很多混乱。你可能指的是迭代器和可迭代对象(参见这里)。然后在Python中还有生成器函数(返回生成器对象)、生成器对象(迭代器)和生成器表达式(求值为生成器对象)。
根据generator的术语表条目,现在的官方术语似乎是generator是“generator function”的缩写。在过去,文档对术语的定义不一致,但幸运的是,这个问题已经得到了解决。
在没有进一步说明的情况下,精确地避免使用术语“生成器”可能仍然是一个好主意。
我给出了这段代码,解释了关于生成器的3个关键概念:
def numbers():
for i in range(10):
yield i
gen = numbers() #this line only returns a generator object, it does not run the code defined inside numbers
for i in gen: #we iterate over the generator and the values are printed
print(i)
#the generator is now empty
for i in gen: #so this for block does not print anything
print(i)
生成器可以看作是创建迭代器的简写。它们的行为类似于Java迭代器。例子:
>>> g = (x for x in range(10))
>>> g
<generator object <genexpr> at 0x7fac1c1e6aa0>
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> list(g) # force iterating the rest
[3, 4, 5, 6, 7, 8, 9]
>>> g.next() # iterator is at the end; calling next again will throw
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
希望这有助于/是你正在寻找的。
更新:
正如许多其他答案所示,有不同的方法来创建生成器。你可以像上面的例子一样使用圆括号语法,也可以使用yield。另一个有趣的特性是生成器可以是“无限的”——迭代器不会停止:
>>> def infinite_gen():
... n = 0
... while True:
... yield n
... n = n + 1
...
>>> g = infinite_gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
...
性能差异:
macOS Big Sur 11.1
MacBook Pro (13-inch, M1, 2020)
Chip Apple M1
Memory 8gb
案例1
import random
import psutil # pip install psutil
import os
from datetime import datetime
def memory_usage_psutil():
# return the memory usage in MB
process = psutil.Process(os.getpid())
mem = process.memory_info().rss / float(2 ** 20)
return '{:.2f} MB'.format(mem)
names = ['John', 'Milovan', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
print('Memory (Before): {}'.format(memory_usage_psutil()))
def people_list(num_people):
result = []
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
result.append(person)
return result
t1 = datetime.now()
people = people_list(1000000)
t2 = datetime.now()
print('Memory (After) : {}'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2 - t1))
输出:
Memory (Before): 50.38 MB
Memory (After) : 1140.41 MB
Took 0:00:01.056423 Seconds
函数,返回一个包含100万个结果的列表。 在底部,我打印出内存使用情况和总时间。 基本内存使用大约是50.38兆字节,在我创建了100万条记录的列表之后,你可以看到它增加了近1140.41兆字节,花了1.1秒。
案例2
import random
import psutil # pip install psutil
import os
from datetime import datetime
def memory_usage_psutil():
# return the memory usage in MB
process = psutil.Process(os.getpid())
mem = process.memory_info().rss / float(2 ** 20)
return '{:.2f} MB'.format(mem)
names = ['John', 'Milovan', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
print('Memory (Before): {}'.format(memory_usage_psutil()))
def people_generator(num_people):
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
yield person
t1 = datetime.now()
people = people_generator(1000000)
t2 = datetime.now()
print('Memory (After) : {}'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2 - t1))
输出:
Memory (Before): 50.52 MB
Memory (After) : 50.73 MB
Took 0:00:00.000008 Seconds
After I ran this that the memory is almost exactly the same and that's because the generator hasn't actually done anything yet it's not holding those million values in memory it's waiting for me to grab the next one. Basically it didn't take any time because as soon as it gets to the first yield statement it stops. I think that it is generator a little bit more readable and it also gives you big performance boosts not only with execution time but with memory. As well and you can still use all of the comprehensions and this generator expression here so you don't lose anything in that area. So those are a few reasons why you would use generators and also some of the advantages that come along with that.