迭代器和生成器之间的区别是什么?举一些例子来说明你在什么时候使用每种情况会很有帮助。


当前回答

如果没有另外两个概念:可迭代对象和迭代器协议,就很难回答这个问题。

What is difference between iterator and iterable? Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice: One difference is that iterator has __next__ method, iterable does not. Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself. This can help to distinguish iterator and iterable in practice.

>>> x = [1, 2, 3]
>>> dir(x) 
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator

What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here). How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.

我可以给你一些更有趣的例子,展示这些概念在实践中的一些令人困惑的用法:

in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator); if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology; we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);

其他回答

iterator是一个更通用的概念:任何具有__next__方法(Python 2中的next)和__iter__方法且返回self的对象。

每个生成器都是迭代器,反之亦然。生成器是通过调用具有一个或多个yield表达式(yield语句,在Python 2.5及更早版本中)的函数来构建的,它是一个满足上一段对迭代器定义的对象。

当你需要一个具有复杂状态维护行为的类,或者想公开__next__(以及__iter__和__init__)之外的其他方法时,你可能想使用自定义迭代器,而不是生成器。大多数情况下,一个生成器(有时,对于足够简单的需求,一个生成器表达式)就足够了,而且编码更简单,因为状态维护(在合理的范围内)基本上是由框架挂起和恢复“为您完成”的。

例如,一个生成器,如:

def squares(start, stop):
    for i in range(start, stop):
        yield i * i

generator = squares(a, b)

或等效的生成器表达式(genexp)

generator = (i*i for i in range(a, b))

将需要更多的代码来构建自定义迭代器:

class Squares(object):
    def __init__(self, start, stop):
       self.start = start
       self.stop = stop
    def __iter__(self): return self
    def __next__(self): # next in Python 2
       if self.start >= self.stop:
           raise StopIteration
       current = self.start * self.start
       self.start += 1
       return current

iterator = Squares(a, b)

但是,当然,使用类Squares,你可以很容易地提供额外的方法。

def current(self):
    return self.start

如果您的应用程序中确实需要这些额外的功能。

每个人都有一个非常漂亮和冗长的答案,我真的很感激。我只是想给那些在概念上还不太清楚的人一个简短的回答:

If you create your own iterator, it is a little bit involved - you have to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator

希望这能澄清一点。

如果没有另外两个概念:可迭代对象和迭代器协议,就很难回答这个问题。

What is difference between iterator and iterable? Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice: One difference is that iterator has __next__ method, iterable does not. Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself. This can help to distinguish iterator and iterable in practice.

>>> x = [1, 2, 3]
>>> dir(x) 
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator

What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here). How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.

我可以给你一些更有趣的例子,展示这些概念在实践中的一些令人困惑的用法:

in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator); if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology; we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);

对于相同的数据,你可以比较两种方法:

def myGeneratorList(n):
    for i in range(n):
        yield i

def myIterableList(n):
    ll = n*[None]
    for i in range(n):
        ll[i] = i
    return ll

# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
    print("{} {}".format(i1, i2))

# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)

print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))

此外,如果检查内存占用,生成器占用的内存要少得多,因为它不需要同时将所有值存储在内存中。

之前的回答忽略了这一点:生成器有close方法,而典型的迭代器没有。close方法在生成器中触发StopIteration异常,该异常可能在迭代器中的finally子句中被捕获,以获得运行一些清理的机会。这种抽象使得它在大型迭代器中比简单迭代器更有用。可以像关闭文件一样关闭生成器,而不必担心下面有什么。

也就是说,我个人对第一个问题的回答是:iteratable只有__iter__方法,典型的迭代器只有__next__方法,生成器既有__iter__又有__next__,还有一个附加的close。

For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.

对于控制流而言,生成器是一个与承诺同样重要的概念:两者都是抽象的和可组合的。