一般来说,有没有一种有效的方法可以知道Python中的迭代器中有多少个元素,而不用遍历每个元素并计数?


当前回答

关于你最初的问题,答案仍然是,在Python中通常没有办法知道迭代器的长度。

Given that you question is motivated by an application of the pysam library, I can give a more specific answer: I'm a contributer to PySAM and the definitive answer is that SAM/BAM files do not provide an exact count of aligned reads. Nor is this information easily available from a BAM index file. The best one can do is to estimate the approximate number of alignments by using the location of the file pointer after reading a number of alignments and extrapolating based on the total size of the file. This is enough to implement a progress bar, but not a method of counting alignments in constant time.

其他回答

在计算机上有两种方法来获取“某物”的长度。

第一种方法是存储一个计数——这需要任何接触文件/数据的东西来修改它(或者一个只公开接口的类——但归根结底是一样的)。

另一种方法是遍历它并计算它有多大。

关于你最初的问题,答案仍然是,在Python中通常没有办法知道迭代器的长度。

Given that you question is motivated by an application of the pysam library, I can give a more specific answer: I'm a contributer to PySAM and the definitive answer is that SAM/BAM files do not provide an exact count of aligned reads. Nor is this information easily available from a BAM index file. The best one can do is to estimate the approximate number of alignments by using the location of the file pointer after reading a number of alignments and extrapolating based on the total size of the file. This is enough to implement a progress bar, but not a method of counting alignments in constant time.

def count_iter(iter):
    sum = 0
    for _ in iter: sum += 1
    return sum

迭代器只是一个对象,它有一个指向下一个对象的指针,由某种缓冲区或流读取,它就像一个LinkedList,在那里你不知道你有多少东西,直到你遍历它们。迭代器是高效的,因为它们所做的一切都是通过引用而不是使用索引告诉你下一个是什么(但是正如你所看到的,你失去了查看下一个条目有多少的能力)。

一个简单的方法是使用内置函数set()或list():

答:set()在迭代器中没有重复项的情况下(最快的方式)

iter = zip([1,2,3],['a','b','c'])
print(len(set(iter)) # set(iter) = {(1, 'a'), (2, 'b'), (3, 'c')}
Out[45]: 3

or

iter = range(1,10)
print(len(set(iter)) # set(iter) = {1, 2, 3, 4, 5, 6, 7, 8, 9}
Out[47]: 9

B: list()以防迭代器中有重复的项

iter = (1,2,1,2,1,2,1,2)
print(len(list(iter)) # list(iter) = [1, 2, 1, 2, 1, 2, 1, 2]
Out[49]: 8
# compare with set function
print(len(set(iter)) # set(iter) = {1, 2}
Out[51]: 2