我如何并行遍历两个列表?

Building on the answer by @unutbu, I have compared the iteration performance of two identical lists when using Python 3.6's zip() functions, Python's enumerate() function, using a manual counter (see count() function), using an index-list, and during a special scenario where the elements of one of the two lists (either foo or bar) may be used to index the other list. Their performances for printing and creating a new list, respectively, were investigated using the timeit() function where the number of repetitions used was 1000 times. One of the Python scripts that I had created to perform these investigations is given below. The sizes of the foo and bar lists had ranged from 10 to 1,000,000 elements.

结果:

For printing purposes: The performances of all the considered approaches were observed to be approximately similar to the zip() function, after factoring an accuracy tolerance of +/-5%. An exception occurred when the list size was smaller than 100 elements. In such a scenario, the index-list method was slightly slower than the zip() function while the enumerate() function was ~9% faster. The other methods yielded similar performance to the zip() function. For creating lists: Two types of list creation approaches were explored: using the (a) list.append() method and (b) list comprehension. After factoring an accuracy tolerance of +/-5%, for both of these approaches, the zip() function was found to perform faster than the enumerate() function, than using a list-index, than using a manual counter. The performance gain by the zip() function in these comparisons can be 5% to 60% faster. Interestingly, using the element of foo to index bar can yield equivalent or faster performances (5% to 20%) than the zip() function.

这些结果的意义:

程序员必须确定每个操作有意义或有意义的计算时间。

For example, for printing purposes, if this time criterion is 1 second, i.e. 10**0 sec, then looking at the y-axis of the graph that is on the left at 1 sec and projecting it horizontally until it reaches the monomials curves, we see that lists sizes that are more than 144 elements will incur significant compute cost and significance to the programmer. That is, any performance gained by the approaches mentioned in this investigation for smaller list sizes will be insignificant to the programmer. The programmer will conclude that the performance of the zip() function to iterate print statements is similar to the other approaches.

结论

通过使用zip()函数在列表创建期间并行遍历两个列表，可以获得显著的性能。当并行遍历两个列表以打印出两个列表的元素时，zip()函数将产生与enumerate()函数类似的性能，如使用手动计数器变量、使用索引列表，以及在两个列表之一(foo或bar)的元素可用于索引另一个列表的特殊场景中。

用于研究列表创建的Python 3.6脚本。

import timeit
import matplotlib.pyplot as plt
import numpy as np


def test_zip( foo, bar ):
    store = []
    for f, b in zip(foo, bar):
        #print(f, b)
        store.append( (f, b) )

def test_enumerate( foo, bar ):
    store = []
    for n, f in enumerate( foo ):
        #print(f, bar[n])
        store.append( (f, bar[n]) )

def test_count( foo, bar ):
    store = []
    count = 0
    for f in foo:
        #print(f, bar[count])
        store.append( (f, bar[count]) )
        count += 1

def test_indices( foo, bar, indices ):
    store = []
    for i in indices:
        #print(foo[i], bar[i])
        store.append( (foo[i], bar[i]) )

def test_existing_list_indices( foo, bar ):
    store = []
    for f in foo:
        #print(f, bar[f])
        store.append( (f, bar[f]) )


list_sizes = [ 10, 100, 1000, 10000, 100000, 1000000 ]
tz = []
te = []
tc = []
ti = []
tii= []

tcz = []
tce = []
tci = []
tcii= []

for a in list_sizes:
    foo = [ i for i in range(a) ]
    bar = [ i for i in range(a) ]
    indices = [ i for i in range(a) ]
    reps = 1000

    tz.append( timeit.timeit( 'test_zip( foo, bar )',
                              'from __main__ import test_zip, foo, bar',
                              number=reps
                              )
               )
    te.append( timeit.timeit( 'test_enumerate( foo, bar )',
                              'from __main__ import test_enumerate, foo, bar',
                              number=reps
                              )
               )
    tc.append( timeit.timeit( 'test_count( foo, bar )',
                              'from __main__ import test_count, foo, bar',
                              number=reps
                              )
               )
    ti.append( timeit.timeit( 'test_indices( foo, bar, indices )',
                              'from __main__ import test_indices, foo, bar, indices',
                              number=reps
                              )
               )
    tii.append( timeit.timeit( 'test_existing_list_indices( foo, bar )',
                               'from __main__ import test_existing_list_indices, foo, bar',
                               number=reps
                               )
                )

    tcz.append( timeit.timeit( '[(f, b) for f, b in zip(foo, bar)]',
                               'from __main__ import foo, bar',
                               number=reps
                               )
                )
    tce.append( timeit.timeit( '[(f, bar[n]) for n, f in enumerate( foo )]',
                               'from __main__ import foo, bar',
                               number=reps
                               )
                )
    tci.append( timeit.timeit( '[(foo[i], bar[i]) for i in indices ]',
                               'from __main__ import foo, bar, indices',
                               number=reps
                               )
                )
    tcii.append( timeit.timeit( '[(f, bar[f]) for f in foo ]',
                                'from __main__ import foo, bar',
                                number=reps
                                )
                 )

print( f'te  = {te}' )
print( f'ti  = {ti}' )
print( f'tii = {tii}' )
print( f'tc  = {tc}' )
print( f'tz  = {tz}' )

print( f'tce  = {te}' )
print( f'tci  = {ti}' )
print( f'tcii = {tii}' )
print( f'tcz  = {tz}' )

fig, ax = plt.subplots( 2, 2 )
ax[0,0].plot( list_sizes, te, label='enumerate()', marker='.' )
ax[0,0].plot( list_sizes, ti, label='index-list', marker='.' )
ax[0,0].plot( list_sizes, tii, label='element of foo', marker='.' )
ax[0,0].plot( list_sizes, tc, label='count()', marker='.' )
ax[0,0].plot( list_sizes, tz, label='zip()', marker='.')
ax[0,0].set_xscale('log')
ax[0,0].set_yscale('log')
ax[0,0].set_xlabel('List Size')
ax[0,0].set_ylabel('Time (s)')
ax[0,0].legend()
ax[0,0].grid( b=True, which='major', axis='both')
ax[0,0].grid( b=True, which='minor', axis='both')

ax[0,1].plot( list_sizes, np.array(te)/np.array(tz), label='enumerate()', marker='.' )
ax[0,1].plot( list_sizes, np.array(ti)/np.array(tz), label='index-list', marker='.' )
ax[0,1].plot( list_sizes, np.array(tii)/np.array(tz), label='element of foo', marker='.' )
ax[0,1].plot( list_sizes, np.array(tc)/np.array(tz), label='count()', marker='.' )
ax[0,1].set_xscale('log')
ax[0,1].set_xlabel('List Size')
ax[0,1].set_ylabel('Performances ( vs zip() function )')
ax[0,1].legend()
ax[0,1].grid( b=True, which='major', axis='both')
ax[0,1].grid( b=True, which='minor', axis='both')

ax[1,0].plot( list_sizes, tce, label='list comprehension using enumerate()',  marker='.')
ax[1,0].plot( list_sizes, tci, label='list comprehension using index-list()',  marker='.')
ax[1,0].plot( list_sizes, tcii, label='list comprehension using element of foo',  marker='.')
ax[1,0].plot( list_sizes, tcz, label='list comprehension using zip()',  marker='.')
ax[1,0].set_xscale('log')
ax[1,0].set_yscale('log')
ax[1,0].set_xlabel('List Size')
ax[1,0].set_ylabel('Time (s)')
ax[1,0].legend()
ax[1,0].grid( b=True, which='major', axis='both')
ax[1,0].grid( b=True, which='minor', axis='both')

ax[1,1].plot( list_sizes, np.array(tce)/np.array(tcz), label='enumerate()', marker='.' )
ax[1,1].plot( list_sizes, np.array(tci)/np.array(tcz), label='index-list', marker='.' )
ax[1,1].plot( list_sizes, np.array(tcii)/np.array(tcz), label='element of foo', marker='.' )
ax[1,1].set_xscale('log')
ax[1,1].set_xlabel('List Size')
ax[1,1].set_ylabel('Performances ( vs zip() function )')
ax[1,1].legend()
ax[1,1].grid( b=True, which='major', axis='both')
ax[1,1].grid( b=True, which='minor', axis='both')

plt.show()

2020-06-19 23:15:11

如果你想在使用zip()遍历多个列表时保留下标，你可以将zip对象传递给enumerate():

for i, (f, b) in enumerate(zip(foo, bar)):
    # do something

例如，如果你想打印出两个列表中值不同的位置，你可以这样做。

foo, bar = ['a', 'b', 'c'], ['a', 'a', 'c']

for i, (f, b) in enumerate(zip(foo, bar)):
    if f != b:
        print(f"items at index {i} are different")
    
# items at index 1 are different

如果您的列表没有相同的长度，则zip()迭代直到最短的列表结束。如果希望迭代到最长的列表结束，请使用内置itertools模块中的zip_longest。默认情况下，它会用None填充缺失的值(但是您可以使用fillvalue参数将其更改为任何您想要的值)。

from itertools import zip_longest
for f, b in zip_longest(foo, bar):
    # do something

2022-09-09 06:09:26

Python 3

for f, b in zip(foo, bar):
    print(f, b)

当较短的foo或bar停止时，Zip停止。

在Python 3中，zip 返回元组的迭代器，如itertools。Python2中的izip。得到一份清单对于元组，使用list(zip(foo, bar))。和压缩，直到两个迭代器都为累了，你会用 itertools.zip_longest。

Python 2

在Python 2中，zip 返回一个元组列表。当foo和bar不是巨大的时候，这很好。如果它们都是巨大的，那么形成zip(foo,bar)是不必要的巨大临时变量，应由itertools替换。izip或 itertools。Izip_longest，它返回一个迭代器而不是一个列表。

import itertools
for f,b in itertools.izip(foo,bar):
    print(f,b)
for f,b in itertools.izip_longest(foo,bar):
    print(f,b)

当foo或bar耗尽时，Izip停止。当foo和bar耗尽时，Izip_longest停止。当较短的迭代器耗尽时，izip_longest将在该迭代器对应的位置生成一个具有None的元组。如果你愿意，你也可以设置一个不同的填充值。请看这里了解完整的故事。

还要注意，zip及其类似zip的兄弟可以接受任意数量的可迭代对象作为参数。例如,

for num, cheese, color in zip([1,2,3], ['manchego', 'stilton', 'brie'], 
                              ['red', 'blue', 'green']):
    print('{} {} {}'.format(num, color, cheese))

打印

1 red manchego
2 blue stilton
3 green brie

2009-11-02 21:28:27

Building on the answer by @unutbu, I have compared the iteration performance of two identical lists when using Python 3.6's zip() functions, Python's enumerate() function, using a manual counter (see count() function), using an index-list, and during a special scenario where the elements of one of the two lists (either foo or bar) may be used to index the other list. Their performances for printing and creating a new list, respectively, were investigated using the timeit() function where the number of repetitions used was 1000 times. One of the Python scripts that I had created to perform these investigations is given below. The sizes of the foo and bar lists had ranged from 10 to 1,000,000 elements.

结果:

For printing purposes: The performances of all the considered approaches were observed to be approximately similar to the zip() function, after factoring an accuracy tolerance of +/-5%. An exception occurred when the list size was smaller than 100 elements. In such a scenario, the index-list method was slightly slower than the zip() function while the enumerate() function was ~9% faster. The other methods yielded similar performance to the zip() function. For creating lists: Two types of list creation approaches were explored: using the (a) list.append() method and (b) list comprehension. After factoring an accuracy tolerance of +/-5%, for both of these approaches, the zip() function was found to perform faster than the enumerate() function, than using a list-index, than using a manual counter. The performance gain by the zip() function in these comparisons can be 5% to 60% faster. Interestingly, using the element of foo to index bar can yield equivalent or faster performances (5% to 20%) than the zip() function.

这些结果的意义:

程序员必须确定每个操作有意义或有意义的计算时间。

For example, for printing purposes, if this time criterion is 1 second, i.e. 10**0 sec, then looking at the y-axis of the graph that is on the left at 1 sec and projecting it horizontally until it reaches the monomials curves, we see that lists sizes that are more than 144 elements will incur significant compute cost and significance to the programmer. That is, any performance gained by the approaches mentioned in this investigation for smaller list sizes will be insignificant to the programmer. The programmer will conclude that the performance of the zip() function to iterate print statements is similar to the other approaches.

结论

通过使用zip()函数在列表创建期间并行遍历两个列表，可以获得显著的性能。当并行遍历两个列表以打印出两个列表的元素时，zip()函数将产生与enumerate()函数类似的性能，如使用手动计数器变量、使用索引列表，以及在两个列表之一(foo或bar)的元素可用于索引另一个列表的特殊场景中。

用于研究列表创建的Python 3.6脚本。

import timeit
import matplotlib.pyplot as plt
import numpy as np


def test_zip( foo, bar ):
    store = []
    for f, b in zip(foo, bar):
        #print(f, b)
        store.append( (f, b) )

def test_enumerate( foo, bar ):
    store = []
    for n, f in enumerate( foo ):
        #print(f, bar[n])
        store.append( (f, bar[n]) )

def test_count( foo, bar ):
    store = []
    count = 0
    for f in foo:
        #print(f, bar[count])
        store.append( (f, bar[count]) )
        count += 1

def test_indices( foo, bar, indices ):
    store = []
    for i in indices:
        #print(foo[i], bar[i])
        store.append( (foo[i], bar[i]) )

def test_existing_list_indices( foo, bar ):
    store = []
    for f in foo:
        #print(f, bar[f])
        store.append( (f, bar[f]) )


list_sizes = [ 10, 100, 1000, 10000, 100000, 1000000 ]
tz = []
te = []
tc = []
ti = []
tii= []

tcz = []
tce = []
tci = []
tcii= []

for a in list_sizes:
    foo = [ i for i in range(a) ]
    bar = [ i for i in range(a) ]
    indices = [ i for i in range(a) ]
    reps = 1000

    tz.append( timeit.timeit( 'test_zip( foo, bar )',
                              'from __main__ import test_zip, foo, bar',
                              number=reps
                              )
               )
    te.append( timeit.timeit( 'test_enumerate( foo, bar )',
                              'from __main__ import test_enumerate, foo, bar',
                              number=reps
                              )
               )
    tc.append( timeit.timeit( 'test_count( foo, bar )',
                              'from __main__ import test_count, foo, bar',
                              number=reps
                              )
               )
    ti.append( timeit.timeit( 'test_indices( foo, bar, indices )',
                              'from __main__ import test_indices, foo, bar, indices',
                              number=reps
                              )
               )
    tii.append( timeit.timeit( 'test_existing_list_indices( foo, bar )',
                               'from __main__ import test_existing_list_indices, foo, bar',
                               number=reps
                               )
                )

    tcz.append( timeit.timeit( '[(f, b) for f, b in zip(foo, bar)]',
                               'from __main__ import foo, bar',
                               number=reps
                               )
                )
    tce.append( timeit.timeit( '[(f, bar[n]) for n, f in enumerate( foo )]',
                               'from __main__ import foo, bar',
                               number=reps
                               )
                )
    tci.append( timeit.timeit( '[(foo[i], bar[i]) for i in indices ]',
                               'from __main__ import foo, bar, indices',
                               number=reps
                               )
                )
    tcii.append( timeit.timeit( '[(f, bar[f]) for f in foo ]',
                                'from __main__ import foo, bar',
                                number=reps
                                )
                 )

print( f'te  = {te}' )
print( f'ti  = {ti}' )
print( f'tii = {tii}' )
print( f'tc  = {tc}' )
print( f'tz  = {tz}' )

print( f'tce  = {te}' )
print( f'tci  = {ti}' )
print( f'tcii = {tii}' )
print( f'tcz  = {tz}' )

fig, ax = plt.subplots( 2, 2 )
ax[0,0].plot( list_sizes, te, label='enumerate()', marker='.' )
ax[0,0].plot( list_sizes, ti, label='index-list', marker='.' )
ax[0,0].plot( list_sizes, tii, label='element of foo', marker='.' )
ax[0,0].plot( list_sizes, tc, label='count()', marker='.' )
ax[0,0].plot( list_sizes, tz, label='zip()', marker='.')
ax[0,0].set_xscale('log')
ax[0,0].set_yscale('log')
ax[0,0].set_xlabel('List Size')
ax[0,0].set_ylabel('Time (s)')
ax[0,0].legend()
ax[0,0].grid( b=True, which='major', axis='both')
ax[0,0].grid( b=True, which='minor', axis='both')

ax[0,1].plot( list_sizes, np.array(te)/np.array(tz), label='enumerate()', marker='.' )
ax[0,1].plot( list_sizes, np.array(ti)/np.array(tz), label='index-list', marker='.' )
ax[0,1].plot( list_sizes, np.array(tii)/np.array(tz), label='element of foo', marker='.' )
ax[0,1].plot( list_sizes, np.array(tc)/np.array(tz), label='count()', marker='.' )
ax[0,1].set_xscale('log')
ax[0,1].set_xlabel('List Size')
ax[0,1].set_ylabel('Performances ( vs zip() function )')
ax[0,1].legend()
ax[0,1].grid( b=True, which='major', axis='both')
ax[0,1].grid( b=True, which='minor', axis='both')

ax[1,0].plot( list_sizes, tce, label='list comprehension using enumerate()',  marker='.')
ax[1,0].plot( list_sizes, tci, label='list comprehension using index-list()',  marker='.')
ax[1,0].plot( list_sizes, tcii, label='list comprehension using element of foo',  marker='.')
ax[1,0].plot( list_sizes, tcz, label='list comprehension using zip()',  marker='.')
ax[1,0].set_xscale('log')
ax[1,0].set_yscale('log')
ax[1,0].set_xlabel('List Size')
ax[1,0].set_ylabel('Time (s)')
ax[1,0].legend()
ax[1,0].grid( b=True, which='major', axis='both')
ax[1,0].grid( b=True, which='minor', axis='both')

ax[1,1].plot( list_sizes, np.array(tce)/np.array(tcz), label='enumerate()', marker='.' )
ax[1,1].plot( list_sizes, np.array(tci)/np.array(tcz), label='index-list', marker='.' )
ax[1,1].plot( list_sizes, np.array(tcii)/np.array(tcz), label='element of foo', marker='.' )
ax[1,1].set_xscale('log')
ax[1,1].set_xlabel('List Size')
ax[1,1].set_ylabel('Performances ( vs zip() function )')
ax[1,1].legend()
ax[1,1].grid( b=True, which='major', axis='both')
ax[1,1].grid( b=True, which='minor', axis='both')

plt.show()

2020-06-19 23:15:11

你应该使用'zip'函数。下面是您自己的zip函数的示例

def custom_zip(seq1, seq2):
    it1 = iter(seq1)
    it2 = iter(seq2)
    while True:
        yield next(it1), next(it2)

2017-04-23 11:09:20

您可以使用推导式将第n个元素捆绑到元组或列表中，然后使用生成器函数将它们传递出去。

def iterate_multi(*lists):
    for i in range(min(map(len,lists))):
        yield tuple(l[i] for l in lists)

for l1, l2, l3 in iterate_multi([1,2,3],[4,5,6],[7,8,9]):
    print(str(l1)+","+str(l2)+","+str(l3))

2020-03-25 02:00:15

我如何并行遍历两个列表?

推荐文章

最新文章

标签