在Python中扁平化一个浅列表

是否有一种简单的方法可以用列表理解来扁平化一个可迭代对象列表，或者如果没有，你们都认为什么是扁平化这样一个浅列表的最好方法，平衡性能和可读性?

我尝试用一个嵌套的列表理解来扁平化这样一个列表，就像这样:

[image for image in menuitem for menuitem in list_of_menuitems]

但是我遇到了NameError的麻烦，因为名字‘menuitem’没有定义。在google和Stack Overflow上搜索之后，我用一个reduce语句得到了想要的结果:

reduce(list.__add__, map(lambda x: list(x), list_of_menuitems))

但是这个方法是相当不可读的，因为我需要调用list(x)因为x是Django QuerySet对象。

结论:

感谢每个为这个问题做出贡献的人。以下是我所学到的一份总结。我也把它变成了一个社区维基，以防其他人想要添加或纠正这些观察。

我原来的reduce语句是多余的，最好这样写:

>>> reduce(list.__add__, (list(mi) for mi in list_of_menuitems))

这是嵌套列表理解的正确语法(聪明的总结dF!)

>>> [image for mi in list_of_menuitems for image in mi]

但这两种方法都不如使用itertools.chain有效:

>>> from itertools import chain
>>> list(chain(*list_of_menuitems))

正如@cdleary所指出的那样，使用chain.from_iterable来避免*操作符魔法可能是更好的风格，就像这样:

>>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]])
>>> print(list(chain))
>>> [1, 2, 3, 5, 89, 6]

当前回答

Sum (list_of_lists，[])将使它变平。

l = [['image00', 'image01'], ['image10'], []]
print sum(l,[]) # prints ['image00', 'image01', 'image10']

2014-02-27 19:03:12

其他回答

如果您只是希望迭代一个扁平的数据结构版本，并且不需要可索引序列，请考虑itertools。连锁店和公司。

>>> list_of_menuitems = [['image00', 'image01'], ['image10'], []]
>>> import itertools
>>> chain = itertools.chain(*list_of_menuitems)
>>> print(list(chain))
['image00', 'image01', 'image10']

它可以在任何可迭代的东西上工作，其中应该包括Django的可迭代QuerySets，它似乎是你在问题中使用的。

编辑:无论如何，这可能和reduce一样好，因为reduce将有相同的开销将项复制到正在扩展的列表中。如果你在最后运行list(Chain)， Chain只会引起这种(相同的)开销。

Meta-Edit:实际上，它的开销比问题提出的解决方案要少，因为当您使用临时列表扩展原始列表时，会丢弃您创建的临时列表。

编辑:正如J.F.塞巴斯蒂安说itertools.chain.from_iterable避免解包，你应该使用它来避免*魔术，但timeit应用程序显示可以忽略的性能差异。

2009-01-02 06:49:50

@。洛特:你启发我写了一个timeit应用程序。

我认为它也会根据分区的数量(容器列表中的迭代器的数量)而变化——你的评论没有提到这30个项目中有多少个分区。这个图在每次运行中平摊1000个项目，使用不同数量的分区。这些物品均匀地分布在各个分区中。

代码(Python 2.6):

#!/usr/bin/env python2.6

"""Usage: %prog item_count"""

from __future__ import print_function

import collections
import itertools
import operator
from timeit import Timer
import sys

import matplotlib.pyplot as pyplot

def itertools_flatten(iter_lst):
    return list(itertools.chain(*iter_lst))

def itertools_iterable_flatten(iter_iter):
    return list(itertools.chain.from_iterable(iter_iter))

def reduce_flatten(iter_lst):
    return reduce(operator.add, map(list, iter_lst))

def reduce_lambda_flatten(iter_lst):
    return reduce(operator.add, map(lambda x: list(x), [i for i in iter_lst]))

def comprehension_flatten(iter_lst):
    return list(item for iter_ in iter_lst for item in iter_)

METHODS = ['itertools', 'itertools_iterable', 'reduce', 'reduce_lambda',
           'comprehension']

def _time_test_assert(iter_lst):
    """Make sure all methods produce an equivalent value.
    :raise AssertionError: On any non-equivalent value."""
    callables = (globals()[method + '_flatten'] for method in METHODS)
    results = [callable(iter_lst) for callable in callables]
    if not all(result == results[0] for result in results[1:]):
        raise AssertionError

def time_test(partition_count, item_count_per_partition, test_count=10000):
    """Run flatten methods on a list of :param:`partition_count` iterables.
    Normalize results over :param:`test_count` runs.
    :return: Mapping from method to (normalized) microseconds per pass.
    """
    iter_lst = [[dict()] * item_count_per_partition] * partition_count
    print('Partition count:    ', partition_count)
    print('Items per partition:', item_count_per_partition)
    _time_test_assert(iter_lst)
    test_str = 'flatten(%r)' % iter_lst
    result_by_method = {}
    for method in METHODS:
        setup_str = 'from test import %s_flatten as flatten' % method
        t = Timer(test_str, setup_str)
        per_pass = test_count * t.timeit(number=test_count) / test_count
        print('%20s: %.2f usec/pass' % (method, per_pass))
        result_by_method[method] = per_pass
    return result_by_method

if __name__ == '__main__':
    if len(sys.argv) != 2:
        raise ValueError('Need a number of items to flatten')
    item_count = int(sys.argv[1])
    partition_counts = []
    pass_times_by_method = collections.defaultdict(list)
    for partition_count in xrange(1, item_count):
        if item_count % partition_count != 0:
            continue
        items_per_partition = item_count / partition_count
        result_by_method = time_test(partition_count, items_per_partition)
        partition_counts.append(partition_count)
        for method, result in result_by_method.iteritems():
            pass_times_by_method[method].append(result)
    for method, pass_times in pass_times_by_method.iteritems():
        pyplot.plot(partition_counts, pass_times, label=method)
    pyplot.legend()
    pyplot.title('Flattening Comparison for %d Items' % item_count)
    pyplot.xlabel('Number of Partitions')
    pyplot.ylabel('Microseconds')
    pyplot.show()

编辑:决定让它成为社区维基。

注意:METHODS可能应该使用装饰器进行积累，但我认为这样更容易让人们阅读。

2009-01-02 23:16:43

下面是使用列表推导式的正确解决方案(它们在这个问题中是落后的):

>>> join = lambda it: (y for x in it for y in x)
>>> list(join([[1,2],[3,4,5],[]]))
[1, 2, 3, 4, 5]

对你来说就是这样

[image for menuitem in list_of_menuitems for image in menuitem.image_set.all()]

或者你可以用join，然后说

join(menuitem.image_set.all() for menuitem in list_of_menuitems)

在这两种情况下，问题在于for循环的嵌套。

2009-01-02 07:33:49

根据我的经验，最有效的将列表的列表扁平化的方法是:

flat_list = []
map(flat_list.extend, list_of_list)

有时与其他提出的方法进行比较:

list_of_list = [range(10)]*1000
%timeit flat_list=[]; map(flat_list.extend, list_of_list)
#10000 loops, best of 3: 119 µs per loop
%timeit flat_list=list(itertools.chain.from_iterable(list_of_list))
#1000 loops, best of 3: 210 µs per loop
%timeit flat_list=[i for sublist in list_of_list for i in sublist]
#1000 loops, best of 3: 525 µs per loop
%timeit flat_list=reduce(list.__add__,list_of_list)
#100 loops, best of 3: 18.1 ms per loop

现在，当处理更长的子列表时，效率增益会更好:

list_of_list = [range(1000)]*10
%timeit flat_list=[]; map(flat_list.extend, list_of_list)
#10000 loops, best of 3: 60.7 µs per loop
%timeit flat_list=list(itertools.chain.from_iterable(list_of_list))
#10000 loops, best of 3: 176 µs per loop

这个方法也适用于任何迭代对象:

class SquaredRange(object):
    def __init__(self, n): 
        self.range = range(n)
    def __iter__(self):
        for i in self.range: 
            yield i**2

list_of_list = [SquaredRange(5)]*3
flat_list = []
map(flat_list.extend, list_of_list)
print flat_list
#[0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16]

2014-01-20 15:19:35

Pylab提供了一个flatten: 链接到numpy flatten

2014-02-22 17:01:33

在Python中扁平化一个浅列表

推荐文章

最新文章

标签