如何将任意长度的列表拆分为大小相等的块?


请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。

对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。


当前回答

senderle答案的一个线性版本:

from itertools import islice
from functools import partial

seq = [1,2,3,4,5,6,7]
size = 3
result = list(iter(partial(lambda it: tuple(islice(it, size)), iter(seq)), ()))
assert result == [(1, 2, 3), (4, 5, 6), (7,)]

其他回答

考虑使用matplotlib.cbook片段

例如:

import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
     print s

由于我必须这样做,下面是我的解决方案,给出了一个生成器和一个批量大小:

def pop_n_elems_from_generator(g, n):
    elems = []
    try:
        for idx in xrange(0, n):
            elems.append(g.next())
        return elems
    except StopIteration:
        return elems

下面是一个处理任意可迭代项的生成器:

def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))

例子:

>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]
>>> def f(x, n, acc=[]): return f(x[n:], n, acc+[(x[:n])]) if x else acc
>>> f("Hallo Welt", 3)
['Hal', 'lo ', 'Wel', 't']
>>> 

如果你在括号里-我拿起了一本关于Erlang的书:)

我很惊讶没有人想到使用iter的双参数形式:

from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

演示:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

这适用于任何可迭代的对象,并延迟生成输出。它返回元组而不是迭代器,但我认为它还是有一定的优雅。它也不会垫;如果您需要填充,上面的一个简单变体就足够了:

from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)

演示:

>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

与基于izip_longest的解决方案一样,上面的解决方案也始终适用。据我所知,对于可选pad的函数,没有单行或双线itertools配方。通过结合以上两种方法,这一方法非常接近:

_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

演示:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

我相信这是提议的提供可选填充的最短的分块器。

正如Tomasz Gandor所观察到的,如果两个填充块遇到一长串填充值,它们会意外停止。以下是以合理方式解决该问题的最后一个变体:

_no_padding = object()
def chunk(it, size, padval=_no_padding):
    it = iter(it)
    chunker = iter(lambda: tuple(islice(it, size)), ())
    if padval == _no_padding:
        yield from chunker
    else:
        for ch in chunker:
            yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

演示:

>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]