如何将任意长度的列表拆分为大小相等的块?
请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。
对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。
如何将任意长度的列表拆分为大小相等的块?
请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。
对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。
当前回答
使用Python 3.8中的赋值表达式,它变得非常好:
import itertools
def batch(iterable, size):
it = iter(iterable)
while item := list(itertools.islice(it, size)):
yield item
这适用于任意可迭代的对象,而不仅仅是列表。
>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
更新
从Python 3.12开始,这个精确的实现可以作为itertools.batch获得
其他回答
根据这个答案,得票最多的答案在最后留下一个“矮子”。这是我的解决方案,可以在没有矮子的情况下,尽可能地获得大小均匀的块。它基本上试图准确选择应该拆分列表的小数点,但只需将其舍入到最接近的整数:
from __future__ import division # not needed in Python 3
def n_even_chunks(l, n):
"""Yield n as even chunks as possible from l."""
last = 0
for i in range(1, n+1):
cur = int(round(i * (len(l) / n)))
yield l[last:cur]
last = cur
演示:
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],
[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77],
[78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88],
[89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
[55, 56, 57, 58, 59, 60, 61, 62, 63],
[64, 65, 66, 67, 68, 69, 70, 71, 72],
[73, 74, 75, 76, 77, 78, 79, 80, 81],
[82, 83, 84, 85, 86, 87, 88, 89, 90],
[91, 92, 93, 94, 95, 96, 97, 98, 99]]
与排名前几的答案进行比较:
>>> pprint.pprint(list(chunks(list(range(100)), 100//9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],
[66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76],
[77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87],
[88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98],
[99]]
>>> pprint.pprint(list(chunks(list(range(100)), 100//11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98],
[99]]
我很惊讶没有人想到使用iter的双参数形式:
from itertools import islice
def chunk(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
演示:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
这适用于任何可迭代的对象,并延迟生成输出。它返回元组而不是迭代器,但我认为它还是有一定的优雅。它也不会垫;如果您需要填充,上面的一个简单变体就足够了:
from itertools import islice, chain, repeat
def chunk_pad(it, size, padval=None):
it = chain(iter(it), repeat(padval))
return iter(lambda: tuple(islice(it, size)), (padval,) * size)
演示:
>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
与基于izip_longest的解决方案一样,上面的解决方案也始终适用。据我所知,对于可选pad的函数,没有单行或双线itertools配方。通过结合以上两种方法,这一方法非常接近:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
if padval == _no_padding:
it = iter(it)
sentinel = ()
else:
it = chain(iter(it), repeat(padval))
sentinel = (padval,) * size
return iter(lambda: tuple(islice(it, size)), sentinel)
演示:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
我相信这是提议的提供可选填充的最短的分块器。
正如Tomasz Gandor所观察到的,如果两个填充块遇到一长串填充值,它们会意外停止。以下是以合理方式解决该问题的最后一个变体:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
it = iter(it)
chunker = iter(lambda: tuple(islice(it, size)), ())
if padval == _no_padding:
yield from chunker
else:
for ch in chunker:
yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))
演示:
>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]
使用列表综合:
l = [1,2,3,4,5,6,7,8,9,10,11,12]
k = 5 #chunk size
print [tuple(l[x:y]) for (x, y) in [(x, x+k) for x in range(0, len(l), k)]]
我专门为此写了一个小图书馆,这里有。库的分块函数特别有效,因为它是作为生成器实现的,因此在某些情况下可以节省大量内存。它也不依赖切片表示法,因此可以使用任意迭代器。
import iterlib
print list(iterlib.chunked(xrange(1, 1000), 10))
# prints [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20), ...]
这适用于v2/v3,可内联,基于生成器,仅使用标准库:
import itertools
def split_groups(iter_in, group_size):
return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))