如何将任意长度的列表拆分为大小相等的块?
请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。
对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。
如何将任意长度的列表拆分为大小相等的块?
请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。
对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。
当前回答
不调用len(),这对大型列表很有用:
def splitter(l, n):
i = 0
chunk = l[:n]
while chunk:
yield chunk
i += n
chunk = l[i:i+n]
这是可迭代的:
def isplitter(l, n):
l = iter(l)
chunk = list(islice(l, n))
while chunk:
yield chunk
chunk = list(islice(l, n))
上述产品的功能风味:
def isplitter2(l, n):
return takewhile(bool,
(tuple(islice(start, n))
for start in repeat(iter(l))))
OR:
def chunks_gen_sentinel(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return iter(imap(tuple, continuous_slices).next,())
OR:
def chunks_gen_filter(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return takewhile(bool,imap(tuple, continuous_slices))
其他回答
由于我必须这样做,下面是我的解决方案,给出了一个生成器和一个批量大小:
def pop_n_elems_from_generator(g, n):
elems = []
try:
for idx in xrange(0, n):
elems.append(g.next())
return elems
except StopIteration:
return elems
根据这个答案,得票最多的答案在最后留下一个“矮子”。这是我的解决方案,可以在没有矮子的情况下,尽可能地获得大小均匀的块。它基本上试图准确选择应该拆分列表的小数点,但只需将其舍入到最接近的整数:
from __future__ import division # not needed in Python 3
def n_even_chunks(l, n):
"""Yield n as even chunks as possible from l."""
last = 0
for i in range(1, n+1):
cur = int(round(i * (len(l) / n)))
yield l[last:cur]
last = cur
演示:
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],
[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77],
[78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88],
[89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
[55, 56, 57, 58, 59, 60, 61, 62, 63],
[64, 65, 66, 67, 68, 69, 70, 71, 72],
[73, 74, 75, 76, 77, 78, 79, 80, 81],
[82, 83, 84, 85, 86, 87, 88, 89, 90],
[91, 92, 93, 94, 95, 96, 97, 98, 99]]
与排名前几的答案进行比较:
>>> pprint.pprint(list(chunks(list(range(100)), 100//9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],
[66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76],
[77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87],
[88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98],
[99]]
>>> pprint.pprint(list(chunks(list(range(100)), 100//11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98],
[99]]
用户@tzot的解决方案zip_langest(*[iter(lst)]*n,fillvalue=padvalue)非常优雅,但如果lst的长度不能被n整除,它会填充最后一个子列表,以保持其长度与其他子列表的长度匹配。然而,如果这不可取,那么只需使用zip()生成类似的循环zip,并将lst的剩余元素(不能生成“完整”子列表)附加到输出即可。
输出示例为ABCDEFG,3->ABC DEF G。
单线版本(Python>=3.8):
list(map(list, zip(*[iter(lst)]*n))) + ([rest] if (rest:=lst[len(lst)//n*n : ]) else [])
A函数:
def chunkify(lst, chunk_size):
nested = list(map(list, zip(*[iter(lst)]*chunk_size)))
rest = lst[len(lst)//chunk_size*chunk_size: ]
if rest:
nested.append(rest)
return nested
生成器(尽管每个批次都是一个元组):
def chunkify(lst, chunk_size):
for tup in zip(*[iter(lst)]*chunk_size):
yield tup
rest = tuple(lst[len(lst)//chunk_size*chunk_size: ])
if rest:
yield rest
它比这里的一些最流行的答案产生相同的输出更快。
my_list, n = list(range(1_000_000)), 12
%timeit list(chunks(my_list, n)) # @Ned_Batchelder
# 36.4 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [my_list[i:i+n] for i in range(0, len(my_list), n)] # @Ned_Batchelder
# 34.6 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit it = iter(my_list); list(iter(lambda: list(islice(it, n)), [])) # @senderle
# 60.6 ms ± 5.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit list(mit.chunked(my_list, n)) # @pylang
# 59.4 ms ± 4.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit chunkify(my_list, n)
# 25.8 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
同样,从Python 3.12开始,这个功能将作为itertools模块中的批处理方法来实现(目前是一个配方),因此这个答案很可能会被Python 3.12淘汰。
使用列表综合:
l = [1,2,3,4,5,6,7,8,9,10,11,12]
k = 5 #chunk size
print [tuple(l[x:y]) for (x, y) in [(x, x+k) for x in range(0, len(l), k)]]
不调用len(),这对大型列表很有用:
def splitter(l, n):
i = 0
chunk = l[:n]
while chunk:
yield chunk
i += n
chunk = l[i:i+n]
这是可迭代的:
def isplitter(l, n):
l = iter(l)
chunk = list(islice(l, n))
while chunk:
yield chunk
chunk = list(islice(l, n))
上述产品的功能风味:
def isplitter2(l, n):
return takewhile(bool,
(tuple(islice(start, n))
for start in repeat(iter(l))))
OR:
def chunks_gen_sentinel(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return iter(imap(tuple, continuous_slices).next,())
OR:
def chunks_gen_filter(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return takewhile(bool,imap(tuple, continuous_slices))