如何将任意长度的列表拆分为大小相等的块?
请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。
对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。
如何将任意长度的列表拆分为大小相等的块?
请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。
对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。
当前回答
让r是块大小,L是初始列表,您可以这样做。
chunkL = [ [i for i in L[r*k:r*(k+1)] ] for k in range(len(L)/r)]
其他回答
一个简单的解决方案
OP已请求“相等大小的块”。我将“等尺寸”理解为“平衡”尺寸:如果尺寸不可能相等(例如,23/5),我们正在寻找尺寸大致相同的物品组。
这里的输入是:
项目列表:input_list(例如,23个数字的列表)要拆分这些项目的组数:n个组(例如5个)
输入:
input_list = list(range(23))
n_groups = 5
连续元素组:
approx_sizes = len(input_list)/n_groups
groups_cont = [input_list[int(i*approx_sizes):int((i+1)*approx_sizes)]
for i in range(n_groups)]
“每N个”元素组:
groups_leap = [input_list[i::n_groups]
for i in range(n_groups)]
后果
print(len(input_list))
print('Contiguous elements lists:')
print(groups_cont)
print('Leap every "N" items lists:')
print(groups_leap)
将输出:23连续元素列表:[[0, 1, 2, 3], [4, 5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16, 17], [18, 19, 20, 21, 22]]跳过每“N”个项目列表:[[0, 5, 10, 15, 20], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18], [4, 9, 14, 19]]
我非常喜欢tzot和J.F.Sebastian提出的Python文档版本,但它有两个缺点:
它不是很明确我通常不希望在最后一个块中有填充值
我在代码中经常使用这个:
from itertools import islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield tuple(islice(iterable, n)) or iterable.next()
更新:一个懒块版本:
from itertools import chain, islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield chain([next(iterable)], islice(iterable, n-1))
我不喜欢按块大小拆分元素的想法,例如,脚本可以将101到3个块划分为[50,50,1]。为了我的需要,我需要按比例分配,保持秩序不变。首先我写了自己的剧本,效果很好,而且很简单。但我后来看到了这个答案,剧本比我的好,我想是这样的。这是我的脚本:
def proportional_dividing(N, n):
"""
N - length of array (bigger number)
n - number of chunks (smaller number)
output - arr, containing N numbers, diveded roundly to n chunks
"""
arr = []
if N == 0:
return arr
elif n == 0:
arr.append(N)
return arr
r = N // n
for i in range(n-1):
arr.append(r)
arr.append(N-r*(n-1))
last_n = arr[-1]
# last number always will be r <= last_n < 2*r
# when last_n == r it's ok, but when last_n > r ...
if last_n > r:
# ... and if difference too big (bigger than 1), then
if abs(r-last_n) > 1:
#[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7] # N=29, n=12
# we need to give unnecessary numbers to first elements back
diff = last_n - r
for k in range(diff):
arr[k] += 1
arr[-1] = r
# and we receive [3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2]
return arr
def split_items(items, chunks):
arr = proportional_dividing(len(items), chunks)
splitted = []
for chunk_size in arr:
splitted.append(items[:chunk_size])
items = items[chunk_size:]
print(splitted)
return splitted
items = [1,2,3,4,5,6,7,8,9,10,11]
chunks = 3
split_items(items, chunks)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm'], 3)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm', 'n'], 3)
split_items(range(100), 4)
split_items(range(99), 4)
split_items(range(101), 4)
和输出:
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'g', 'k', 'l', 'm']]
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'g'], ['k', 'l', 'm', 'n']]
[range(0, 25), range(25, 50), range(50, 75), range(75, 100)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 99)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 101)]
我很好奇不同方法的性能,这里是:
在Python 3.5.1上测试
import time
batch_size = 7
arr_len = 298937
#---------slice-------------
print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
if not arr:
break
tmp = arr[0:batch_size]
arr = arr[batch_size:-1]
print(time.time() - start)
#-----------index-----------
print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)
#----------batches 1------------
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#----------batches 2------------
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([next(batchiter)], batchiter)
print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#---------chunks-------------
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
tmp = x
print(time.time() - start)
#-----------grouper-----------
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(iterable, n, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
tmp = x
print(time.time() - start)
结果:
slice
31.18285083770752
index
0.02184295654296875
batches 1
0.03503894805908203
batches 2
0.22681021690368652
chunks
0.019841909408569336
grouper
0.006506919860839844
任何可迭代的通用分块器,使用户可以选择如何在结尾处处理部分分块。
在Python 3上测试。
分块.py
from enum import Enum
class PartialChunkOptions(Enum):
INCLUDE = 0
EXCLUDE = 1
PAD = 2
ERROR = 3
class PartialChunkException(Exception):
pass
def chunker(iterable, n, on_partial=PartialChunkOptions.INCLUDE, pad=None):
"""
A chunker yielding n-element lists from an iterable, with various options
about what to do about a partial chunk at the end.
on_partial=PartialChunkOptions.INCLUDE (the default):
include the partial chunk as a short (<n) element list
on_partial=PartialChunkOptions.EXCLUDE
do not include the partial chunk
on_partial=PartialChunkOptions.PAD
pad to an n-element list
(also pass pad=<pad_value>, default None)
on_partial=PartialChunkOptions.ERROR
raise a RuntimeError if a partial chunk is encountered
"""
on_partial = PartialChunkOptions(on_partial)
iterator = iter(iterable)
while True:
vals = []
for i in range(n):
try:
vals.append(next(iterator))
except StopIteration:
if vals:
if on_partial == PartialChunkOptions.INCLUDE:
yield vals
elif on_partial == PartialChunkOptions.EXCLUDE:
pass
elif on_partial == PartialChunkOptions.PAD:
yield vals + [pad] * (n - len(vals))
elif on_partial == PartialChunkOptions.ERROR:
raise PartialChunkException
return
return
yield vals
测试.py
import chunker
chunk_size = 3
for it in (range(100, 107),
range(100, 109)):
print("\nITERABLE TO CHUNK: {}".format(it))
print("CHUNK SIZE: {}".format(chunk_size))
for option in chunker.PartialChunkOptions.__members__.values():
print("\noption {} used".format(option))
try:
for chunk in chunker.chunker(it, chunk_size, on_partial=option):
print(chunk)
except chunker.PartialChunkException:
print("PartialChunkException was raised")
print("")
test.py的输出
ITERABLE TO CHUNK: range(100, 107)
CHUNK SIZE: 3
option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106]
option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, None, None]
option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
PartialChunkException was raised
ITERABLE TO CHUNK: range(100, 109)
CHUNK SIZE: 3
option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]