如何将任意长度的列表拆分为大小相等的块?


请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。

对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。


当前回答

[AA[i:i+SS] for i in range(len(AA))[::SS]]

其中AA是数组,SS是块大小。例如:

>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

要扩展py3中的范围,请执行以下操作

(py3) >>> [list(AA[i:i+SS]) for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]

其他回答

如果您知道列表大小:

def SplitList(mylist, chunk_size):
    return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]

如果没有(迭代器):

def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

在后一种情况下,如果您可以确保序列始终包含给定大小的整数个块(即没有不完整的最后一个块),则可以用更漂亮的方式重新表述。

我很好奇不同方法的性能,这里是:

在Python 3.5.1上测试

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

结果:

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844

这是一个生成大小均匀的块的生成器:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

对于Python 2,使用xrange代替range:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in xrange(0, len(lst), n):
        yield lst[i:i + n]

下面是一行理解列表。不过,上面的方法更可取,因为使用命名函数使代码更容易理解。对于Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

对于Python 2:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

这个问题让我想起Raku(以前的Perl6).comb(n)方法。它将字符串分成n个大小的块。(还有更多,但我会省略细节。)

在Python3中实现一个类似的函数作为lambda表达式非常简单:

comb = lambda s,n: (s[i:i+n] for i in range(0,len(s),n))

然后你可以这样称呼它:

some_list = list(range(0, 20))  # creates a list of 20 elements
generator = comb(some_list, 4)  # creates a generator that will generate lists of 4 elements
for sublist in generator:
    print(sublist)  # prints a sublist of four elements, as it's generated

当然,您不必将生成器分配给变量;你可以直接这样循环:

for sublist in comb(some_list, 4):
    print(sublist)  # prints a sublist of four elements, as it's generated

另外,此comb()函数还对字符串进行操作:

list( comb('catdogant', 3) )  # returns ['cat', 'dog', 'ant']

这适用于v2/v3,可内联,基于生成器,仅使用标准库:

import itertools
def split_groups(iter_in, group_size):
    return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))