如何将任意长度的列表拆分为大小相等的块?


请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。

对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。


当前回答

我不喜欢按块大小拆分元素的想法,例如,脚本可以将101到3个块划分为[50,50,1]。为了我的需要,我需要按比例分配,保持秩序不变。首先我写了自己的剧本,效果很好,而且很简单。但我后来看到了这个答案,剧本比我的好,我想是这样的。这是我的脚本:

def proportional_dividing(N, n):
    """
    N - length of array (bigger number)
    n - number of chunks (smaller number)
    output - arr, containing N numbers, diveded roundly to n chunks
    """
    arr = []
    if N == 0:
        return arr
    elif n == 0:
        arr.append(N)
        return arr
    r = N // n
    for i in range(n-1):
        arr.append(r)
    arr.append(N-r*(n-1))

    last_n = arr[-1]
    # last number always will be r <= last_n < 2*r
    # when last_n == r it's ok, but when last_n > r ...
    if last_n > r:
        # ... and if difference too big (bigger than 1), then
        if abs(r-last_n) > 1:
            #[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7] # N=29, n=12
            # we need to give unnecessary numbers to first elements back
            diff = last_n - r
            for k in range(diff):
                arr[k] += 1
            arr[-1] = r
            # and we receive [3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2]
    return arr

def split_items(items, chunks):
    arr = proportional_dividing(len(items), chunks)
    splitted = []
    for chunk_size in arr:
        splitted.append(items[:chunk_size])
        items = items[chunk_size:]
    print(splitted)
    return splitted

items = [1,2,3,4,5,6,7,8,9,10,11]
chunks = 3
split_items(items, chunks)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm'], 3)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm', 'n'], 3)
split_items(range(100), 4)
split_items(range(99), 4)
split_items(range(101), 4)

和输出:

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'g', 'k', 'l', 'm']]
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'g'], ['k', 'l', 'm', 'n']]
[range(0, 25), range(25, 50), range(50, 75), range(75, 100)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 99)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 101)]

其他回答

def chunk(input, size):
    return map(None, *([iter(input)] * size))

任何可迭代的通用分块器,使用户可以选择如何在结尾处处理部分分块。

在Python 3上测试。

分块.py

from enum import Enum

class PartialChunkOptions(Enum):
    INCLUDE = 0
    EXCLUDE = 1
    PAD = 2
    ERROR = 3

class PartialChunkException(Exception):
    pass

def chunker(iterable, n, on_partial=PartialChunkOptions.INCLUDE, pad=None):
    """
    A chunker yielding n-element lists from an iterable, with various options
    about what to do about a partial chunk at the end.

    on_partial=PartialChunkOptions.INCLUDE (the default):
                     include the partial chunk as a short (<n) element list

    on_partial=PartialChunkOptions.EXCLUDE
                     do not include the partial chunk

    on_partial=PartialChunkOptions.PAD
                     pad to an n-element list 
                     (also pass pad=<pad_value>, default None)

    on_partial=PartialChunkOptions.ERROR
                     raise a RuntimeError if a partial chunk is encountered
    """

    on_partial = PartialChunkOptions(on_partial)        

    iterator = iter(iterable)
    while True:
        vals = []
        for i in range(n):
            try:
                vals.append(next(iterator))
            except StopIteration:
                if vals:
                    if on_partial == PartialChunkOptions.INCLUDE:
                        yield vals
                    elif on_partial == PartialChunkOptions.EXCLUDE:
                        pass
                    elif on_partial == PartialChunkOptions.PAD:
                        yield vals + [pad] * (n - len(vals))
                    elif on_partial == PartialChunkOptions.ERROR:
                        raise PartialChunkException
                    return
                return
        yield vals

测试.py

import chunker

chunk_size = 3

for it in (range(100, 107),
          range(100, 109)):

    print("\nITERABLE TO CHUNK: {}".format(it))
    print("CHUNK SIZE: {}".format(chunk_size))

    for option in chunker.PartialChunkOptions.__members__.values():
        print("\noption {} used".format(option))
        try:
            for chunk in chunker.chunker(it, chunk_size, on_partial=option):
                print(chunk)
        except chunker.PartialChunkException:
            print("PartialChunkException was raised")
    print("")

test.py的输出


ITERABLE TO CHUNK: range(100, 107)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, None, None]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
PartialChunkException was raised


ITERABLE TO CHUNK: range(100, 109)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

itertools模块中的配方提供了两种方法来实现这一点,具体取决于您希望如何处理最终的奇数大小的批次(保留它、用填充值填充它、忽略它或引发异常):

from itertools import islice, izip_longest

def batched(iterable, n):
    "Batch data into lists of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    it = iter(iterable)
    while True:
        batch = list(islice(it, n))
        if not batch:
            return
        yield batch

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')

我意识到这个问题已经过时了(在谷歌上被它绊倒了),但肯定像下面这样的问题比任何复杂的建议都要简单和清晰得多,而且只使用切片:

def chunker(iterable, chunksize):
    for i,c in enumerate(iterable[::chunksize]):
        yield iterable[i*chunksize:(i+1)*chunksize]

>>> for chunk in chunker(range(0,100), 10):
...     print list(chunk)
... 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
... etc ...

这个问题让我想起Raku(以前的Perl6).comb(n)方法。它将字符串分成n个大小的块。(还有更多,但我会省略细节。)

在Python3中实现一个类似的函数作为lambda表达式非常简单:

comb = lambda s,n: (s[i:i+n] for i in range(0,len(s),n))

然后你可以这样称呼它:

some_list = list(range(0, 20))  # creates a list of 20 elements
generator = comb(some_list, 4)  # creates a generator that will generate lists of 4 elements
for sublist in generator:
    print(sublist)  # prints a sublist of four elements, as it's generated

当然,您不必将生成器分配给变量;你可以直接这样循环:

for sublist in comb(some_list, 4):
    print(sublist)  # prints a sublist of four elements, as it's generated

另外,此comb()函数还对字符串进行操作:

list( comb('catdogant', 3) )  # returns ['cat', 'dog', 'ant']