如何将任意长度的列表拆分为大小相等的块?


请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。

对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。


当前回答

任何可迭代的通用分块器,使用户可以选择如何在结尾处处理部分分块。

在Python 3上测试。

分块.py

from enum import Enum

class PartialChunkOptions(Enum):
    INCLUDE = 0
    EXCLUDE = 1
    PAD = 2
    ERROR = 3

class PartialChunkException(Exception):
    pass

def chunker(iterable, n, on_partial=PartialChunkOptions.INCLUDE, pad=None):
    """
    A chunker yielding n-element lists from an iterable, with various options
    about what to do about a partial chunk at the end.

    on_partial=PartialChunkOptions.INCLUDE (the default):
                     include the partial chunk as a short (<n) element list

    on_partial=PartialChunkOptions.EXCLUDE
                     do not include the partial chunk

    on_partial=PartialChunkOptions.PAD
                     pad to an n-element list 
                     (also pass pad=<pad_value>, default None)

    on_partial=PartialChunkOptions.ERROR
                     raise a RuntimeError if a partial chunk is encountered
    """

    on_partial = PartialChunkOptions(on_partial)        

    iterator = iter(iterable)
    while True:
        vals = []
        for i in range(n):
            try:
                vals.append(next(iterator))
            except StopIteration:
                if vals:
                    if on_partial == PartialChunkOptions.INCLUDE:
                        yield vals
                    elif on_partial == PartialChunkOptions.EXCLUDE:
                        pass
                    elif on_partial == PartialChunkOptions.PAD:
                        yield vals + [pad] * (n - len(vals))
                    elif on_partial == PartialChunkOptions.ERROR:
                        raise PartialChunkException
                    return
                return
        yield vals

测试.py

import chunker

chunk_size = 3

for it in (range(100, 107),
          range(100, 109)):

    print("\nITERABLE TO CHUNK: {}".format(it))
    print("CHUNK SIZE: {}".format(chunk_size))

    for option in chunker.PartialChunkOptions.__members__.values():
        print("\noption {} used".format(option))
        try:
            for chunk in chunker.chunker(it, chunk_size, on_partial=option):
                print(chunk)
        except chunker.PartialChunkException:
            print("PartialChunkException was raised")
    print("")

test.py的输出


ITERABLE TO CHUNK: range(100, 107)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, None, None]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
PartialChunkException was raised


ITERABLE TO CHUNK: range(100, 109)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

其他回答

直接从(旧的)Python文档(itertools的配方):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

J.F.Sebastian建议的当前版本:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

我猜圭多的时间机器工作了,会工作的。

这些解决方案之所以有效,是因为[iter(iterable)]*n(或早期版本中的等价物)创建了一个迭代器,在列表中重复n次。izip_length然后有效地执行“每个”迭代器的循环;因为这是同一个迭代器,所以每一个这样的调用都会使它前进,从而导致每个这样的zip循环生成一个由n个项组成的元组。

用户@tzot的解决方案zip_langest(*[iter(lst)]*n,fillvalue=padvalue)非常优雅,但如果lst的长度不能被n整除,它会填充最后一个子列表,以保持其长度与其他子列表的长度匹配。然而,如果这不可取,那么只需使用zip()生成类似的循环zip,并将lst的剩余元素(不能生成“完整”子列表)附加到输出即可。

输出示例为ABCDEFG,3->ABC DEF G。

单线版本(Python>=3.8):

list(map(list, zip(*[iter(lst)]*n))) + ([rest] if (rest:=lst[len(lst)//n*n : ]) else [])

A函数:

def chunkify(lst, chunk_size):
    nested = list(map(list, zip(*[iter(lst)]*chunk_size)))
    rest = lst[len(lst)//chunk_size*chunk_size: ]
    if rest:
        nested.append(rest)
    return nested

生成器(尽管每个批次都是一个元组):

def chunkify(lst, chunk_size):
    for tup in zip(*[iter(lst)]*chunk_size):
        yield tup
    rest = tuple(lst[len(lst)//chunk_size*chunk_size: ])
    if rest:
        yield rest

它比这里的一些最流行的答案产生相同的输出更快。

my_list, n = list(range(1_000_000)), 12

%timeit list(chunks(my_list, n))                                         # @Ned_Batchelder
# 36.4 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [my_list[i:i+n] for i in range(0, len(my_list), n)]              # @Ned_Batchelder
# 34.6 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit it = iter(my_list); list(iter(lambda: list(islice(it, n)), []))  # @senderle
# 60.6 ms ± 5.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit list(mit.chunked(my_list, n))                                    # @pylang
# 59.4 ms ± 4.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit chunkify(my_list, n)
# 25.8 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

同样,从Python 3.12开始,这个功能将作为itertools模块中的批处理方法来实现(目前是一个配方),因此这个答案很可能会被Python 3.12淘汰。

def chunk(lst):
    out = []
    for x in xrange(2, len(lst) + 1):
        if not len(lst) % x:
            factor = len(lst) / x
            break
    while lst:
        out.append([lst.pop(0) for x in xrange(factor)])
    return out
def main():
  print(chunkify([1,2,3,4,5,6],2))

def chunkify(list, n):
  chunks = []
  for i in range(0, len(list), n):
    chunks.append(list[i:i+n])
  return chunks

main()

我认为这很简单,可以为您提供数组的一部分。

假设列表是第一个

import math

# length of the list len(lst) is ln
# size of a chunk is size

for num in range ( math.ceil(ln/size) ):
    start, end = num*size, min((num+1)*size, ln)
    print(lst[start:end])