如何将任意长度的列表拆分为大小相等的块?


请参阅如果数据结果将直接用于循环,并且不需要存储,则如何以块形式遍历列表。

对于字符串输入的同一问题,请参见每n个字符拆分字符串?。相同的技术通常适用,但也有一些变化。


当前回答

我知道这有点过时,但还没有人提到numpy.array_split:

import numpy as np

lst = range(50)
np.array_split(lst, 5)

结果:

[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
 array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
 array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]

其他回答

def main():
  print(chunkify([1,2,3,4,5,6],2))

def chunkify(list, n):
  chunks = []
  for i in range(0, len(list), n):
    chunks.append(list[i:i+n])
  return chunks

main()

我认为这很简单,可以为您提供数组的一部分。

一种老式的方法,不需要itertools,但仍然可以使用任意生成器:

def chunks(g, n):
  """divide a generator 'g' into small chunks
  Yields:
    a chunk that has 'n' or less items
  """
  n = max(1, n)
  buff = []
  for item in g:
    buff.append(item)
    if len(buff) == n:
      yield buff
      buff = []
  if buff:
    yield buff

senderle答案的一个线性版本:

from itertools import islice
from functools import partial

seq = [1,2,3,4,5,6,7]
size = 3
result = list(iter(partial(lambda it: tuple(islice(it, size)), iter(seq)), ()))
assert result == [(1, 2, 3), (4, 5, 6), (7,)]

用户@tzot的解决方案zip_langest(*[iter(lst)]*n,fillvalue=padvalue)非常优雅,但如果lst的长度不能被n整除,它会填充最后一个子列表,以保持其长度与其他子列表的长度匹配。然而,如果这不可取,那么只需使用zip()生成类似的循环zip,并将lst的剩余元素(不能生成“完整”子列表)附加到输出即可。

输出示例为ABCDEFG,3->ABC DEF G。

单线版本(Python>=3.8):

list(map(list, zip(*[iter(lst)]*n))) + ([rest] if (rest:=lst[len(lst)//n*n : ]) else [])

A函数:

def chunkify(lst, chunk_size):
    nested = list(map(list, zip(*[iter(lst)]*chunk_size)))
    rest = lst[len(lst)//chunk_size*chunk_size: ]
    if rest:
        nested.append(rest)
    return nested

生成器(尽管每个批次都是一个元组):

def chunkify(lst, chunk_size):
    for tup in zip(*[iter(lst)]*chunk_size):
        yield tup
    rest = tuple(lst[len(lst)//chunk_size*chunk_size: ])
    if rest:
        yield rest

它比这里的一些最流行的答案产生相同的输出更快。

my_list, n = list(range(1_000_000)), 12

%timeit list(chunks(my_list, n))                                         # @Ned_Batchelder
# 36.4 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [my_list[i:i+n] for i in range(0, len(my_list), n)]              # @Ned_Batchelder
# 34.6 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit it = iter(my_list); list(iter(lambda: list(islice(it, n)), []))  # @senderle
# 60.6 ms ± 5.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit list(mit.chunked(my_list, n))                                    # @pylang
# 59.4 ms ± 4.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit chunkify(my_list, n)
# 25.8 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

同样,从Python 3.12开始,这个功能将作为itertools模块中的批处理方法来实现(目前是一个配方),因此这个答案很可能会被Python 3.12淘汰。

任何可迭代的通用分块器,使用户可以选择如何在结尾处处理部分分块。

在Python 3上测试。

分块.py

from enum import Enum

class PartialChunkOptions(Enum):
    INCLUDE = 0
    EXCLUDE = 1
    PAD = 2
    ERROR = 3

class PartialChunkException(Exception):
    pass

def chunker(iterable, n, on_partial=PartialChunkOptions.INCLUDE, pad=None):
    """
    A chunker yielding n-element lists from an iterable, with various options
    about what to do about a partial chunk at the end.

    on_partial=PartialChunkOptions.INCLUDE (the default):
                     include the partial chunk as a short (<n) element list

    on_partial=PartialChunkOptions.EXCLUDE
                     do not include the partial chunk

    on_partial=PartialChunkOptions.PAD
                     pad to an n-element list 
                     (also pass pad=<pad_value>, default None)

    on_partial=PartialChunkOptions.ERROR
                     raise a RuntimeError if a partial chunk is encountered
    """

    on_partial = PartialChunkOptions(on_partial)        

    iterator = iter(iterable)
    while True:
        vals = []
        for i in range(n):
            try:
                vals.append(next(iterator))
            except StopIteration:
                if vals:
                    if on_partial == PartialChunkOptions.INCLUDE:
                        yield vals
                    elif on_partial == PartialChunkOptions.EXCLUDE:
                        pass
                    elif on_partial == PartialChunkOptions.PAD:
                        yield vals + [pad] * (n - len(vals))
                    elif on_partial == PartialChunkOptions.ERROR:
                        raise PartialChunkException
                    return
                return
        yield vals

测试.py

import chunker

chunk_size = 3

for it in (range(100, 107),
          range(100, 109)):

    print("\nITERABLE TO CHUNK: {}".format(it))
    print("CHUNK SIZE: {}".format(chunk_size))

    for option in chunker.PartialChunkOptions.__members__.values():
        print("\noption {} used".format(option))
        try:
            for chunk in chunker.chunker(it, chunk_size, on_partial=option):
                print(chunk)
        except chunker.PartialChunkException:
            print("PartialChunkException was raised")
    print("")

test.py的输出


ITERABLE TO CHUNK: range(100, 107)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, None, None]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
PartialChunkException was raised


ITERABLE TO CHUNK: range(100, 109)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]