我有一个Python脚本,它把一个整数列表作为输入,我需要一次处理四个整数。不幸的是,我无法控制输入,否则我将它作为一个四元素元组列表传入。目前,我以这种方式迭代它:

for i in range(0, len(ints), 4):
    # dummy op for example code
    foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]

不过,它看起来很像“C-think”,这让我怀疑有一种更python的方式来处理这种情况。该列表在迭代后被丢弃,因此不需要保留。也许这样会更好?

while ints:
    foo += ints[0] * ints[1] + ints[2] * ints[3]
    ints[0:4] = []

不过,感觉还是不太对。: - /

相关问题:在Python中如何将列表分割成大小均匀的块?


当前回答

另一种方法是使用双参数形式的iter:

from itertools import islice

def group(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

这可以很容易地适应使用填充(这类似于Markus Jarderot的答案):

from itertools import islice, chain, repeat

def group_pad(it, size, pad=None):
    it = chain(iter(it), repeat(pad))
    return iter(lambda: tuple(islice(it, size)), (pad,) * size)

这些甚至可以组合为可选的填充:

_no_pad = object()
def group(it, size, pad=_no_pad):
    if pad == _no_pad:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(pad))
        sentinel = (pad,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

其他回答

我需要一个解决方案,也将工作与集和生成器。我写不出很短很漂亮的东西,但至少可读性很好。

def chunker(seq, size):
    res = []
    for el in seq:
        res.append(el)
        if len(res) == size:
            yield res
            res = []
    if res:
        yield res

列表:

>>> list(chunker([i for i in range(10)], 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Set:

>>> list(chunker(set([i for i in range(10)]), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

发电机:

>>> list(chunker((i for i in range(10)), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

首先,我将它设计为将字符串拆分为子字符串以解析包含十六进制的字符串。 今天我把它变成复杂的,但仍然简单的生成器。

def chunker(iterable, size, reductor, condition):
    it = iter(iterable)
    def chunk_generator():
        return (next(it) for _ in range(size))
    chunk = reductor(chunk_generator())
    while condition(chunk):
        yield chunk
        chunk = reductor(chunk_generator())

参数:

明显的

Iterable是任何包含/生成/迭代输入数据的Iterable /迭代器/生成器, 当然,大小是你想要得到的块的大小,

更有趣的

reductor is a callable, which receives generator iterating over content of chunk. I'd expect it to return sequence or string, but I don't demand that. You can pass as this argument for example list, tuple, set, frozenset, or anything fancier. I'd pass this function, returning string (provided that iterable contains / generates / iterates over strings): def concatenate(iterable): return ''.join(iterable) Note that reductor can cause closing generator by raising exception. condition is a callable which receives anything what reductor returned. It decides to approve & yield it (by returning anything evaluating to True), or to decline it & finish generator's work (by returning anything other or raising exception). When number of elements in iterable is not divisible by size, when it gets exhausted, reductor will receive generator generating less elements than size. Let's call these elements lasts elements. I invited two functions to pass as this argument: lambda x:x - the lasts elements will be yielded. lambda x: len(x)==<size> - the lasts elements will be rejected. replace <size> using number equal to size

关于J.F. Sebastian给出的解决方案:

def chunker(iterable, chunksize):
    return zip(*[iter(iterable)]*chunksize)

它很聪明,但有一个缺点——总是返回元组。如何获得字符串代替? 当然,你可以写“.join(chunker(…))”,但无论如何都要构造临时元组。

你可以通过编写自己的zip来摆脱临时元组,就像这样:

class IteratorExhausted(Exception):
    pass

def translate_StopIteration(iterable, to=IteratorExhausted):
    for i in iterable:
        yield i
    raise to # StopIteration would get ignored because this is generator,
             # but custom exception can leave the generator.

def custom_zip(*iterables, reductor=tuple):
    iterators = tuple(map(translate_StopIteration, iterables))
    while True:
        try:
            yield reductor(next(i) for i in iterators)
        except IteratorExhausted: # when any of iterators get exhausted.
            break

Then

def chunker(data, size, reductor=tuple):
    return custom_zip(*[iter(data)]*size, reductor=reductor)

使用示例:

>>> for i in chunker('12345', 2):
...     print(repr(i))
...
('1', '2')
('3', '4')
>>> for i in chunker('12345', 2, ''.join):
...     print(repr(i))
...
'12'
'34'
import itertools
def chunks(iterable,size):
    it = iter(iterable)
    chunk = tuple(itertools.islice(it,size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# though this will throw ValueError if the length of ints
# isn't a multiple of four:
for x1,x2,x3,x4 in chunks(ints,4):
    foo += x1 + x2 + x3 + x4

for chunk in chunks(ints,4):
    foo += sum(chunk)

另一种方法:

import itertools
def chunks2(iterable,size,filler=None):
    it = itertools.chain(iterable,itertools.repeat(filler,size-1))
    chunk = tuple(itertools.islice(it,size))
    while len(chunk) == size:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# x2, x3 and x4 could get the value 0 if the length is not
# a multiple of 4.
for x1,x2,x3,x4 in chunks2(ints,4,0):
    foo += x1 + x2 + x3 + x4

我从来不想填充我的块,所以这个要求是必要的。我发现在任何可迭代对象上工作的能力也是必需的。鉴于此,我决定扩展公认的答案,https://stackoverflow.com/a/434411/1074659。

如果由于需要比较和筛选填充值而不需要填充,则这种方法的性能会受到轻微的影响。然而,对于大块大小,这个实用程序是非常高性能的。

#!/usr/bin/env python3
from itertools import zip_longest


_UNDEFINED = object()


def chunker(iterable, chunksize, fillvalue=_UNDEFINED):
    """
    Collect data into chunks and optionally pad it.

    Performance worsens as `chunksize` approaches 1.

    Inspired by:
        https://docs.python.org/3/library/itertools.html#itertools-recipes

    """
    args = [iter(iterable)] * chunksize
    chunks = zip_longest(*args, fillvalue=fillvalue)
    yield from (
        filter(lambda val: val is not _UNDEFINED, chunk)
        if chunk[-1] is _UNDEFINED
        else chunk
        for chunk in chunks
    ) if fillvalue is _UNDEFINED else chunks