我有一个Python脚本,它把一个整数列表作为输入,我需要一次处理四个整数。不幸的是,我无法控制输入,否则我将它作为一个四元素元组列表传入。目前,我以这种方式迭代它:

for i in range(0, len(ints), 4):
    # dummy op for example code
    foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]

不过,它看起来很像“C-think”,这让我怀疑有一种更python的方式来处理这种情况。该列表在迭代后被丢弃,因此不需要保留。也许这样会更好?

while ints:
    foo += ints[0] * ints[1] + ints[2] * ints[3]
    ints[0:4] = []

不过,感觉还是不太对。: - /

相关问题:在Python中如何将列表分割成大小均匀的块?


当前回答

关于J.F. Sebastian给出的解决方案:

def chunker(iterable, chunksize):
    return zip(*[iter(iterable)]*chunksize)

它很聪明,但有一个缺点——总是返回元组。如何获得字符串代替? 当然,你可以写“.join(chunker(…))”,但无论如何都要构造临时元组。

你可以通过编写自己的zip来摆脱临时元组,就像这样:

class IteratorExhausted(Exception):
    pass

def translate_StopIteration(iterable, to=IteratorExhausted):
    for i in iterable:
        yield i
    raise to # StopIteration would get ignored because this is generator,
             # but custom exception can leave the generator.

def custom_zip(*iterables, reductor=tuple):
    iterators = tuple(map(translate_StopIteration, iterables))
    while True:
        try:
            yield reductor(next(i) for i in iterators)
        except IteratorExhausted: # when any of iterators get exhausted.
            break

Then

def chunker(data, size, reductor=tuple):
    return custom_zip(*[iter(data)]*size, reductor=reductor)

使用示例:

>>> for i in chunker('12345', 2):
...     print(repr(i))
...
('1', '2')
('3', '4')
>>> for i in chunker('12345', 2, ''.join):
...     print(repr(i))
...
'12'
'34'

其他回答

这个问题的理想解决方案是使用迭代器(而不仅仅是序列)。它还应该是快速的。

这是itertools文档提供的解决方案:

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

在我的mac book air上使用ipython的%timeit,我每次循环得到47.5 us。

然而,这真的不适合我,因为结果被填充为偶数大小的组。没有填充的解决方案稍微复杂一些。最天真的解决方案可能是:

def grouper(size, iterable):
    i = iter(iterable)
    while True:
        out = []
        try:
            for _ in range(size):
                out.append(i.next())
        except StopIteration:
            yield out
            break
        
        yield out

简单,但相当慢:每循环693个

我能想到的最好的解决方案是使用islice进行内循环:

def grouper(size, iterable):
    it = iter(iterable)
    while True:
        group = tuple(itertools.islice(it, None, size))
        if not group:
            break
        yield group

对于同样的数据集,我每循环得到305 us。

由于无法更快地得到一个纯粹的解决方案,我提供了以下解决方案,但有一个重要的警告:如果您的输入数据中有filldata的实例,则可能会得到错误的答案。

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    # itertools.zip_longest on Python 3
    for x in itertools.izip_longest(*args, fillvalue=fillvalue):
        if x[-1] is fillvalue:
            yield tuple(v for v in x if v is not fillvalue)
        else:
            yield x

我真的不喜欢这个答案,但它明显更快。每回路124 us

我希望通过将迭代器从列表中删除,我不是简单地复制列表的一部分。生成器可以被切片,它们将自动仍然是一个生成器,而列表将被切片成1000个条目的大块,这是较低的效率。

def iter_group(iterable, batch_size:int):
    length = len(iterable)
    start = batch_size*-1
    end = 0
    while(end < length):
        start += batch_size
        end += batch_size
        if type(iterable) == list:
            yield (iterable[i] for i in range(start,min(length-1,end)))
        else:
            yield iterable[start:end]

用法:

items = list(range(1,1251))

for item_group in iter_group(items, 1000):
    for item in item_group:
        print(item)

首先,我将它设计为将字符串拆分为子字符串以解析包含十六进制的字符串。 今天我把它变成复杂的,但仍然简单的生成器。

def chunker(iterable, size, reductor, condition):
    it = iter(iterable)
    def chunk_generator():
        return (next(it) for _ in range(size))
    chunk = reductor(chunk_generator())
    while condition(chunk):
        yield chunk
        chunk = reductor(chunk_generator())

参数:

明显的

Iterable是任何包含/生成/迭代输入数据的Iterable /迭代器/生成器, 当然,大小是你想要得到的块的大小,

更有趣的

reductor is a callable, which receives generator iterating over content of chunk. I'd expect it to return sequence or string, but I don't demand that. You can pass as this argument for example list, tuple, set, frozenset, or anything fancier. I'd pass this function, returning string (provided that iterable contains / generates / iterates over strings): def concatenate(iterable): return ''.join(iterable) Note that reductor can cause closing generator by raising exception. condition is a callable which receives anything what reductor returned. It decides to approve & yield it (by returning anything evaluating to True), or to decline it & finish generator's work (by returning anything other or raising exception). When number of elements in iterable is not divisible by size, when it gets exhausted, reductor will receive generator generating less elements than size. Let's call these elements lasts elements. I invited two functions to pass as this argument: lambda x:x - the lasts elements will be yielded. lambda x: len(x)==<size> - the lasts elements will be rejected. replace <size> using number equal to size

在Python 3.8中,您可以使用walrus操作符和itertools.islice。

from itertools import islice

list_ = [i for i in range(10, 100)]

def chunker(it, size):
    iterator = iter(it)
    while chunk := list(islice(iterator, size)):
        print(chunk)
In [2]: chunker(list_, 10)                                                         
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

似乎没有一个漂亮的方法来做到这一点。下面是一个有很多方法的页面,包括:

def split_seq(seq, size):
    newseq = []
    splitsize = 1.0/size*len(seq)
    for i in range(size):
        newseq.append(seq[int(round(i*splitsize)):int(round((i+1)*splitsize))])
    return newseq