我需要一个滚动窗口(又名滑动窗口)可迭代的序列/迭代器/生成器。(默认的Python迭代可以被认为是一种特殊情况,其中窗口长度为1。)我目前正在使用以下代码。我怎样才能做得更优雅和/或更有效?

def rolling_window(seq, window_size):
    it = iter(seq)
    win = [it.next() for cnt in xrange(window_size)] # First window
    yield win
    for e in it: # Subsequent windows
        win[:-1] = win[1:]
        win[-1] = e
        yield win

if __name__=="__main__":
    for w in rolling_window(xrange(6), 3):
        print w

"""Example output:   
   [0, 1, 2]
   [1, 2, 3]
   [2, 3, 4]
   [3, 4, 5]
"""

对于window_size == 2的特定情况(即,在序列中迭代相邻的重叠对),请参见如何从列表中迭代重叠(当前,下一个)值对?


当前回答

def GetShiftingWindows(thelist, size):
    return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]

>> a = [1, 2, 3, 4, 5]
>> GetShiftingWindows(a, 3)
[ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]

其他回答

让我们让它变懒!

from itertools import islice, tee

def window(iterable, size): 
    iterators = tee(iterable, size) 
    iterators = [islice(iterator, i, None) for i, iterator in enumerate(iterators)]  
    yield from zip(*iterators)

list(window(range(5), 3))
# [(0, 1, 2), (1, 2, 3), (2, 3, 4)]

在Python 3.10中,我们有itertools.pairwise(iterable)函数来滑动包含两个元素的窗口:

医生是这样说的:

返回从输入可迭代对象中获取的连续重叠对。 输出迭代器中2元组的数量将比输入的数量少1。如果输入可迭代对象的值少于两个,则返回空值。 大致相当于: def成对(iterable): # pairwise('ABCDEFG')——> AB BC CD DE EF FG A, b = tee(可迭代) 下一(b,没有) 返回zip(a, b)

这似乎是为collections.deque定制的,因为您实际上有一个FIFO(添加到一端,从另一端删除)。然而,即使你使用列表,你也不应该切片两次;相反,您应该只从列表中弹出(0)并追加()新项。

下面是一个基于deque的优化实现:

from collections import deque

def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win

在我的测试中,它在大多数时候都轻松击败了这里发布的其他所有东西,尽管pillmuncher的tee版本在大可迭代对象和小窗口方面击败了它。在较大的窗口上,deque再次以原始速度领先。

Access to individual items in the deque may be faster or slower than with lists or tuples. (Items near the beginning are faster, or items near the end if you use a negative index.) I put a sum(w) in the body of my loop; this plays to the deque's strength (iterating from one item to the next is fast, so this loop ran a a full 20% faster than the next fastest method, pillmuncher's). When I changed it to individually look up and add items in a window of ten, the tables turned and the tee method was 20% faster. I was able to recover some speed by using negative indexes for the last five terms in the addition, but tee was still a little faster. Overall I would estimate that either one is plenty fast for most uses and if you need a little more performance, profile and pick the one that works best.

deque窗口的一个轻微修改版本,使其成为一个真正的滚动窗口。因此,它开始只填充一个元素,然后增长到它的最大窗口大小,然后缩小,因为它的左边缘接近结束:

from collections import deque
def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(1)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win
    for _ in xrange(len(win)-1):
        win.popleft()
        yield win

for wnd in window(range(5), n=3):
    print(list(wnd))

这给了

[0]
[0, 1]
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4]
[4]

在旧版本的Python文档中有一个itertools示例:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

文档中的那个更简洁一点,我想它使用了itertools来达到更好的效果。


如果你的迭代器是一个简单的列表/元组,用指定的窗口大小滑动它的简单方法是:

seq = [0, 1, 2, 3, 4, 5]
window_size = 3

for i in range(len(seq) - window_size + 1):
    print(seq[i: i + window_size])

输出:

[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]