我如何划分(分裂，划分)一个基于条件的列表?

我的看法。我提出一个惰性单次配分函数，它保持输出子序列的相对顺序。

1. 需求

我认为这些要求是:

维护元素的相对顺序(因此，没有集合和字典) 对于每个元素只计算condition一次(因此不使用 (i)筛选或分组) 允许任意一个序列的惰性消耗(如果我们能够负担得起的话) 预先计算它们，那么naïve实现很可能是可接受)

2. 把图书馆

我的配分函数(下面介绍)和其他类似的函数把它变成了一个小图书馆:

python-split

它通常可以通过PyPI安装:

pip install --user split

要根据条件拆分列表，使用partition函数:

>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]

3.配分函数说明

在内部，我们需要同时构建两个子序列，因此需要消耗只有一个输出序列强制计算另一个输出序列了。我们需要在用户请求之间保持状态(存储已处理) 但还没有请求的元素)。为了保持状态，我使用了两个双端队列(双端队列):

from collections import deque

SplitSeq类负责内部管理:

class SplitSeq:
    def __init__(self, condition, sequence):
        self.cond = condition
        self.goods = deque([])
        self.bads = deque([])
        self.seq = iter(sequence)

魔术发生在它的. getnext()方法中。就像。next() 的迭代器，但允许指定我们想要的元素类型这一次。在幕后，它并没有丢弃被拒绝的元素，而是把它们放在两个队列中的一个:

    def getNext(self, getGood=True):
        if getGood:
            these, those, cond = self.goods, self.bads, self.cond
        else:
            these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
        if these:
            return these.popleft()
        else:
            while 1: # exit on StopIteration
                n = self.seq.next()
                if cond(n):
                    return n
                else:
                    those.append(n)

最终用户应该使用配分函数。它需要条件函数和序列(就像映射或过滤器)，以及返回两个生成器。的子序列元素，则第二个元素将构建互补的子序列。迭代器和生成器允许延迟偶长序列或无限序列的分裂。

def partition(condition, sequence):
    cond = condition if condition else bool  # evaluate as bool if condition == None
    ss = SplitSeq(cond, sequence)
    def goods():
        while 1:
            yield ss.getNext(getGood=True)
    def bads():
        while 1:
            yield ss.getNext(getGood=False)
    return goods(), bads()

为了方便起见，我选择test函数作为第一个参数将来的部分应用(类似于如何映射和过滤将test函数作为第一个参数)。

2011-10-24 19:42:33

受到@gnibbler伟大(但简洁!)回答的启发，我们可以应用该方法映射到多个分区:

from collections import defaultdict

def splitter(l, mapper):
    """Split an iterable into multiple partitions generated by a callable mapper."""

    results = defaultdict(list)

    for x in l:
        results[mapper(x)] += [x]

    return results

然后可以使用splitter，如下所示:

>>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
>>> split = splitter(l, lambda x: x % 2 == 0)  # partition l into odds and evens
>>> split.items()
>>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]

这适用于有更复杂映射的两个以上分区(也适用于迭代器):

>>> import math
>>> l = xrange(1, 23)
>>> split = splitter(l, lambda x: int(math.log10(x) * 5))
>>> split.items()
[(0, [1]),
 (1, [2]),
 (2, [3]),
 (3, [4, 5, 6]),
 (4, [7, 8, 9]),
 (5, [10, 11, 12, 13, 14, 15]),
 (6, [16, 17, 18, 19, 20, 21, 22])]

或者用字典来映射:

>>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
>>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
>>> split = splitter(l, map.get)
>>> split.items()
(1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]

2013-03-14 11:03:24

手动迭代，使用条件选择每个元素将被追加到的列表:

good, bad = [], []
for x in mylist:
    (bad, good)[x in goodvals].append(x)

2012-08-27 00:51:47

我基本上喜欢安德斯的方法，因为它非常普遍。下面的版本将分类器放在前面(以匹配过滤器语法)，并使用defaultdict(假定已导入)。

def categorize(func, seq):
    """Return mapping from categories to lists
    of categorized items.
    """
    d = defaultdict(list)
    for item in seq:
        d[func(item)].append(item)
    return d

2010-07-19 14:20:02

所有提出的解决方案的问题是，它将扫描和应用过滤功能两次。我会做一个简单的小函数，像这样:

def split_into_two_lists(lst, f):
    a = []
    b = []
    for elem in lst:
        if f(elem):
            a.append(elem)
        else:
            b.append(elem)
    return a, b

这样你就不会重复处理任何东西，也不会重复代码。

2009-06-04 08:10:50