我如何划分(分裂，划分)一个基于条件的列表?

我有这样的代码:

good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]

目标是根据mylist的内容是否满足条件，将它们拆分为另外两个列表。

我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?

当前回答

如果你不介意使用一个外部库，有两个我知道本机实现这个操作:

>>> files = [ ('file1.jpg', 33, '.jpg'), ('file2.avi', 999, '.avi')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')

iteration_utilities.partition: >>> from iteration_utilities import partition >>> notimages, images = partition(files, lambda x: x[2].lower() in IMAGE_TYPES) >>> notimages [('file2.avi', 999, '.avi')] >>> images [('file1.jpg', 33, '.jpg')] more_itertools.partition >>> from more_itertools import partition >>> notimages, images = partition(lambda x: x[2].lower() in IMAGE_TYPES, files) >>> list(notimages) # returns a generator so you need to explicitly convert to list. [('file2.avi', 999, '.avi')] >>> list(images) [('file1.jpg', 33, '.jpg')]

2016-12-19 16:24:39

其他回答

def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

检查这个

2015-02-24 09:26:53

我的看法。我提出一个惰性单次配分函数，它保持输出子序列的相对顺序。

1. 需求

我认为这些要求是:

维护元素的相对顺序(因此，没有集合和字典) 对于每个元素只计算condition一次(因此不使用 (i)筛选或分组) 允许任意一个序列的惰性消耗(如果我们能够负担得起的话) 预先计算它们，那么naïve实现很可能是可接受)

2. 把图书馆

我的配分函数(下面介绍)和其他类似的函数把它变成了一个小图书馆:

python-split

它通常可以通过PyPI安装:

pip install --user split

要根据条件拆分列表，使用partition函数:

>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]

3.配分函数说明

在内部，我们需要同时构建两个子序列，因此需要消耗只有一个输出序列强制计算另一个输出序列了。我们需要在用户请求之间保持状态(存储已处理) 但还没有请求的元素)。为了保持状态，我使用了两个双端队列(双端队列):

from collections import deque

SplitSeq类负责内部管理:

class SplitSeq:
    def __init__(self, condition, sequence):
        self.cond = condition
        self.goods = deque([])
        self.bads = deque([])
        self.seq = iter(sequence)

魔术发生在它的. getnext()方法中。就像。next() 的迭代器，但允许指定我们想要的元素类型这一次。在幕后，它并没有丢弃被拒绝的元素，而是把它们放在两个队列中的一个:

    def getNext(self, getGood=True):
        if getGood:
            these, those, cond = self.goods, self.bads, self.cond
        else:
            these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
        if these:
            return these.popleft()
        else:
            while 1: # exit on StopIteration
                n = self.seq.next()
                if cond(n):
                    return n
                else:
                    those.append(n)

最终用户应该使用配分函数。它需要条件函数和序列(就像映射或过滤器)，以及返回两个生成器。的子序列元素，则第二个元素将构建互补的子序列。迭代器和生成器允许延迟偶长序列或无限序列的分裂。

def partition(condition, sequence):
    cond = condition if condition else bool  # evaluate as bool if condition == None
    ss = SplitSeq(cond, sequence)
    def goods():
        while 1:
            yield ss.getNext(getGood=True)
    def bads():
        while 1:
            yield ss.getNext(getGood=False)
    return goods(), bads()

为了方便起见，我选择test函数作为第一个参数将来的部分应用(类似于如何映射和过滤将test函数作为第一个参数)。

2011-10-24 19:42:33

如果你不想用两行代码来完成一个语义只需要一次的操作，你可以把上面的一些方法(甚至是你自己的方法)包装在一个函数中:

def part_with_predicate(l, pred):
    return [i for i in l if pred(i)], [i for i in l if not pred(i)]

这不是一种惰性计算方法，它确实对列表进行了两次迭代，但是它允许您在一行代码中对列表进行分区。

2012-05-04 20:53:55

这里已经有很多解了，但另一种方法是

anims = []
images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]

只在列表上迭代一次，看起来更python化，因此对我来说是可读的。

>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file1.bmp', 33L, '.bmp')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> anims = []
>>> images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
>>> print '\n'.join([str(anims), str(images)])
[('file2.avi', 999L, '.avi')]
[('file1.jpg', 33L, '.jpg'), ('file1.bmp', 33L, '.bmp')]
>>>

2014-03-08 03:40:43

之前的答案似乎并不能满足我所有的四种强迫症:

尽可能的懒惰，只对原始Iterable求值一次每个项只计算谓词一次提供良好的类型注释(适用于python 3.7)

我的解决方案并不漂亮，我不认为我可以推荐使用它，但它是:

def iter_split_on_predicate(predicate: Callable[[T], bool], iterable: Iterable[T]) -> Tuple[Iterator[T], Iterator[T]]:
    deque_predicate_true = deque()
    deque_predicate_false = deque()
    
    # define a generator function to consume the input iterable
    # the Predicate is evaluated once per item, added to the appropriate deque, and the predicate result it yielded 
    def shared_generator(definitely_an_iterator):
        for item in definitely_an_iterator:
            print("Evaluate predicate.")
            if predicate(item):
                deque_predicate_true.appendleft(item)
                yield True
            else:
                deque_predicate_false.appendleft(item)
                yield False
    
    # consume input iterable only once,
    # converting to an iterator with the iter() function if necessary. Probably this conversion is unnecessary
    shared_gen = shared_generator(
        iterable if isinstance(iterable, collections.abc.Iterator) else iter(iterable)
    )
    
    # define a generator function for each predicate outcome and queue
    def iter_for(predicate_value, hold_queue):
        def consume_shared_generator_until_hold_queue_contains_something():
            if not hold_queue:
                try:
                    while next(shared_gen) != predicate_value:
                        pass
                except:
                    pass
        
        consume_shared_generator_until_hold_queue_contains_something()
        while hold_queue:
            print("Yield where predicate is "+str(predicate_value))
            yield hold_queue.pop()
            consume_shared_generator_until_hold_queue_contains_something()
    
    # return a tuple of two generators  
    return iter_for(predicate_value=True, hold_queue=deque_predicate_true), iter_for(predicate_value=False, hold_queue=deque_predicate_false)

用下面的测试，我们从print语句中得到如下输出:

t,f = iter_split_on_predicate(lambda item:item>=10,[1,2,3,10,11,12,4,5,6,13,14,15])
print(list(zip(t,f)))
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# [(10, 1), (11, 2), (12, 3), (13, 4), (14, 5), (15, 6)]

2020-11-24 03:30:35

我如何划分(分裂，划分)一个基于条件的列表?

推荐文章

最新文章

标签