我有这样的代码:

good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]

目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。

我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?


当前回答

简单的生成器版本,在内存中保存尽可能少的值,并且只调用pred一次:

from collections import deque
from typing import Callable, TypeVar, Iterable
_T = TypeVar('_T')

def iter_split(pred: Callable[[_T], bool],
               iterable: Iterable[_T]) -> tuple[Iterable[_T], Iterable[_T]]:
    """Split an iterable into two iterables based on a predicate.
    
    The predicate will only be called once per element.
    
    Returns:
        A tuple of two iterables, the first containing all elements for which
        the predicate returned True, the second containing all elements for
        which the predicate returned False.
    """
    iterator = iter(iterable)
    true_values: deque[_T] = deque()
    false_values: deque[_T] = deque()
    
    def true_generator():
        while True:
            while true_values:
                yield true_values.popleft()
            
            for item in iterator:
                if pred(item):
                    yield item
                    break
                false_values.append(item)
            else:
                break
            
    def false_generator():
        while True:
            while false_values:
                yield false_values.popleft()
            
            for item in iterator:
                if not pred(item):
                    yield item
                    break
                true_values.append(item)
            else:
                break

    return true_generator(), false_generator()

其他回答

所有提出的解决方案的问题是,它将扫描和应用过滤功能两次。我会做一个简单的小函数,像这样:

def split_into_two_lists(lst, f):
    a = []
    b = []
    for elem in lst:
        if f(elem):
            a.append(elem)
        else:
            b.append(elem)
    return a, b

这样你就不会重复处理任何东西,也不会重复代码。

为了提高性能,请尝试itertools。

itertools模块标准化了一组快速、内存高效的核心工具,这些工具单独使用或组合使用都很有用。它们一起构成了一个“迭代器代数”,使得用纯Python简洁有效地构造专门的工具成为可能。

出现看到itertools。过滤器或imap。

itertools。iterable ifilter(谓词) 创建一个迭代器,从iterable中过滤元素,只返回谓词为True的元素

如果你不想用两行代码来完成一个语义只需要一次的操作,你可以把上面的一些方法(甚至是你自己的方法)包装在一个函数中:

def part_with_predicate(l, pred):
    return [i for i in l if pred(i)], [i for i in l if not pred(i)]

这不是一种惰性计算方法,它确实对列表进行了两次迭代,但是它允许您在一行代码中对列表进行分区。

一个基于生成器的版本,如果你能忍受一个或两个原始列表的反转。

设置…

random.seed(1234)
a = list(range(10))
random.shuffle(a)
a
[2, 8, 3, 5, 6, 4, 9, 0, 1, 7]

至于分裂……

(list((a.pop(j) for j, y in [(len(a)-i-1, x) for i,x in enumerate(a[::-1])] if y%2 == 0))[::-1], a)
([2, 8, 6, 4, 0], [3, 5, 9, 1, 7])

Another list of tuples of locations and each element is built in reverse order. In a generator wrapped round that each element is tested against the predicate (here test for even) and if True then the element is poped using previously computed locations. We are working backwards along the list so poping elements out does not change positions closer to the beginning of the list. A wrapping list() evaluates the generator and a final revers [::-1] puts the elements back in the right order. The original list "a" now only contains elements that for which the predicate is False.

下面是惰性迭代器方法:

from itertools import tee

def split_on_condition(seq, condition):
    l1, l2 = tee((condition(item), item) for item in seq)
    return (i for p, i in l1 if p), (i for p, i in l2 if not p)

它对每个项计算一次条件,并返回两个生成器,第一个生成条件为真时序列中的值,另一个生成条件为假时序列中的值。

因为它是惰性的,你可以在任何迭代器上使用它,甚至是无限迭代器:

from itertools import count, islice

def is_prime(n):
    return n > 1 and all(n % i for i in xrange(2, n))

primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))

通常情况下,非惰性列表返回方法会更好:

def split_on_condition(seq, condition):
    a, b = [], []
    for item in seq:
        (a if condition(item) else b).append(item)
    return a, b

编辑:对于您更具体的用例,将项目按某些键分割到不同的列表中,这里有一个通用函数:

DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
    """Split a sequence into lists based on a key function.

        seq - input sequence
        resultmapping - a dictionary that maps from target lists to keys that go to that list
        keyfunc - function to calculate the key of an input value
        default - the target where items that don't have a corresponding key go, by default they are dropped
    """
    result_lists = dict((key, []) for key in resultmapping)
    appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)

    if default is not DROP_VALUE:
        result_lists.setdefault(default, [])
        default_action = result_lists[default].append
    else:
        default_action = DROP_VALUE

    for item in seq:
        appenders.get(keyfunc(item), default_action)(item)

    return result_lists

用法:

def file_extension(f):
    return f[2].lower()

split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']