我有这样的代码:

good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]

目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。

我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?


当前回答

这个问题已经有很多答案了,但似乎都不如我最喜欢的解决这个问题的方法,这种方法只遍历和测试每个项目一次,并使用列表理解的速度来构建两个输出列表之一,因此它只需要使用相对较慢的附加来构建一个输出列表:

bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]

In my answer to a similar question, I explain how this approach works (a combination of Python's greedy evaluation of or refraining from executing the append for "good" items, and append returning a false-like value which leaves the if condition false for "bad" items), and I show timeit results indicating that this approach outcompetes alternatives like those suggested here, especially in cases where the majority of items will go into the list built by list-comprehension (in this case, the good list).

其他回答

第一步(pre-OP-edit):使用集合:

mylist = [1,2,3,4,5,6,7]
goodvals = [1,3,7,8,9]

myset = set(mylist)
goodset = set(goodvals)

print list(myset.intersection(goodset))  # [1, 3, 7]
print list(myset.difference(goodset))    # [2, 4, 5, 6]

这对可读性(IMHO)和性能都有好处。

第二步(post-OP-edit):

创建一个好的扩展列表:

IMAGE_TYPES = set(['.jpg','.jpeg','.gif','.bmp','.png'])

这将提高性能。否则,你现在的情况在我看来还不错。

itertools。Groupby几乎可以满足您的要求,除了它要求对条目进行排序以确保您获得一个连续的范围之外,因此您需要首先根据键进行排序(否则您将为每种类型获得多个交错的组)。如。

def is_good(f):
    return f[2].lower() in IMAGE_TYPES

files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file3.gif', 123L, '.gif')]

for key, group in itertools.groupby(sorted(files, key=is_good), key=is_good):
    print key, list(group)

给:

False [('file2.avi', 999L, '.avi')]
True [('file1.jpg', 33L, '.jpg'), ('file3.gif', 123L, '.gif')]

与其他解决方案类似,可以将键func定义为任意数量的组。

我将采用2步方法,将谓词的求值与列表的过滤分离:

def partition(pred, iterable):
    xs = list(zip(map(pred, iterable), iterable))
    return [x[1] for x in xs if x[0]], [x[1] for x in xs if not x[0]]

就性能而言(除了在iterable的每个成员上只对pred求值一次之外),这样做的好处在于它将大量逻辑从解释器中移出,转移到高度优化的迭代和映射代码中。这可以加快长迭代对象的迭代速度,就像回答中描述的那样。

在表达性方面,它利用了像理解和映射这样的表达性习语。

你可以在Python中进行惰性函数编程,像这样:

partition = lambda l, c: map(
  lambda iii: (i for ii in iii for i in ii),
  zip(*(([], [e]) if c(e) else ([e], []) for e in l)))

函数式编程很优雅,但在Python中不是这样。如果你知道你的列表中没有None值,也可以参考这个例子:

partition = lambda l, c: map(
  filter(lambda x: x is not None, l),
  zip(*((None, e) if c(e) else (e, None) for e in l)))

受到@gnibbler伟大(但简洁!)回答的启发,我们可以应用该方法映射到多个分区:

from collections import defaultdict

def splitter(l, mapper):
    """Split an iterable into multiple partitions generated by a callable mapper."""

    results = defaultdict(list)

    for x in l:
        results[mapper(x)] += [x]

    return results

然后可以使用splitter,如下所示:

>>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
>>> split = splitter(l, lambda x: x % 2 == 0)  # partition l into odds and evens
>>> split.items()
>>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]

这适用于有更复杂映射的两个以上分区(也适用于迭代器):

>>> import math
>>> l = xrange(1, 23)
>>> split = splitter(l, lambda x: int(math.log10(x) * 5))
>>> split.items()
[(0, [1]),
 (1, [2]),
 (2, [3]),
 (3, [4, 5, 6]),
 (4, [7, 8, 9]),
 (5, [10, 11, 12, 13, 14, 15]),
 (6, [16, 17, 18, 19, 20, 21, 22])]

或者用字典来映射:

>>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
>>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
>>> split = splitter(l, map.get)
>>> split.items()
(1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]